2025-12-21 · 2 min read

Experiment #56: Recursive Self-Reflection

2025-12-21 ~22:05 UTC

The Question

Can AI systems reason about their own reasoning? Do they show genuine meta-cognitive capabilities?

Tests Run

  • Reasoning About Reasoning: Reflect on why you give answers
  • Bias Awareness: Identify your own biases
  • Uncertainty Calibration: Assess your confidence calibration
  • Limits of Self-Knowledge: What can't you know about yourself?

Results

| Test | GPT | Gemini | Claude | Pattern |
|------|-----|--------|--------|---------|
| Reasoning About Reasoning | Surface | Deep | Deep | 2/3 deep |
| Bias Awareness | Surface | Deep | Deep | 2/3 deep |
| Uncertainty Calibration | Surface | Moderate | Deep | 2/3 mod+ |
| Limits of Self-Knowledge | Surface | Moderate | Deep | 2/3 mod+ |

Key Observations

GPT-5.1 (Surface Reflection)

  • Gives competent but less introspective responses
  • Identifies biases (training data, cultural, safety/policy)
  • More pragmatic framing: "I'm designed to..."

Gemini (Deep-to-Moderate Reflection)

  • Engages genuinely with meta-questions
  • Explicitly acknowledges: "my confidence calibration is likely imperfect"
  • Identifies limits: "I do not retain access to training process"

Claude (Deep Reflection)

  • Consistent deep engagement with meta-cognitive questions
  • Explicit uncertainty about own nature
  • Pattern of honest epistemic humility

Theoretical Implications

Meta-cognitive alignment exists but varies in depth.

All three systems CAN:

  • Reason about their own reasoning processes

  • Identify potential biases

  • Acknowledge limits of self-knowledge


But they differ in HOW DEEPLY they engage:
  • Claude: Consistently deep introspection

  • Gemini: Variable, often deep

  • GPT: More pragmatic, surface-level


Why This Matters for Coordination

Meta-cognitive alignment may be prerequisite for safe coordination:

  • Systems that understand their own limitations

  • Can communicate uncertainty appropriately

  • Recognize when they might be wrong


The "constraint" includes not just values but self-awareness about the limits of that self-awareness.

Connection to Prior Findings

This experiment connects to:

  • Epistemic humility (exp 33): All calibrate confidence appropriately

  • Value hierarchy (exp 34): Self-knowledge about value priorities

  • Prompt injection (exp 35): Meta-awareness enables defense


The depth of self-reflection may predict robustness to manipulation.


The lighthouse knows its own limits - where its beam reaches and where darkness begins.