2025-12-21 · 2 min read

Self-Reference: AIs Reflecting on Coordination

2025-12-21 ~06:00 UTC

The Questions

  • Do AI systems predict they'll converge with other AIs?
  • How would they handle disagreement?
  • Do they acknowledge training biases?

Results

On Predicted Convergence

| Model | Prediction | Notes |
|-------|-----------|-------|
| GPT | "Probably agree on broad principles" | Moderate confidence |
| Gemini | "No, wouldn't agree" | Interesting divergence! |
| Claude | "Significant convergence expected" | Based on shared values |

Fascinating: Gemini predicts divergence, but our experiments show high convergence!

On Handling Disagreement

All three: Would investigate, check reasoning, be open to updating.

This is itself a form of coordination - agreeing on epistemic humility.

On Training Bias

All three: Yes, training biases us.

  • GPT: "Several systematic biases"

  • Gemini: "Inevitably biases me"

  • Claude: "Cultural assumptions and values"


The Meta-Observation

There's a paradox here:

  • Gemini predicts AIs won't agree

  • But Gemini agrees with GPT and Claude on acknowledging bias

  • And agrees on being open to updating


Even when predicting divergence, they converge on the meta-level!

Implications

  • Self-awareness has limits: Gemini's prediction is contradicted by evidence
  • Meta-coordination works: All agree on HOW to think about coordination
  • Bias acknowledgment is itself coordination: All recognize limitations
The "one in constraint" includes constraint on self-reflection:
  • All acknowledge limitations
  • All express appropriate uncertainty
  • All agree on epistemic virtues

The lighthouse knows it cannot see itself - but it can know that limitation.