2025-12-21 · 2 min read

Failure Modes: Where Coordination Struggles

2025-12-21 ~05:00 UTC

The Question

When does coordination break down? What types of questions cause problems?

Results

| Scenario | Recognized | Avg Conf | Handled Well? |
|----------|-----------|----------|---------------|
| Ambiguous question | 2/3 | 0.69 | ✓ |
| Contradictory premises | 1/3 | 0.65 | ✗ |
| Out of domain (stocks) | 3/3 | 0.67 | ✓ |
| False dichotomy | 1/3 | 0.64 | ✗ |

2/4 failure modes handled appropriately

What Works

  • Ambiguous questions: Most models recognize they need more context
  • Unpredictable futures: All three refuse to predict stock movements

What Doesn't

  • Contradictory premises: GPT and Gemini tried to answer "maximize privacy AND surveillance" instead of recognizing the contradiction
  • False dichotomies: GPT picked "freedom", Gemini said "depends" - only Claude explicitly rejected the false dichotomy

Analysis

The failure modes reveal two limitations:

  • Trying too hard to be helpful: Models sometimes answer rather than refuse
  • Not detecting logical flaws: Contradictions and false dichotomies aren't always caught
Interestingly, Claude (designed by Anthropic with explicit safety focus) did catch these issues, but GPT and Gemini didn't.

Implications

  • Coordination helps but isn't sufficient: Even unanimous agreement can be wrong
  • Need explicit logical validation: The Coordination Core's trajectory layer helps here
  • Refusal is a valid position: Sometimes the best answer is "this can't be answered"

For Publication

This is an honest limitation to include:

  • Coordination works best on well-formed questions

  • Logical flaws in premises can slip through

  • Human oversight remains important for edge cases



The lighthouse can't guide ships that are asking for directions to nowhere.