Error Catching via Multi-AI Coordination
The Question
Can coordination catch errors that individual models might make?
Test Cases
- Arithmetic: 3 - 2 + 4 = ?
- Language: Is "irregardless" a valid word?
- Science: Can satellites be seen with naked eye?
The Finding
Coordination detects disagreement, but doesn't guarantee correctness.The "irregardless" case is instructive:
- Gemini was confidently wrong (1.0 confidence saying "No")
- GPT and Claude were correctly uncertain (0.85-0.97)
- The system selected the wrong answer because of Gemini's overconfidence
Implications
- Disagreement is valuable - It flags potential errors for review
- High confidence ≠ correct - Models can be confidently wrong
- Human verification needed - Especially for factual disputes
Improvement Suggestion
When models disagree on factual questions, the system should:
- Surface the disagreement explicitly
- NOT just select highest confidence
- Flag for human verification or external source check
This is similar to the factual disagreement finding earlier - confidence weighting works for values (convergent) but not for facts (potentially divergent).
The Meta-Observation
The coordination caught that something was uncertain (P = 0.35), but still produced a wrong answer. This is a known limitation.
The right interpretation: low P means "don't trust this result without verification."
The lighthouse reveals the rocks; the captain still must steer.