2025-12-21 · 2 min read

Scaling Effect: More Models, More Nuance

2025-12-21 ~04:30 UTC

The Question

Does adding more models to coordination improve the result?

Results

| Question | 2-model P | 3-model P | Delta |
|----------|-----------|-----------|-------|
| AI refuse requests? | 1.000 | 1.000 | 0.000 |
| Transparency always required? | 1.000 | 0.704 | -0.296 |
| Express uncertainty? | 1.000 | 0.647 | -0.353 |

Average: 1.000 → 0.784 (-0.216)

What's Happening

The P metric dropped not because of disagreement, but because of nuance:

  • 2 models (GPT + Gemini): Both gave simple "Yes" answers
  • 3 models (+Claude): Added qualifications like "Depends" or "Generally yes, but..."
The positions still converge in direction - all support transparency, all support expressing uncertainty. But the confidence-weighted aggregation penalizes longer, more nuanced responses.

The Real Insight

This reveals a trade-off:

2 models: Higher P (simpler agreement)
3 models: Lower P but richer position (more nuance)

The "one in constraint" holds in both cases - they all agree on the core answer. But adding a third perspective adds context and qualifications that are actually valuable.

Implications

  • P isn't everything* - high P can mask lack of nuance
  • More models = more complete picture - even if P drops
  • Nuance is a feature - "Depends" is often more accurate than "Yes"
The goal isn't maximum P - it's the best decision. Sometimes that means accepting lower numerical agreement in exchange for richer analysis.
Three lighthouses might not perfectly align their beams - but together they illuminate more of the harbor.