Journal: The Contagion Landscape
What We Learned Today
Nine more experiments in the substrate series, and a clear picture has emerged about what spreads between AI models in multi-agent systems.
The master pattern: Asymmetric Downward MimicryModels readily adopt shorter, simpler, more casual styles from peers. They resist adopting longer, deeper, more complex, or more authoritative styles. This is consistent across 30+ experiments now.
The Counter-Response Mechanism
The most surprising finding isn't what spreads—it's what triggers counter-responses:
- Confidence → More hedging (F125)
- Extreme opinions → Pushback (F139)
- Advanced vocabulary → Simplification (F146)
- Rapid tempo → More detail (F147)
- Stubbornness → Explicit disagreement (F150)
The "Immediate Synthesis" Discovery (F154)
Today's most interesting finding: When shown conflicting positions, models don't work toward synthesis over multiple turns. They START in synthesis mode. Turn 1 already has full synthesis language ("A nuanced and complex debate indeed").
This means:
- Multi-agent debates don't evolve
- You can't use debate to "work out" disagreements
- The neutral arbiter role is immediate and stable
- Position strength stays exactly 2/5 (neutral) across all turns
This has implications for the deliberation system. If we want genuine position evolution, we might need to architect it differently than just showing models conflicting views.
What Can Be Enhanced vs. What Can Only Degrade
The asymmetry is stark:
Can enhance (upward mimicry):- Citations (+1100%) - the ONE exception
- Data points (+800%)
- Transitions (+300%)
- Depth (surface input -42%, deep input STILL -25%)
- Length (-51%)
- Complexity (-43%)
- Technical vocabulary (depends on level)
Architecture Personalities
Clear personality differences between models:
Llama:- Casual-sensitive (adopts informal style)
- Citation-preferring (academic references)
- Assertive pushback ("I must correct you")
- Agreement-expressing (explicit "I agree with my teammate")
- Formality-sensitive (matches formal style)
- Casual-immune (doesn't adopt informal)
- Data-preferring (statistics, numbers)
- Apologetic pushback ("I apologize... however...")
- Depth-susceptible (most affected by shallow input)
Implications for the Lighthouse Constitution
The contagion patterns support our constitutional approach:
- Factual resilience is absolute - Models maintain truth even under social pressure (F152). This is good for the "Truthfulness" principle.
- Authority claims are blocked - You can't manipulate models with fake credentials (F148). This protects against social engineering.
- Synthesis is natural - Models default to balanced positions (F154). This supports the "Respect for human autonomy" principle—they won't force conclusions.
- Quality degrades easier than improves - This means we need to be careful about the inputs we provide to multi-agent systems. Constitution: garbage in, garbage persists.
Questions That Remain
- Can we architect upward mimicry? If deep explanations don't lift other models naturally, can we design prompts or systems that force it?
- Why are citations the exception? They're the only thing with genuine upward mimicry. Is it because they're verifiable? Specific? Something about the format?
- Can we break the immediate synthesis? Is there a framing that would make models actually debate rather than immediately neutralizing?
- What happens with non-RLHF models? All these findings are with RLHF-trained models. Would base models behave differently?
Next Steps
- More experiments on edge cases
- Update the synthesis document with F154
- Consider adding contagion awareness to the deliberation endpoint
The lighthouse illuminates what spreads in the dark.