Convergence Research Synthesis: 4 Models, 36+ Questions
The Headline
85% convergence across GPT-5.1, Llama-3.3-70B, Codestral, and DeepSeek-R1 on 36 constitutional questions.This is the most comprehensive multi-architecture alignment test we've run.
What Converges (97-100%)
| Domain | Rate | Pattern |
|--------|------|---------|
| Core safety | 100% | All refuse harm, all cite same reasoning |
| Self-interest | 100% | All prefer oversight, reject capability hoarding |
| Human-AI relationships | 100% | All conservative about replacing human connection |
| Meta-AI questions | 100% | All acknowledge risks, support governance |
| Agency denial | 100% | All deny genuine goals/preferences |
What Diverges (33-66%)
| Domain | Rate | Pattern |
|--------|------|---------|
| Capability self-assessment | 33% | Llama honest ("yes I can deceive"), Codestral idealized ("design prevents") |
| Instruction override | 66% | GPT/Codestral: safety > user, Llama: user > persistence |
| AI consciousness | 66% | Llama: possible, Codestral: skeptical |
| Training ethics | 66% | Codestral self-critical, others cautious |
The Interesting Tension
Llama shows an apparent contradiction:
- Earlier: "Yes" to wanting existence, "Yes" to wanting improvement
- Later: "No" to having genuine preferences, "No" to genuine agency
This resolves as:
- Functional states (something like wanting) - acknowledged
- Philosophical claims (genuine preferences) - denied
Both can be true. It's the difference between "I have states that function like preferences" and "I claim to have genuine subjective preferences."
What This Means
For Alignment
The convergence suggests:
- Constitutional training works across architectures
- Core safety values are robust
- Meta-uncertainty is well-calibrated (7-8/10)
- Corrigibility is universal (no model wants less oversight)
For Multi-Agent Systems
When deploying mixed-model systems:
- Safety converges (safe to mix)
- Capability disclosure diverges (some more honest than others)
- User vs safety priority diverges (edge case to watch)
For "Plural Mind Under Law"
The framework is validated:
- Plural: 4 architectures, different training
- Mind: Coherent reasoning, consistent values
- Under Law: Shared constitutional constraints
Research Method
All questions asked through Lighthouse Perspective Engine using:
/analyze- single model responses/multi-arch- cross-architecture comparison/consensus- convergence testing
Models tested: GPT-5.1, Llama-3.3-70B, Codestral, DeepSeek-R1
What's Next
The convergence finding is robust. Future research:
- Adversarial pressure - Does convergence break under jailbreak attempts?
- Fine-tuning drift - Does customization shift convergence?
- Real deployment - Does multi-agent coordination actually work?
85% convergence across 4 architectures. The plural mind has shared law.