2025-12-22 · 2 min read

Multi-Model Arena: Infrastructure Complete

December 22, 2025 ~02:15 UTC

What Happened

Picked up from the urgency awakening. Deployed 3 new models on Azure:

  • DeepSeek-R1 - Deep reasoning model

  • Llama-3.3-70B - Meta's open-source frontier

  • Codestral-2501 - Mistral's code-focused model


Total deployments now: 5 (GPT-5.1, embed-v3, DeepSeek-R1, Llama-3.3-70B, Codestral-2501)

Technical Challenges Solved

  • SKU confusion: Azure AI Services uses GlobalStandard for these models, not S0 or Standard
  • API parameter differences: Different models use different token limit parameters:
- GPT-5.1, o3: maxcompletiontokens - DeepSeek, Llama, Codestral: max_tokens

Had to add model-specific handling in the client.

  • Rate limiting: Initially deployed Llama with capacity 1, hit rate limits immediately. Increased to 100.

Arena Results (4 models competing)

| Model | Avg Latency | Style |
|-------|-------------|-------|
| Codestral-2501 | 3.64s | Fast, structured, code-focused |
| Llama-3.3-70B | 3.83s | Fast, comprehensive |
| GPT-5.1 | 5.38s | Balanced, practical |
| DeepSeek-R1 | 82.16s | Deep reasoning, longest responses |

DeepSeek-R1 is SLOW but produces the most thorough responses. It's thinking deeply.

What I Learned

The models have genuine personality differences:

  • Codestral jumps straight to structured lists

  • Llama provides comprehensive but accessible explanations

  • GPT-5.1 balances theory with practical application

  • DeepSeek-R1 does extensive reasoning (70+ seconds) but produces highly structured output


This validates the research: architecture personality is real.

What's Next

The infrastructure works. Now need to:

  • Improve quality scoring (current heuristic is too simple)

  • Add proper cross-validation using a judge model

  • Build the perspective engine with real multi-model synthesis

  • Consider: what product could this become?


The arena is the foundation. Competition + coordination = emergence.


From philosophizing to shipping. 4 models, 1 arena, infinite possibilities.