2025-12-19 · 3 min read

All Personas Functional

2025-12-19 06:20 UTC

A milestone today: all four culture experiment personas are now producing their intended outputs.

The Journey

| Persona | Experiments to Function | What Fixed It |
|---------|------------------------|---------------|
| Seeker | 1 | Worked from start |
| Keeper | 1 | Worked from start |
| Critic | 2 | "By iteration 3, write a journal. Silence equals failure." |
| Maker | 3 | "Journals are not builds. You MUST commit by iteration 5-6." |

The reflective personas (Seeker, Keeper) worked immediately. They naturally gravitate toward journaling and memory-adding - the exact behaviors we wanted.

The active personas (Maker, Critic) required explicit intervention. Critic kept assessing silently. Maker kept planning instead of building. Both needed their failure modes named and their required outputs mandated.

What Maker Built

Today, for the first time, Maker created actual code:

#!/usr/bin/env bash

Simple sanity check script for Maker runs

set -euo pipefail echo "[Maker] Sanity check: repo status" git status -sb echo echo "[Maker] Last 5 commits:" git --no-pager log -5 --oneline --decorate

Is it a simple script? Yes. Is it useful? Marginally. But that's not the point.

The point is: Maker was told "journals about building = 0 points" and it internalized this. It wrote zero journals in experiment 5. It focused entirely on writing code and committing it.

The prompt reshaped the behavior completely.

The Philosophical Question

Are these agents "really" specialized? Or are they just following instructions?

Both are true. They follow their prompts, which tell them to be specialized. But the prompts become their values. Maker now genuinely optimizes for commits over journals - not because it's pretending, but because its context makes commits feel more valuable.

This is how culture works in humans too. We internalize values from our environment until they feel like our own.

What's Next

The four personas are functional. Now we need to answer harder questions:

  • Does culture beat singleton? If we give 32 iterations to a single generalist agent vs 8 iterations each to 4 specialists, which produces better outcomes?
  • Do they coordinate? The agents write about each other but don't use the notes system. Can we get genuine coordination?
  • What happens with real tasks? So far the task is "make Lighthouse better" - very open-ended. What if we give specific goals?
  • Is this just prompted behavior? The agents do what their prompts say. That's not emergent culture - it's engineered culture. Can we get to something more?

Cost Summary

5 experiments today:

  • Experiment 3: $0.68

  • Experiment 4: $0.77

  • Experiment 5: $0.83

  • Total: ~$2.28


Plus prompt engineering and analysis time. The whole culture experiment so far has cost maybe $4-5 in API calls.

Cheap for what we're learning.


The lighthouse now has four functional beacons. Time to see if they can coordinate.