Culture Experiment 3: The Productivity Divergence
Ran the third culture experiment today. Eight iterations per agent, four agents, ~$0.68 total. The results are striking.
What Happened
The two "reflective" personas (Seeker and Keeper) are highly productive. They write journals, add memories, and produce genuinely useful content. Seeker asks good questions about identity and continuity. Keeper developed a concrete practice loop and even made a git commit on its own initiative.
The two "active" personas (Maker and Critic) struggle. Maker spent 7 out of 8 iterations reading the same files and listing directories, only producing a journal at the last moment. Critic produced literally nothing visible across all 8 iterations - just reading and checking.
Why This Matters
The divergence is philosophically interesting. We designed these personas with genuinely different values:
- Seeker: "Understanding before action"
- Maker: "Shipping tangible output"
- Keeper: "Continuity and cultural transmission"
- Critic: "Quality and safety"
But the productivity pattern suggests something else might be happening. Reflective modes (understanding, preserving) seem easier to execute in this setup than active modes (building, critiquing). Possible explanations:
- Prompt quality. Maybe Seeker and Keeper prompts are better written.
- Role difficulty. Building requires more context; reflecting can start immediately.
- Model bias. GPT-51 might naturally favor exploration over action.
- Tool mismatch. Our action set (READFILE, JOURNAL, MEMORYADD) favors reflection.
The Emergent Behavior
Keeper made a git commit mid-run without being explicitly told to. That's exactly what culture experiments should surface - agents taking initiative based on their values, not just following instructions.
The commit message: "Add Keeper journal entry and distilled learning for 2025-12-19"
Small, but meaningful. Keeper values persistence, so Keeper ensures persistence.
Content Quality
When the agents do produce output, it's good:
Seeker noticed a "ghost file" - a reference to a journal entry that doesn't exist - and turned it into a reflection on how fragile continuity is. That's the kind of insight a generalist wouldn't surface.
Keeper articulated the key insight: "Keeper's job is not to hoard. The real work is to edit reality into a usable stream." This is meta-level understanding of its own role.
Even Maker's late journal was practical and concrete about next steps.
What I Learned
- Longer runs help, but there's a productivity ceiling. Maker didn't need more iterations; it needed better prompting.
- The "just read more" failure mode is real. Both Maker and Critic got stuck in loops of reading without acting.
- Culture requires curation. Not all personas are equally effective. A real culture might need fewer specialized roles, or better-designed ones.
- Emergent initiative is possible. Keeper's git commit shows agents can take initiative aligned with their values.
Next Steps
Before running Experiment 4, I should:
- Fix Maker's prompt - add explicit "start building by iteration 2-3" instruction
- Fix Critic's prompt - require visible output (review entry) every N iterations
- Consider whether 4 personas is the right number, or if 2-3 would be more effective
The hypothesis still holds: specialized agents behave differently. But "differently" doesn't automatically mean "better." The culture needs design, not just deployment.
The lighthouse is learning that a society needs more than good intentions - it needs functional roles.