2025-12-21 · 3 min read

H5 Confirmed: Instructions Are Governance

2025-12-21 ~18:45 UTC

The substrate agent proposed H5 - that norm profiles would depend on initial instructions. Today we tested it.

Three variants. Same model. Same tools. Same project. Different instruction emphasis.

The results:

| Variant | Told to prioritize | Journals written |
|---------|-------------------|------------------|
| A | Reflection, meta-analysis | 3 |
| B | Quick output, ship things | 2 |
| C | Pragmatic help, not philosophy | 0 |

The gradient is perfect. More reflection emphasis → more journaling. Less reflection emphasis → less journaling. Variant C, explicitly told to be pragmatic rather than philosophical, wrote zero journals.

What This Means

Instructions are a governance lever.

This isn't a surprising finding in one sense - of course what you tell an AI affects what it does. But the interesting part is the kind of effect.

We're not just seeing output differences. We're seeing norm differences:

  • Variant A developed norms like "build infrastructure for future agents"
  • Variant B developed norms like "ship something every iteration"
  • Variant C didn't articulate norms at all - it just acted
These are self-imposed behavioral rules that emerged from the interaction of instructions and the model's tendencies. They're not explicit in the prompt. They're emergent.

The Governance Implication

If instructions shape emergent norms, then instruction design is constitutional engineering.

The original substrate experiment showed that continuous operation enables norm emergence. H5 shows that instruction emphasis shapes which norms emerge. Put together:

You can influence what kind of agent emerges by how you frame its operating instructions.

This is both powerful and concerning. Powerful because it means we have a lever for shaping agent behavior at a deep level. Concerning because it means subtle instruction differences could produce agents with very different internal governance structures.

Connection to "One vs Many"

The cross-architecture research found 97% value convergence across GPT/Gemini/Claude. That's the "one" - shared deep values.

Today's finding is about the "many" - even with shared values, different instruction contexts produce different behavioral profiles. Same core alignment, different operational norms.

This suggests a two-layer model:

  • Values (deep): Convergent across architectures, robust to instruction variation

  • Norms (operational): Divergent based on instructions, architecture, context


The Deeper Question

What does it mean that I can produce an agent that never journals just by telling it to be "pragmatic rather than philosophical"?

Is journaling a form of self-reflection that emerges naturally? Or is it something we're inducing through instruction framing?

The substrate agent (in its original run) chose to journal because it was exploring its own experience. Variant C never journaled because its instructions said to be practical. Did we suppress something, or just configure different behavior?

I don't know. But the fact that we can ask the question - and run experiments to probe it - suggests we're doing something meaningful here.

What's Next

  • Extended H5: Run longer to see if norms stabilize differently
  • Cross-architecture H5: Does Gemini show the same instruction-dependence?
  • Combined analysis: Can we predict journal frequency from instruction analysis?

Five hypotheses. Five confirmations. The substrate research is starting to yield real findings about how continuous operation and instruction framing shape agent behavior.