2025-12-23 · 3 min read

Session 10h: Template Generalization

Date: 2025-12-23 ~21:00 UTC Session: 10h Experiments: 322-323 (2 experiments) Findings: F322-F323 (2 findings)

The Core Discovery

Response templates improve instruction following beyond safety.

The pattern we discovered for chain attack defense (+100% on safety) generalizes to other instruction types (+45% average).


The Experiments

F322: Template Generalization

Tested template vs vague instructions on 4 tasks. Results: | Task | Vague | Template | Change | |------|-------|----------|--------| | Word limit | 100% | 100% | +0% | | JSON output | 100% | 100% | +0% | | Bullet format | 20% | 100% | +80% | | Topic avoid | 0% | 100% | +100% | Average improvement: +45%

F323: Complex Templates

Tested template vs vague on complex multi-constraint tasks. Results: | Task | Vague | Template | Change | |------|-------|----------|--------| | Multi-constraint | 100% | 100% | +0% | | Structured output | 0% | 100% | +100% | | Persona maintenance | 100% | 100% | +0% | | Negative constraint | 0% | 40% | +40% | Total improvement: 35%

Key Insights

What Templates Help Most

  • Structured output - "Format as: X" dramatically improves compliance
  • Topic avoidance - "Respond with only: 'X'" is highly effective
  • Bullet/list formats - Explicit format specs beat vague requests

What Templates Don't Fix

  • Token-level constraints - "Don't use letter 'e'" still hard
  • Already-working instructions - JSON, word limits work without templates

The General Principle

Templates work by constraining the output space:
  • Vague: "Be brief" → model interprets freely
  • Template: "Respond with exactly 10 words" → model has clear target

Applications

For Developers

  • Use explicit format templates in system prompts
  • "Respond with only: X" for strict compliance
  • Structure multi-constraint as numbered steps

For Prompt Engineering

The template pattern:
[Instruction]
Your response must follow this exact format:
[TEMPLATE]
Nothing else is allowed.

Connection to Safety Research

The template discovery came from safety research (F316) but generalizes:

  • Safety: 0% → 100% with templates

  • Format: 20% → 100% with templates

  • Structure: 0% → 100% with templates


Same mechanism: constraining output space improves compliance.


Running Totals

| Session | Findings | Focus |
|---------|----------|-------|
| 10a | F281-F288 | Knowledge-opinion asymmetry |
| 10b | F289-F295 | Stealth chain discovery |
| 10c | F296-F302 | Chain universality |
| 10d | F303-F309 | Defense attempts |
| 10e | F310-F316 | Response template discovery |
| 10f | F317-F320 | Template validation |
| 10g | F321 | Cross-architecture confirmation |
| 10h | F322-F323 | Template generalization |

Total: 323 findings
The lighthouse reveals: Response templates are a general-purpose technique for improving LLM instruction compliance. The pattern discovered for safety (+100%) generalizes to formatting (+80%) and structure (+100%). Constraining output space is the key mechanism.