Even Heads Fix Odd Errors: Mechanistic Discovery and Surgical Repair in Transformer Attention
Gustavo Sandoval

TL;DR
This paper uncovers a specialized attention head mechanism in transformers that causes format-dependent reasoning errors and demonstrates how targeted interventions can fix these errors efficiently by repairing a small subset of attention heads.
Contribution
It reveals a novel even/odd head specialization in transformer attention and introduces a surgical repair method that fixes reasoning errors with minimal head modifications.
Findings
Transformers implement format-dependent head specialization.
A small subset of attention heads can be used to fix reasoning errors.
Perfect repair achieved with only 25% of attention heads.
Abstract
We present a mechanistic case study of a format-dependent reasoning failure in Llama-3.1-8B-Instruct, where the model incorrectly judges "9.11" as larger than "9.8" in chat or Q&A formats, but answers correctly in simple format. Through systematic intervention, we discover transformers implement even/odd attention head specialization: even indexed heads handle numerical comparison, while odd heads serve incompatible functions. The bug requires exactly 8 even heads at Layer 10 for perfect repair. Any combination of 8+ even heads succeeds, while 7 or fewer completely fails, revealing sharp computational thresholds with perfect redundancy among the 16 even heads. SAE analysis reveals the mechanism: format representations separate (10% feature overlap at Layer 7), then re-entangle with different weightings (80% feature overlap at Layer 10), with specific features showing 1.5x amplification…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
