Even Heads Fix Odd Errors: Mechanistic Discovery and Surgical Repair in Transformer Attention

Gustavo Sandoval

arXiv:2508.19414·cs.LG·August 28, 2025

Even Heads Fix Odd Errors: Mechanistic Discovery and Surgical Repair in Transformer Attention

Gustavo Sandoval

PDF

TL;DR

This paper uncovers a specialized attention head mechanism in transformers that causes format-dependent reasoning errors and demonstrates how targeted interventions can fix these errors efficiently by repairing a small subset of attention heads.

Contribution

It reveals a novel even/odd head specialization in transformer attention and introduces a surgical repair method that fixes reasoning errors with minimal head modifications.

Findings

01

Transformers implement format-dependent head specialization.

02

A small subset of attention heads can be used to fix reasoning errors.

03

Perfect repair achieved with only 25% of attention heads.

Abstract

We present a mechanistic case study of a format-dependent reasoning failure in Llama-3.1-8B-Instruct, where the model incorrectly judges "9.11" as larger than "9.8" in chat or Q&A formats, but answers correctly in simple format. Through systematic intervention, we discover transformers implement even/odd attention head specialization: even indexed heads handle numerical comparison, while odd heads serve incompatible functions. The bug requires exactly 8 even heads at Layer 10 for perfect repair. Any combination of 8+ even heads succeeds, while 7 or fewer completely fails, revealing sharp computational thresholds with perfect redundancy among the 16 even heads. SAE analysis reveals the mechanism: format representations separate (10% feature overlap at Layer 7), then re-entangle with different weightings (80% feature overlap at Layer 10), with specific features showing 1.5x amplification…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.