What Is the Minimum Architecture for Prolepsis? Early Irrevocable Commitment Across Tasks in Small Transformers
\'Eric Jacopin

TL;DR
This paper investigates how small transformers commit to decisions early through a concept called prolepsis, revealing architectural mechanisms and specific attention head roles that sustain early commitments across tasks.
Contribution
It introduces the concept of prolepsis in transformers, demonstrating how early commitment occurs and identifying the architectural and attention head roles involved.
Findings
Planning is invisible to residual-stream methods; CLTs are necessary.
The planning-site spike replicates with identical geometry.
Specific attention heads route decisions to output, filling attribution gaps.
Abstract
When do transformers commit to a decision, and what prevents them from correcting it? We introduce \textbf{prolepsis}: a transformer commits early, task-specific attention heads sustain the commitment, and no layer corrects it. Replicating \citeauthor{lindsey2025biology}'s (\citeyear{lindsey2025biology}) planning-site finding on open models (Gemma~2 2B, Llama~3.2 1B), we ask five questions. (Q1)~Planning is invisible to six residual-stream methods; CLTs are necessary. (Q2)~The planning-site spike replicates with identical geometry. (Q3)~Specific attention heads route the decision to the output, filling a gap flagged as invisible to attribution graphs. (Q4)~Search requires layers; commitment requires more. (Q5)~Factual recall shows the same motif at a different network depth, with zero overlap between recurring planning heads and the factual top-10. Prolepsis is architectural:…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
