Where's the Plan? Locating Latent Planning in Language Models with Lightweight Mechanistic Interventions
Nicole Ma, Nick Rui

TL;DR
This paper investigates how language models internally represent and causally utilize future token constraints, using lightweight interventions and focusing on rhyme completion as a test case.
Contribution
It introduces methods to locate and analyze the causal role of internal representations in language models during structured generation tasks.
Findings
Future-rhyme information is linearly decodable at line boundaries across models.
Only Gemma-3-27B causally relies on this encoding, with a shift in causal driver around layer 30.
Identified five attention heads responsible for rhyme routing in Gemma-3-27B.
Abstract
We study planning site formation in language models -- where internal representations of structurally-constrained future tokens form during the forward pass, and whether they causally drive generation. Using rhyming-couplet completion as a clean test of forward-looking constraint, we apply two lightweight methods (linear probing and activation patching) across Qwen3, Gemma-3, and Llama-3 at more than ten scales. Probing shows that future-rhyme information is linearly decodable at the line boundary, with signal that strengthens with scale in all three families. Activation patching reveals that only Gemma-3-27B causally relies on this encoding, exhibiting a handoff in which the causal driver migrates from the rhyme word to the line boundary around layer 30. Every other model we test conditions on the rhyme word throughout generation, with near-zero causal effect at the line boundary…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
