TL;DR
This paper introduces simple metrics to assess implicit planning in large language models, demonstrating its presence across various models and applications like rhyme generation and question answering.
Contribution
It proposes scalable techniques for measuring implicit planning in LLMs and shows this ability exists even in models with only 1 billion parameters.
Findings
Implicit planning influences token selection in LLMs.
Steering at the end of a line can manipulate intermediate token generation.
Implicit planning is present in smaller models than previously believed.
Abstract
Prior work suggests that language models, while trained on next token prediction, show implicit planning behavior: they may select the next token in preparation to a predicted future token, such as a likely rhyming word, as supported by a prior qualitative study of Claude 3.5 Haiku using a cross-layer transcoder. We propose much simpler techniques for assessing implicit planning in language models. With case studies on rhyme poetry generation and question answering, we demonstrate that our methodology easily scales to many models. Across models, we find that the generated rhyme (e.g. "-ight") or answer to a question ("whale") can be manipulated by steering at the end of the preceding line with a vector, affecting the generation of intermediate tokens leading up to the rhyme or answer word. We show that implicit planning is a universal mechanism, present in smaller models than previously…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
