Instructions Shape Production of Language, not Processing
Andreas Waldis, Leshem Choshen, Yufang Hou, Yotam Perlitz

TL;DR
This paper demonstrates that instructions primarily influence the production stage in language models, with internal information in output tokens correlating strongly with behavior, unlike in sample tokens.
Contribution
It reveals an asymmetry between processing and production stages in language models, emphasizing the importance of analyzing output tokens for understanding capabilities.
Findings
Instruction flow affects output tokens and behavior significantly.
Model scale and instruction tuning sharpen the production-stage effects.
The asymmetry between input processing and output production generalizes across models.
Abstract
Instructions trigger a production-centered mechanism in language models. Through a cognitively inspired lens that separates language processing and production, we reveal this mechanism as an asymmetry between the two stages by probing task-specific information layer-wise across five binary judgment tasks. Specifically, we measure how instruction tokens shape information both when sample tokens, the input under evaluation, are processed and when output tokens are produced. Across prompting variations, task-specific information in sample tokens remains largely stable and correlates only weakly with behavior, whereas the same information in output tokens varies substantially and correlates strongly with behavior. Attention-based interventions confirm this pattern causally: blocking instruction flow to all subsequent tokens reduces both behavior and information in output tokens, whereas…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
