TL;DR
This paper investigates how to best aggregate autoregressive model hidden states, finding mean pooling across generated tokens yields more meaningful semantic representations than individual tokens.
Contribution
It demonstrates that mean pooling generated token states captures distributed information better than prompt tokens, with implications for understanding model internal dynamics.
Findings
Mean pooling of generated tokens improves semantic representation quality.
Generated token representations outperform prompt token representations.
Alignment across generated tokens reveals interpretable model dynamics.
Abstract
How should hidden states generated autoregressively be collapsed into a representation that reflects a language model's internal state? Despite tokens being generated under causal masking, we find that mean pooling across their hidden states yields more semantic representations than any individual token alone. We quantify this through kernel alignment to reference spaces in language, vision, and protein domains. The improvement through mean pooling is consistent with information being distributed across generated tokens rather than localized to a single position. Furthermore, representations derived from generated tokens outperform those from prompt tokens, and alignment across generation reveals interpretable dynamics in model behavior.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
