The Truth Lies Somewhere in the Middle (of the Generated Tokens)

Sophie L. Wang; Phillip Isola; Brian Cheung

arXiv:2605.09969·cs.LG·May 12, 2026

The Truth Lies Somewhere in the Middle (of the Generated Tokens)

Sophie L. Wang, Phillip Isola, Brian Cheung

PDF

1 Repo

TL;DR

This paper investigates how to best aggregate autoregressive model hidden states, finding mean pooling across generated tokens yields more meaningful semantic representations than individual tokens.

Contribution

It demonstrates that mean pooling generated token states captures distributed information better than prompt tokens, with implications for understanding model internal dynamics.

Findings

01

Mean pooling of generated tokens improves semantic representation quality.

02

Generated token representations outperform prompt token representations.

03

Alignment across generated tokens reveals interpretable model dynamics.

Abstract

How should hidden states generated autoregressively be collapsed into a representation that reflects a language model's internal state? Despite tokens being generated under causal masking, we find that mean pooling across their hidden states yields more semantic representations than any individual token alone. We quantify this through kernel alignment to reference spaces in language, vision, and protein domains. The improvement through mean pooling is consistent with information being distributed across generated tokens rather than localized to a single position. Furthermore, representations derived from generated tokens outperform those from prompt tokens, and alignment across generation reveals interpretable dynamics in model behavior.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sophicle/tokens
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.