On the Semantic and Syntactic Information Encoded in Proto-Tokens for One-Step Text Reconstruction
Ivan Bondarenko, Egor Palkin, Fedor Tikunov

TL;DR
This paper investigates the semantic and syntactic information encoded in proto-tokens used for one-step text reconstruction in large language models, revealing their properties and how to impose semantic structure without losing reconstruction accuracy.
Contribution
It provides a detailed analysis of proto-token content, stability, and attention patterns, and introduces regularization schemes to embed semantic structure into proto-tokens.
Findings
m-token captures more semantic information than e-token
Anchor-based constraints reduce reconstruction accuracy
Relational distillation transfers semantic relations without harming reconstruction
Abstract
Autoregressive large language models (LLMs) generate text token-by-token, requiring n forward passes to produce a sequence of length n. Recent work, Exploring the Latent Capacity of LLMs for One-Step Text Reconstruction (Mezentsev and Oseledets), shows that frozen LLMs can reconstruct hundreds of tokens from only two learned proto-tokens in a single forward pass, suggesting a path beyond the autoregressive paradigm. In this paper, we study what information these proto-tokens encode and how they behave under reconstruction and controlled constraints. We perform a series of experiments aimed at disentangling semantic and syntactic content in the two proto-tokens, analyzing stability properties of the e-token, and visualizing attention patterns to the e-token during reconstruction. Finally, we test two regularization schemes for "imposing" semantic structure on the e-token using teacher…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning and Algorithms
