Exploring the Hidden Capacity of LLMs for One-Step Text Generation
Gleb Mezentsev, Ivan Oseledets

TL;DR
This paper reveals that large language models can generate hundreds of tokens from just two learned embeddings in a single forward pass, indicating a native multi-token generation capability that could bypass traditional autoregressive decoding.
Contribution
It demonstrates that frozen LLMs can produce long text segments from minimal learned inputs, uncovering a new multi-token generation ability without retraining the models.
Findings
Frozen LLMs can generate hundreds of tokens from two learned embeddings.
Embeddings encode information in connected, local regions in space.
Multi-token generation may be achievable via learned encoders without retraining.
Abstract
A recent study showed that large language models (LLMs) can reconstruct surprisingly long texts - up to thousands of tokens - via autoregressive generation from just one trained input embedding. In this work, we explore whether autoregressive decoding is essential for such reconstruction. We show that frozen LLMs can generate hundreds of accurate tokens in just one token-parallel forward pass, when provided with only two learned embeddings. This reveals a surprising and underexplored multi-token generation capability of autoregressive LLMs. We examine these embeddings and characterize the information they encode. We also empirically show that, although these representations are not unique for a given text, they form connected and local regions in embedding space - suggesting the potential to train a practical encoder. The existence of such representations hints that multi-token…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies
