Memory Tokens: Large Language Models Can Generate Reversible Sentence Embeddings

Ignacio Sastre; Aiala Ros\'a

arXiv:2506.15001·cs.CL·January 9, 2026

Memory Tokens: Large Language Models Can Generate Reversible Sentence Embeddings

Ignacio Sastre, Aiala Ros\'a

PDF

Open Access 1 Video

TL;DR

This paper demonstrates that large language models can generate reversible sentence embeddings using a special memory token, enabling exact reconstruction of original text without changing model weights, across multiple languages and model sizes.

Contribution

Introduction of a memory token technique that allows LLMs to produce reversible sentence embeddings for the first time, without modifying model weights.

Findings

01

Llama 3.1 8B successfully reconstructs all tested sequences.

02

Reversible embeddings work across English and Spanish datasets.

03

Effective for sequences up to approximately 240 tokens.

Abstract

In this work, we observe an interesting phenomenon: it is possible to generate reversible sentence embeddings that allow an LLM to reconstruct the original text exactly, without modifying the model's weights. This is achieved by introducing a special memory token, whose embedding is optimized through training on a fixed sequence. When prompted with this embedding, the model reconstructs the fixed sequence exactly. We evaluate this phenomenon across English and Spanish datasets, sequences of up to approximately 240 tokens, and model scales ranging from 100M to 8B parameters. Notably, Llama 3.1 8B successfully reconstructs all tested sequences. Our findings highlight an interesting capability of LLMs and suggest potential applications in memory-based retrieval, compression, and controlled text generation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Memory Tokens: Large Language Models can generate reversible Sentence Embeddings· underline

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification