Large Language Model as Token Compressor and Decompressor

Wenbing Li; Yiran Wang; Zikai Song; Jielei Zhang; Tianhao Zhao; Junkai Lin; Wei Yang

arXiv:2603.25340·cs.CL·May 14, 2026

Large Language Model as Token Compressor and Decompressor

Wenbing Li, Yiran Wang, Zikai Song, Jielei Zhang, Tianhao Zhao, Junkai Lin, Wei Yang

PDF

TL;DR

This paper introduces a method to adapt large language models into token compressors and decompressors, enabling efficient long-context processing by encoding texts into compact latent codes with minimal performance loss.

Contribution

It presents a self-expressive autoencoding framework using LoRA adapters to create content-adaptive, variable-length token compression for long texts.

Findings

01

Preserves reconstruction quality on long-context datasets

02

Reduces memory usage and latency during generation

03

Supports direct decoding and autoregressive generation in compressed space

Abstract

In this paper, we study whether an off-the-shelf LLM can be adapted into a discrete, variable-length token compressor and decompressor for long-context processing. To this end, we design a self-expressive autoencoding framework that fine-tunes a pretrained LLM with lightweight LoRA adapters to map long texts into compact sequences of learned latent codes, termed Z-tokens, and to decode them back into natural language or task outputs. The resulting representation is content-adaptive: less predictable or information-dense segments can receive more Z-tokens, while redundant regions can be represented more compactly through a budget-aware length regularizer. Our method is evaluated on long-context datasets such as Wikipedia, CNN/DailyMail, HotpotQA, and QuALITY, showing that it preserves reconstruction quality and downstream performance while reducing effective context length,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.