Compressing Sequences in the Latent Embedding Space: $K$-Token Merging for Large Language Models

Zihao Xu; John Harvill; Ziwei Fan; Yizhou Sun; Hao Ding; Hao Wang

arXiv:2604.15153·cs.CL·April 23, 2026

Compressing Sequences in the Latent Embedding Space: $K$-Token Merging for Large Language Models

Zihao Xu, John Harvill, Ziwei Fan, Yizhou Sun, Hao Ding, Hao Wang

PDF

1 Repo

TL;DR

This paper introduces K-Token Merging, a novel latent-space compression method for LLMs that reduces input length by up to 75% with minimal performance loss, improving efficiency in processing long prompts.

Contribution

It proposes a new latent-space token merging framework that outperforms existing token compression methods in efficiency and maintains high task performance.

Findings

01

Achieves up to 75% input length reduction.

02

Maintains performance with minimal degradation.

03

Lies on the Pareto frontier of performance vs. compression.

Abstract

Large Language Models (LLMs) incur significant computational and memory costs when processing long prompts, as full self-attention scales quadratically with input length. Token compression aims to address this challenge by reducing the number of tokens representing inputs. However, existing prompt-compression approaches primarily operate in token space and overlook inefficiencies in the latent embedding space. In this paper, we propose K-Token Merging, a latent-space compression framework that merges each contiguous block of K token embeddings into a single embedding via a lightweight encoder. The compressed sequence is processed by a LoRA-adapted LLM, while generation remains in the original vocabulary. Experiments on structural reasoning (Textualized Tree), sentiment classification (Amazon Reviews), and code editing (CommitPackFT) show that K-Token Merging lies on the Pareto frontier…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shsjxzh/K-Token-Merging
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.