KV-Embedding: Training-free Text Embedding via Internal KV Re-routing in Decoder-only LLMs

Yixuan Tang; Yi Yang

arXiv:2601.01046·cs.CL·January 6, 2026

KV-Embedding: Training-free Text Embedding via Internal KV Re-routing in Decoder-only LLMs

Yixuan Tang, Yi Yang

PDF

Open Access

TL;DR

KV-Embedding is a training-free method that re-routes internal key-value states in decoder-only LLMs to produce better text embeddings by accessing sequence-level context within a single forward pass.

Contribution

The paper introduces KV-Embedding, a novel approach that activates frozen LLMs' internal states for improved embeddings without additional training.

Findings

01

Outperforms existing training-free baselines by up to 10%

02

Maintains robust performance on sequences up to 4,096 tokens

03

Demonstrates effectiveness across multiple LLM backbones

Abstract

While LLMs are powerful embedding backbones, their application in training-free settings faces two structural challenges: causal attention restricts early tokens from accessing subsequent context, and the next-token prediction objective biases representations toward generation rather than semantic compression. To address these limitations, we propose KV-Embedding, a framework that activates the latent representation power of frozen LLMs. Our method leverages the observation that the key-value (KV) states of the final token at each layer encode a compressed view of the sequence. By re-routing these states as a prepended prefix, we enable all tokens to access sequence-level context within a single forward pass. To ensure model-agnostic applicability, we introduce an automated layer selection strategy based on intrinsic dimensionality. Evaluations on MTEB across Qwen, Mistral, and Llama…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Generative Adversarial Networks and Image Synthesis