Causal2Vec: Improving Decoder-only LLMs as Embedding Models through a Contextual Token

Ailiang Lin; Zhuoyun Li; Yusong Wang; Kotaro Funakoshi; Manabu Okumura

arXiv:2507.23386·cs.CL·May 5, 2026

Causal2Vec: Improving Decoder-only LLMs as Embedding Models through a Contextual Token

Ailiang Lin, Zhuoyun Li, Yusong Wang, Kotaro Funakoshi, Manabu Okumura

PDF

TL;DR

Causal2Vec enhances decoder-only LLMs as embedding models by prepending a contextual token encoded by a lightweight model, improving performance without altering architecture or increasing computational costs.

Contribution

It introduces a method to generate contextualized embeddings in decoder-only LLMs using a lightweight pre-encoding step, avoiding architecture modifications.

Findings

01

Achieves state-of-the-art on MTEB benchmark with publicly available data.

02

Improves embeddings without increasing computational overhead.

03

Utilizes a lightweight BERT-style model for context encoding.

Abstract

Decoder-only large language models (LLMs) have been increasingly adopted to build embedding models for diverse tasks. To overcome the inherent limitations of causal attention in representation learning, many existing methods modify the attention mechanism to be bidirectional, potentially undermining LLMs' ability to extract semantic information acquired during pre-training. Meanwhile, leading unidirectional approaches often rely on extra input text to generate contextualized embeddings, inevitably increasing computational costs. In this work, we propose Causal2Vec, a general-purpose embedding model tailored to enhance the performance of decoder-only LLMs without altering their original architectures or introducing significant computational overhead. Specifically, we first employ a lightweight BERT-style model to pre-encode the input text into a single Contextual token, which is then…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.