Causal2Vec: Improving Decoder-only LLMs as Embedding Models through a Contextual Token
Ailiang Lin, Zhuoyun Li, Yusong Wang, Kotaro Funakoshi, Manabu Okumura

TL;DR
Causal2Vec enhances decoder-only LLMs as embedding models by prepending a contextual token encoded by a lightweight model, improving performance without altering architecture or increasing computational costs.
Contribution
It introduces a method to generate contextualized embeddings in decoder-only LLMs using a lightweight pre-encoding step, avoiding architecture modifications.
Findings
Achieves state-of-the-art on MTEB benchmark with publicly available data.
Improves embeddings without increasing computational overhead.
Utilizes a lightweight BERT-style model for context encoding.
Abstract
Decoder-only large language models (LLMs) have been increasingly adopted to build embedding models for diverse tasks. To overcome the inherent limitations of causal attention in representation learning, many existing methods modify the attention mechanism to be bidirectional, potentially undermining LLMs' ability to extract semantic information acquired during pre-training. Meanwhile, leading unidirectional approaches often rely on extra input text to generate contextualized embeddings, inevitably increasing computational costs. In this work, we propose Causal2Vec, a general-purpose embedding model tailored to enhance the performance of decoder-only LLMs without altering their original architectures or introducing significant computational overhead. Specifically, we first employ a lightweight BERT-style model to pre-encode the input text into a single Contextual token, which is then…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
