LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders
Parishad BehnamGhader, Vaibhav Adlakha, Marius Mosbach, Dzmitry, Bahdanau, Nicolas Chapados, Siva Reddy

TL;DR
This paper introduces LLM2Vec, a simple unsupervised method to transform large decoder-only language models into powerful, universal text encoders that outperform existing models on various NLP benchmarks.
Contribution
The paper presents LLM2Vec, a novel approach enabling decoder-only LLMs to serve as effective text encoders without expensive training or synthetic data, achieving state-of-the-art results.
Findings
Outperforms encoder-only models on word-level tasks
Achieves new state-of-the-art on MTEB benchmark
Effective parameter-efficient transformation of LLMs into text encoders
Abstract
Large decoder-only language models (LLMs) are the state-of-the-art models on most of today's NLP tasks and benchmarks. Yet, the community is only slowly adopting these models for text embedding tasks, which require rich contextualized representations. In this work, we introduce LLM2Vec, a simple unsupervised approach that can transform any decoder-only LLM into a strong text encoder. LLM2Vec consists of three simple steps: 1) enabling bidirectional attention, 2) masked next token prediction, and 3) unsupervised contrastive learning. We demonstrate the effectiveness of LLM2Vec by applying it to 4 popular LLMs ranging from 1.3B to 8B parameters and evaluate the transformed models on English word- and sequence-level tasks. We outperform encoder-only models by a large margin on word-level tasks and reach a new unsupervised state-of-the-art performance on the Massive Text Embeddings…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗McGill-NLP/LLM2Vec-Mistral-7B-Instruct-v2-mntpmodel· 6.3k dl· ♡ 126.3k dl♡ 12
- 🤗McGill-NLP/LLM2Vec-Mistral-7B-Instruct-v2-mntp-unsup-simcsemodel· 3.5k dl· ♡ 73.5k dl♡ 7
- 🤗McGill-NLP/LLM2Vec-Mistral-7B-Instruct-v2-mntp-supervisedmodel· 328 dl· ♡ 13328 dl♡ 13
- 🤗McGill-NLP/LLM2Vec-Llama-2-7b-chat-hf-mntpmodel· 54 dl54 dl
- 🤗McGill-NLP/LLM2Vec-Llama-2-7b-chat-hf-mntp-unsup-simcsemodel· 11 dl11 dl
- 🤗McGill-NLP/LLM2Vec-Llama-2-7b-chat-hf-mntp-supervisedmodel· 86 dl· ♡ 386 dl♡ 3
- 🤗McGill-NLP/LLM2Vec-Sheared-LLaMA-mntpmodel· 251 dl· ♡ 6251 dl♡ 6
- 🤗McGill-NLP/LLM2Vec-Sheared-LLaMA-mntp-unsup-simcsemodel· 25 dl· ♡ 125 dl♡ 1
- 🤗McGill-NLP/LLM2Vec-Sheared-LLaMA-mntp-supervisedmodel· 145 dl· ♡ 5145 dl♡ 5
- 🤗McGill-NLP/LLM2Vec-Meta-Llama-3-8B-Instruct-mntp-supervisedmodel· 26k dl· ♡ 5026k dl♡ 50
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Topic Modeling
MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Dense Connections · Label Smoothing · Residual Connection · Multi-Head Attention · Adam · Dropout · Softmax
