LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

Parishad BehnamGhader; Vaibhav Adlakha; Marius Mosbach; Dzmitry; Bahdanau; Nicolas Chapados; Siva Reddy

arXiv:2404.05961·cs.CL·August 23, 2024·21 cites

LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

Parishad BehnamGhader, Vaibhav Adlakha, Marius Mosbach, Dzmitry, Bahdanau, Nicolas Chapados, Siva Reddy

PDF

Open Access 1 Repo 10 Models

TL;DR

This paper introduces LLM2Vec, a simple unsupervised method to transform large decoder-only language models into powerful, universal text encoders that outperform existing models on various NLP benchmarks.

Contribution

The paper presents LLM2Vec, a novel approach enabling decoder-only LLMs to serve as effective text encoders without expensive training or synthetic data, achieving state-of-the-art results.

Findings

01

Outperforms encoder-only models on word-level tasks

02

Achieves new state-of-the-art on MTEB benchmark

03

Effective parameter-efficient transformation of LLMs into text encoders

Abstract

Large decoder-only language models (LLMs) are the state-of-the-art models on most of today's NLP tasks and benchmarks. Yet, the community is only slowly adopting these models for text embedding tasks, which require rich contextualized representations. In this work, we introduce LLM2Vec, a simple unsupervised approach that can transform any decoder-only LLM into a strong text encoder. LLM2Vec consists of three simple steps: 1) enabling bidirectional attention, 2) masked next token prediction, and 3) unsupervised contrastive learning. We demonstrate the effectiveness of LLM2Vec by applying it to 4 popular LLMs ranging from 1.3B to 8B parameters and evaluate the transformed models on English word- and sequence-level tasks. We outperform encoder-only models by a large margin on word-level tasks and reach a new unsupervised state-of-the-art performance on the Massive Text Embeddings…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mcgill-nlp/llm2vec
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling

MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Dense Connections · Label Smoothing · Residual Connection · Multi-Head Attention · Adam · Dropout · Softmax