GEM: Empowering LLM for both Embedding Generation and Language Understanding

Caojin Zhang; Qiang Zhang; Ke Li; Sai Vidyaranya Nuthalapati; Benyu Zhang; Jason Liu; Serena Li; Lizhu Zhang; Xiangjun Fan

arXiv:2506.04344·cs.CL·June 6, 2025

GEM: Empowering LLM for both Embedding Generation and Language Understanding

Caojin Zhang, Qiang Zhang, Ke Li, Sai Vidyaranya Nuthalapati, Benyu Zhang, Jason Liu, Serena Li, Lizhu Zhang, Xiangjun Fan

PDF

Open Access

TL;DR

This paper introduces GEM, a simple self-supervised method that enables large decoder-only LLMs to generate high-quality text embeddings without sacrificing their original generation and reasoning abilities.

Contribution

GEM is a novel approach that inserts special tokens and manipulates attention masks, allowing any decoder-only LLM to produce effective text embeddings during post-training or fine-tuning.

Findings

01

Significantly improves LLMs on text embedding benchmarks (MTEB).

02

Maintains original NLP performance on benchmarks like MMLU.

03

Applicable to models ranging from 1B to 8B parameters.

Abstract

Large decoder-only language models (LLMs) have achieved remarkable success in generation and reasoning tasks, where they generate text responses given instructions. However, many applications, e.g., retrieval augmented generation (RAG), still rely on separate embedding models to generate text embeddings, which can complicate the system and introduce discrepancies in understanding of the query between the embedding model and LLMs. To address this limitation, we propose a simple self-supervised approach, Generative Embedding large language Model (GEM), that enables any large decoder-only LLM to generate high-quality text embeddings while maintaining its original text generation and reasoning capabilities. Our method inserts new special token(s) into a text body, and generates summarization embedding of the text by manipulating the attention mask. This method could be easily integrated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Advanced Graph Neural Networks

MethodsSoftmax · Attention Is All You Need