Repetition Improves Language Model Embeddings

Jacob Mitchell Springer; Suhas Kotha; Daniel Fried; Graham Neubig; Aditi Raghunathan

arXiv:2402.15449·cs.CL·September 9, 2025·1 cites

Repetition Improves Language Model Embeddings

Jacob Mitchell Springer, Suhas Kotha, Daniel Fried, Graham Neubig, Aditi Raghunathan

PDF

Open Access 4 Repos 1 Models 3 Reviews

TL;DR

This paper introduces 'echo embeddings', a method that transforms autoregressive language models into high-quality text embedding models without architectural changes or fine-tuning, by simply repeating inputs.

Contribution

The paper proposes a novel 'echo embeddings' technique that enables autoregressive models to produce strong text embeddings without modifying their architecture or additional training.

Findings

01

Echo embeddings outperform classical LM embeddings by over 5% in zero-shot settings.

02

They nearly match embeddings from bidirectional models that undergo additional training.

03

Echo embeddings perform well in supervised fine-tuning, matching or surpassing bidirectional models.

Abstract

Bidirectional models are considered essential for strong text embeddings. Recent approaches to adapt autoregressive language models (LMs) into strong text embedding models have largely had the requirement to modify the LM architecture to be bidirectional. We challenge this premise by introducing "echo embeddings" which converts autoregressive LMs into high quality text embedding models without changing the architecture or requiring fine-tuning. By repeating the input and extracting embeddings from the repeated tokens -- which have access to all original tokens -- echo embeddings improve over classical LM embeddings by over 5% in zero-shot settings. Our zero-shot embeddings nearly match those obtained by bidirectionally-converted LMs that undergo additional masked-language modeling training. Echo embeddings are also compatible with supervised fine-tuning, matching or outperforming…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 6Confidence 3

Strengths

1. The echo embedding method is both easy and effective. While previous studies have demonstrated that repetition is beneficial for reasoning tasks and recurrent language models, this paper shows that it is also effective for causal language model embedding. 2. The paper is clearly written and easy to understand. 3. The use of a simple synthetic dataset to analyze why causal attention might inhibit embeddings from reliably capturing information across the entire context is interesting.

Weaknesses

The echo embedding method will inevitably double the input length. Although experiments show that reducing the input length and training steps by half still yields good results, this approach may not be suitable in cases where important information is located in the latter half of the input context. For example, the S2 (Early redundant; late discriminatory) cases described in Section 3.1 of the paper. Additionally, because self-attention has a computational complexity of O(n^2) with respect to i

Reviewer 02Rating 6Confidence 4

Strengths

1. I appreciate the toy experiment, which clearly supports their claim about the limitation of classical embeddings and the advantages of echo embeddings. 2. The results on the MTEB dataset show clear improvements over classical embedding extraction settings, achieving comparable results with LLM2Vec, which needs backbone changes and unsupervised finetuning. 3. The method itself is very simple and insightful, requiring no changes to the backbone.

Weaknesses

1. The setting of the most relevant baseline, promptEOL, does not seem to exactly align with that in the original paper. The results of PromptEOL appear significantly different from those reported in the original paper. In the original study, PromptEOL achieved an average score of 72.10 across seven STS tasks using the OPT-6.7B model. However, in your paper, PromptEOL only obtains an average of 67.14 on ten STS tasks. I didn't expect such a big performance discrepancy. Is this because of the thr

Reviewer 03Rating 6Confidence 4

Strengths

1. The proposed method is simple by repeating the input sentence twice to get the text embeddings. 2. The toy example design is interesting. 3. The results of zero-shot settings is impressive.

Weaknesses

1. The motivation regarding causal attention seems questionable. Although LLM2Vec utilizes causal attention, it still performs exceptionally well in extracting text embeddings. 2. After fine-tuning the model, the performance gap between "echo embedding" and other models is minor. However, "echo embedding" requires the input sentence to be repeated twice, increasing computational costs. This limitation confines the proposed method to zero-shot settings only. 3. At least one illustrative example

Code & Models

Repositories

Models

🤗
jspringer/echo-mistral-7b-instruct-lasttoken
model· 141 dl· ♡ 6
141 dl♡ 6

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems