AnglE-optimized Text Embeddings

Xianming Li; Jing Li

arXiv:2309.12871·cs.CL·January 3, 2025·21 cites

AnglE-optimized Text Embeddings

Xianming Li, Jing Li

PDF

Open Access 2 Repos 10 Models 1 Datasets 4 Reviews

TL;DR

This paper introduces AnglE, a novel angle-optimized text embedding model that addresses vanishing gradient issues caused by cosine saturation, improving semantic textual similarity tasks across various datasets and domains.

Contribution

AnglE is the first model to incorporate angle optimization in complex space to mitigate cosine saturation effects in text embeddings.

Findings

01

AnglE outperforms state-of-the-art STS models on multiple datasets.

02

Angle optimization effectively mitigates cosine saturation issues.

03

The model is effective in domain-specific and limited data scenarios.

Abstract

High-quality text embedding is pivotal in improving semantic textual similarity (STS) tasks, which are crucial components in Large Language Model (LLM) applications. However, a common challenge existing text embedding models face is the problem of vanishing gradients, primarily due to their reliance on the cosine function in the optimization objective, which has saturation zones. To address this issue, this paper proposes a novel angle-optimized text embedding model called AnglE. The core idea of AnglE is to introduce angle optimization in a complex space. This novel approach effectively mitigates the adverse effects of the saturation zone in the cosine function, which can impede gradient and hinder optimization processes. To set up a comprehensive STS evaluation, we experimented on existing short-text STS datasets and a newly collected long-text STS dataset from GitHub Issues.…

Peer Reviews

Decision·ICLR 2024 Conference Withdrawn Submission

Reviewer 01Rating 6· marginally above the acceptance thresholdConfidence 3

Strengths

* The paper addresses an important issue in optimizing the cosine similarity of learning text embeddings, and the proposed method is interesting and novel. * It introduces the GitHub Issues Similarity Dataset as a testbed for evaluating model performance on long-text STS tasks. * The proposed method achieves promising results on a wide range of STS tasks.

Weaknesses

* Some technical details are not clearly explained. For example, while the angle objective optimizes the text representations in a complex space, it's unclear how these complex vectors are obtained as the representations from language models are real vectors. * The paper seems to have missed discussions with a few important related studies. For example, [1] addresses the gradient vanishing issue by incorporating cosine distance in learning text embeddings, [2] designs angular softmax objectives

Reviewer 02Rating 5· marginally below the acceptance thresholdConfidence 3

Strengths

1. The proposed method of calculating similarity looks novel to me. 2. The impact of the method has the potential to be significant in many fields.

Weaknesses

1. According to the paper, the motivation for introducing a complex space is to deal with the vanishing gradient of cos. In this sense, it would be great if techniques like gradient clipping and gradient normalization could be compared. 2. The writing can be improved. E.g., section 3.4 is a bit confusing to me. See my questions below. 3. I am also worried about the empirical significance. In table 2, the proposed method only improves the performance marginally (<1%) compared to SimCSE-BERT. I

Reviewer 03Rating 5· marginally below the acceptance thresholdConfidence 4

Strengths

1. This paper proposed a novel angle-optimized target to enhance the learning ability of contrastive learning-based representation learning models, which tried to alleviate the problem of vanishing gradients. 2. This paper developed a novel long-text STS dataset to better evaluate the performance of representation learning models. 3. This paper also explored LLM-based supervised data generation and contrastive learning, which is very interesting.

Weaknesses

1. First of all, the authors argued that gradient vanishing problem is caused by the saturation zones in cosine functions in the optimization target. However, as far as I know, the gradient vanishing problem is mainly due to the deep structure. The saturation zones can be used to prove the high similarity between sentences. Therefore, the motivation of this paper is not so convincing. More explanations are needed. 2. Second, the authors focused on contrastive learning target, which limits the a

Reviewer 04Rating 5· marginally below the acceptance thresholdConfidence 4

Strengths

1. This paper identifies an interesting research question, the gradiant vanishing problem appearing at the saturation zone of cosine function influences the quality of text embeddings. 2. The proposed solution of using angle difference for optimization is orginal and novel. 3. Experiments on semantic textual similarity task are sufficiently conducted.

Weaknesses

Despite an appealing motivation and an interesting solution, I still have the following concerns: 1. From my point of view, the only technical contribution of this paper is to design how to evaluate angle difference. This contribution is indeed interesting, but is a bit superficial and insufficient for a long research paper of ICLR standard. I expect authors to propose more __insightful__ designs to better solve the gradient vanishing problem. 2. The explanation of why saturation zone in cosin

Code & Models

Repositories

Models

Datasets

WhereIsAI/github-issue-similarity
dataset· 189 dl
189 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies