Improving General Text Embedding Model: Tackling Task Conflict and Data   Imbalance through Model Merging

Mingxin Li; Zhijie Nie; Yanzhao Zhang; Dingkun Long; Richong Zhang,; Pengjun Xie

arXiv:2410.15035·cs.CL·October 22, 2024

Improving General Text Embedding Model: Tackling Task Conflict and Data Imbalance through Model Merging

Mingxin Li, Zhijie Nie, Yanzhao Zhang, Dingkun Long, Richong Zhang,, Pengjun Xie

PDF

Open Access

TL;DR

This paper introduces a model merging technique called Self Positioning to improve general text embedding models by addressing task conflict and data imbalance, leading to better multi-task performance.

Contribution

It proposes a novel model merging method, Self Positioning, that optimally combines independently trained models to enhance general text embeddings.

Findings

01

Self Positioning improves MTEB multi-task performance by 0.7 points.

02

The method outperforms resampling techniques in efficiency and effectiveness.

03

Model merging reduces negative transfer caused by task conflict and data imbalance.

Abstract

Text embeddings are vital for tasks such as text retrieval and semantic textual similarity (STS). Recently, the advent of pretrained language models, along with unified benchmarks like the Massive Text Embedding Benchmark (MTEB), has facilitated the development of versatile general-purpose text embedding models. Advanced embedding models are typically developed using large-scale multi-task data and joint training across multiple tasks. However, our experimental analysis reveals two significant drawbacks of joint training: 1) Task Conflict: Gradients from different tasks interfere with each other, leading to negative transfer. 2) Data Imbalance: Disproportionate data distribution introduces biases that negatively impact performance across tasks. To overcome these challenges, we explore model merging-a technique that combines independently trained models to mitigate gradient conflicts and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning · Topic Modeling