Improving General Text Embedding Model: Tackling Task Conflict and Data Imbalance through Model Merging
Mingxin Li, Zhijie Nie, Yanzhao Zhang, Dingkun Long, Richong Zhang,, Pengjun Xie

TL;DR
This paper introduces a model merging technique called Self Positioning to improve general text embedding models by addressing task conflict and data imbalance, leading to better multi-task performance.
Contribution
It proposes a novel model merging method, Self Positioning, that optimally combines independently trained models to enhance general text embeddings.
Findings
Self Positioning improves MTEB multi-task performance by 0.7 points.
The method outperforms resampling techniques in efficiency and effectiveness.
Model merging reduces negative transfer caused by task conflict and data imbalance.
Abstract
Text embeddings are vital for tasks such as text retrieval and semantic textual similarity (STS). Recently, the advent of pretrained language models, along with unified benchmarks like the Massive Text Embedding Benchmark (MTEB), has facilitated the development of versatile general-purpose text embedding models. Advanced embedding models are typically developed using large-scale multi-task data and joint training across multiple tasks. However, our experimental analysis reveals two significant drawbacks of joint training: 1) Task Conflict: Gradients from different tasks interfere with each other, leading to negative transfer. 2) Data Imbalance: Disproportionate data distribution introduces biases that negatively impact performance across tasks. To overcome these challenges, we explore model merging-a technique that combines independently trained models to mitigate gradient conflicts and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Topic Modeling
