CoDiEmb: A Collaborative yet Distinct Framework for Unified Representation Learning in Information Retrieval and Semantic Textual Similarity

Bowen Zhang; Zixin Song; Chunquan Chen; Qian-Wen Zhang; Di Yin; Xing Sun

arXiv:2508.11442·cs.CL·September 30, 2025

CoDiEmb: A Collaborative yet Distinct Framework for Unified Representation Learning in Information Retrieval and Semantic Textual Similarity

Bowen Zhang, Zixin Song, Chunquan Chen, Qian-Wen Zhang, Di Yin, Xing Sun

PDF

1 Models 3 Reviews

TL;DR

CoDiEmb is a novel framework that effectively learns unified text embeddings for both Information Retrieval and Semantic Textual Similarity tasks by decoupling task-specific signals and employing a dynamic, collaborative training strategy.

Contribution

It introduces a decoupled, task-specific training approach with a delta-guided model fusion and a single-stage pipeline for improved multi-task embedding learning.

Findings

01

Outperforms baseline models on 15 IR and STS benchmarks.

02

Mitigates negative transfer between IR and STS tasks.

03

Enhances the geometric properties of learned embeddings.

Abstract

Learning unified text embeddings that excel across diverse downstream tasks is a central goal in representation learning, yet negative transfer remains a persistent obstacle. This challenge is particularly pronounced when jointly training a single encoder for Information Retrieval (IR) and Semantic Textual Similarity (STS), two essential but fundamentally disparate tasks for which naive co-training typically yields steep performance trade-offs. We argue that resolving this conflict requires systematically decoupling task-specific learning signals throughout the training pipeline. To this end, we introduce CoDiEmb, a unified framework that reconciles the divergent requirements of IR and STS in a collaborative yet distinct manner. CoDiEmb integrates three key innovations for effective joint optimization: (1) Task-specialized objectives paired with a dynamic sampler that forms single-task…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 2Confidence 4

Strengths

- The paper tackles persistent "negative transfer" between IR and STS and argues for task‑aware training. The design aligns losses with evaluation targets (nDCG@k vs Spearman). Figure 1 and Sec. 2 make the approach concrete. - Consistent gains across three backbones on STS tasks only. CoDiEmb outperforms InfoNCE/CoSENT/mixed‑sampler baselines on the CMTEB suite (Table 1), with per‑task details in Tables 6–7. - Useful ablations/robustness. Loss‑component ablations (Table 9) and batch‑size robustn

Weaknesses

1. Limited novelty relative to listwise LTR. The proposed LRankKL is extremely close to classical listwise losses (ListNet/ListMLE/RocketQA). The paper should explicitly connect to that literature and temper novelty claims around the STS loss. 2. Scope of baselines. Results largely compare to internal objectives (InfoNCE, CoSENT, mixed sampler). Missing are strong generalist comparators such as NV‑Embed and Jina‑v3/Task‑LoRA, which directly address multi‑task IR+STS. Even frozen‑backbone adapt

Reviewer 02Rating 4Confidence 3

Strengths

The paper is well-motivated, providing a clear and insightful diagnosis of why joint IR/STS training typically fails, correctly identifying the core discrepancies in their respective data structures, text lengths, and evaluation metrics. The proposed solution is systematic, providing a comprehensive framework that decouples tasks at the data ingestion, loss calculation, and batch sampling levels. The design of the task-specific losses is a significant strength. Each objective is explicitly chose

Weaknesses

1. The weights $\alpha$, $\beta$, and $\gamma$ of the total STS loss are never specified in the paper, which hinders reproducibility. 2. The dynamic single-source data sampler is a core component, but its scheduling mechanism is completely undefined. How are the tasks (IR vs. STS) alternated? Is it a 1:1 iteration, or proportional to dataset size, or some other curriculum? 3. The paper calls this a "unified framework", but it functions more like a task switcher. It doesn't use a single, unified

Reviewer 03Rating 2Confidence 2

Strengths

* Tackles an important and widely relevant problem—joint optimization across tasks like IR and STS, which mirrors the broader multi-task goal seen in setups such as MTEB (though the paper focuses on only two of those categories). * The proposed framework is conceptually simple, avoiding complex multi-stage pipelines or architectural modifications. * The algorithm is practical and easy to implement, making it accessible for real-world adaptation and integration into existing embedding training wo

Weaknesses

* Limited novelty: the proposed methods such as extended InfoNCE, rank-normalized losses, and task-specific sampling, are largely incremental adaptations of existing techniques. * The scope of joint optimization is narrow. Recent IR work (e.g., BEIR, MTEB) already treats multi-task or zero-shot generalization as standard, so balancing only IR and STS tasks represents a subset of a broader, already-explored challenge. * The reported performance gains are modest, often falling within the expected

Code & Models

Models

🤗
tencent/Youtu-Embedding
model· 904 dl· ♡ 70
904 dl♡ 70

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.