TL;DR
CoDiEmb is a novel framework that effectively learns unified text embeddings for both Information Retrieval and Semantic Textual Similarity tasks by decoupling task-specific signals and employing a dynamic, collaborative training strategy.
Contribution
It introduces a decoupled, task-specific training approach with a delta-guided model fusion and a single-stage pipeline for improved multi-task embedding learning.
Findings
Outperforms baseline models on 15 IR and STS benchmarks.
Mitigates negative transfer between IR and STS tasks.
Enhances the geometric properties of learned embeddings.
Abstract
Learning unified text embeddings that excel across diverse downstream tasks is a central goal in representation learning, yet negative transfer remains a persistent obstacle. This challenge is particularly pronounced when jointly training a single encoder for Information Retrieval (IR) and Semantic Textual Similarity (STS), two essential but fundamentally disparate tasks for which naive co-training typically yields steep performance trade-offs. We argue that resolving this conflict requires systematically decoupling task-specific learning signals throughout the training pipeline. To this end, we introduce CoDiEmb, a unified framework that reconciles the divergent requirements of IR and STS in a collaborative yet distinct manner. CoDiEmb integrates three key innovations for effective joint optimization: (1) Task-specialized objectives paired with a dynamic sampler that forms single-task…
Peer Reviews
Decision·Submitted to ICLR 2026
- The paper tackles persistent "negative transfer" between IR and STS and argues for task‑aware training. The design aligns losses with evaluation targets (nDCG@k vs Spearman). Figure 1 and Sec. 2 make the approach concrete. - Consistent gains across three backbones on STS tasks only. CoDiEmb outperforms InfoNCE/CoSENT/mixed‑sampler baselines on the CMTEB suite (Table 1), with per‑task details in Tables 6–7. - Useful ablations/robustness. Loss‑component ablations (Table 9) and batch‑size robustn
1. Limited novelty relative to listwise LTR. The proposed LRankKL is extremely close to classical listwise losses (ListNet/ListMLE/RocketQA). The paper should explicitly connect to that literature and temper novelty claims around the STS loss. 2. Scope of baselines. Results largely compare to internal objectives (InfoNCE, CoSENT, mixed sampler). Missing are strong generalist comparators such as NV‑Embed and Jina‑v3/Task‑LoRA, which directly address multi‑task IR+STS. Even frozen‑backbone adapt
The paper is well-motivated, providing a clear and insightful diagnosis of why joint IR/STS training typically fails, correctly identifying the core discrepancies in their respective data structures, text lengths, and evaluation metrics. The proposed solution is systematic, providing a comprehensive framework that decouples tasks at the data ingestion, loss calculation, and batch sampling levels. The design of the task-specific losses is a significant strength. Each objective is explicitly chose
1. The weights $\alpha$, $\beta$, and $\gamma$ of the total STS loss are never specified in the paper, which hinders reproducibility. 2. The dynamic single-source data sampler is a core component, but its scheduling mechanism is completely undefined. How are the tasks (IR vs. STS) alternated? Is it a 1:1 iteration, or proportional to dataset size, or some other curriculum? 3. The paper calls this a "unified framework", but it functions more like a task switcher. It doesn't use a single, unified
* Tackles an important and widely relevant problem—joint optimization across tasks like IR and STS, which mirrors the broader multi-task goal seen in setups such as MTEB (though the paper focuses on only two of those categories). * The proposed framework is conceptually simple, avoiding complex multi-stage pipelines or architectural modifications. * The algorithm is practical and easy to implement, making it accessible for real-world adaptation and integration into existing embedding training wo
* Limited novelty: the proposed methods such as extended InfoNCE, rank-normalized losses, and task-specific sampling, are largely incremental adaptations of existing techniques. * The scope of joint optimization is narrow. Recent IR work (e.g., BEIR, MTEB) already treats multi-task or zero-shot generalization as standard, so balancing only IR and STS tasks represents a subset of a broader, already-explored challenge. * The reported performance gains are modest, often falling within the expected
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
