Influence Guided Sampling for Domain Adaptation of Text Retrievers
Meet Doshi, Vishwajeet Kumar, Yulong Li, Jaydeep Sen

TL;DR
This paper introduces Inf-DDS, a reinforcement learning-based sampling method that adaptively prioritizes training datasets to improve domain adaptation in text retrieval models, achieving significant performance gains and reduced computational costs.
Contribution
It presents a novel influence-guided sampling framework that adaptively reweighs datasets for training text retrievers, outperforming existing methods in effectiveness and efficiency.
Findings
Achieved a 5.03 absolute NDCG@10 improvement on a multilingual model.
Improved retrieval performance by 0.94 NDCG@10 on MiniLM-L6-v2.
Reduced GPU costs by 1.5 to 4 times compared to baseline methods.
Abstract
General-purpose open-domain dense retrieval systems are usually trained with a large, eclectic mix of corpora and search tasks. How should these diverse corpora and tasks be sampled for training? Conventional approaches sample them uniformly, proportional to their instance population sizes, or depend on human-level expert supervision. It is well known that the training data sampling strategy can greatly impact model performance. However, how to find the optimal strategy has not been adequately studied in the context of embedding models. We propose Inf-DDS, a novel reinforcement learning driven sampling framework that adaptively reweighs training datasets guided by influence-based reward signals and is much more lightweight with respect to GPU consumption. Our technique iteratively refines the sampling policy, prioritizing datasets that maximize model performance on a target development…
Peer Reviews
Decision·Submitted to ICLR 2026
* Novel influence-based reward mechanism offering more stable, interpretable sampling than gradient-based baselines. * Computational efficiency via gradient reuse and partial subsampling. * Clear motivation bridging influence functions with adaptive sampling.
* Reward estimation cost: computing per-domain influence still scales poorly for very large dataset pools; proxy reliance may not generalize. * Influence score stability: although more stable than gradient-based methods, influence estimation still depends on the correctness of the proxy update steps. * Initialization sensitivity: Inf-DDS performance depends heavily on the initial sampling distribution. * Overfitting to dev sets: using dev-based rewards risks domain leakage. * Influence effect on
The problem of selecting informative samples during training is important, especially for reducing computation or improving convergence. The core idea of prioritizing domains by observed dev improvement (rather than gradient alignment) is intuitive and close to the end goal.
1. The paper optimizes proxy losses (e.g., InfoNCE / KD loss deltas) but does not demonstrate that these correlate with ranking metrics such as NDCG@10. 2. Influence estimation requires extra forward/backward or Hessian-vector steps. The paper calls the method efficient but does not report GPU-hours / wall-clock / memory, so it is unclear whether gains outweigh the additional compute. 3. The paper alternates between linear-normalized influence weights and softmax Reptile updates, but does not cl
- This work addresses the important topic of sampling datasets and tasks for training general-purpose dense retrievers in a more principled manner. - The proposed sampling method was shown to improve retrieval performance while requiring considerably less compute over the baselines across multiple datasets. - The presentation is clear overall, though Fig 2 (step 3) is not consistent with the pseudocode.
- The choice for the initial dataset sampling probability distribution is not justified. - The performance across the dev sets is considered to be equally important, while it may not be the case.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Domain Adaptation and Few-Shot Learning · Topic Modeling
