Learning to Retrieve with Weakened Labels: Robust Training under Label Noise

Arnab Sharma

arXiv:2512.13237·cs.LG·December 16, 2025

Learning to Retrieve with Weakened Labels: Robust Training under Label Noise

Arnab Sharma

PDF

Open Access

TL;DR

This paper introduces a label weakening technique for training neural retrieval models that enhances robustness against label noise by considering multiple plausible labels rather than a single potentially incorrect label.

Contribution

The work proposes a novel label weakening approach that improves retrieval model robustness to label noise without requiring complex loss functions or extensive hyperparameter tuning.

Findings

01

Label weakening outperforms 10 state-of-the-art loss functions.

02

The approach is effective across multiple retrieval models and datasets.

03

Robustness is demonstrated in realistic noisy settings with semantic-aware noise.

Abstract

Neural Encoders are frequently used in the NLP domain to perform dense retrieval tasks, for instance, to generate the candidate documents for a given query in question-answering tasks. However, sparse annotation and label noise in the training data make it challenging to train or fine-tune such retrieval models. Although existing works have attempted to mitigate these problems by incorporating modified loss functions or data cleaning, these approaches either require some hyperparameters to tune during training or add substantial complexity to the training setup. In this work, we consider a label weakening approach to generate robust retrieval models in the presence of label noise. Instead of enforcing a single, potentially erroneous label for each query document pair, we allow for a set of plausible labels derived from both the observed supervision and the model's confidence scores. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Text and Document Classification Technologies · Topic Modeling