Learning to Rank from Samples of Variable Quality

Mostafa Dehghani; Jaap Kamps

arXiv:1806.08694·cs.IR·June 25, 2018

Learning to Rank from Samples of Variable Quality

Mostafa Dehghani, Jaap Kamps

PDF

Open Access

TL;DR

This paper introduces fidelity-weighted learning (FWL), a semi-supervised approach that leverages both high-quality and weakly-labeled data by estimating label confidence to improve deep neural network training.

Contribution

The paper proposes a novel semi-supervised student-teacher framework that accounts for label quality, enhancing learning from mixed-quality datasets.

Findings

01

FWL outperforms state-of-the-art semi-supervised methods in document ranking.

02

The approach effectively utilizes weakly-labeled data with confidence weighting.

03

Experimental results demonstrate improved ranking performance.

Abstract

Training deep neural networks requires many training samples, but in practice, training labels are expensive to obtain and may be of varying quality, as some may be from trusted expert labelers while others might be from heuristics or other sources of weak supervision such as crowd-sourcing. This creates a fundamental quality-versus quantity trade-off in the learning process. Do we learn from the small amount of high-quality data or the potentially large amount of weakly-labeled data? We argue that if the learner could somehow know and take the label-quality into account when learning the data representation, we could get the best of both worlds. To this end, we introduce "fidelity-weighted learning" (FWL), a semi-supervised student-teacher approach for training deep neural networks using weakly-labeled data. FWL modulates the parameter updates to a student network (trained on the task…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Imbalanced Data Classification Techniques · Machine Learning and Algorithms