Towards Theoretical Understanding of Weak Supervision for Information   Retrieval

Hamed Zamani; W. Bruce Croft

arXiv:1806.04815·cs.IR·June 14, 2018·1 cites

Towards Theoretical Understanding of Weak Supervision for Information Retrieval

Hamed Zamani, W. Bruce Croft

PDF

Open Access

TL;DR

This paper explores the theoretical foundations of using weak supervision in neural information retrieval, aiming to explain why models trained on weakly labeled data can outperform their labelers.

Contribution

It provides a theoretical analysis of weak supervision in IR, offering insights and guidelines for training models effectively with weakly labeled data.

Findings

01

Theoretical insights into learning from weakly supervised data

02

Guidelines for training IR models with weak supervision

03

Empirical evidence supporting the effectiveness of weak supervision

Abstract

Neural network approaches have recently shown to be effective in several information retrieval (IR) tasks. However, neural approaches often require large volumes of training data to perform effectively, which is not always available. To mitigate the shortage of labeled data, training neural IR models with weak supervision has been recently proposed and received considerable attention in the literature. In weak supervision, an existing model automatically generates labels for a large set of unlabeled data, and a machine learning model is further trained on the generated "weak" data. Surprisingly, it has been shown in prior art that the trained neural model can outperform the weak labeler by a significant margin. Although these obtained improvements have been intuitively justified in previous work, the literature still lacks theoretical justification for the observed empirical findings.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Topic Modeling · Machine Learning and Data Classification