Efficient Test Collection Construction via Active Learning
Md Mustafizur Rahman, Mucahid Kutlu, Tamer Elsayed, Matthew Lease

TL;DR
This paper presents active learning strategies for efficiently constructing information retrieval test collections by selecting documents for human judgment and automatically classifying relevance, reducing reliance on shared task campaigns.
Contribution
It introduces active learning methods that do not depend on system rankings to build test collections, demonstrating effectiveness across multiple TREC datasets.
Findings
High labeling accuracy achieved
Effective relevance classification without full judgments
Performance varies with relevance scarcity
Abstract
To create a new IR test collection at low cost, it is valuable to carefully select which documents merit human relevance judgments. Shared task campaigns such as NIST TREC pool document rankings from many participating systems (and often interactive runs as well) in order to identify the most likely relevant documents for human judging. However, if one's primary goal is merely to build a test collection, it would be useful to be able to do so without needing to run an entire shared task. Toward this end, we investigate multiple active learning strategies which, without reliance on system rankings: 1) select which documents human assessors should judge; and 2) automatically classify the relevance of additional unjudged documents. To assess our approach, we report experiments on five TREC collections with varying scarcity of relevant documents. We report labeling accuracy achieved, as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
