Cheap IR Evaluation: Fewer Topics, No Relevance Judgements, and Crowdsourced Assessments
Kevin Roitero

TL;DR
This paper proposes a resource-efficient, crowdsourced approach to IR evaluation that reduces reliance on extensive relevance judgments and fewer topics, aiming to improve test collection methodologies.
Contribution
It introduces a novel, more principled method for IR effectiveness evaluation that minimizes resource use and leverages crowdsourcing, advancing current evaluation practices.
Findings
Effective IR evaluation with fewer topics
Crowdsourced relevance assessments are viable
Resource savings without compromising accuracy
Abstract
To evaluate Information Retrieval (IR) effectiveness, a possible approach is to use test collections, which are composed of a collection of documents, a set of description of information needs (called topics), and a set of relevant documents to each topic. Test collections are modelled in a competition scenario: for example, in the well known TREC initiative, participants run their own retrieval systems over a set of topics and they provide a ranked list of retrieved documents; some of the retrieved documents (usually the first ranked) constitute the so called pool, and their relevance is evaluated by human assessors; the document list is then used to compute effectiveness metrics and rank the participant systems. Private Web Search companies also run their in-house evaluation exercises; although the details are mostly unknown, and the aims are somehow different, the overall approach…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques · Topic Modeling · Text and Document Classification Technologies
