Augmented Test Collections: A Step in the Right Direction
Laura Hasler, Martin Halvey, Robert Villa

TL;DR
This paper advocates for augmenting IR test collections with contextual and subjective information from human assessors to improve the realism and user-focus of system evaluations.
Contribution
It proposes enhancing relevance assessments with additional data about assessors and their interpretation, addressing oversimplifications in current evaluation methods.
Findings
Initial user studies to understand assessor judgment processes
Potential for more realistic and user-centered evaluation metrics
Framework for augmenting test collections with contextual information
Abstract
In this position paper we argue that certain aspects of relevance assessment in the evaluation of IR systems are oversimplified and that human assessments represented by qrels should be augmented to take account of contextual factors and the subjectivity of the task at hand. We propose enhancing test collections used in evaluation with information related to human assessors and their interpretation of the task. Such augmented collections would provide a more realistic and user-focused evaluation, enabling us to better understand the evaluation process, the performance of systems and user interactions. A first step is to conduct user studies to examine in more detail what people actually do when we ask them to judge the relevance of a document.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Mobile Crowdsensing and Crowdsourcing · Semantic Web and Ontologies
