Fine-Grained Relevance Annotations for Multi-Task Document Ranking and   Question Answering

Sebastian Hofst\"atter; Markus Zlabinger; Mete Sertkan; Michael; Schr\"oder; Allan Hanbury

arXiv:2008.05363·cs.IR·August 13, 2020

Fine-Grained Relevance Annotations for Multi-Task Document Ranking and Question Answering

Sebastian Hofst\"atter, Markus Zlabinger, Mete Sertkan, Michael, Schr\"oder, Allan Hanbury

PDF

1 Repo

TL;DR

FiRA is a new dataset with detailed relevance annotations at passage and word levels, enabling better evaluation of document ranking and question answering models, and revealing insights into relevance distribution within long documents.

Contribution

Introduces FiRA, a dataset with fine-grained relevance annotations extending TREC 2019 data, facilitating improved evaluation of multi-task document ranking and QA models.

Findings

01

TKL model achieves state-of-the-art results on long documents

02

TKL misses many relevant passages despite strong overall performance

03

Relevance distribution varies across different positions in long documents

Abstract

There are many existing retrieval and question answering datasets. However, most of them either focus on ranked list evaluation or single-candidate question answering. This divide makes it challenging to properly evaluate approaches concerned with ranking documents and providing snippets or answers for a given query. In this work, we present FiRA: a novel dataset of Fine-Grained Relevance Annotations. We extend the ranked retrieval annotations of the Deep Learning track of TREC 2019 with passage and word level graded relevance annotations for all relevant documents. We use our newly created data to study the distribution of relevance in long documents, as well as the attention of annotators to specific positions of the text. As an example, we evaluate the recently introduced TKL document ranking model. We find that although TKL exhibits state-of-the-art retrieval results for long…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sebastian-hofstaetter/fira-trec-19-dataset
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.