Towards Robust Evaluation: A Comprehensive Taxonomy of Datasets and   Metrics for Open Domain Question Answering in the Era of Large Language   Models

Akchay Srivastava; Atif Memon

arXiv:2406.13232·cs.CL·June 21, 2024

Towards Robust Evaluation: A Comprehensive Taxonomy of Datasets and Metrics for Open Domain Question Answering in the Era of Large Language Models

Akchay Srivastava, Atif Memon

PDF

Open Access

TL;DR

This paper provides a comprehensive review and taxonomy of datasets and evaluation metrics for open domain question answering, aiming to improve the robustness and comparability of system assessments in the era of large language models.

Contribution

It introduces a novel taxonomy for ODQA datasets based on modality and difficulty, and offers a structured organization and critical analysis of evaluation metrics.

Findings

01

Reviewed 52 datasets and 20 evaluation techniques

02

Proposed a taxonomy incorporating modality and question difficulty

03

Analyzed trade-offs in current evaluation metrics

Abstract

Open Domain Question Answering (ODQA) within natural language processing involves building systems that answer factual questions using large-scale knowledge corpora. Recent advances stem from the confluence of several factors, such as large-scale training datasets, deep learning techniques, and the rise of large language models. High-quality datasets are used to train models on realistic scenarios and enable the evaluation of the system on potentially unseen data. Standardized metrics facilitate comparisons between different ODQA systems, allowing researchers to objectively track advancements in the field. Our study presents a thorough examination of the current landscape of ODQA benchmarking by reviewing 52 datasets and 20 evaluation techniques across textual and multimodal modalities. We introduce a novel taxonomy for ODQA datasets that incorporates both the modality and difficulty of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques