Towards Robust Evaluation: A Comprehensive Taxonomy of Datasets and Metrics for Open Domain Question Answering in the Era of Large Language Models
Akchay Srivastava, Atif Memon

TL;DR
This paper provides a comprehensive review and taxonomy of datasets and evaluation metrics for open domain question answering, aiming to improve the robustness and comparability of system assessments in the era of large language models.
Contribution
It introduces a novel taxonomy for ODQA datasets based on modality and difficulty, and offers a structured organization and critical analysis of evaluation metrics.
Findings
Reviewed 52 datasets and 20 evaluation techniques
Proposed a taxonomy incorporating modality and question difficulty
Analyzed trade-offs in current evaluation metrics
Abstract
Open Domain Question Answering (ODQA) within natural language processing involves building systems that answer factual questions using large-scale knowledge corpora. Recent advances stem from the confluence of several factors, such as large-scale training datasets, deep learning techniques, and the rise of large language models. High-quality datasets are used to train models on realistic scenarios and enable the evaluation of the system on potentially unseen data. Standardized metrics facilitate comparisons between different ODQA systems, allowing researchers to objectively track advancements in the field. Our study presents a thorough examination of the current landscape of ODQA benchmarking by reviewing 52 datasets and 20 evaluation techniques across textual and multimodal modalities. We introduce a novel taxonomy for ODQA datasets that incorporates both the modality and difficulty of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
