Narrowing the Knowledge Evaluation Gap: Open-Domain Question Answering   with Multi-Granularity Answers

Gal Yona; Roee Aharoni; Mor Geva

arXiv:2401.04695·cs.CL·August 2, 2024·2 cites

Narrowing the Knowledge Evaluation Gap: Open-Domain Question Answering with Multi-Granularity Answers

Gal Yona, Roee Aharoni, Mor Geva

PDF

Open Access 1 Datasets 1 Video

TL;DR

This paper introduces GRANOLA QA, an evaluation framework for open-domain question answering that considers multi-granularity answers, and proposes DRAG, a decoding method that improves answer accuracy by aligning response granularity with model uncertainty.

Contribution

The paper presents GRANOLA QA and GRANOLA-EQ datasets, and introduces DRAG, a decoding algorithm that enhances answer accuracy by considering answer granularity and model uncertainty.

Findings

01

DRAG improves accuracy by nearly 20 points on GRANOLA-EQ.

02

Standard decoding often produces overly specific answers that are incorrect.

03

Multi-granularity evaluation reveals more knowledge in language models than standard methods.

Abstract

Factual questions typically can be answered correctly at different levels of granularity. For example, both ``August 4, 1961'' and ``1961'' are correct answers to the question ``When was Barack Obama born?''. Standard question answering (QA) evaluation protocols, however, do not explicitly take this into account and compare a predicted answer against answers of a single granularity level. In this work, we propose GRANOLA QA, a novel evaluation setting where a predicted answer is evaluated in terms of accuracy and informativeness against a set of multi-granularity answers. We present a simple methodology for enriching existing datasets with multi-granularity answers, and create GRANOLA-EQ, a multi-granularity version of the EntityQuestions dataset. We evaluate a range of decoding methods on GRANOLA-EQ, including a new algorithm, called Decoding with Response Aggregation (DRAG), that is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

google/granola-entity-questions
dataset· 34 dl
34 dl

Videos

Narrowing the Knowledge Evaluation Gap: Open-Domain Question Answering with Multi-Granularity Answers· underline

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsSparse Evolutionary Training