Return of EM: Entity-driven Answer Set Expansion for QA Evaluation

Dongryeol Lee; Minwoo Lee; Kyungmin Min; Joonsuk Park; Kyomin Jung

arXiv:2404.15650·cs.CL·December 13, 2024

Return of EM: Entity-driven Answer Set Expansion for QA Evaluation

Dongryeol Lee, Minwoo Lee, Kyungmin Min, Joonsuk Park, Kyomin Jung

PDF

Open Access 1 Repo

TL;DR

This paper introduces an entity-driven answer set expansion method using soft EM to improve QA evaluation, achieving high reliability, interpretability, and environmental benefits over traditional LLM-based methods.

Contribution

It presents a novel soft EM approach that expands gold answer sets based on entity types, enhancing evaluation accuracy and interpretability.

Findings

01

Outperforms traditional QA evaluation methods significantly

02

Achieves reliability comparable to LLM-based evaluations

03

Reduces environmental impact and improves interpretability

Abstract

Recently, directly using large language models (LLMs) has been shown to be the most reliable method to evaluate QA models. However, it suffers from limited interpretability, high cost, and environmental harm. To address these, we propose to use soft EM with entity-driven answer set expansion. Our approach expands the gold answer set to include diverse surface forms, based on the observation that the surface forms often follow particular patterns depending on the entity type. The experimental results show that our method outperforms traditional evaluation methods by a large margin. Moreover, the reliability of our evaluation method is comparable to that of LLM-based ones, while offering the benefits of high interpretability and reduced environmental harm.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dongryeollee96/entqa
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Advanced Text Analysis Techniques · Natural Language Processing Techniques

MethodsSparse Evolutionary Training