Towards a Better Metric for Evaluating Question Generation Systems

Preksha Nema; Mitesh M. Khapra

arXiv:1808.10192·cs.CL·September 3, 2018·6 cites

Towards a Better Metric for Evaluating Question Generation Systems

Preksha Nema, Mitesh M. Khapra

PDF

Open Access 1 Repo

TL;DR

This paper critically examines the effectiveness of n-gram based metrics like BLEU for evaluating question generation systems, proposing an improved scoring function that better aligns with human judgments of answerability.

Contribution

It introduces a new answerability scoring function and demonstrates how integrating it with existing metrics enhances their correlation with human evaluations.

Findings

01

Current metrics poorly correlate with human judgments on answerability.

02

The proposed scoring function improves metric correlation with human assessments.

03

Integration of the new score with existing metrics enhances evaluation accuracy.

Abstract

There has always been criticism for using $n$ -gram based similarity metrics, such as BLEU, NIST, etc, for evaluating the performance of NLG systems. However, these metrics continue to remain popular and are recently being used for evaluating the performance of systems which automatically generate questions from documents, knowledge graphs, images, etc. Given the rising interest in such automatic question generation (AQG) systems, it is important to objectively examine whether these metrics are suitable for this task. In particular, it is important to verify whether such metrics used for evaluating AQG systems focus on answerability of the generated question by preferring questions which contain all relevant information such as question type (Wh-types), entities, relations, etc. In this work, we show that current automatic evaluation metrics based on $n$ -gram similarity do not always…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

PrekshaNema25/Answerability-Metric
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications