QAScore -- An Unsupervised Unreferenced Metric for the Question Generation Evaluation
Tianbo Ji, Chenyang Lyu, Gareth Jones, Liting Zhou, Yvette Graham

TL;DR
QAScore is an unsupervised, reference-free metric for evaluating question generation quality, demonstrating stronger correlation with human judgments than traditional metrics like BLEU, ROUGE, and BERTScore.
Contribution
It introduces QAScore, a novel evaluation metric that assesses question quality based on language model cross entropy without requiring references or fine-tuning.
Findings
QAScore correlates better with human judgments than BLEU, ROUGE, and BERTScore.
QAScore is unsupervised and reference-free, simplifying the evaluation process.
Experimental results validate QAScore's effectiveness in QG evaluation.
Abstract
Question Generation (QG) aims to automate the task of composing questions for a passage with a set of chosen answers found within the passage. In recent years, the introduction of neural generation models has resulted in substantial improvements of automatically generated questions in terms of quality, especially compared to traditional approaches that employ manually crafted heuristics. However, the metrics commonly applied in QG evaluations have been criticized for their low agreement with human judgement. We therefore propose a new reference-free evaluation metric that has the potential to provide a better mechanism for evaluating QG systems, called QAScore. Instead of fine-tuning a language model to maximize its correlation with human judgements, QAScore evaluates a question by computing the cross entropy according to the probability that the language model can correctly generate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems
