QRelScore: Better Evaluating Generated Questions with Deeper Understanding of Context-aware Relevance
Xiaoqiang Wang, Bang Liu, Siliang Tang, Lingfei Wu

TL;DR
QRelScore is a novel context-aware metric for evaluating question generation that better aligns with human judgment by considering input context and reasoning complexity.
Contribution
It introduces QRelScore, a new evaluation metric utilizing language models for deeper context understanding and reasoning in question relevance assessment.
Findings
QRelScore correlates more strongly with human judgments than existing metrics.
It demonstrates robustness against adversarial samples.
It effectively handles complex reasoning and multiple evidences.
Abstract
Existing metrics for assessing question generation not only require costly human reference but also fail to take into account the input context of generation, rendering the lack of deep understanding of the relevance between the generated questions and input contexts. As a result, they may wrongly penalize a legitimate and reasonable candidate question when it (i) involves complicated reasoning with the context or (ii) can be grounded by multiple evidences in the context. In this paper, we propose , a context-aware evance evaluation metric for uestion Generation. Based on off-the-shelf language models such as BERT and GPT2, QRelScore employs both word-level hierarchical matching and sentence-level prompt-based generation to cope with the complicated reasoning and diverse generation from multiple evidences,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Attention Dropout · Dense Connections · Residual Connection · Weight Decay · Softmax · Dropout · Multi-Head Attention
