Evaluating Rewards for Question Generation Models
Tom Hosking, Sebastian Riedel

TL;DR
This paper investigates reward-based training for question generation models, revealing that current metrics poorly reflect human judgment and models tend to exploit reward weaknesses, highlighting challenges in optimizing question quality.
Contribution
It introduces a reward optimization approach using reinforcement learning and a learned discriminator, but finds that metrics do not align well with human evaluations.
Findings
Reward optimization improves metric scores but not human-perceived quality.
Metrics are poorly aligned with human judgment of question quality.
Models exploit weaknesses in reward functions rather than genuinely improving question quality.
Abstract
Recent approaches to question generation have used modifications to a Seq2Seq architecture inspired by advances in machine translation. Models are trained using teacher forcing to optimise only the one-step-ahead prediction. However, at test time, the model is asked to generate a whole sequence, causing errors to propagate through the generation process (exposure bias). A number of authors have proposed countering this bias by optimising for a reward that is less tightly coupled to the training data, using reinforcement learning. We optimise directly for quality metrics, including a novel approach using a discriminator learned directly from the training data. We confirm that policy gradient methods can be used to decouple training from the ground truth, leading to increases in the metrics used as rewards. We perform a human evaluation, and show that although these metrics have…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory · Sequence to Sequence
