DualReward: A Dynamic Reinforcement Learning Framework for Cloze Tests Distractor Generation
Tianyou Huang, Xinglu Chen, Jingshen Zhang, Xinying Qiu, Ruiying Niu

TL;DR
DualReward is a reinforcement learning framework that adaptively generates high-quality distractors for cloze tests, outperforming existing methods especially on diverse datasets by balancing human-like and novel distractors.
Contribution
It introduces a dual reward structure with adaptive scaling for distractor generation, enhancing performance over state-of-the-art methods across multiple datasets.
Findings
Consistent improvement on CLOTH-F dataset.
Significant gains (3.48-3.86%) in P@1 on MCQ dataset.
Effective handling of varied question types and domains.
Abstract
This paper introduces DualReward, a novel reinforcement learning framework for automatic distractor generation in cloze tests. Unlike conventional approaches that rely primarily on supervised learning or static generative models, our method employs a dual reward structure with adaptive scaling that differentiates between human-created gold standard distractors and model-generated candidates. The framework dynamically adjusts reward signal intensity based on model performance and confidence. We evaluate our approach on both passage-level (CLOTH-F) and sentence-level (MCQ) cloze test datasets, demonstrating consistent improvements over state-of-the-art baselines. Experimental results show that our adaptive reward scaling mechanism provides modest but consistent benefits on homogeneous datasets (CLOTH-F) and more substantial improvements (3.48-3.86% in P@1) on diverse, cross-domain data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Fuzzy Logic and Control Systems · Robot Manipulation and Learning
