Two Heads Are Better Than One: Dual-Model Verbal Reflection at Inference-Time
Jiazheng Li, Yuxiang Zhou, Junru Lu, Gladys Tyen, Lin Gui, Cesare Aloisi, Yulan He

TL;DR
This paper introduces DARS, a dual-model framework that enhances automated student answer scoring by generating precise verbal feedback through contrastive reflection, improving transparency and performance in reasoning tasks.
Contribution
The paper presents a novel contrastive reflection synthesis pipeline and a dual-model framework, DARS, for improved explainability and accuracy in automated student answer scoring.
Findings
DARS outperforms existing baselines across all metrics.
Reflection data significantly improves scoring performance.
The framework scales effectively with larger models.
Abstract
Although preference optimization methods have improved reasoning performance in Large Language Models (LLMs), they often lack transparency regarding why one reasoning outcome is preferred over another. This limitation is especially critical in Automated Student Answer Scoring (ASAS), where explainability is essential to justify assessment outcomes. Verbal reinforcement learning offers the potential to generate explicit reflection, but it tends to produce superficial critiques that can harm assessment performance. Existing LLMs also struggle to reliably detect subtle reasoning errors in ASAS tasks. Moreover, manually identifying intermediate reasoning errors is expensive and difficult to scale. To address these challenges, we introduce a contrastive reflection synthesis pipeline that generates precise verbal feedback by identifying discrepancies in structure reasoning graph paths.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Topic Modeling · Multimodal Machine Learning Applications
