Two Heads Are Better Than One: Dual-Model Verbal Reflection at Inference-Time

Jiazheng Li; Yuxiang Zhou; Junru Lu; Gladys Tyen; Lin Gui; Cesare Aloisi; Yulan He

arXiv:2502.19230·cs.CL·September 30, 2025

Two Heads Are Better Than One: Dual-Model Verbal Reflection at Inference-Time

Jiazheng Li, Yuxiang Zhou, Junru Lu, Gladys Tyen, Lin Gui, Cesare Aloisi, Yulan He

PDF

Open Access 1 Datasets 1 Video

TL;DR

This paper introduces DARS, a dual-model framework that enhances automated student answer scoring by generating precise verbal feedback through contrastive reflection, improving transparency and performance in reasoning tasks.

Contribution

The paper presents a novel contrastive reflection synthesis pipeline and a dual-model framework, DARS, for improved explainability and accuracy in automated student answer scoring.

Findings

01

DARS outperforms existing baselines across all metrics.

02

Reflection data significantly improves scoring performance.

03

The framework scales effectively with larger models.

Abstract

Although preference optimization methods have improved reasoning performance in Large Language Models (LLMs), they often lack transparency regarding why one reasoning outcome is preferred over another. This limitation is especially critical in Automated Student Answer Scoring (ASAS), where explainability is essential to justify assessment outcomes. Verbal reinforcement learning offers the potential to generate explicit reflection, but it tends to produce superficial critiques that can harm assessment performance. Existing LLMs also struggle to reliably detect subtle reasoning errors in ASAS tasks. Moreover, manually identifying intermediate reasoning errors is expensive and difficult to scale. To address these challenges, we introduce a contrastive reflection synthesis pipeline that generates precise verbal feedback by identifying discrepancies in structure reasoning graph paths.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

jiazhengli/DARS_synthethsis_reflection
dataset· 37 dl
37 dl

Videos

Two Heads Are Better Than One: Dual-Model Verbal Reflection at Inference-Time· underline

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning · Topic Modeling · Multimodal Machine Learning Applications