Multi-turn Training with Basic Human Feedback Helps Little on LLM Reasoning

Qiang Liu; Wuganjing Song; Zhenzhou Lin; Feifan Chen; Qiaolong Cai; Chen Li; Yongduo Sui

arXiv:2510.21339·cs.CL·October 28, 2025

Multi-turn Training with Basic Human Feedback Helps Little on LLM Reasoning

Qiang Liu, Wuganjing Song, Zhenzhou Lin, Feifan Chen, Qiaolong Cai, Chen Li, Yongduo Sui

PDF

Open Access

TL;DR

This study shows that for reasoning tasks with complete information, single-turn training with human feedback is more effective than multi-turn strategies, which can harm performance.

Contribution

The paper challenges previous assumptions by demonstrating that multi-turn training with basic human feedback offers limited benefits and can degrade reasoning in LLMs.

Findings

01

Single-turn training generalizes well to multi-turn tasks.

02

Multi-turn training can reduce single-turn reasoning performance.

03

Multi-turn strategies provide limited or negative benefits for reasoning.

Abstract

The reasoning capabilities of Large Language Models (LLMs) are typically developed through the single-turn reinforcement learning, whereas real-world applications often involve multi-turn interactions with human feedback, leading to a potential mismatch between training and deployment conditions. In this work, we study whether multi-turn training with human feedback is necessary for reasoning tasks. We compare conventional single-turn training with three multi-turn strategies and reach contrary conclusions to previous research. We find that models trained in a single-turn setting generalize effectively to both single- and multi-turn evaluations, while models trained with multi-turn strategies exhibit a significant degradation in single-turn reasoning performance. These results suggest that for tasks with complete information, robust single-turn training remains more effective and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems