MulFeRL: Enhancing Reinforcement Learning with Verbal Feedback in a Multi-turn Loop

Xuancheng Li; Haitao Li; Yujia Zhou; YiqunLiu; Qingyao Ai

arXiv:2601.22900·cs.AI·February 2, 2026

MulFeRL: Enhancing Reinforcement Learning with Verbal Feedback in a Multi-turn Loop

Xuancheng Li, Haitao Li, Yujia Zhou, YiqunLiu, Qingyao Ai

PDF

Open Access

TL;DR

This paper introduces MulFeRL, a multi-turn reinforcement learning framework that uses verbal feedback to improve reasoning and training efficiency, especially on failed samples, outperforming existing methods.

Contribution

It proposes a novel multi-turn feedback-guided RL framework that incorporates verbal feedback into the training process, enhancing reasoning and generalization capabilities.

Findings

01

Outperforms supervised fine-tuning and RLVR baselines in-domain

02

Generalizes well out-of-domain

03

Effectively leverages verbal feedback for training

Abstract

Reinforcement Learning with Verifiable Rewards (RLVR) is widely used to improve reasoning in multiple domains, yet outcome-only scalar rewards are often sparse and uninformative, especially on failed samples, where they merely indicate failure and provide no insight into why the reasoning fails. In this paper, we investigate how to leverage richer verbal feedback to guide RLVR training on failed samples, and how to convert such feedback into a trainable learning signal. Specifically, we propose a multi-turn feedback-guided reinforcement learning framework. It builds on three mechanisms: (1) dynamic multi-turn regeneration guided by feedback, triggered only on failed samples, (2) two complementary learning signals for within-turn and cross-turn optimization, and (3) structured feedback injection into the model's reasoning process. Trained on sampled OpenR1-Math, the approach outperforms…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Reinforcement Learning in Robotics · Intelligent Tutoring Systems and Adaptive Learning