Mitigating Lost in Multi-turn Conversation via Curriculum RL with Verifiable Accuracy and Abstention Rewards

Ming Li; Pei Chen; Zhenhao Zhang; Tao Yang; Xinyang Zhang; Han Li; Tianyu Cao; Ming Zeng; Zhuofeng Wu; Meng Jiang; Huasheng Li; Lihong Li; Bing Yin

arXiv:2510.18731·cs.CL·May 1, 2026

Mitigating Lost in Multi-turn Conversation via Curriculum RL with Verifiable Accuracy and Abstention Rewards

Ming Li, Pei Chen, Zhenhao Zhang, Tao Yang, Xinyang Zhang, Han Li, Tianyu Cao, Ming Zeng, Zhuofeng Wu, Meng Jiang, Huasheng Li, Lihong Li, Bing Yin

PDF

TL;DR

This paper introduces RLAAR, a reinforcement learning framework that enhances multi-turn language models by improving accuracy and abstention, thereby reducing performance degradation known as Lost-in-Conversation.

Contribution

The paper proposes a novel curriculum reinforcement learning method with verifiable rewards to improve multi-turn LLM reliability and abstention capabilities.

Findings

01

RLAAR reduces Lost-in-Conversation performance decay from 62.6% to 75.1%.

02

It significantly increases calibrated abstention rates from 33.5% to 73.4%.

03

The approach stabilizes training and promotes reliable multi-turn dialogue.

Abstract

Large Language Models demonstrate strong capabilities in single-turn instruction following but suffer from Lost-in-Conversation (LiC), a degradation in performance as information is revealed progressively in multi-turn settings. Motivated by the current progress on Reinforcement Learning with Verifiable Rewards (RLVR), we propose Curriculum Reinforcement Learning with Verifiable Accuracy and Abstention Rewards (RLAAR), a framework that encourages models not only to generate correct answers, but also to judge the solvability of questions in the multi-turn conversation setting. Our approach employs a competence-gated curriculum that incrementally increases dialogue difficulty (in terms of instruction shards), stabilizing training while promoting reliability. Using multi-turn, on-policy rollouts and a mixed-reward system, RLAAR teaches models to balance problem-solving with informed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.