A Simple "Try Again" Can Elicit Multi-Turn LLM Reasoning
Licheng Liu, Zihan Wang, Linjie Li, Chenwei Xu, Yiping Lu, Han Liu, Avirup Sil, Manling Li

TL;DR
This paper introduces a simple unary feedback mechanism called UFO for reinforcement learning, which enhances multi-turn reasoning in large language models by enabling better reflection and revision during iterative problem solving.
Contribution
The work presents UFO, a minimal unary feedback approach for multi-turn RL training that improves reasoning accuracy and feedback responsiveness in large language models.
Findings
Multi-turn RL with UFO improves reasoning accuracy by up to 14%.
UFO maintains single-turn performance while enhancing multi-turn reasoning.
Models trained with UFO better reflect and revise answers based on feedback.
Abstract
Multi-turn problem solving is critical yet challenging for Large Reasoning Models (LRMs) to reflect on their reasoning and revise from feedback. Existing Reinforcement Learning (RL) methods train large reasoning models on a single-turn paradigm with verifiable rewards. However, we observe that models trained with existing RL paradigms often lose their ability to solve problems across multiple turns and struggle to revise answers based on contextual feedback, leading to repetitive responses. We ask: can LRMs learn to reflect their answers in a multi-turn context? In this work, we find that training models with multi-turn RL using only unary feedback (e.g., "Let's try again") after wrong answers can improve both single-turn performance and multi-turn reasoning. We introduce Unary Feedback as Observation (UFO) for reinforcement learning, which uses minimal yet common unary user feedback…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Reinforcement Learning in Robotics
