A Simple "Try Again" Can Elicit Multi-Turn LLM Reasoning

Licheng Liu; Zihan Wang; Linjie Li; Chenwei Xu; Yiping Lu; Han Liu; Avirup Sil; Manling Li

arXiv:2507.14295·cs.LG·August 25, 2025

A Simple "Try Again" Can Elicit Multi-Turn LLM Reasoning

Licheng Liu, Zihan Wang, Linjie Li, Chenwei Xu, Yiping Lu, Han Liu, Avirup Sil, Manling Li

PDF

Open Access 1 Models

TL;DR

This paper introduces a simple unary feedback mechanism called UFO for reinforcement learning, which enhances multi-turn reasoning in large language models by enabling better reflection and revision during iterative problem solving.

Contribution

The work presents UFO, a minimal unary feedback approach for multi-turn RL training that improves reasoning accuracy and feedback responsiveness in large language models.

Findings

01

Multi-turn RL with UFO improves reasoning accuracy by up to 14%.

02

UFO maintains single-turn performance while enhancing multi-turn reasoning.

03

Models trained with UFO better reflect and revise answers based on feedback.

Abstract

Multi-turn problem solving is critical yet challenging for Large Reasoning Models (LRMs) to reflect on their reasoning and revise from feedback. Existing Reinforcement Learning (RL) methods train large reasoning models on a single-turn paradigm with verifiable rewards. However, we observe that models trained with existing RL paradigms often lose their ability to solve problems across multiple turns and struggle to revise answers based on contextual feedback, leading to repetitive responses. We ask: can LRMs learn to reflect their answers in a multi-turn context? In this work, we find that training models with multi-turn RL using only unary feedback (e.g., "Let's try again") after wrong answers can improve both single-turn performance and multi-turn reasoning. We introduce Unary Feedback as Observation (UFO) for reinforcement learning, which uses minimal yet common unary user feedback…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
LichengLiu03/Qwen2.5-3B-UFO
model· 5 dl· ♡ 2
5 dl♡ 2

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Reinforcement Learning in Robotics