Do Not Step Into the Same River Twice: Learning to Reason from Trial and Error

Chenming Tang; Hsiu-Yuan Huang; Weijie Liu; Clive Bai; Saiyong Yang; Yunfang Wu

arXiv:2510.26109·cs.LG·April 17, 2026

Do Not Step Into the Same River Twice: Learning to Reason from Trial and Error

Chenming Tang, Hsiu-Yuan Huang, Weijie Liu, Clive Bai, Saiyong Yang, Yunfang Wu

PDF

1 Repo 2 Models

TL;DR

This paper introduces LTE, a reinforcement learning approach that improves language model reasoning by learning from its own mistakes without external guidance, leading to better performance on mathematical reasoning tasks.

Contribution

LTE is a novel method that enables language models to learn from their own errors, overcoming exploration stagnation without relying on external experts.

Findings

01

LTE outperforms GRPO by 5.02 in Pass@1 and 9.96 in Pass@k on average.

02

LTE surpasses methods requiring external guidance.

03

LTE mitigates exploration stagnation and improves training exploration and exploitation.

Abstract

Reinforcement learning with verifiable rewards (RLVR) has significantly boosted the reasoning capability of language models (LMs). However, existing RLVR approaches train LMs based on their own on-policy responses and are constrained by the initial capability of LMs, thus prone to exploration stagnation, in which LMs fail to solve more training problems and cannot further learn from the training data. Some approaches try to address this by leveraging off-policy solutions to training problems, but rely on external expert guidance that is limited in availability and scalability. In this work, we propose LTE (Learning to reason from Trial and Error), an approach that hints LMs with their previously self-made mistakes, not requiring any external expert guidance. Experiments validate the effectiveness of LTE, which outperforms the normal group relative policy optimization (GRPO) by 5.02 in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

JamyDon/LTE
github

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.