Beyond Correctness: Learning Robust Reasoning via Transfer

Hyunseok Lee; Soheil Abbasloo; Jihoon Tack; Jinwoo Shin

arXiv:2602.08489·cs.LG·February 10, 2026

Beyond Correctness: Learning Robust Reasoning via Transfer

Hyunseok Lee, Soheil Abbasloo, Jihoon Tack, Jinwoo Shin

PDF

Open Access

TL;DR

This paper introduces RLTR, a reinforcement learning method that enhances the robustness and transferability of reasoning in large language models, leading to more reliable and sample-efficient reasoning capabilities.

Contribution

The paper proposes RLTR, a novel transfer-based reward mechanism that improves reasoning robustness and efficiency in large language models.

Findings

01

RLTR improves sampling consistency and answer accuracy.

02

RLTR achieves similar performance with fewer training steps.

03

On MATH500, RLTR outperforms RLVR in accuracy and efficiency.

Abstract

Reinforcement Learning with Verifiable Rewards (RLVR) has recently strengthened LLM reasoning, but its focus on final answer correctness leaves a critical gap: it does not ensure the robustness of the reasoning process itself. We adopt a simple philosophical view, robust reasoning should remain useful beyond the mind that produced it, and treat reasoning as a form of meaning transfer that must survive truncation, reinterpretation, and continuation. Building on this principle, we introduce Reinforcement Learning with Transferable Reward (RLTR), which operationalizes robustness via transfer reward that tests whether a partial reasoning prefix from one model can guide a separate model to the correct answer. This encourages LLMs to produce reasoning that is stable, interpretable, and genuinely generalizable. Our approach improves sampling consistency while improving final answer accuracy,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Reinforcement Learning in Robotics