Large Language Models Reasoning Abilities Under Non-Ideal Conditions After RL-Fine-Tuning

Chang Tian; Matthew B. Blaschko; Mingzhe Xing; Xiuxing Li; Yinliang Yue; Marie-Francine Moens

arXiv:2508.04848·cs.AI·August 8, 2025

Large Language Models Reasoning Abilities Under Non-Ideal Conditions After RL-Fine-Tuning

Chang Tian, Matthew B. Blaschko, Mingzhe Xing, Xiuxing Li, Yinliang Yue, Marie-Francine Moens

PDF

TL;DR

This paper evaluates large language models' reasoning abilities under realistic non-ideal conditions after RL fine-tuning, revealing significant performance drops and highlighting the need for more robust reasoning methods.

Contribution

It introduces a new evaluation framework for LLM reasoning under non-ideal scenarios and assesses the impact of RL fine-tuning on these capabilities.

Findings

01

RL fine-tuning improves ideal scenario reasoning

02

Performance declines significantly in non-ideal scenarios

03

Current remediation methods are largely ineffective

Abstract

Reinforcement learning (RL) has become a key technique for enhancing the reasoning abilities of large language models (LLMs), with policy-gradient algorithms dominating the post-training stage because of their efficiency and effectiveness. However, most existing benchmarks evaluate large-language-model reasoning under idealized settings, overlooking performance in realistic, non-ideal scenarios. We identify three representative non-ideal scenarios with practical relevance: summary inference, fine-grained noise suppression, and contextual filtering. We introduce a new research direction guided by brain-science findings that human reasoning remains reliable under imperfect inputs. We formally define and evaluate these challenging scenarios. We fine-tune three LLMs and a state-of-the-art large vision-language model (LVLM) using RL with a representative policy-gradient algorithm and then…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.