ReaLM: Reflection-Enhanced Autonomous Reasoning with Small Language Models

Yuanfeng Xu; Zehui Dai; Jian Liang; Jiapeng Guan; Guangrun Wang; Liang Lin; Xiaohui Lv

arXiv:2508.12387·cs.CL·August 19, 2025

ReaLM: Reflection-Enhanced Autonomous Reasoning with Small Language Models

Yuanfeng Xu, Zehui Dai, Jian Liang, Jiapeng Guan, Guangrun Wang, Liang Lin, Xiaohui Lv

PDF

Open Access

TL;DR

ReaLM is a reinforcement learning framework that enhances small language models' reasoning, autonomy, and generalization by contrasting reasoning paths, fading external signals, and distilling domain knowledge, leading to significant performance improvements.

Contribution

The paper introduces ReaLM, a novel reinforcement learning approach with techniques like MRPV, EAAI, and guided distillation to improve small language models' reasoning and autonomy.

Findings

01

ReaLM outperforms baseline models on reasoning tasks.

02

Enhanced reasoning and autonomy demonstrated across multiple benchmarks.

03

Improved generalization through knowledge distillation.

Abstract

Small Language Models (SLMs) are a cost-effective alternative to Large Language Models (LLMs), but often struggle with complex reasoning due to their limited capacity and a tendency to produce mistakes or inconsistent answers during multi-step reasoning. Existing efforts have improved SLM performance, but typically at the cost of one or more of three key aspects: (1) reasoning capability, due to biased supervision that filters out negative reasoning paths and limits learning from errors; (2) autonomy, due to over-reliance on externally generated reasoning signals; and (3) generalization, which suffers when models overfit to teacher-specific patterns. In this paper, we introduce ReaLM, a reinforcement learning framework for robust and self-sufficient reasoning in vertical domains. To enhance reasoning capability, we propose Multi-Route Process Verification (MRPV), which contrasts both…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Intelligent Tutoring Systems and Adaptive Learning · Advanced Data Processing Techniques