Beyond the Trade-off: Self-Supervised Reinforcement Learning for Reasoning Models' Instruction Following

Qingyu Ren; Qianyu He; Bowei Zhang; Jie Zeng; Jiaqing Liang; Yanghua Xiao; Weikang Zhou; Zeye Sun; Fei Yu

arXiv:2508.02150·cs.AI·August 5, 2025

Beyond the Trade-off: Self-Supervised Reinforcement Learning for Reasoning Models' Instruction Following

Qingyu Ren, Qianyu He, Bowei Zhang, Jie Zeng, Jiaqing Liang, Yanghua Xiao, Weikang Zhou, Zeye Sun, Fei Yu

PDF

Open Access

TL;DR

This paper introduces a self-supervised reinforcement learning framework that enhances reasoning models' instruction following abilities using internal signals, avoiding external supervision and reducing costs.

Contribution

It presents a novel self-supervised RL method that improves instruction following in reasoning models without relying on external models or supervision.

Findings

01

Significant improvement in instruction following capabilities.

02

Maintains reasoning performance after enhancement.

03

Offers a scalable, cost-effective solution.

Abstract

Reasoning models excel in complex problem solving but exhibit a concerning trade off between reasoning capabilities and instruction following abilities. Existing approaches for improving instruction following rely on stronger external models, creating methodological bottlenecks and practical limitations including increased costs and accessibility constraints. We propose a self-supervised RL framework that leverages reasoning models' own internal signals to improve instruction following capabilities without external supervision. Extensive experiments demonstrate that our framework significantly improves instruction following capabilities while maintaining reasoning performance, offering a scalable and cost-effective approach to enhance instruction following in reasoning models. The data and code are publicly available at https://github.com/Rainier-rq/verl-if.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning · Reinforcement Learning in Robotics · Teaching and Learning Programming