Incentivizing Reasoning for Advanced Instruction-Following of Large Language Models

Yulei Qin; Gang Li; Zongyi Li; Zihan Xu; Yuchen Shi; Zhekai Lin; Xiao Cui; Ke Li; Xing Sun

arXiv:2506.01413·cs.CV·October 1, 2025

Incentivizing Reasoning for Advanced Instruction-Following of Large Language Models

Yulei Qin, Gang Li, Zongyi Li, Zihan Xu, Yuchen Shi, Zhekai Lin, Xiao Cui, Ke Li, Xing Sun

PDF

1 Repo 6 Models 3 Datasets

TL;DR

This paper introduces RAIF, a reinforcement learning-based method that incentivizes reasoning in large language models to better follow complex instructions, significantly improving performance and generalizability.

Contribution

RAIF systematically enhances LLMs' instruction-following ability by decomposing complex instructions, using RL with rule-based rewards, and employing behavior cloning to foster reasoning skills.

Findings

01

11.74% performance gain on seven benchmarks

02

Achieves results comparable to larger models (8B vs. 1.5B)

03

Generalizes well to out-of-distribution constraints

Abstract

Existing large language models (LLMs) face challenges of following complex instructions, especially when multiple constraints are present and organized in paralleling, chaining, and branching structures. One intuitive solution, namely chain-of-thought (CoT), is expected to universally improve capabilities of LLMs. However, we find that the vanilla CoT exerts a negative impact on performance due to its superficial reasoning pattern of simply paraphrasing the instructions. It fails to peel back the compositions of constraints for identifying their relationship across hierarchies of types and dimensions. To this end, we propose RAIF, a systematic method to boost LLMs in dealing with complex instructions via incentivizing reasoning for test-time compute scaling. First, we stem from the decomposition of complex instructions under existing taxonomies and propose a reproducible data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yuleiqin/raif
pytorchOfficial

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.