RAIN-Merging: A Gradient-Free Method to Enhance Instruction Following in Large Reasoning Models with Preserved Thinking Format

Zhehao Huang; Yuhang Liu; Baijiong Lin; Yixin Lou; Zhengbao He; Hanling Tian; Tao Li; Xiaolin Huang

arXiv:2602.22538·cs.LG·February 27, 2026

RAIN-Merging: A Gradient-Free Method to Enhance Instruction Following in Large Reasoning Models with Preserved Thinking Format

Zhehao Huang, Yuhang Liu, Baijiong Lin, Yixin Lou, Zhengbao He, Hanling Tian, Tao Li, Xiaolin Huang

PDF

Open Access 3 Reviews

TL;DR

RAIN-Merging is a gradient-free technique that effectively combines instruction-tuned models with large reasoning models, enhancing instruction following without compromising reasoning capabilities by preserving the models' thinking formats.

Contribution

The paper introduces RAIN-Merging, a novel gradient-free method that merges instruction-tuned models with large reasoning models, maintaining reasoning performance and output format fidelity.

Findings

01

Significant improvement in instruction adherence across benchmarks.

02

Maintains reasoning quality while enhancing instruction following.

03

Effective across various model scales and architectures.

Abstract

Large reasoning models (LRMs) excel at a long chain of reasoning but often fail to faithfully follow instructions regarding output format, constraints, or specific requirements. We investigate whether this gap can be closed by integrating an instruction-tuned model (ITM) into an LRM. Analyzing their differences in parameter space, namely task vectors, we find that their principal subspaces are nearly orthogonal across key modules, suggesting a lightweight merging with minimal interference. However, we also demonstrate that naive merges are fragile because they overlook the output format mismatch between LRMs (with explicit thinking and response segments) and ITMs (answers-only). We introduce RAIN-Merging (Reasoning-Aware Instruction-attention guided Null-space projection Merging), a gradient-free method that integrates instruction following while preserving thinking format and reasoning…

Peer Reviews

Decision·ICLR 2026 Oral

Reviewer 01Rating 6Confidence 3

Strengths

Clear problem & neat insight. The paper pinpoints a real pain point: LRMs reason well, but violate format/constraints. The idea to protect the thinking segment explicitly while injecting instruction-following behavior is crisp and well-motivated. Strong empirical results. On the headline 7B setting, RAIN-Merging improves instruction-following average (48.11 vs. 44.12 LRM; +4 points absolute) while also improving reasoning/general (55.59 vs. 51.03) and beating task-arithmetic, SLERP, Karcher, TI

Weaknesses

Calibration-set specificity. The instruction calibration set is distilled from IFEval-style instructions (365 samples). This may bias the proxy to rule-verifiable patterns and possibly underrepresent open-ended or tool-use instructions. Reliance on explicit thinking markers. Stage 1 presumes accessible special tokens and feature extraction around them. It is unclear how well this transfers to LRMs with different templates (or hidden/implicit thinking) or to models without consistent <think> tag

Reviewer 02Rating 8Confidence 5

Strengths

1. Novel Research Problem: The work addresses an important and underexplored challenge—balancing instruction-following and reasoning capabilities in LRMs—through model merging, a lightweight and training-free approach. 2. Effective Methodology: The two-stage RAIN-Merging framework is well-motivated, combining null-space projection to preserve reasoning structure with attention-guided scaling to enhance instruction alignment, all without gradient updates. 3. Comprehensive Experiments: The pa

Weaknesses

While the method is evaluated on several model families (Qwen, Llama), further validation on a wider range of architectures and modalities (e.g., multimodal or multilingual models) would strengthen the generalizability claims.

Reviewer 03Rating 4Confidence 4

Strengths

1. Merging ITM and LRM is an interesting and practical problem. 2. The finding that task vectors' principal subspaces are nearly orthogonal across key modules provides an interesting understanding about the parameter space structure of these capabilities. 3. The evaluation spans multiple model families and sizes. 4. The gradient-free nature makes this a practical, accessible alternative to SFT. 5. Using four instruction-following benchmarks and multiple reasoning datasets provides reasonable emp

Weaknesses

### 1. Data Contamination / Generalization Concerns For example, Qwen2.5-7B-Instruct is trained on IFEval, InfoBench, and ComplexBench as calibration data, and this paper evaluates RAIN-Merging on the same benchmarks. Results may not generalize to unseen instruction-following or reasoning scenarios. Maybe the null-space projection and coefficients are optimized on the same distribution they're tested on. ### 2. Data The paper evaluates instruction-following and reasoning on separate benchmark d

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Topic Modeling