Decouple before Integration: Test-time Synthesis of SFT and RLVR Task Vectors

Chaohao Yuan; Chenghao Xiao; Yu Rong; Hong Cheng; Long-Kai Huang

arXiv:2605.00610·cs.LG·May 4, 2026

Decouple before Integration: Test-time Synthesis of SFT and RLVR Task Vectors

Chaohao Yuan, Chenghao Xiao, Yu Rong, Hong Cheng, Long-Kai Huang

PDF

1 Repo

TL;DR

This paper introduces DoTS, a post-hoc framework that synthesizes SFT and RLVR capabilities at inference time via task vector arithmetic, avoiding catastrophic forgetting and gradient conflicts.

Contribution

Proposes Decoupled Test-time Synthesis (DoTS), enabling independent training of SFT and RLVR checkpoints and their combination at inference without model updates.

Findings

01

DoTS matches or exceeds training-based SFT-RLVR methods on reasoning benchmarks.

02

It surpasses state-of-the-art models when applied to stronger checkpoints.

03

It generalizes to out-of-domain benchmarks without re-tuning.

Abstract

SFT and RLVR represent two fundamental yet distinct paradigms for LLM post-training, each excelling in distinct dimensions. SFT expands knowledge breadth while RLVR enhances reasoning depth. Yet integrating these complementary strengths remains a formidable challenge. Sequential training can cause catastrophic forgetting, and joint optimization often suffers from severe gradient conflicts. We analyze SFT and RLVR through the lens of task vectors and reveal three structural properties behind these failures: a 30* magnitude disparity, 45* sign interference, and heterogeneous module-wise update distributions. These findings show SFT and RLVR are difficult to integrate directly, but they also suggest that the two paradigms modify partly complementary components of the model. Motivated by these observations, we propose Decoupled Test-time Synthesis (DoTS), a post-hoc framework allows SFT and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chaohaoyuan/DoTS
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.