Consolidation or Adaptation? PRISM: Disentangling SFT and RL Data via Gradient Concentration

Yang Zhao; Yangou Ouyang; Xiao Ding; Hepeng Wang; Bibo Cai; Kai Xiong; Jinglong Gao; Zhouhao Sun; Li Du; Bing Qin; Ting Liu

arXiv:2601.07224·cs.AI·April 14, 2026

Consolidation or Adaptation? PRISM: Disentangling SFT and RL Data via Gradient Concentration

Yang Zhao, Yangou Ouyang, Xiao Ding, Hepeng Wang, Bibo Cai, Kai Xiong, Jinglong Gao, Zhouhao Sun, Li Du, Bing Qin, Ting Liu

PDF

1 Datasets

TL;DR

PRISM is a dynamics-aware framework that improves hybrid supervised fine-tuning and reinforcement learning for large language models by analyzing gradient structures to better allocate data.

Contribution

It introduces a novel data arbitration method based on gradient concentration, grounded in Schema Theory, to enhance LLM training efficiency and effectiveness.

Findings

01

PRISM outperforms state-of-the-art methods on WebShop and ALFWorld.

02

It reduces computational costs by up to 3.22 times.

03

PRISM achieves a Pareto improvement in training outcomes.

Abstract

While Hybrid Supervised Fine-Tuning (SFT) followed by Reinforcement Learning (RL) has become the standard paradigm for training LLM agents, effective mechanisms for data allocation between these stages remain largely underexplored. Current data arbitration strategies often rely on surface-level heuristics that fail to diagnose intrinsic learning needs. Since SFT targets pattern consolidation through imitation while RL drives structural adaptation via exploration, misaligning data with these functional roles causes severe optimization interference. We propose PRISM, a dynamics-aware framework grounded in Schema Theory that arbitrates data based on its degree of cognitive conflict with the model's existing knowledge. By analyzing the spatial geometric structure of gradients, PRISM identifies data triggering high spatial concentration as high-conflict signals that require RL for structural…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

molmohsen/awesome-ai-agent-papers
dataset· 39 dl
39 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.