FRISM: Fine-Grained Reasoning Injection via Subspace-Level Model Merging for Vision-Language Models

Chenyu Huang; Peng Ye; Xudong Tan; Jinhan Mu; Shenghe Zheng; Li Shen; Tao Chen

arXiv:2601.21187·cs.CV·May 8, 2026

FRISM: Fine-Grained Reasoning Injection via Subspace-Level Model Merging for Vision-Language Models

Chenyu Huang, Peng Ye, Xudong Tan, Jinhan Mu, Shenghe Zheng, Li Shen, Tao Chen

PDF

TL;DR

FRISM is a novel framework that enhances vision-language models' reasoning abilities through fine-grained, subspace-level model merging and adaptive learning, improving reasoning without sacrificing visual performance.

Contribution

It introduces a subspace-level merging approach with SVD decomposition and a label-free self-distillation strategy for better reasoning injection in VLMs.

Findings

01

FRISM improves reasoning capabilities across multiple benchmarks.

02

It maintains visual performance while enhancing reasoning.

03

The method outperforms existing coarse-grained merging techniques.

Abstract

Efficiently enhancing the reasoning capabilities of Vision-Language Models (VLMs) by merging them with Large Reasoning Models (LRMs) has emerged as a promising direction. However, existing methods typically operate at a coarse-grained layer level, which often leads to a trade-off between injecting reasoning capabilities and preserving visual capabilities. To address this limitation, we propose FRISM (Fine-grained Reasoning Injection via Subspace-level model Merging), a fine-grained reasoning injection framework based on subspace-level model merging. Observing that different SVD subspaces contribute differently to reasoning and perception, FRISM decomposes LRM task vectors via Singular Value Decomposition (SVD) and adaptively tunes the scaling coefficients of each subspace through learning to realize fine-grained reasoning injection. Furthermore, we introduce a label-free…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.