CLASP: Class-Adaptive Layer Fusion and Dual-Stage Pruning for Multimodal Large Language Models

Yunkai Dang; Yizhu Jiang; Yifan Jiang; Qi Fan; Yinghuan Shi; Wenbin Li; and Yang Gao

arXiv:2604.12767·cs.CV·April 15, 2026

CLASP: Class-Adaptive Layer Fusion and Dual-Stage Pruning for Multimodal Large Language Models

Yunkai Dang, Yizhu Jiang, Yifan Jiang, Qi Fan, Yinghuan Shi, Wenbin Li, and Yang Gao

PDF

1 Repo

TL;DR

CLASP is a flexible token reduction framework for multimodal large language models that uses class-adaptive layer fusion and dual-stage pruning to improve efficiency and robustness.

Contribution

It introduces a novel class-adaptive pruning method with multi-layer feature fusion and dual-stage token selection, outperforming existing approaches.

Findings

01

CLASP achieves superior performance across various benchmarks.

02

It effectively reduces visual tokens while maintaining accuracy.

03

The method is robust under diverse instructions and architectures.

Abstract

Multimodal Large Language Models (MLLMs) suffer from substantial computational overhead due to the high redundancy in visual token sequences. Existing approaches typically address this issue using single-layer Vision Transformer (ViT) features and static pruning strategies. However, such fixed configurations are often brittle under diverse instructions. To overcome these limitations, we propose CLASP, a plug-and-play token reduction framework based on class-adaptive layer fusion and dual-stage pruning. Specifically, CLASP first constructs category-specific visual representations through multi-layer vision feature fusion. It then performs dual-stage pruning, allocating the token budget between attention-salient pivot tokens for relevance and redundancy-aware completion tokens for coverage. Through class-adaptive pruning, CLASP enables prompt-conditioned feature fusion and budget…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Yunkaidang/CLASP
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.