Collaborative Multi-Mode Pruning for Vision-Language Models

Zimeng Wu; Yunhong Wang; Donghao Wang; Jiaxin Chen

arXiv:2604.02956·cs.CV·April 6, 2026

Collaborative Multi-Mode Pruning for Vision-Language Models

Zimeng Wu, Yunhong Wang, Donghao Wang, Jiaxin Chen

PDF

1 Repo

TL;DR

This paper introduces CoMP, a joint parameter and token pruning framework for vision-language models, which improves performance at high pruning ratios by exploring redundancy in both modes.

Contribution

The paper proposes a novel collaborative importance metric and multi-mode pruning strategy that jointly prunes parameters and tokens in VLMs, outperforming existing methods.

Findings

01

CoMP achieves better performance at high pruning ratios.

02

It effectively explores redundancy in both parameters and tokens.

03

Source code is publicly available at https://github.com/Wuzimeng/CoMP.git.

Abstract

Vision-Language Models (VLMs) have advanced rapidly within the unified Transformer architecture, yet their deployment on resource-constrained devices remains challenging due to high computational complexity. While pruning has emerged as an effective technique for compressing VLMs, existing approaches predominantly focus on a single mode by pruning either parameters or tokens, neglecting fully exploring the inherent redundancy in each mode, which leads to substantial performance degradation at high pruning ratios. To address the above limitations, we propose Collaborative Multi-Mode Pruning (CoMP), a novel framework tailored for VLMs by performing joint parameter and token pruning. Specifically, we first design a Collaborative Importance Metric (CIM) that investigates the mutual interference between the coupled parameters and tokens. It incorporates distinct significance of tokens into…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Wuzimeng/CoMP.git
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.