CoreMatching: A Co-adaptive Sparse Inference Framework with Token and Neuron Pruning for Comprehensive Acceleration of Vision-Language Models

Qinsi Wang; Hancheng Ye; Ming-Yu Chung; Yudong Liu; Yueqian Lin; Martin Kuo; Mingyuan Ma; Jianyi Zhang; Yiran Chen

arXiv:2505.19235·cs.LG·May 27, 2025

CoreMatching: A Co-adaptive Sparse Inference Framework with Token and Neuron Pruning for Comprehensive Acceleration of Vision-Language Models

Qinsi Wang, Hancheng Ye, Ming-Yu Chung, Yudong Liu, Yueqian Lin, Martin Kuo, Mingyuan Ma, Jianyi Zhang, Yiran Chen

PDF

1 Repo

TL;DR

CoreMatching introduces a co-adaptive sparse inference framework that synergistically combines token and neuron pruning to significantly accelerate vision-language models, surpassing state-of-the-art efficiency benchmarks.

Contribution

This work uncovers the interplay between token and neuron sparsity in VLMs and proposes a novel framework leveraging their synergy for improved inference efficiency.

Findings

01

Achieved 5x FLOPs reduction on NVIDIA Titan Xp

02

Realized 10x overall speedup in inference

03

Surpassed state-of-the-art baselines on multiple image understanding tasks

Abstract

Vision-Language Models (VLMs) excel across diverse tasks but suffer from high inference costs in time and memory. Token sparsity mitigates inefficiencies in token usage, while neuron sparsity reduces high-dimensional computations, both offering promising solutions to enhance efficiency. Recently, these two sparsity paradigms have evolved largely in parallel, fostering the prevailing assumption that they function independently. However, a fundamental yet underexplored question remains: Do they truly operate in isolation, or is there a deeper underlying interplay that has yet to be uncovered? In this paper, we conduct the first comprehensive investigation into this question. By introducing and analyzing the matching mechanism between Core Neurons and Core Tokens, we found that key neurons and tokens for inference mutually influence and reinforce each other. Building on this insight, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wangqinsi1/2025-icml-corematching
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.