LLMC+: Benchmarking Vision-Language Model Compression with a Plug-and-play Toolkit

Chengtao Lv; Bilang Zhang; Yang Yong; Ruihao Gong; Yushi Huang; Shiqiao Gu; Jiajun Wu; Yumeng Shi; Jinyang Guo; Wenya Wang

arXiv:2508.09981·cs.CV·November 18, 2025

LLMC+: Benchmarking Vision-Language Model Compression with a Plug-and-play Toolkit

Chengtao Lv, Bilang Zhang, Yang Yong, Ruihao Gong, Yushi Huang, Shiqiao Gu, Jiajun Wu, Yumeng Shi, Jinyang Guo, Wenya Wang

PDF

1 Video

TL;DR

LLMC+ is a comprehensive benchmark and toolkit for evaluating and combining various vision-language model compression techniques, addressing current limitations and promoting fair, realistic assessments of model efficiency.

Contribution

Introduces LLMC+, a versatile benchmark with a plug-and-play toolkit supporting over 20 algorithms for systematic VLM compression evaluation.

Findings

01

Spatial and temporal redundancies require different strategies.

02

Token reduction impacts multi-turn and detail-sensitive tasks.

03

Combining token and model compression yields high efficiency with minimal performance loss.

Abstract

Large Vision-Language Models (VLMs) exhibit impressive multi-modal capabilities but suffer from prohibitive computational and memory demands, due to their long visual token sequences and massive parameter sizes. To address these issues, recent works have proposed training-free compression methods. However, existing efforts often suffer from three major limitations: (1) Current approaches do not decompose techniques into comparable modules, hindering fair evaluation across spatial and temporal redundancy. (2) Evaluation confined to simple single-turn tasks, failing to reflect performance in realistic scenarios. (3) Isolated use of individual compression techniques, without exploring their joint potential. To overcome these gaps, we introduce LLMC+, a comprehensive VLM compression benchmark with a versatile, plug-and-play toolkit. LLMC+ supports over 20 algorithms across five…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

LLMC+: Benchmarking Vision-Language Model Compression with a plug-and-play Toolkit· underline