GM-Skip: Metric-Guided Transformer Block Skipping for Efficient Vision-Language Models

Lianming Huang; Haibo Hu; Qiao Li; Xin He; Nan Guan; Chun Jason Xue

arXiv:2508.18227·cs.CV·August 26, 2025

GM-Skip: Metric-Guided Transformer Block Skipping for Efficient Vision-Language Models

Lianming Huang, Haibo Hu, Qiao Li, Xin He, Nan Guan, Chun Jason Xue

PDF

TL;DR

GM-Skip introduces a metric-guided Transformer block skipping method that significantly accelerates vision-language model inference while maintaining high task performance, suitable for latency-sensitive applications.

Contribution

It proposes a novel, metric-adaptive framework for selectively skipping Transformer blocks in VLMs, balancing speed and accuracy with a greedy, feedback-driven approach.

Findings

01

Speeds up inference by over 40% on COCO tasks

02

Maintains high accuracy, e.g., 87.3% on Person classification

03

Reduces latency by up to 45.4% in autonomous vehicle deployment

Abstract

Transformer-based Vision-Language Models (VLMs) have achieved impressive performance on tasks such as image captioning, object recognition, and visual reasoning, but their high computational cost hinders deployment in latency-sensitive applications like autonomous driving. We introduce GM-Skip, a flexible and metric-adaptive framework for Transformer block skipping that accelerates VLM inference while preserving output quality. GM-Skip features a greedy, metric-guided block selection strategy that uses metric feedback (e.g., accuracy, CIDEr) to identify redundant layers, along with a reverse-order deletion mechanism that preserves early foundational blocks to avoid performance collapse. To support diverse deployment needs, it incorporates a tunable trade-off between sparsity and performance via a score-sparsity balance objective. Experiments across multiple tasks and datasets, including…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.