HiPP-Prune: Hierarchical Preference-Conditioned Structured Pruning for Vision-Language Models

Lincen Bai; Hedi Tabia; Raul Santos-Rodriguez

arXiv:2603.06270·cs.CV·March 9, 2026

HiPP-Prune: Hierarchical Preference-Conditioned Structured Pruning for Vision-Language Models

Lincen Bai, Hedi Tabia, Raul Santos-Rodriguez

PDF

Open Access

TL;DR

HiPP-Prune is a hierarchical, preference-conditioned structured pruning framework for vision-language models that optimizes multiple objectives, including task utility and hallucination robustness, through plan-level decisions and a user-adjustable trade-off.

Contribution

It introduces a novel hierarchical pruning approach with a preference-conditioned policy and visual sensitivity integration, enabling controllable robustness-utility trade-offs in VLM pruning.

Findings

01

Discoveries of diverse non-dominated pruning plans.

02

Enhanced robustness-utility trade-offs demonstrated on LLaVA and ScienceQA.

03

Effective control over hallucination and task performance balance.

Abstract

Pruning vision-language models (VLMs) for efficient deployment is challenging because compression can affect not only task utility but also visual grounding, often amplifying object hallucinations even at the same sparsity level. We present HiPP-Prune, a hierarchical preference-conditioned structured pruning framework that treats pruning as conditional resource allocation under multiple objectives. HiPP-Prune makes plan-level decisions: a single policy invocation outputs a global pruning blueprint by factorizing decisions into an overall sparsity budget and a layer-wise allocation, enabling queryable trade-offs via a user-specified preference vector. To account for VLM-specific failure modes, our policy state integrates a visual sensitivity signal derived from attention flow between vision tokens and language hidden states, discouraging over-pruning of vision-critical layers that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning