Gated Relational Alignment via Confidence-based Distillation for Efficient VLMs

Yanlong Chen; Amirhossein Habibian; Luca Benini; Yawei Li

arXiv:2601.22709·cs.CV·May 5, 2026

Gated Relational Alignment via Confidence-based Distillation for Efficient VLMs

Yanlong Chen, Amirhossein Habibian, Luca Benini, Yawei Li

PDF

2 Models

TL;DR

GRACE is a novel framework combining knowledge distillation and quantization-aware training based on the Information Bottleneck principle, enabling efficient vision-language models with minimal accuracy loss.

Contribution

It introduces confidence-gated distillation, relational kernel alignment, and an adaptive controller to improve quantization of VLMs, outperforming existing methods.

Findings

01

INT4 models outperform FP16 baselines on benchmarks

02

Nearly match teacher performance with significant resource savings

03

Achieve 3x throughput and 54% memory reduction

Abstract

Vision-Language Models (VLMs) achieve strong multimodal performance but are costly to deploy, and post-training quantization often causes significant accuracy loss. Despite its potential, quantization-aware training for VLMs remains underexplored. We propose GRACE, a framework unifying knowledge distillation and QAT under the Information Bottleneck principle: quantization constrains information capacity while distillation guides what to preserve within this budget. Treating the teacher as a proxy for task-relevant information, we introduce confidence-gated decoupled distillation to filter unreliable supervision, relational centered kernel alignment to transfer visual token structures, and an adaptive controller via Lagrangian relaxation to balance fidelity against capacity constraints. Across extensive benchmarks on LLaVA and Qwen families, our INT4 models consistently outperform FP16…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.