LQA: A Lightweight Quantized-Adaptive Framework for Vision-Language Models on the Edge

Xin Wang; Hong Jia; Hualin Zhou; Sheng Guang Wang; Yu Zhang; Ting Dang; Tao Gu

arXiv:2602.07849·cs.AI·February 18, 2026

LQA: A Lightweight Quantized-Adaptive Framework for Vision-Language Models on the Edge

Xin Wang, Hong Jia, Hualin Zhou, Sheng Guang Wang, Yu Zhang, Ting Dang, Tao Gu

PDF

Open Access

TL;DR

LQA is a lightweight, quantized framework that enables efficient and robust deployment of vision-language models on edge devices by combining modality-aware quantization with gradient-free test-time adaptation.

Contribution

It introduces Selective Hybrid Quantization and a gradient-free adaptation mechanism, making VLMs more resource-efficient and robust against distribution shifts on edge hardware.

Findings

01

Improves adaptation performance by 4.5% across datasets.

02

Reduces memory usage by up to 19.9 times compared to full-precision models.

03

Outperforms gradient-based TTA methods in resource-constrained environments.

Abstract

Deploying Vision-Language Models (VLMs) on edge devices is challenged by resource constraints and performance degradation under distribution shifts. While test-time adaptation (TTA) can counteract such shifts, existing methods are too resource-intensive for on-device deployment. To address this challenge, we propose LQA, a lightweight, quantized-adaptive framework for VLMs that combines a modality-aware quantization strategy with gradient-free test-time adaptation. We introduce Selective Hybrid Quantization (SHQ) and a quantized, gradient-free adaptation mechanism to enable robust and efficient VLM deployment on resource-constrained hardware. Experiments across both synthetic and real-world distribution shifts show that LQA improves overall adaptation performance by 4.5\%, uses less memory than full-precision models, and significantly outperforms gradient-based TTA methods, achieving up…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning