QuantVLA: Scale-Calibrated Post-Training Quantization for Vision-Language-Action Models

Jingxuan Zhang; Yunta Hsieh; Zhongwei Wan; Haokun Lin; Xin Wang; Ziqi Wang; Yingtie Lei; Mi Zhang

arXiv:2602.20309·cs.LG·April 8, 2026

QuantVLA: Scale-Calibrated Post-Training Quantization for Vision-Language-Action Models

Jingxuan Zhang, Yunta Hsieh, Zhongwei Wan, Haokun Lin, Xin Wang, Ziqi Wang, Yingtie Lei, Mi Zhang

PDF

1 Repo

TL;DR

QuantVLA introduces a novel, training-free post-training quantization framework for vision-language-action models, significantly reducing memory and compute demands while maintaining high task success rates.

Contribution

It is the first PTQ method for VLA systems and successfully quantizes a diffusion transformer action head, enabling scalable low-bit embodied intelligence.

Findings

01

Exceeds full-precision baseline success rates on LIBERO tasks.

02

Achieves approximately 70% relative memory savings.

03

Supports low-bit integer kernels without architecture changes.

Abstract

Vision-language-action (VLA) models unify perception, language, and control for embodied agents but face significant challenges in practical deployment due to rapidly increasing compute and memory demands, especially as models scale to longer horizons and larger backbones. To address these bottlenecks, we introduce QuantVLA, a training-free post-training quantization (PTQ) framework that, to our knowledge, is the first PTQ approach for VLA systems and the first to successfully quantize a diffusion transformer (DiT) action head. QuantVLA incorporates three scale-calibrated components: (1) a selective quantization layout that integerizes all linear layers in both the language backbone and the DiT while keeping attention projections in floating point to preserve the original operator schedule; (2) attention temperature matching, a lightweight per-head scaling mechanism that stabilizes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aiot-mlsys-lab/QuantVLA
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.