HBVLA: Pushing 1-Bit Post-Training Quantization for Vision-Language-Action Models
Xin Yan, Zhenglin Wan, Feiyang Ye, Xingrui Yu, Hangyu Du, Yang You, Ivor Tsang

TL;DR
This paper introduces HBVLA, a novel binarization framework tailored for vision-language-action models, significantly reducing model size and computation while maintaining high performance for deployment on resource-limited platforms.
Contribution
We propose a policy-aware Hessian-based weight importance measure and a sparse orthogonal transform for effective 1-bit quantization of VLA models, addressing distribution gap issues.
Findings
Quantized models retain over 92% of full-precision performance.
HBVLA outperforms existing binarization methods in accuracy.
Demonstrates robust deployment on real-world robotic platforms.
Abstract
Vision-Language-Action (VLA) models enable instruction-following embodied control, but their large compute and memory footprints hinder deployment on resource-constrained robots and edge platforms. While reducing weights to 1-bit precision through binarization can greatly improve efficiency, existing methods fail to narrow the distribution gap between binarized and full-precision weights, causing quantization errors to accumulate under long-horizon closed-loop execution and severely degrade actions. To fill this gap, we propose HBVLA, a VLA-tailored binarization framework. First, we use a policy-aware enhanced Hessian to identify weights that are truly critical for action generation. Then, we employ a sparse orthogonal transform for non-salient weights to induce a low-entropy intermediate state. Finally, we quantize both salient and non-salient weights in the Harr domain with group-wise…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Advanced Memory and Neural Computing
