BitVLA: 1-bit Vision-Language-Action Models for Robotics Manipulation

Hongyu Wang; Chuyan Xiong; Ruiping Wang; Xilin Chen

arXiv:2506.07530·cs.RO·March 3, 2026

BitVLA: 1-bit Vision-Language-Action Models for Robotics Manipulation

Hongyu Wang, Chuyan Xiong, Ruiping Wang, Xilin Chen

PDF

Open Access 1 Repo 3 Models 1 Datasets

TL;DR

BitVLA introduces a fully native 1-bit vision-language-action model for robotics, significantly reducing memory and latency while maintaining strong task performance, enabling efficient deployment on edge devices.

Contribution

The paper presents BitVLA, a novel 1-bit VLA model built on 1-bit LLM, with a new quantize-then-distill strategy for vision encoder compression, achieving high efficiency without sacrificing accuracy.

Findings

01

Matches full-precision baseline performance

02

Reduces model memory by 11.0x

03

Lowers end-to-end latency by 4.4x

Abstract

Deploying powerful Vision-Language-Action (VLA) models on edge devices is limited by their massive size. In this paper, we take a deployment-oriented view of VLA training: we target efficiency through model design and optimization, rather than relying solely on post-hoc compression. Thus, we propose BitVLA, a fully native 1-bit VLA model for robotic manipulation, where every parameters is ternary, i.e., {-1,0,1}. BitVLA is built on the publicly available 1-bit LLM BitNet b1.58 2B4T, and is trained as a vision-language-action policy that inherits the compactness of 1-bit pretraining while retaining strong task performance. To further reduce the memory footprint of the vision backbone, we introduce Quantize-then-Distill, a post-training quantization-aware training strategy that compresses a full-precision vision encoder to 1.58-bit weights, while a full-precision teacher guides…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ustcwhy/bitvla
jaxOfficial

Models

Datasets

hongyuw/BitVLA-MAmmoTH-VL
dataset· 14 dl
14 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning

MethodsALIGN