A1: A Fully Transparent Open-Source, Adaptive and Efficient Truncated Vision-Language-Action Model

Kaidong Zhang; Jian Zhang; Rongtao Xu; Yu Sun; Shuoshuo Xue; Youpeng Wen; Xiaoyu Guo; Minghao Guo; Weijia Liufu; Liu Zihou; Kangyi Ji; Yangsong Zhang; Jiarun Zhu; Jingzhi Liu; Zihang Li; Ruiyi Chen; Meng Cao; Jingming Zhang; Shen Zhao; Xiaojun Chang; Feng Zheng; Ivan Laptev; Xiaodan Liang

arXiv:2604.05672·cs.RO·April 16, 2026

A1: A Fully Transparent Open-Source, Adaptive and Efficient Truncated Vision-Language-Action Model

Kaidong Zhang, Jian Zhang, Rongtao Xu, Yu Sun, Shuoshuo Xue, Youpeng Wen, Xiaoyu Guo, Minghao Guo, Weijia Liufu, Liu Zihou, Kangyi Ji, Yangsong Zhang, Jiarun Zhu, Jingzhi Liu, Zihang Li, Ruiyi Chen, Meng Cao, Jingming Zhang, Shen Zhao, Xiaojun Chang, Feng Zheng, Ivan Laptev

PDF

TL;DR

A1 is an open-source, efficient vision-language-action framework for robot manipulation that reduces inference costs through adaptive schemes while maintaining high success rates.

Contribution

The paper introduces a fully transparent VLA model with adaptive inference techniques, enabling low-cost, high-throughput robot manipulation without performance loss.

Findings

01

Achieves up to 72% lower latency in flow-matching inference.

02

Reduces backbone computation by up to 76.6%.

03

Outperforms several baselines on RoboChallenge.

Abstract

Vision-Language-Action (VLA) models have emerged as a powerful paradigm for open-world robot manipulation, but their practical deployment is often constrained by cost: billion-scale VLM backbones and iterative diffusion/flow-based action heads incur high latency and compute, making real-time control expensive on commodity hardware. We present A1, a fully open-source and transparent VLA framework designed for low-cost, high-throughput inference without sacrificing manipulation success; Our approach leverages pretrained VLMs that provide implicit affordance priors for action generation. We release the full training stack (training code, data/data-processing pipeline, intermediate checkpoints, and evaluation scripts) to enable end-to-end reproducibility. Beyond optimizing the VLM alone, A1 targets the full inference pipeline by introducing a budget-aware adaptive inference scheme that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.