OpenHelix: A Short Survey, Empirical Analysis, and Open-Source Dual-System VLA Model for Robotic Manipulation
Can Cui, Pengxiang Ding, Wenxuan Song, Shuanghao Bai, Xinyang Tong,, Zirui Ge, Runze Suo, Wanqi Zhou, Yang Liu, Bofang Jia, Han Zhao, Siteng, Huang, Donglin Wang

TL;DR
This paper provides a comprehensive survey and empirical analysis of dual-system VLA models for robotic manipulation, introducing an open-source model to facilitate further research and optimization.
Contribution
It offers the first systematic comparison and empirical evaluation of existing dual-system VLA architectures, along with an open-source model for community use.
Findings
Comparison of structural designs of dual-system architectures
Empirical evaluation of core design elements
Open-source VLA model for further research
Abstract
Dual-system VLA (Vision-Language-Action) architectures have become a hot topic in embodied intelligence research, but there is a lack of sufficient open-source work for further performance analysis and optimization. To address this problem, this paper will summarize and compare the structural designs of existing dual-system architectures, and conduct systematic empirical evaluations on the core design elements of existing dual-system architectures. Ultimately, it will provide a low-cost open-source model for further exploration. Of course, this project will continue to update with more experimental conclusions and open-source models with improved performance for everyone to choose from. Project page: https://openhelix-robot.github.io/.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Social Robot Interaction and HRI · Robot Manipulation and Learning
