VLAgents: A Policy Server for Efficient VLA Inference
Tobias J\"ulg, Khaled Gamal, Nisarga Nilavadi, Pierre Krack, Seongjin Bien, Michael Krawez, Florian Walter, Wolfram Burgard

TL;DR
VLAgents is a modular policy server that streamlines VLA inference in robotics by supporting flexible communication methods, improving deployment efficiency and performance in distributed systems.
Contribution
Introduces VLAgents, a unified, adaptable policy server for VLA inference that enhances deployment and communication efficiency in robotics applications.
Findings
Outperforms existing policy servers in benchmarks
Supports both high-speed shared memory and compressed streaming
Successfully integrates multiple VLA policies
Abstract
The rapid emergence of Vision-Language-Action models (VLAs) has a significant impact on robotics. However, their deployment remains complex due to the fragmented interfaces and the inherent communication latency in distributed setups. To address this, we introduce VLAgents, a modular policy server that abstracts VLA inferencing behind a unified Gymnasium-style protocol. Crucially, its communication layer transparently adapts to the context by supporting both zero-copy shared memory for high-speed simulation and compressed streaming for remote hardware. In this work, we present the architecture of VLAgents and validate it by integrating seven policies -- including OpenVLA and Pi Zero. In a benchmark with both local and remote communication, we further demonstrate how it outperforms the default policy servers provided by OpenVLA, OpenPi, and LeRobot. VLAgents is available at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Robotics and Automated Systems · Reinforcement Learning in Robotics
