How Fast Can I Run My VLA? Demystifying VLA Inference Performance with VLA-Perf
Wenqi Jiang, Jason Clemons, Karu Sankaralingam, Christos Kozyrakis

TL;DR
This paper introduces VLA-Perf, an analytical model for VLA inference performance, systematically studying how model design and deployment choices impact real-time inference capabilities in embodied AI tasks.
Contribution
The paper presents VLA-Perf, the first analytical performance model for VLA inference, and provides comprehensive insights into optimizing model design and deployment for real-time applications.
Findings
Inference performance is significantly affected by model scaling and architecture.
Long-context inputs and asynchronous inference influence latency.
Optimal deployment depends on hardware and network conditions.
Abstract
Vision-Language-Action (VLA) models have recently demonstrated impressive capabilities across various embodied AI tasks. While deploying VLA models on real-world robots imposes strict real-time inference constraints, the inference performance landscape of VLA remains poorly understood due to the large combinatorial space of model architectures and inference systems. In this paper, we ask a fundamental research question: How should we design future VLA models and systems to support real-time inference? To address this question, we first introduce VLA-Perf, an analytical performance model that can analyze inference performance for arbitrary combinations of VLA models and inference systems. Using VLA-Perf, we conduct the first systematic study of the VLA inference performance landscape. From a model-design perspective, we examine how inference performance is affected by model scaling,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Explainable Artificial Intelligence (XAI)
