How Fast Can I Run My VLA? Demystifying VLA Inference Performance with VLA-Perf

Wenqi Jiang; Jason Clemons; Karu Sankaralingam; Christos Kozyrakis

arXiv:2602.18397·cs.RO·February 23, 2026

How Fast Can I Run My VLA? Demystifying VLA Inference Performance with VLA-Perf

Wenqi Jiang, Jason Clemons, Karu Sankaralingam, Christos Kozyrakis

PDF

Open Access

TL;DR

This paper introduces VLA-Perf, an analytical model for VLA inference performance, systematically studying how model design and deployment choices impact real-time inference capabilities in embodied AI tasks.

Contribution

The paper presents VLA-Perf, the first analytical performance model for VLA inference, and provides comprehensive insights into optimizing model design and deployment for real-time applications.

Findings

01

Inference performance is significantly affected by model scaling and architecture.

02

Long-context inputs and asynchronous inference influence latency.

03

Optimal deployment depends on hardware and network conditions.

Abstract

Vision-Language-Action (VLA) models have recently demonstrated impressive capabilities across various embodied AI tasks. While deploying VLA models on real-world robots imposes strict real-time inference constraints, the inference performance landscape of VLA remains poorly understood due to the large combinatorial space of model architectures and inference systems. In this paper, we ask a fundamental research question: How should we design future VLA models and systems to support real-time inference? To address this question, we first introduce VLA-Perf, an analytical performance model that can analyze inference performance for arbitrary combinations of VLA models and inference systems. Using VLA-Perf, we conduct the first systematic study of the VLA inference performance landscape. From a model-design perspective, we examine how inference performance is affected by model scaling,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Explainable Artificial Intelligence (XAI)