EdgeVLA: Efficient Vision-Language-Action Models

Pawe{\l} Budzianowski; Wesley Maa; Matthew Freed; Jingxiang Mo; Winston Hsiao; Aaron Xie; Tomasz M{\l}oduchowski; Viraj Tipnis; Benjamin Bolte

arXiv:2507.14049·cs.RO·July 21, 2025

EdgeVLA: Efficient Vision-Language-Action Models

Pawe{\l} Budzianowski, Wesley Maa, Matthew Freed, Jingxiang Mo, Winston Hsiao, Aaron Xie, Tomasz M{\l}oduchowski, Viraj Tipnis, Benjamin Bolte

PDF

Open Access

TL;DR

EdgeVLA introduces a fast, resource-efficient vision-language-action model for robotics, enabling real-time performance on edge devices without sacrificing accuracy.

Contribution

The paper presents EVLA, a novel method that accelerates VLA inference by 7x and reduces computational demands using small language models, maintaining performance.

Findings

01

7x inference speedup on edge devices

02

Comparable training performance to larger models

03

Significant reduction in memory usage

Abstract

Vision-Language Models (VLMs) have emerged as a promising approach to address the data scarcity challenge in robotics, enabling the development of generalizable visuomotor control policies. While models like OpenVLA showcase the potential of this paradigm, deploying large-scale VLMs on resource-constrained mobile manipulation systems remains a significant hurdle. This paper introduces Edge VLA (EVLA), a novel approach designed to significantly enhance the inference speed of Vision-Language-Action (VLA) models. EVLA maintains the representational power of these models while enabling real-time performance on edge devices. We achieve this through two key innovations: 1) Eliminating the autoregressive requirement for end-effector position prediction, leading to a 7x speedup in inference, and 2) Leveraging the efficiency of Small Language Models (SLMs), demonstrating comparable training…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Multimodal Machine Learning Applications · Reinforcement Learning in Robotics