Lite VLA: Efficient Vision-Language-Action Control on CPU-Bound Edge Robots
Justin Williams, Kishor Datta Gupta, Roy George, and Mrinmoy Sarkar

TL;DR
This paper presents Lite VLA, a compact vision-language model system enabling real-time scene understanding and reasoning on resource-constrained mobile robots without cloud reliance, facilitating autonomous operation in GPS-denied environments.
Contribution
The work introduces a novel integrated framework for deploying small VLMs on embedded hardware, allowing simultaneous perception, reasoning, and mobility in dynamic environments.
Findings
Successful deployment of VLMs on mobile robots for real-time reasoning
Achieved a balance between computational efficiency and task accuracy
Demonstrated system responsiveness in dynamic scenarios
Abstract
The deployment of artificial intelligence models at the edge is increasingly critical for autonomous robots operating in GPS-denied environments where local, resource-efficient reasoning is essential. This work demonstrates the feasibility of deploying small Vision-Language Models (VLMs) on mobile robots to achieve real-time scene understanding and reasoning under strict computational constraints. Unlike prior approaches that separate perception from mobility, the proposed framework enables simultaneous movement and reasoning in dynamic environments using only on-board hardware. The system integrates a compact VLM with multimodal perception to perform contextual interpretation directly on embedded hardware, eliminating reliance on cloud connectivity. Experimental validation highlights the balance between computational efficiency, task accuracy, and system responsiveness. Implementation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Robotics and Sensor-Based Localization · Advanced Neural Network Applications
