Scalable, Training-Free Visual Language Robotics: A Modular Multi-Model Framework for Consumer-Grade GPUs
Marie Samson, Bastien Muraccioli, Fumio Kanehiro

TL;DR
This paper presents SVLR, a scalable, modular, training-free framework for vision-language robotic control that integrates lightweight open-source models to enable flexible, real-time task execution on consumer-grade GPUs.
Contribution
Introduces SVLR, a novel open-source framework that allows scalable, retraining-free robotic control using modular AI models for visual and language understanding.
Findings
Operates effectively on NVIDIA RTX 2070 GPU
Successfully performs pick-and-place tasks
Supports easy integration of new tasks without retraining
Abstract
The integration of language instructions with robotic control, particularly through Vision Language Action (VLA) models, has shown significant potential. However, these systems are often hindered by high computational costs, the need for extensive retraining, and limited scalability, making them less accessible for widespread use. In this paper, we introduce SVLR (Scalable Visual Language Robotics), an open-source, modular framework that operates without the need for retraining, providing a scalable solution for robotic control. SVLR leverages a combination of lightweight, open-source AI models including the Vision-Language Model (VLM) Mini-InternVL, zero-shot image segmentation model CLIPSeg, Large Language Model Phi-3, and sentence similarity model all-MiniLM to process visual and language inputs. These models work together to identify objects in an unknown environment, use them as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotic Path Planning Algorithms · Robotics and Automated Systems · Multimodal Machine Learning Applications
