ROSGPT_Vision: Commanding Robots Using Only Language Models' Prompts

Bilel Benjdira; Anis Koubaa; Anas M. Ali

arXiv:2308.11236·cs.RO·February 18, 2025·1 cites

ROSGPT_Vision: Commanding Robots Using Only Language Models' Prompts

Bilel Benjdira, Anis Koubaa, Anas M. Ali

PDF

Open Access 1 Repo

TL;DR

ROSGPT_Vision introduces a novel framework that enables robots to perform complex tasks using only language model prompts, integrating visual and task prompts for autonomous decision-making.

Contribution

This paper presents the PRM design pattern and implements ROSGPT_Vision, a framework that automates visual and task prompting for robot control using language models.

Findings

01

ROSGPT_Vision reduces development costs significantly.

02

The framework effectively processes visual data for real-world tasks.

03

Prompting strategies can be optimized to improve application quality.

Abstract

In this paper, we argue that the next generation of robots can be commanded using only Language Models' prompts. Every prompt interrogates separately a specific Robotic Modality via its Modality Language Model (MLM). A central Task Modality mediates the whole communication to execute the robotic mission via a Large Language Model (LLM). This paper gives this new robotic design pattern the name of: Prompting Robotic Modalities (PRM). Moreover, this paper applies this PRM design pattern in building a new robotic framework named ROSGPT_Vision. ROSGPT_Vision allows the execution of a robotic task using only two prompts: a Visual and an LLM prompt. The Visual Prompt extracts, in natural language, the visual semantic features related to the task under consideration (Visual Robotic Modality). Meanwhile, the LLM Prompt regulates the robotic reaction to the visual description (Task Modality).…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bilel-bj/rosgpt_vision
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Robotics and Automated Systems · Advanced Image and Video Retrieval Techniques