TL;DR
ROSClaw is a unified framework that integrates semantic reasoning and physical control for heterogeneous multi-robot systems, enabling efficient policy learning, real-time state access, and robust task execution across platforms.
Contribution
It introduces a hierarchical, vision-language-based agent framework that unifies reasoning and control, facilitating rapid cross-platform deployment and continual skill improvement.
Findings
Supports real-time access to physical states of robots.
Enables iterative policy optimization during real-world execution.
Facilitates rapid transfer and validation across hardware platforms.
Abstract
The integration of large language models (LLMs) with embodied agents has improved high-level reasoning capabilities; however, a critical gap remains between semantic understanding and physical execution. While vision-language-action (VLA) and vision-language-navigation (VLN) systems enable robots to perform manipulation and navigation tasks from natural language instructions, they still struggle with long-horizon sequential and temporally structured tasks. Existing frameworks typically adopt modular pipelines for data collection, skill training, and policy deployment, resulting in high costs in experimental validation and policy optimization. To address these limitations, we propose ROSClaw, an agent framework for heterogeneous robots that integrates policy learning and task execution within a unified vision-language model (VLM) controller. The framework leverages e-URDF representations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
