ROSClaw: A Hierarchical Semantic-Physical Framework for Heterogeneous Multi-Agent Collaboration

Rongfeng Zhao; Xuanhao Zhang; Zhaochen Guo; Xiang Shao; Zhongpan Zhu; Bin He; Jie Chen

arXiv:2604.04664·cs.RO·April 7, 2026

ROSClaw: A Hierarchical Semantic-Physical Framework for Heterogeneous Multi-Agent Collaboration

Rongfeng Zhao, Xuanhao Zhang, Zhaochen Guo, Xiang Shao, Zhongpan Zhu, Bin He, Jie Chen

PDF

1 Repo

TL;DR

ROSClaw is a unified framework that integrates semantic reasoning and physical control for heterogeneous multi-robot systems, enabling efficient policy learning, real-time state access, and robust task execution across platforms.

Contribution

It introduces a hierarchical, vision-language-based agent framework that unifies reasoning and control, facilitating rapid cross-platform deployment and continual skill improvement.

Findings

01

Supports real-time access to physical states of robots.

02

Enables iterative policy optimization during real-world execution.

03

Facilitates rapid transfer and validation across hardware platforms.

Abstract

The integration of large language models (LLMs) with embodied agents has improved high-level reasoning capabilities; however, a critical gap remains between semantic understanding and physical execution. While vision-language-action (VLA) and vision-language-navigation (VLN) systems enable robots to perform manipulation and navigation tasks from natural language instructions, they still struggle with long-horizon sequential and temporally structured tasks. Existing frameworks typically adopt modular pipelines for data collection, skill training, and policy deployment, resulting in high costs in experimental validation and policy optimization. To address these limitations, we propose ROSClaw, an agent framework for heterogeneous robots that integrates policy learning and task execution within a unified vision-language model (VLM) controller. The framework leverages e-URDF representations…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://www.rosclaw.io
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.