RoBridge: A Hierarchical Architecture Bridging Cognition and Execution for General Robotic Manipulation

Kaidong Zhang; Rongtao Xu; Pengzhen Ren; Junfan Lin; Hefeng Wu; Liang Lin; Xiaodan Liang

arXiv:2505.01709·cs.RO·July 24, 2025

RoBridge: A Hierarchical Architecture Bridging Cognition and Execution for General Robotic Manipulation

Kaidong Zhang, Rongtao Xu, Pengzhen Ren, Junfan Lin, Hefeng Wu, Liang Lin, Xiaodan Liang

PDF

Open Access

TL;DR

RoBridge introduces a hierarchical architecture that combines cognitive reasoning and execution capabilities, significantly improving general robotic manipulation in open-ended scenarios through a novel integration of vision-language models and reinforcement learning.

Contribution

The paper presents RoBridge, a hierarchical system that bridges cognition and execution in robots, leveraging large-scale pre-trained models and reinforcement learning for enhanced generalization.

Findings

01

Achieves 75% success on new tasks

02

83% success rate in sim-to-real transfer with minimal data

03

Demonstrates effective integration of cognition and manipulation

Abstract

Operating robots in open-ended scenarios with diverse tasks is a crucial research and application direction in robotics. While recent progress in natural language processing and large multimodal models has enhanced robots' ability to understand complex instructions, robot manipulation still faces the procedural skill dilemma and the declarative skill dilemma in open environments. Existing methods often compromise cognitive and executive capabilities. To address these challenges, in this paper, we propose RoBridge, a hierarchical intelligent architecture for general robotic manipulation. It consists of a high-level cognitive planner (HCP) based on a large-scale pre-trained vision-language model (VLM), an invariant operable representation (IOR) serving as a symbolic bridge, and a generalist embodied agent (GEA). RoBridge maintains the declarative skill of VLM and unleashes the procedural…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Robot Manipulation and Learning