Learning to Manipulate Anywhere: A Visual Generalizable Framework For   Reinforcement Learning

Zhecheng Yuan; Tianming Wei; Shuiqi Cheng; Gu Zhang; Yuanpei Chen,; Huazhe Xu

arXiv:2407.15815·cs.RO·October 24, 2024

Learning to Manipulate Anywhere: A Visual Generalizable Framework For Reinforcement Learning

Zhecheng Yuan, Tianming Wei, Shuiqi Cheng, Gu Zhang, Yuanpei Chen,, Huazhe Xu

PDF

Open Access 1 Models

TL;DR

This paper introduces Maniwhere, a visual reinforcement learning framework that enhances robot generalization across diverse visual disturbances using multi-view learning, spatial transformers, and curriculum-based augmentation, demonstrating superior sim2real transfer.

Contribution

The paper presents Maniwhere, a novel framework combining multi-view representation learning with spatial transformers and curriculum augmentation to improve visual generalization in reinforcement learning for robots.

Findings

01

Maniwhere outperforms existing methods in diverse manipulation tasks.

02

The framework achieves strong sim2real transfer across multiple hardware platforms.

03

It effectively generalizes across various visual disturbances and viewpoints.

Abstract

Can we endow visuomotor robots with generalization capabilities to operate in diverse open-world scenarios? In this paper, we propose \textbf{Maniwhere}, a generalizable framework tailored for visual reinforcement learning, enabling the trained robot policies to generalize across a combination of multiple visual disturbance types. Specifically, we introduce a multi-view representation learning approach fused with Spatial Transformer Network (STN) module to capture shared semantic information and correspondences among different viewpoints. In addition, we employ a curriculum-based randomization and augmentation approach to stabilize the RL training process and strengthen the visual generalization ability. To exhibit the effectiveness of Maniwhere, we meticulously design 8 tasks encompassing articulate objects, bi-manual, and dexterous hand manipulation tasks, demonstrating Maniwhere's…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
gemcollector/maniwhere
model· ♡ 1
♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications

MethodsAttention Is All You Need · Byte Pair Encoding · Layer Normalization · Label Smoothing · Linear Layer · Softmax · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Multi-Head Attention · Dense Connections