SLIM: Sim-to-Real Legged Instructive Manipulation via Long-Horizon   Visuomotor Learning

Haichao Zhang; Haonan Yu; Le Zhao; Andrew Choi; Qinxun Bai; Break; Yang; Wei Xu

arXiv:2501.09905·cs.RO·January 31, 2025

SLIM: Sim-to-Real Legged Instructive Manipulation via Long-Horizon Visuomotor Learning

Haichao Zhang, Haonan Yu, Le Zhao, Andrew Choi, Qinxun Bai, Break, Yang, Wei Xu

PDF

Open Access

TL;DR

This paper introduces SLIM, a low-cost, hierarchical visuomotor system trained in simulation that effectively performs long-horizon manipulation tasks in the real world with high success rates, using accessible hardware and techniques to bridge the sim-to-real gap.

Contribution

The paper presents a novel hierarchical reinforcement learning approach with a teacher-student training pipeline and sim-to-real transfer techniques for low-cost legged robots performing complex tasks.

Findings

01

Achieves nearly 80% success rate in real-world long-horizon tasks.

02

Operates at 1.5x the speed of expert teleoperation.

03

Demonstrates effective deployment in diverse environments.

Abstract

We present a low-cost legged mobile manipulation system that solves long-horizon real-world tasks, trained by reinforcement learning purely in simulation. This system is made possible by 1) a hierarchical design of a high-level policy for visual-mobile manipulation following task instructions, and a low-level quadruped locomotion policy, 2) a teacher and student training pipeline for the high level, which trains a teacher to tackle long-horizon tasks using privileged task decomposition and target object information, and further trains a student for visual-mobile manipulation via RL guided by the teacher's behavior, and 3) a suite of techniques for minimizing the sim-to-real gap. In contrast to many previous works that use high-end equipments, our system demonstrates effective performance with more accessible hardware -- specifically, a Unitree Go1 quadruped, a WidowX-250S arm, and a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Soft Robotics and Applications · Advanced Vision and Imaging

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings