SLIM: Sim-to-Real Legged Instructive Manipulation via Long-Horizon Visuomotor Learning
Haichao Zhang, Haonan Yu, Le Zhao, Andrew Choi, Qinxun Bai, Break, Yang, Wei Xu

TL;DR
This paper introduces SLIM, a low-cost, hierarchical visuomotor system trained in simulation that effectively performs long-horizon manipulation tasks in the real world with high success rates, using accessible hardware and techniques to bridge the sim-to-real gap.
Contribution
The paper presents a novel hierarchical reinforcement learning approach with a teacher-student training pipeline and sim-to-real transfer techniques for low-cost legged robots performing complex tasks.
Findings
Achieves nearly 80% success rate in real-world long-horizon tasks.
Operates at 1.5x the speed of expert teleoperation.
Demonstrates effective deployment in diverse environments.
Abstract
We present a low-cost legged mobile manipulation system that solves long-horizon real-world tasks, trained by reinforcement learning purely in simulation. This system is made possible by 1) a hierarchical design of a high-level policy for visual-mobile manipulation following task instructions, and a low-level quadruped locomotion policy, 2) a teacher and student training pipeline for the high level, which trains a teacher to tackle long-horizon tasks using privileged task decomposition and target object information, and further trains a student for visual-mobile manipulation via RL guided by the teacher's behavior, and 3) a suite of techniques for minimizing the sim-to-real gap. In contrast to many previous works that use high-end equipments, our system demonstrates effective performance with more accessible hardware -- specifically, a Unitree Go1 quadruped, a WidowX-250S arm, and a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Soft Robotics and Applications · Advanced Vision and Imaging
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
