Learning Humanoid End-Effector Control for Open-Vocabulary Visual Loco-Manipulation
Runpei Dong, Ziyan Li, Xialin He, Saurabh Gupta

TL;DR
This paper introduces HERO, a humanoid robot control system that combines large vision models with a residual-aware end-effector tracking policy, enabling accurate, open-vocabulary loco-manipulation in diverse real-world environments.
Contribution
HERO integrates large vision models with a residual-aware control policy, improving end-effector tracking accuracy and generalization for humanoid loco-manipulation tasks.
Findings
3.2x reduction in end-effector tracking error
Effective manipulation of diverse objects in real-world settings
Robust performance across various environments
Abstract
Visual loco-manipulation of arbitrary objects in the wild with humanoid robots requires accurate end-effector (EE) control and a generalizable understanding of the scene via visual inputs (e.g., RGB-D images). Existing approaches are based on real-world imitation learning and exhibit limited generalization due to the difficulty in collecting large-scale training datasets. This paper presents a new paradigm, HERO, for object loco-manipulation with humanoid robots that combines the strong generalization and open-vocabulary understanding of large vision models with strong control performance from simulated training. We achieve this by designing an accurate residual-aware EE tracking policy. This EE tracking policy combines classical robotics with machine learning. It uses a) inverse kinematics to convert residual end-effector targets into reference trajectories, b) a learned neural forward…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Social Robot Interaction and HRI · Reinforcement Learning in Robotics
