EgoPush: Learning End-to-End Egocentric Multi-Object Rearrangement for Mobile Robots
Boyuan An, Zhexiong Wang, Yipeng Wang, Jiaqi Li, Sihang Li, Jing Zhang, Chen Feng

TL;DR
EgoPush is a perception-driven reinforcement learning framework that enables mobile robots to perform multi-object rearrangement in cluttered environments using egocentric vision, without relying on global state estimation.
Contribution
We propose EgoPush, a novel object-centric latent space and active perception approach for egocentric, end-to-end learning of multi-object rearrangement on mobile robots.
Findings
EgoPush outperforms RL baselines in success rate in simulation.
EgoPush achieves zero-shot sim-to-real transfer on a mobile robot.
Design choices in EgoPush are validated through ablation studies.
Abstract
Humans can rearrange objects in cluttered environments using egocentric perception, navigating occlusions without global coordinates. Inspired by this capability, we study long-horizon multi-object non-prehensile rearrangement for mobile robots using a single egocentric camera. We introduce EgoPush, a policy learning framework that enables egocentric, perception-driven rearrangement without relying on explicit global state estimation that often fails in dynamic scenes. EgoPush designs an object-centric latent space to encode relative spatial relations among objects, rather than absolute poses. This design enables a privileged reinforcement-learning (RL) teacher to jointly learn latent states and mobile actions from sparse keypoints, which is then distilled into a purely visual student policy. To reduce the supervision gap between the omniscient teacher and the partially observed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Social Robot Interaction and HRI
