TL;DR
EgoNav enables humanoid robots to navigate unseen environments by learning from human walking data using a diffusion model, visual memory, and a hybrid sampling scheme, achieving zero-shot deployment.
Contribution
Introduces EgoNav, a novel system that learns humanoid navigation solely from human data without robot-specific training or fine-tuning.
Findings
Outperforms baselines in collision avoidance and coverage
Successfully navigates unseen indoor and outdoor environments
Emergent behaviors like door waiting and crowd navigation
Abstract
We present EgoNav, a system that enables a humanoid robot to traverse diverse, unseen environments by learning entirely from 5 hours of human walking data, with no robot data or finetuning. A diffusion model predicts distributions of plausible future trajectories conditioned on past trajectory, a 360 deg visual memory fusing color, depth, and semantics, and video features from a frozen DINOv3 backbone that capture appearance cues invisible to depth sensors. A hybrid sampling scheme achieves real-time inference in 10 denoising steps, and a receding-horizon controller selects paths from the predicted distribution. We validate EgoNav through offline evaluations, where it outperforms baselines in collision avoidance and multi-modal coverage, and through zero-shot deployment on a Unitree G1 humanoid across unseen indoor and outdoor environments. Behaviors such as waiting for doors to open,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
