DD-PPO: Learning Near-Perfect PointGoal Navigators from 2.5 Billion Frames
Erik Wijmans, Abhishek Kadian, Ari Morcos, Stefan Lee, Irfan Essa,, Devi Parikh, Manolis Savva, Dhruv Batra

TL;DR
This paper introduces DD-PPO, a scalable distributed reinforcement learning method that trained a near-perfect point-goal navigation agent using 2.5 billion frames, achieving state-of-the-art results efficiently.
Contribution
The paper presents DD-PPO, a simple, scalable, and decentralized distributed RL algorithm enabling massive-scale training for embodied AI navigation tasks.
Findings
Achieved 107x speedup on 128 GPUs over serial implementation.
Trained an agent with 2.5 billion steps, equivalent to 80 years of human experience.
Set new state-of-the-art on Habitat Autonomous Navigation Challenge 2019.
Abstract
We present Decentralized Distributed Proximal Policy Optimization (DD-PPO), a method for distributed reinforcement learning in resource-intensive simulated environments. DD-PPO is distributed (uses multiple machines), decentralized (lacks a centralized server), and synchronous (no computation is ever stale), making it conceptually simple and easy to implement. In our experiments on training virtual robots to navigate in Habitat-Sim, DD-PPO exhibits near-linear scaling -- achieving a speedup of 107x on 128 GPUs over a serial implementation. We leverage this scaling to train an agent for 2.5 Billion steps of experience (the equivalent of 80 years of human experience) -- over 6 months of GPU-time training in under 3 days of wall-clock time with 64 GPUs. This massive-scale training not only sets the state of art on Habitat Autonomous Navigation Challenge 2019, but essentially solves the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Reinforcement Learning in Robotics · Multimodal Machine Learning Applications
MethodsDecentralized Distributed Proximal Policy Optimization
