VER: Scaling On-Policy RL Leads to the Emergence of Navigation in   Embodied Rearrangement

Erik Wijmans; Irfan Essa; Dhruv Batra

arXiv:2210.05064·cs.LG·October 12, 2022·6 cites

VER: Scaling On-Policy RL Leads to the Emergence of Navigation in Embodied Rearrangement

Erik Wijmans, Irfan Essa, Dhruv Batra

PDF

Open Access 1 Repo 1 Video

TL;DR

The paper introduces Variable Experience Rollout (VER), a novel RL scaling technique that combines synchronous and asynchronous methods, leading to significant speed-ups and enabling emergent navigation skills in embodied AI tasks.

Contribution

VER is a new method that efficiently scales on-policy RL across multiple GPUs without synchronization, improving training speed and enabling emergent navigation behaviors.

Findings

01

VER achieves 1.6-2x speedup over DD-PPO in navigation tasks.

02

VER is 2.5-2.7x faster than DD-PPO in mobile manipulation tasks.

03

Navigation skills emerge unexpectedly in non-navigation training scenarios.

Abstract

We present Variable Experience Rollout (VER), a technique for efficiently scaling batched on-policy reinforcement learning in heterogenous environments (where different environments take vastly different times to generate rollouts) to many GPUs residing on, potentially, many machines. VER combines the strengths of and blurs the line between synchronous and asynchronous on-policy RL methods (SyncOnRL and AsyncOnRL, respectively). VER learns from on-policy experience (like SyncOnRL) and has no synchronization points (like AsyncOnRL). VER leads to significant and consistent speed-ups across a broad range of embodied navigation and mobile manipulation tasks in photorealistic 3D simulation environments. Specifically, for PointGoal navigation and ObjectGoal navigation in Habitat 1.0, VER is 60-100% faster (1.6-2x speedup) than DD-PPO, the current state of art distributed SyncOnRL, with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

facebookresearch/habitat-lab
pytorchOfficial

Videos

VER: Scaling On-Policy RL Leads to the Emergence of Navigation in Embodied Rearrangement· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Human Pose and Action Recognition

MethodsBalanced Selection · Decentralized Distributed Proximal Policy Optimization · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings