Learning Object-conditioned Exploration using Distributed Soft Actor Critic
Ayzaan Wahid, Austin Stone, Kevin Chen, Brian Ichter, Alexander Toshev

TL;DR
This paper introduces a scalable reinforcement learning approach for object-guided exploration and low-level control in robotic navigation, achieving high success rates in complex real-world environments.
Contribution
It presents a novel end-to-end training method using distributed Soft Actor Critic for scalable, efficient learning of object-conditioned exploration policies in robotics.
Findings
Success rate of 0.68 on unseen environments
SPL of 0.58 indicating effective navigation
Utilized 98 million experience steps in 24 hours
Abstract
Object navigation is defined as navigating to an object of a given label in a complex, unexplored environment. In its general form, this problem poses several challenges for Robotics: semantic exploration of unknown environments in search of an object and low-level control. In this work we study object-guided exploration and low-level control, and present an end-to-end trained navigation policy achieving a success rate of 0.68 and SPL of 0.58 on unseen, visually complex scans of real homes. We propose a highly scalable implementation of an off-policy Reinforcement Learning algorithm, distributed Soft Actor Critic, which allows the system to utilize 98M experience steps in 24 hours on 8 GPUs. Our system learns to control a differential drive mobile base in simulation from a stack of high dimensional observations commonly used on robotic platforms. The learned policy is capable of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotic Path Planning Algorithms · Reinforcement Learning in Robotics · Multimodal Machine Learning Applications
