Stochastic Neural Networks for Hierarchical Reinforcement Learning
Carlos Florensa, Yan Duan, Pieter Abbeel

TL;DR
This paper introduces a hierarchical reinforcement learning framework that uses stochastic neural networks and minimal domain knowledge to pre-train interpretable skills, significantly improving learning efficiency in sparse reward tasks.
Contribution
It presents a novel approach combining stochastic neural networks and intrinsic motivation to pre-train a wide range of skills with minimal domain knowledge, enhancing downstream learning.
Findings
Effective in learning interpretable skills efficiently
Significantly improves exploration in sparse reward environments
Boosts performance across diverse downstream tasks
Abstract
Deep reinforcement learning has achieved many impressive results in recent years. However, tasks with sparse rewards or long horizons continue to pose significant challenges. To tackle these important problems, we propose a general framework that first learns useful skills in a pre-training environment, and then leverages the acquired skills for learning faster in downstream tasks. Our approach brings together some of the strengths of intrinsic motivation and hierarchical methods: the learning of useful skill is guided by a single proxy reward, the design of which requires very minimal domain knowledge about the downstream tasks. Then a high-level policy is trained on top of these skills, providing a significant improvement of the exploration and allowing to tackle sparse rewards in the downstream tasks. To efficiently pre-train a large span of skills, we use Stochastic Neural Networks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Neural Networks and Applications · Adaptive Dynamic Programming Control
