Learning by Playing - Solving Sparse Reward Tasks from Scratch
Martin Riedmiller, Roland Hafner, Thomas Lampe, Michael Neunert, Jonas, Degrave, Tom Van de Wiele, Volodymyr Mnih, Nicolas Heess, Jost Tobias, Springenberg

TL;DR
This paper introduces SAC-X, a reinforcement learning framework that uses auxiliary tasks and learned scheduling to efficiently learn complex behaviors from scratch in environments with sparse rewards.
Contribution
The paper presents SAC-X, a novel approach that combines auxiliary tasks with learned scheduling to improve exploration and learning in sparse reward settings.
Findings
SAC-X outperforms baseline methods in robotic manipulation tasks.
Active scheduling of auxiliary policies enhances exploration efficiency.
The approach enables learning complex behaviors from scratch.
Abstract
We propose Scheduled Auxiliary Control (SAC-X), a new learning paradigm in the context of Reinforcement Learning (RL). SAC-X enables learning of complex behaviors - from scratch - in the presence of multiple sparse reward signals. To this end, the agent is equipped with a set of general auxiliary tasks, that it attempts to learn simultaneously via off-policy RL. The key idea behind our method is that active (learned) scheduling and execution of auxiliary policies allows the agent to efficiently explore its environment - enabling it to excel at sparse reward RL. Our experiments in several challenging robotic manipulation settings demonstrate the power of our approach.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
DeepMind's AI Learns Complex Behaviors From Scratch | Two Minute Papers #239· youtube
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Adaptive Dynamic Programming Control
