Abstract Demonstrations and Adaptive Exploration for Efficient and Stable Multi-step Sparse Reward Reinforcement Learning
Xintong Yang, Ze Ji, Jing Wu, Yu-kun Lai

TL;DR
This paper introduces A^2, a novel DRL exploration method combining abstract demonstrations and adaptive exploration, significantly enhancing learning efficiency and stability in long-horizon, sparse reward tasks like robotic manipulation.
Contribution
The paper proposes A^2, a new exploration technique that decomposes tasks into subtasks and adaptively adjusts exploration, improving DRL performance on complex sparse reward tasks.
Findings
A^2 improves learning efficiency in robotic tasks.
A^2 enhances stability of DRL algorithms.
A^2 outperforms baseline methods in experiments.
Abstract
Although Deep Reinforcement Learning (DRL) has been popular in many disciplines including robotics, state-of-the-art DRL algorithms still struggle to learn long-horizon, multi-step and sparse reward tasks, such as stacking several blocks given only a task-completion reward signal. To improve learning efficiency for such tasks, this paper proposes a DRL exploration technique, termed A^2, which integrates two components inspired by human experiences: Abstract demonstrations and Adaptive exploration. A^2 starts by decomposing a complex task into subtasks, and then provides the correct orders of subtasks to learn. During training, the agent explores the environment adaptively, acting more deterministically for well-mastered subtasks and more stochastically for ill-learnt subtasks. Ablation and comparative experiments are conducted on several grid-world tasks and three robotic manipulation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Data Stream Mining Techniques
MethodsWeight Decay · Convolution · Adam · Batch Normalization · *Communicated@Fast*How Do I Communicate to Expedia? · Experience Replay · Dense Connections · Deep Deterministic Policy Gradient
