Abstract Demonstrations and Adaptive Exploration for Efficient and   Stable Multi-step Sparse Reward Reinforcement Learning

Xintong Yang; Ze Ji; Jing Wu; Yu-kun Lai

arXiv:2207.09243·cs.RO·March 10, 2023

Abstract Demonstrations and Adaptive Exploration for Efficient and Stable Multi-step Sparse Reward Reinforcement Learning

Xintong Yang, Ze Ji, Jing Wu, Yu-kun Lai

PDF

Open Access 1 Repo

TL;DR

This paper introduces A^2, a novel DRL exploration method combining abstract demonstrations and adaptive exploration, significantly enhancing learning efficiency and stability in long-horizon, sparse reward tasks like robotic manipulation.

Contribution

The paper proposes A^2, a new exploration technique that decomposes tasks into subtasks and adaptively adjusts exploration, improving DRL performance on complex sparse reward tasks.

Findings

01

A^2 improves learning efficiency in robotic tasks.

02

A^2 enhances stability of DRL algorithms.

03

A^2 outperforms baseline methods in experiments.

Abstract

Although Deep Reinforcement Learning (DRL) has been popular in many disciplines including robotics, state-of-the-art DRL algorithms still struggle to learn long-horizon, multi-step and sparse reward tasks, such as stacking several blocks given only a task-completion reward signal. To improve learning efficiency for such tasks, this paper proposes a DRL exploration technique, termed A^2, which integrates two components inspired by human experiences: Abstract demonstrations and Adaptive exploration. A^2 starts by decomposing a complex task into subtasks, and then provides the correct orders of subtasks to learn. During training, the agent explores the environment adaptively, acting more deterministically for well-mastered subtasks and more stochastically for ill-learnt subtasks. Ablation and comparative experiments are conducted on several grid-world tasks and three robotic manipulation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ianyangchina/a-2-paper-code
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Data Stream Mining Techniques

MethodsWeight Decay · Convolution · Adam · Batch Normalization · *Communicated@Fast*How Do I Communicate to Expedia? · Experience Replay · Dense Connections · Deep Deterministic Policy Gradient