Learning Memory-Dependent Continuous Control from Demonstrations

Siqing Hou; Dongqi Han; Jun Tani

arXiv:2102.09208·cs.LG·February 19, 2021

Learning Memory-Dependent Continuous Control from Demonstrations

Siqing Hou, Dongqi Han, Jun Tani

PDF

Open Access

TL;DR

This paper introduces READER, a novel reinforcement learning algorithm that leverages demonstrations and experience replay to improve memory-dependent continuous control in partially observable environments, enhancing sample efficiency.

Contribution

The paper presents READER, a new algorithm capable of handling memory-dependent control tasks using demonstrations, extending RL to partially observable environments.

Findings

01

Significantly reduces environment interactions.

02

Outperforms baseline RL algorithms in sample efficiency.

03

Effective in memory-critical control tasks.

Abstract

Efficient exploration has presented a long-standing challenge in reinforcement learning, especially when rewards are sparse. A developmental system can overcome this difficulty by learning from both demonstrations and self-exploration. However, existing methods are not applicable to most real-world robotic controlling problems because they assume that environments follow Markov decision processes (MDP); thus, they do not extend to partially observable environments where historical observations are necessary for decision making. This paper builds on the idea of replaying demonstrations for memory-dependent continuous control, by proposing a novel algorithm, Recurrent Actor-Critic with Demonstration and Experience Replay (READER). Experiments involving several memory-crucial continuous control tasks reveal significantly reduce interactions with the environment using our method with a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Robotic Path Planning Algorithms · Advanced Bandit Algorithms Research

MethodsExperience Replay