Reinforcement Learning from Imperfect Demonstrations

Yang Gao; Huazhe Xu; Ji Lin; Fisher Yu; Sergey Levine; Trevor Darrell

arXiv:1802.05313·cs.AI·May 31, 2019·99 cites

Reinforcement Learning from Imperfect Demonstrations

Yang Gao, Huazhe Xu, Ji Lin, Fisher Yu, Sergey Levine, Trevor Darrell

PDF

Open Access

TL;DR

The paper introduces NAC, a unified reinforcement learning algorithm that effectively leverages imperfect demonstrations and environment interactions to learn robust policies, outperforming existing methods in driving simulations.

Contribution

NAC is a novel unified algorithm that normalizes Q-values, enabling robust learning from imperfect demonstrations without combining separate supervised and reinforcement losses.

Findings

01

NAC outperforms existing baselines in driving game simulations.

02

NAC is robust to noisy and suboptimal demonstration data.

03

NAC surpasses demonstrator performance through interactive refinement.

Abstract

Robust real-world learning should benefit from both demonstrations and interactions with the environment. Current approaches to learning from demonstration and reward perform supervised learning on expert demonstration data and use reinforcement learning to further improve performance based on the reward received from the environment. These tasks have divergent losses which are difficult to jointly optimize and such methods can be very sensitive to noisy demonstrations. We propose a unified reinforcement learning algorithm, Normalized Actor-Critic (NAC), that effectively normalizes the Q-function, reducing the Q-values of actions unseen in the demonstration data. NAC learns an initial policy network from demonstrations and refines the policy in the environment, surpassing the demonstrator's performance. Crucially, both learning from demonstration and interactive refinement use the same…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Sports Analytics and Performance · Autonomous Vehicle Technology and Safety