Deep Reinforcement Learning from Policy-Dependent Human Feedback

Dilip Arumugam; Jun Ki Lee; Sophie Saskin; Michael L. Littman

arXiv:1902.04257·cs.LG·February 13, 2019·31 cites

Deep Reinforcement Learning from Policy-Dependent Human Feedback

Dilip Arumugam, Jun Ki Lee, Sophie Saskin, Michael L. Littman

PDF

Open Access

TL;DR

This paper introduces Deep COACH, an algorithm enabling deep reinforcement learning agents to learn complex behaviors from real-time human feedback in high-dimensional environments like Minecraft, with reduced sample complexity.

Contribution

It extends the COACH algorithm to deep neural networks, incorporating modifications for high-dimensional observations and efficient learning from sparse human feedback.

Findings

01

Successfully learned Minecraft tasks from raw pixels using human feedback

02

Achieved rapid learning within 10-15 minutes of interaction

03

Demonstrated effectiveness in complex 3D environments

Abstract

To widen their accessibility and increase their utility, intelligent agents must be able to learn complex behaviors as specified by (non-expert) human users. Moreover, they will need to learn these behaviors within a reasonable amount of time while efficiently leveraging the sparse feedback a human trainer is capable of providing. Recent work has shown that human feedback can be characterized as a critique of an agent's current behavior rather than as an alternative reward signal to be maximized, culminating in the COnvergent Actor-Critic by Humans (COACH) algorithm for making direct policy updates based on human feedback. Our work builds on COACH, moving to a setting where the agent's policy is represented by a deep neural network. We employ a series of modifications on top of the original COACH algorithm that are critical for successfully learning behaviors from high-dimensional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)