DIP-RL: Demonstration-Inferred Preference Learning in Minecraft

Ellen Novoseller; Vinicius G. Goecks; David Watkins; Josh Miller,; Nicholas Waytowich

arXiv:2307.12158·cs.LG·July 25, 2023·1 cites

DIP-RL: Demonstration-Inferred Preference Learning in Minecraft

Ellen Novoseller, Vinicius G. Goecks, David Watkins, Josh Miller,, Nicholas Waytowich

PDF

Open Access

TL;DR

DIP-RL is a novel reinforcement learning approach that uses human demonstrations and inferred preferences to learn reward functions in unstructured environments like Minecraft, enabling more human-aligned agent behaviors.

Contribution

The paper introduces DIP-RL, a new method that leverages demonstrations and preference inference to guide RL without explicit reward signals in complex environments.

Findings

01

DIP-RL effectively learns reward functions reflecting human preferences.

02

The method performs competitively against baselines in Minecraft tasks.

03

DIP-RL successfully integrates demonstrations and preference inference for RL.

Abstract

In machine learning for sequential decision-making, an algorithmic agent learns to interact with an environment while receiving feedback in the form of a reward signal. However, in many unstructured real-world settings, such a reward signal is unknown and humans cannot reliably craft a reward signal that correctly captures desired behavior. To solve tasks in such unstructured and open-ended environments, we present Demonstration-Inferred Preference Reinforcement Learning (DIP-RL), an algorithm that leverages human demonstrations in three distinct ways, including training an autoencoder, seeding reinforcement learning (RL) training batches with demonstration data, and inferring preferences over behaviors to learn a reward function to guide RL. We evaluate DIP-RL in a tree-chopping task in Minecraft. Results suggest that the method can guide an RL agent to learn a reward function that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications