Information Directed Reward Learning for Reinforcement Learning

David Lindner; Matteo Turchetta; Sebastian Tschiatschek and; Kamil Ciosek; Andreas Krause

arXiv:2102.12466·cs.LG·February 1, 2022

Information Directed Reward Learning for Reinforcement Learning

David Lindner, Matteo Turchetta, Sebastian Tschiatschek and, Kamil Ciosek, Andreas Krause

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces IDRL, a Bayesian active reward learning method for reinforcement learning that efficiently learns reward models from expert queries, enabling high-performance policies with fewer queries across various environments.

Contribution

The paper proposes IDRL, a novel active reward learning approach that handles diverse query types and focuses on improving policy performance rather than just reward approximation.

Findings

01

IDRL achieves comparable or better results with fewer queries.

02

It effectively handles different query types.

03

Extensive evaluations demonstrate its efficiency across multiple environments.

Abstract

For many reinforcement learning (RL) applications, specifying a reward is difficult. This paper considers an RL setting where the agent obtains information about the reward only by querying an expert that can, for example, evaluate individual states or provide binary preferences over trajectories. From such expensive feedback, we aim to learn a model of the reward that allows standard RL algorithms to achieve high expected returns with as few expert queries as possible. To this end, we propose Information Directed Reward Learning (IDRL), which uses a Bayesian model of the reward and selects queries that maximize the information gain about the difference in return between plausibly optimal policies. In contrast to prior active reward learning methods designed for specific types of queries, IDRL naturally accommodates different query types. Moreover, it achieves similar or better…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

david-lindner/idrl
noneOfficial

Videos

Information Directed Reward Learning for Reinforcement Learning· slideslive

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Optimization and Search Problems