Bayesian Inverse Reinforcement Learning for Non-Markovian Rewards

Noah Topper; Alvaro Velasquez; George Atia

arXiv:2406.13991·cs.LG·June 21, 2024

Bayesian Inverse Reinforcement Learning for Non-Markovian Rewards

Noah Topper, Alvaro Velasquez, George Atia

PDF

Open Access

TL;DR

This paper introduces a Bayesian IRL framework to infer non-Markovian reward functions, such as reward machines, directly from expert behavior, addressing limitations of existing methods that require reward signals.

Contribution

It develops a novel Bayesian IRL approach for non-Markovian rewards, including a new reward space, history-aware demonstrations, and a modified annealing algorithm.

Findings

01

Performs well in optimizing inferred rewards

02

Outperforms existing methods for binary non-Markovian rewards

03

Effectively infers complex reward structures from behavior

Abstract

Inverse reinforcement learning (IRL) is the problem of inferring a reward function from expert behavior. There are several approaches to IRL, but most are designed to learn a Markovian reward. However, a reward function might be non-Markovian, depending on more than just the current state, such as a reward machine (RM). Although there has been recent work on inferring RMs, it assumes access to the reward signal, absent in IRL. We propose a Bayesian IRL (BIRL) framework for inferring RMs directly from expert behavior, requiring significant changes to the standard framework. We define a new reward space, adapt the expert demonstration to include history, show how to compute the reward posterior, and propose a novel modification to simulated annealing to maximize this posterior. We demonstrate that our method performs well when optimizing according to its inferred reward and compares…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDistributed Sensor Networks and Detection Algorithms · Innovation Diffusion and Forecasting · Reinforcement Learning in Robotics