Inferring Probabilistic Reward Machines from Non-Markovian Reward   Processes for Reinforcement Learning

Taylor Dohmen; Noah Topper; George Atia; Andre Beckus; Ashutosh; Trivedi; Alvaro Velasquez

arXiv:2107.04633·cs.LG·March 29, 2022

Inferring Probabilistic Reward Machines from Non-Markovian Reward Processes for Reinforcement Learning

Taylor Dohmen, Noah Topper, George Atia, Andre Beckus, Ashutosh, Trivedi, Alvaro Velasquez

PDF

Open Access

TL;DR

This paper introduces probabilistic reward machines (PRMs) to model non-Markovian stochastic rewards in reinforcement learning, along with an algorithm to learn them from data, enhancing the ability to handle complex reward signals.

Contribution

The paper proposes probabilistic reward machines (PRMs) as a novel representation for stochastic non-Markovian rewards and provides a learning algorithm with correctness and convergence guarantees.

Findings

01

Successfully models stochastic non-Markovian rewards

02

Provides a convergent learning algorithm for PRMs

03

Enhances reinforcement learning with structured reward representations

Abstract

The success of reinforcement learning in typical settings is predicated on Markovian assumptions on the reward signal by which an agent learns optimal policies. In recent years, the use of reward machines has relaxed this assumption by enabling a structured representation of non-Markovian rewards. In particular, such representations can be used to augment the state space of the underlying decision process, thereby facilitating non-Markovian reinforcement learning. However, these reward machines cannot capture the semantics of stochastic reward signals. In this paper, we make progress on this front by introducing probabilistic reward machines (PRMs) as a representation of non-Markovian stochastic rewards. We present an algorithm to learn PRMs from the underlying decision process and prove results around its correctness and convergence.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Smart Grid Security and Resilience · Simulation Techniques and Applications