A Bayesian Approach to Learning Bandit Structure in Markov Decision   Processes

Kelly W. Zhang; Omer Gottesman; Finale Doshi-Velez

arXiv:2208.00250·cs.LG·August 2, 2022

A Bayesian Approach to Learning Bandit Structure in Markov Decision Processes

Kelly W. Zhang, Omer Gottesman, Finale Doshi-Velez

PDF

Open Access

TL;DR

This paper introduces a Bayesian online algorithm that helps determine whether a decision-making environment is better modeled as a contextual bandit or an MDP, improving learning efficiency and robustness.

Contribution

It presents a Bayesian hypothesis testing approach that incorporates prior knowledge to adaptively distinguish between CB and MDP environments, interpolating between the two models.

Findings

01

Lower regret in CB settings compared to MDP algorithms

02

Effective learning of optimal policy in MDP settings

03

Robustness to environment misspecification

Abstract

In the reinforcement learning literature, there are many algorithms developed for either Contextual Bandit (CB) or Markov Decision Processes (MDP) environments. However, when deploying reinforcement learning algorithms in the real world, even with domain expertise, it is often difficult to know whether it is appropriate to treat a sequential decision making problem as a CB or an MDP. In other words, do actions affect future states, or only the immediate rewards? Making the wrong assumption regarding the nature of the environment can lead to inefficient learning, or even prevent the algorithm from ever learning an optimal policy, even with infinite data. In this work we develop an online algorithm that uses a Bayesian hypothesis testing approach to learn the nature of the environment. Our algorithm allows practitioners to incorporate prior knowledge about whether the environment is that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques · Reinforcement Learning in Robotics