Harnessing Causality in Reinforcement Learning With Bagged Decision Times
Daiqi Gao, Hsin-Yu Lai, Predrag Klasnja, and Susan A. Murphy

TL;DR
This paper introduces an online reinforcement learning method that leverages causal DAGs to handle non-Markovian, non-stationary decision problems with bagged decision times, demonstrated on mobile health data.
Contribution
It develops a novel RL algorithm using causal DAGs to construct Markov states in non-Markovian, non-stationary settings with bagged decision times.
Findings
The proposed method effectively maximizes rewards in mobile health data.
States constructed achieve maximal optimal value among all periodic MDP states.
The approach handles non-Markovian and non-stationary dynamics successfully.
Abstract
We consider reinforcement learning (RL) for a class of problems with bagged decision times. A bag contains a finite sequence of consecutive decision times. The transition dynamics are non-Markovian and non-stationary within a bag. All actions within a bag jointly impact a single reward, observed at the end of the bag. For example, in mobile health, multiple activity suggestions in a day collectively affect a user's daily commitment to being active. Our goal is to develop an online RL algorithm to maximize the discounted sum of the bag-specific rewards. To handle non-Markovian transitions within a bag, we utilize an expert-provided causal directed acyclic graph (DAG). Based on the DAG, we construct states as a dynamical Bayesian sufficient statistic of the observed history, which results in Markov state transitions within and across bags. We then formulate this problem as a periodic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEvolutionary Algorithms and Applications · Reinforcement Learning in Robotics
