Risk-Averse Multi-Armed Bandits with Unobserved Confounders: A Case Study in Emotion Regulation in Mobile Health
Yi Shen, Jessilyn Dunn, Michael M. Zavlanos

TL;DR
This paper introduces a risk-averse multi-armed bandit framework addressing unobserved confounders, focusing on minimizing risk in decision-making, with applications in emotion regulation in mobile health.
Contribution
It formulates a transfer learning approach for risk-averse bandits with unobserved confounders, a novel setting in the context of mobile health applications.
Findings
Develops a new transfer learning method for risk-averse bandits with unobserved confounders.
Demonstrates reduced online learning steps to identify optimal arms.
Addresses bias caused by unobserved confounders in expert data.
Abstract
In this paper, we consider a risk-averse multi-armed bandit (MAB) problem where the goal is to learn a policy that minimizes the risk of low expected return, as opposed to maximizing the expected return itself, which is the objective in the usual approach to risk-neutral MAB. Specifically, we formulate this problem as a transfer learning problem between an expert and a learner agent in the presence of contexts that are only observable by the expert but not by the learner. Thus, such contexts are unobserved confounders (UCs) from the learner's perspective. Given a dataset generated by the expert that excludes the UCs, the goal for the learner is to identify the true minimum-risk arm with fewer online learning steps, while avoiding possible biased decisions due to the presence of UCs in the expert's data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Machine Learning and Algorithms
