Evaluating and Learning Robust Bandit Policies Under Uncertain Causal Mechanisms
Katherine Avery, Chinmay Pendse, David Jensen

TL;DR
This paper introduces a causal multi-armed bandit algorithm that effectively learns and evaluates policies under uncertain causal mechanisms, leveraging structural equation models and independence testing.
Contribution
It proposes a novel approach combining causal graphical models and SEMs for robust policy learning under causal uncertainty, outperforming traditional methods.
Findings
SEM-based evaluation is more accurate with diverse causal mechanisms.
The SEM approach learns low-variance, near-optimal policies.
Traditional methods may fail to converge or get stuck in local extrema.
Abstract
Causal graphical models can encode large amounts structural knowledge, both from the background knowledge of domain experts and the structural knowledge discovered from randomized experiments or observational data. However, though we may know the general structure of causal relationships, we often do not know the exact causal mechanisms. In this work, we propose a causal multi-armed bandit evaluation and learning algorithm that can reason effectively despite uncertainty over conditional probability distributions. Further, we show how conditional independence testing can be used to choose variables for modeling. We find that the structural equation model (SEM) approach gives more accurate evaluations compared to traditional approaches, particularly as the range of possible causal mechanisms grows. Further, the SEM approach learns low-variance policies, and it learns an optimal policy,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
