Causal Bandit Over Unknown Graphs: Upper Confidence Bounds With Backdoor Adjustment
Yijia Zhao, Qing Zhou

TL;DR
This paper introduces BA-UCB, a new algorithm for causal bandit problems with unknown DAGs, leveraging backdoor adjustment to improve intervention selection and regret bounds.
Contribution
It develops a novel method combining observational and experimental data to identify backdoor sets, enabling effective causal effect estimation without known causal graphs.
Findings
BA-UCB achieves lower cumulative regret than existing methods.
Theoretical regret bounds are established with relaxed dependency on intervention arms.
Simulation results show improved efficiency and accuracy in unknown causal graph settings.
Abstract
The causal bandit problem seeks to identify, through sequential experimentation, an intervention that maximizes the expected reward in a causal system modeled by a directed acyclic graph (DAG). Existing methods typically assume that the causal graph is known or impose restrictive structural assumptions. In this paper, we study causal bandit problems when the causal graph is unknown. We first consider Gaussian DAG models without latent confounders. By combining observational and experimental data collected sequentially during the bandit process, we identify candidate backdoor adjustment sets for each intervention arm. These sets enable estimation of causal effects and construction of upper confidence bounds that integrate information from both data sources. Based on these estimates, we propose a new algorithm, termed backdoor-adjustment upper confidence bound (BA-UCB), for sequential…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
