Causal Bandits with Unknown Graph Structure
Yangyi Lu, Amirhossein Meisami, Ambuj Tewari

TL;DR
This paper introduces new algorithms for causal bandit problems that do not require prior knowledge of the causal graph, achieving improved regret bounds for various graph structures.
Contribution
It develops the first causal bandit algorithms that operate without known causal graphs, applicable to trees, forests, and general graphs, with proven regret guarantees.
Findings
Algorithms perform well on causal trees, forests, and general graphs.
Regret bounds significantly better than standard MAB algorithms under mild conditions.
Necessary conditions for improved regret bounds are established.
Abstract
In causal bandit problems, the action set consists of interventions on variables of a causal graph. Several researchers have recently studied such bandit problems and pointed out their practical applications. However, all existing works rely on a restrictive and impractical assumption that the learner is given full knowledge of the causal graph structure upfront. In this paper, we develop novel causal bandit algorithms without knowing the causal graph. Our algorithms work well for causal trees, causal forests and a general class of causal graphs. The regret guarantees of our algorithms greatly improve upon those of standard multi-armed bandit (MAB) algorithms under mild conditions. Lastly, we prove our mild conditions are necessary: without them one cannot do better than standard MAB algorithms.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Reinforcement Learning in Robotics
