Regret Analysis of Bandit Problems with Causal Background Knowledge
Yangyi Lu, Amirhossein Meisami, Ambuj Tewari, Zhenyu Yan

TL;DR
This paper introduces causal bandit algorithms that leverage causal graph information to improve learning efficiency, achieving lower regret bounds and better scalability compared to traditional methods.
Contribution
It proposes two new algorithms, C-UCB and C-TS, with theoretical regret guarantees, extending to linear bandits with causal features, and demonstrates their effectiveness through experiments.
Findings
Causal algorithms outperform standard algorithms in regret within hundreds of iterations.
The proposed algorithms scale better with the number of interventions under certain causal structures.
They achieve regret bounds that depend only on feature dimension in the linear setting.
Abstract
We study how to learn optimal interventions sequentially given causal information represented as a causal graph along with associated conditional distributions. Causal modeling is useful in real world problems like online advertisement where complex causal mechanisms underlie the relationship between interventions and outcomes. We propose two algorithms, causal upper confidence bound (C-UCB) and causal Thompson Sampling (C-TS), that enjoy improved cumulative regret bounds compared with algorithms that do not use causal information. We thus resolve an open problem posed by \cite{lattimore2016causal}. Further, we extend C-UCB and C-TS to the linear bandit setting and propose causal linear UCB (CL-UCB) and causal linear TS (CL-TS) algorithms. These algorithms enjoy a cumulative regret bound that only scales with the feature dimension. Our experiments show the benefit of using causal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Reinforcement Learning in Robotics
MethodsSpatio-temporal stability analysis
