Regret Analysis of Bandit Problems with Causal Background Knowledge

Yangyi Lu; Amirhossein Meisami; Ambuj Tewari; Zhenyu Yan

arXiv:1910.04938·stat.ML·June 12, 2020·6 cites

Regret Analysis of Bandit Problems with Causal Background Knowledge

Yangyi Lu, Amirhossein Meisami, Ambuj Tewari, Zhenyu Yan

PDF

Open Access

TL;DR

This paper introduces causal bandit algorithms that leverage causal graph information to improve learning efficiency, achieving lower regret bounds and better scalability compared to traditional methods.

Contribution

It proposes two new algorithms, C-UCB and C-TS, with theoretical regret guarantees, extending to linear bandits with causal features, and demonstrates their effectiveness through experiments.

Findings

01

Causal algorithms outperform standard algorithms in regret within hundreds of iterations.

02

The proposed algorithms scale better with the number of interventions under certain causal structures.

03

They achieve regret bounds that depend only on feature dimension in the linear setting.

Abstract

We study how to learn optimal interventions sequentially given causal information represented as a causal graph along with associated conditional distributions. Causal modeling is useful in real world problems like online advertisement where complex causal mechanisms underlie the relationship between interventions and outcomes. We propose two algorithms, causal upper confidence bound (C-UCB) and causal Thompson Sampling (C-TS), that enjoy improved cumulative regret bounds compared with algorithms that do not use causal information. We thus resolve an open problem posed by \cite{lattimore2016causal}. Further, we extend C-UCB and C-TS to the linear bandit setting and propose causal linear UCB (CL-UCB) and causal linear TS (CL-TS) algorithms. These algorithms enjoy a cumulative regret bound that only scales with the feature dimension. Our experiments show the benefit of using causal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Reinforcement Learning in Robotics

MethodsSpatio-temporal stability analysis