Causal Bandits: Learning Good Interventions via Causal Inference
Finnian Lattimore, Tor Lattimore, Mark D. Reid

TL;DR
This paper introduces a causal bandit framework that leverages causal inference to enhance the learning rate of effective interventions in stochastic environments, outperforming traditional methods.
Contribution
It presents a novel algorithm that exploits causal feedback in bandit problems and provides theoretical guarantees showing improved regret bounds over existing approaches.
Findings
The proposed algorithm achieves lower simple regret than non-causal methods.
Theoretical regret bounds demonstrate the advantage of using causal information.
Empirical results confirm the effectiveness of the causal bandit approach.
Abstract
We study the problem of using causal models to improve the rate at which good interventions can be learned online in a stochastic environment. Our formalism combines multi-arm bandits and causal inference to model a novel type of bandit feedback that is not exploited by existing approaches. We propose a new algorithm that exploits the causal feedback and prove a bound on its simple regret that is strictly better (in all quantities) than algorithms that do not use the additional causal information.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Reinforcement Learning in Robotics
MethodsCausal inference
