Robustly Improving Bandit Algorithms with Confounded and Selection   Biased Offline Data: A Causal Approach

Wen Huang; Xintao Wu

arXiv:2312.12731·cs.LG·December 21, 2023·1 cites

Robustly Improving Bandit Algorithms with Confounded and Selection Biased Offline Data: A Causal Approach

Wen Huang, Xintao Wu

PDF

Open Access 1 Video

TL;DR

This paper introduces a causal framework to improve bandit algorithms using biased offline data, effectively addressing confounding and selection biases to enhance decision-making and reduce regret.

Contribution

It formulates a causal approach to derive bounds that are robust to biases, guiding bandit algorithms to better utilize offline data for near-optimal policies.

Findings

01

Derived causal bounds effectively guide policy learning.

02

Incorporating bounds reduces asymptotic regret.

03

Framework applicable to both contextual and non-contextual bandits.

Abstract

This paper studies bandit problems where an agent has access to offline data that might be utilized to potentially improve the estimation of each arm's reward distribution. A major obstacle in this setting is the existence of compound biases from the observational data. Ignoring these biases and blindly fitting a model with the biased data could even negatively affect the online learning phase. In this work, we formulate this problem from a causal perspective. First, we categorize the biases into confounding bias and selection bias based on the causal structure they imply. Next, we extract the causal bound for each arm that is robust towards compound biases from biased observational data. The derived bounds contain the ground truth mean reward and can effectively guide the bandit agent to learn a nearly-optimal decision policy. We also conduct regret analysis in both contextual and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Robustly Improving Bandit Algorithms with Confounded and Selection Biased Offline Data: A Causal Approach· underline

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Cognitive Radio Networks and Spectrum Sensing