Contextual Bandits with Latent Confounders: An NMF Approach

Rajat Sen; Karthikeyan Shanmugam; Murat Kocaoglu; Alexandros G.; Dimakis; and Sanjay Shakkottai

arXiv:1606.00119·cs.LG·October 28, 2016·5 cites

Contextual Bandits with Latent Confounders: An NMF Approach

Rajat Sen, Karthikeyan Shanmugam, Murat Kocaoglu, Alexandros G., Dimakis, and Sanjay Shakkottai

PDF

Open Access

TL;DR

This paper introduces an NMF-based approach to causal stochastic contextual bandits with latent confounders, achieving improved regret bounds by exploiting low-dimensional structure in the reward matrix.

Contribution

It proposes a novel $ ext{NMF}$-based algorithm for contextual bandits with latent confounders, providing the first regret guarantees for online matrix completion with bandit feedback when rank exceeds one.

Findings

01

Achieves regret of $ ilde{O}(L ext{poly}(m, ext{log} K) ext{log} T$

02

Outperforms conventional bandit algorithms in experiments

03

Provides theoretical lower bounds matching upper bounds under mild conditions

Abstract

Motivated by online recommendation and advertising systems, we consider a causal model for stochastic contextual bandits with a latent low-dimensional confounder. In our model, there are $L$ observed contexts and $K$ arms of the bandit. The observed context influences the reward obtained through a latent confounder variable with cardinality $m$ ( $m ≪ L, K$ ). The arm choice and the latent confounder causally determines the reward while the observed context is correlated with the confounder. Under this model, the $L \times K$ mean reward matrix $U$ (for each context in $[L]$ and each arm in $[K]$ ) factorizes into non-negative factors $A$ ( $L \times m$ ) and $W$ ( $m \times K$ ). This insight enables us to propose an $ϵ$ -greedy NMF-Bandit algorithm that designs a sequence of interventions (selecting specific arms), that achieves a balance between learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Reinforcement Learning in Robotics