Compliance-Aware Bandits

Nicol\'as Della Penna; Mark D. Reid; David Balduzzi

arXiv:1602.02852·stat.ML·February 10, 2016·1 cites

Compliance-Aware Bandits

Nicol\'as Della Penna, Mark D. Reid, David Balduzzi

PDF

Open Access

TL;DR

This paper investigates how observable non-compliance in bandit algorithms affects learning, proposing hybrid algorithms that incorporate compliance data while maintaining near-optimal regret bounds, demonstrated through real clinical trial data.

Contribution

It introduces hybrid algorithms that effectively utilize compliance information in bandit settings, preserving regret guarantees and demonstrating practical benefits.

Findings

01

Non-compliance can be beneficial or detrimental to learning.

02

Naive use of compliance data can lead to loss of regret guarantees.

03

Hybrid algorithms maintain regret bounds and improve performance in real data simulations.

Abstract

Motivated by clinical trials, we study bandits with observable non-compliance. At each step, the learner chooses an arm, after, instead of observing only the reward, it also observes the action that took place. We show that such noncompliance can be helpful or hurtful to the learner in general. Unfortunately, naively incorporating compliance information into bandit algorithms loses guarantees on sublinear regret. We present hybrid algorithms that maintain regret bounds up to a multiplicative factor and can incorporate compliance information. Simulations based on real data from the International Stoke Trial show the practical potential of these algorithms.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Reinforcement Learning in Robotics