Online Semi-Supervised Learning in Contextual Bandits with Episodic   Reward

Baihan Lin

arXiv:2009.08457·cs.LG·October 27, 2020

Online Semi-Supervised Learning in Contextual Bandits with Episodic Reward

Baihan Lin

PDF

Open Access 1 Repo

TL;DR

This paper introduces BerlinUCB, an online semi-supervised learning algorithm for contextual bandits with episodic rewards, effectively handling nonstationary contexts and missing reward feedback, demonstrated through diverse experiments.

Contribution

The paper presents BerlinUCB, a novel algorithm that integrates clustering for self-supervision in online semi-supervised contextual bandits with episodic rewards.

Findings

01

BerlinUCB outperforms standard contextual bandits in various scenarios.

02

The method effectively handles nonstationary environments and missing reward data.

03

Experiments show significant improvements across multiple datasets.

Abstract

We considered a novel practical problem of online learning with episodically revealed rewards, motivated by several real-world applications, where the contexts are nonstationary over different episodes and the reward feedbacks are not always available to the decision making agents. For this online semi-supervised learning setting, we introduced Background Episodic Reward LinUCB (BerlinUCB), a solution that easily incorporates clustering as a self-supervision module to provide useful side information when rewards are not observed. Our experiments on a variety of datasets, both in stationary and nonstationary environments of six different scenarios, demonstrated clear advantages of the proposed approach over the standard contextual bandit. Lastly, we introduced a relevant real-life example where this problem setting is especially useful.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

doerlbh/BerlinUCB
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques · Smart Grid Energy Management