Online Semi-Supervised Learning in Contextual Bandits with Episodic Reward
Baihan Lin

TL;DR
This paper introduces BerlinUCB, an online semi-supervised learning algorithm for contextual bandits with episodic rewards, effectively handling nonstationary contexts and missing reward feedback, demonstrated through diverse experiments.
Contribution
The paper presents BerlinUCB, a novel algorithm that integrates clustering for self-supervision in online semi-supervised contextual bandits with episodic rewards.
Findings
BerlinUCB outperforms standard contextual bandits in various scenarios.
The method effectively handles nonstationary environments and missing reward data.
Experiments show significant improvements across multiple datasets.
Abstract
We considered a novel practical problem of online learning with episodically revealed rewards, motivated by several real-world applications, where the contexts are nonstationary over different episodes and the reward feedbacks are not always available to the decision making agents. For this online semi-supervised learning setting, we introduced Background Episodic Reward LinUCB (BerlinUCB), a solution that easily incorporates clustering as a self-supervision module to provide useful side information when rewards are not observed. Our experiments on a variety of datasets, both in stationary and nonstationary environments of six different scenarios, demonstrated clear advantages of the proposed approach over the standard contextual bandit. Lastly, we introduced a relevant real-life example where this problem setting is especially useful.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques · Smart Grid Energy Management
