Self-Supervised Contextual Bandits in Computer Vision
Aniket Anand Deshmukh, Abhimanu Kumar, Levi Boyles, Denis Charles,, Eren Manavoglu, Urun Dogan

TL;DR
This paper introduces a novel method combining self-supervised learning with contextual bandits to improve reward optimization in computer vision tasks, demonstrating significant gains across multiple datasets.
Contribution
It proposes a new approach that integrates self-supervision into contextual bandit algorithms, addressing the lack of implicit labels in early learning stages.
Findings
Substantial improvements in cumulative reward on eight datasets
Identification of cases where the method underperforms and alternative solutions
Enhanced data representation learning for better decision-making
Abstract
Contextual bandits are a common problem faced by machine learning practitioners in domains as diverse as hypothesis testing to product recommendations. There have been a lot of approaches in exploiting rich data representations for contextual bandit problems with varying degree of success. Self-supervised learning is a promising approach to find rich data representations without explicit labels. In a typical self-supervised learning scheme, the primary task is defined by the problem objective (e.g. clustering, classification, embedding generation etc.) and the secondary task is defined by the self-supervision objective (e.g. rotation prediction, words in neighborhood, colorization, etc.). In the usual self-supervision, we learn implicit labels from the training data for a secondary task. However, in the contextual bandit setting, we don't have the advantage of getting implicit labels…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Data Classification · Domain Adaptation and Few-Shot Learning
