Selectively Contextual Bandits
Claudia Roberts, Maria Dimakopoulou, Qifeng Qiao, Ashok, Chandrashekhar, Tony Jebara

TL;DR
This paper introduces a hybrid online learning algorithm for contextual bandits that balances personalized treatment with shared community experiences, leveraging context only when it offers significant gains.
Contribution
It proposes a novel selective interpolation method between contextual and context-free bandits, improving personalization and community benefits while simplifying treatment policies.
Findings
Hybrid policy improves user experience balance.
Selective context reliance enhances learning rate.
Method effective on public classification datasets.
Abstract
Contextual bandits are widely used in industrial personalization systems. These online learning frameworks learn a treatment assignment policy in the presence of treatment effects that vary with the observed contextual features of the users. While personalization creates a rich user experience that reflect individual interests, there are benefits of a shared experience across a community that enable participation in the zeitgeist. Such benefits are emergent through network effects and are not captured in regret metrics typically employed in evaluating bandits. To balance these needs, we propose a new online learning algorithm that preserves benefits of personalization while increasing the commonality in treatments across users. Our approach selectively interpolates between a contextual bandit algorithm and a context-free multi-arm bandit and leverages the contextual information for a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques · Online Learning and Analytics
