Contextual Bandits with Cross-learning
Santiago Balseiro, Negin Golrezaei, Mohammad Mahdian, Vahab Mirrokni,, Jon Schneider

TL;DR
This paper introduces algorithms for a new variant of the contextual bandits problem with cross-learning, achieving improved regret bounds especially when rewards for all contexts are learned simultaneously, and demonstrates their effectiveness on real auction data.
Contribution
The paper proposes novel algorithms for contextual bandits with cross-learning, reducing regret dependence on the number of contexts and extending understanding of partial cross-learning scenarios.
Findings
Achieves regret $ ilde{O}( oot{K}{T})$ under complete cross-learning.
Outperforms traditional algorithms on real auction data.
Provides theoretical analysis for partial cross-learning cases.
Abstract
In the classical contextual bandits problem, in each round , a learner observes some context , chooses some action to perform, and receives some reward . We consider the variant of this problem where in addition to receiving the reward , the learner also learns the values of for some other contexts in set ; i.e., the rewards that would have been achieved by performing that action under different contexts . This variant arises in several strategic settings, such as learning how to bid in non-truthful repeated auctions, which has gained a lot of attention lately as many platforms have switched to running first-price auctions. We call this problem the contextual bandits problem with cross-learning. The best algorithms for the classical contextual bandits problem achieve …
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
