Contextual Bandits with Cross-learning

Santiago Balseiro; Negin Golrezaei; Mohammad Mahdian; Vahab Mirrokni,; Jon Schneider

arXiv:1809.09582·cs.LG·November 17, 2021

Contextual Bandits with Cross-learning

Santiago Balseiro, Negin Golrezaei, Mohammad Mahdian, Vahab Mirrokni,, Jon Schneider

PDF

TL;DR

This paper introduces algorithms for a new variant of the contextual bandits problem with cross-learning, achieving improved regret bounds especially when rewards for all contexts are learned simultaneously, and demonstrates their effectiveness on real auction data.

Contribution

The paper proposes novel algorithms for contextual bandits with cross-learning, reducing regret dependence on the number of contexts and extending understanding of partial cross-learning scenarios.

Findings

01

Achieves regret $ ilde{O}( oot{K}{T})$ under complete cross-learning.

02

Outperforms traditional algorithms on real auction data.

03

Provides theoretical analysis for partial cross-learning cases.

Abstract

In the classical contextual bandits problem, in each round $t$ , a learner observes some context $c$ , chooses some action $i$ to perform, and receives some reward $r_{i, t} (c)$ . We consider the variant of this problem where in addition to receiving the reward $r_{i, t} (c)$ , the learner also learns the values of $r_{i, t} (c^{'})$ for some other contexts $c^{'}$ in set $O_{i} (c)$ ; i.e., the rewards that would have been achieved by performing that action under different contexts $c^{'} \in O_{i} (c)$ . This variant arises in several strategic settings, such as learning how to bid in non-truthful repeated auctions, which has gained a lot of attention lately as many platforms have switched to running first-price auctions. We call this problem the contextual bandits problem with cross-learning. The best algorithms for the classical contextual bandits problem achieve $\tilde{O} (C K T)$ …

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.