Learning to Bid Without Knowing your Value
Zhe Feng, Chara Podimata, Vasilis Syrgkanis

TL;DR
This paper develops a fast-converging online learning algorithm for bidders in complex auctions with unknown and evolving values, leveraging auction feedback structure to outperform traditional bandit methods.
Contribution
It introduces a novel online learning algorithm tailored for outcome-based feedback in auctions, achieving exponential improvements in regret convergence rates.
Findings
Algorithm outperforms bandit approaches in experiments.
Regret grows logarithmically with actions, linearly with outcomes.
Performance remains robust under noise and relaxed assumptions.
Abstract
We address online learning in complex auction settings, such as sponsored search auctions, where the value of the bidder is unknown to her, evolving in an arbitrary manner and observed only if the bidder wins an allocation. We leverage the structure of the utility of the bidder and the partial feedback that bidders typically receive in auctions, in order to provide algorithms with regret rates against the best fixed bid in hindsight, that are exponentially faster in convergence in terms of dependence on the action space, than what would have been derived by applying a generic bandit algorithm and almost equivalent to what would have been achieved in the full information setting. Our results are enabled by analyzing a new online learning setting with outcome-based feedback, which generalizes learning with feedback graphs. We provide an online learning algorithm for this setting, of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Machine Learning and Algorithms
