Bandit Learning in Decentralized Matching Markets
Lydia T. Liu, Feng Ruan, Horia Mania, Michael I. Jordan

TL;DR
This paper extends multi-armed bandit models to decentralized two-sided matching markets where players learn preferences without communication, proposing an algorithm with logarithmic regret and analyzing its incentive compatibility.
Contribution
Introduces a new algorithm for decentralized matching markets with learning, achieving low regret and analyzing incentive properties under different preference assumptions.
Findings
Achieves $ ext{O}( ext{log}(T))$ regret with shared preferences.
Achieves $ ext{O}( ext{log}(T)^2)$ regret without preference assumptions.
Algorithm is incentive compatible when preferences are shared.
Abstract
We study two-sided matching markets in which one side of the market (the players) does not have a priori knowledge about its preferences for the other side (the arms) and is required to learn its preferences from experience. Also, we assume the players have no direct means of communication. This model extends the standard stochastic multi-armed bandit framework to a decentralized multiple player setting with competition. We introduce a new algorithm for this setting that, over a time horizon , attains stable regret when preferences of the arms over players are shared, and regret when there are no assumptions on the preferences on either side. Moreover, in the setting where a single player may deviate, we show that the algorithm is incentive compatible whenever the arms' preferences are shared, but not necessarily so when preferences are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Game Theory and Voting Systems
