Player-optimal Stable Regret for Bandit Learning in Matching Markets
Fang Kong, Shuai Li

TL;DR
This paper introduces an algorithm for bandit learning in matching markets that achieves near-optimal stable regret bounds for players, significantly improving over previous methods especially when preference gaps are small.
Contribution
The paper proposes the explore-then-Gale-Shapley (ETGS) algorithm that bounds player-optimal stable regret by O(K log T / Δ^2), advancing the theoretical understanding of learning in matching markets.
Findings
Achieves polynomial regret bounds for player-optimal stable matching.
Improves upon previous results with exponential bounds under small preference gaps.
Matches lower bounds under certain preference conditions.
Abstract
The problem of matching markets has been studied for a long time in the literature due to its wide range of applications. Finding a stable matching is a common equilibrium objective in this problem. Since market participants are usually uncertain of their preferences, a rich line of recent works study the online setting where one-side participants (players) learn their unknown preferences from iterative interactions with the other side (arms). Most previous works in this line are only able to derive theoretical guarantees for player-pessimal stable regret, which is defined compared with the players' least-preferred stable matching. However, under the pessimal stable matching, players only obtain the least reward among all stable matchings. To maximize players' profits, player-optimal stable matching would be the most desirable. Though \citet{basu21beyond} successfully bring an upper…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Stochastic Gradient Optimization Techniques
