Optimal Analysis for Bandit Learning in Matching Markets with Serial Dictatorship
Zilong Wang, Shuai Li

TL;DR
This paper introduces an optimal algorithm for bandit learning in matching markets with serial dictatorship, achieving regret bounds that match the theoretical lower bounds and improving understanding of learning dynamics in such markets.
Contribution
The paper presents the first algorithm that matches the regret lower bound for bandit learning in matching markets with serial dictatorship.
Findings
Proposed a multi-level successive selection algorithm with optimal regret bounds.
Achieved regret bounds that match the theoretical lower bounds.
Enhanced understanding of learning efficiency in matching markets with unified preferences.
Abstract
The problem of two-sided matching markets is well-studied in computer science and economics, owing to its diverse applications across numerous domains. Since market participants are usually uncertain about their preferences in various online matching platforms, an emerging line of research is dedicated to the online setting where one-side participants (players) learn their unknown preferences through multiple rounds of interactions with the other side (arms). Sankararaman et al. provide an regret lower bound for this problem under serial dictatorship assumption, where is the number of players, is the number of arms, is the minimum reward gap across players and arms, and is the time horizon. Serial dictatorship assumes arms have the same preferences, which is common in reality when…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Game Theory and Voting Systems · Auction Theory and Applications
