Beyond $\log^2(T)$ Regret for Decentralized Bandits in Matching Markets
Soumya Basu, Karthik Abinav Sankararaman, Abishek Sankararaman

TL;DR
This paper introduces decentralized algorithms for regret minimization in two-sided matching markets with bandit feedback, achieving near-optimal regret bounds and improving upon previous methods, especially in complex market settings.
Contribution
The paper presents new decentralized algorithms that reduce regret to near-logarithmic levels in general markets and achieve optimal regret in markets with stability conditions, surpassing prior work.
Findings
Achieved $O( ext{log}^{1+ ext{ε}}(T))$ regret for general markets.
Established $ ext{Θ}( ext{log}(T))$ regret in markets with stability.
Demonstrated algorithm superiority through simulations.
Abstract
We design decentralized algorithms for regret minimization in the two-sided matching market with one-sided bandit feedback that significantly improves upon the prior works (Liu et al. 2020a, 2020b, Sankararaman et al. 2020). First, for general markets, for any , we design an algorithm that achieves a regret to the agent-optimal stable matching, with unknown time horizon , improving upon the regret achieved in (Liu et al. 2020b). Second, we provide the optimal agent-optimal regret for markets satisfying uniqueness consistency -- markets where leaving participants don't alter the original stable matching. Previously, regret was achievable (Sankararaman et al. 2020, Liu et al. 2020b) in the much restricted serial dictatorship setting, when all arms have the same preference over the agents.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Mobile Crowdsensing and Crowdsourcing
