Competing Bandits in Time Varying Matching Markets
Deepan Muthirayan, Chinmay Maheshwari, Pramod P. Khargonekar, Shankar, Sastry

TL;DR
This paper introduces a method for online learning in non-stationary two-sided matching markets with time-varying preferences, achieving near-optimal regret bounds despite changing preferences.
Contribution
It proposes a versatile algorithm that handles any preference structure and variation scenario, extending prior fixed-preference models to dynamic settings.
Findings
Achieves sub-linear regret of (L_T^{1/2} T^{1/2}) with known preference changes.
Matches optimal single-agent learning rates despite market competition.
Extends to scenarios with unknown number of preference changes.
Abstract
We study the problem of online learning in two-sided non-stationary matching markets, where the objective is to converge to a stable match. In particular, we consider the setting where one side of the market, the arms, has fixed known set of preferences over the other side, the players. While this problem has been studied when the players have fixed but unknown preferences, in this work we study the problem of how to learn when the preferences of the players are time varying and unknown. Our contribution is a methodology that can handle any type of preference structure and variation scenario. We show that, with the proposed algorithm, each player receives a uniform sub-linear regret of {} up to the number of changes in the underlying preferences of the agents, . Therefore, we show that the optimal rates for single-agent learning can be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuction Theory and Applications · Advanced Bandit Algorithms Research · Game Theory and Applications
