Competing Bandits in Decentralized Contextual Matching Markets
Satush Parikh, Soumya Basu, Avishek Ghosh, Abishek Sankararaman

TL;DR
This paper introduces algorithms for decentralized multi-agent matching markets with non-stationary preferences, achieving low regret and stable matches by learning latent environments in a linear contextual bandit setting.
Contribution
It develops novel algorithms that simultaneously identify latent environments and learn stable matchings in non-stationary, multi-agent markets with linear contextual preferences.
Findings
Achieves instance-dependent logarithmic regret.
Scales independently of the number of arms.
Handles non-stationary latent environments.
Abstract
Sequential learning in a multi-agent resource constrained matching market has received significant interest in the past few years. We study decentralized learning in two-sided matching markets where the demand side (aka players or agents) competes for the supply side (aka arms) with potentially time-varying preferences to obtain a stable match. Motivated by the linear contextual bandit framework, we assume that for each agent, an arm-mean may be represented by a linear function of a known feature vector and an unknown (agent-specific) parameter. Moreover, the preferences over arms depend on a latent environment in each round, where the latent environment varies across rounds in a non-stationary manner. We propose learning algorithms to identify the latent environment and obtain stable matchings simultaneously. Our proposed algorithms achieve instance-dependent logarithmic regret,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuction Theory and Applications · Experimental Behavioral Economics Studies · Game Theory and Voting Systems
