TL;DR
This paper introduces a novel framework and algorithms for learning stable market outcomes in large-scale matching markets with uncertainty, using bandit algorithms to balance stability and learning efficiency.
Contribution
It develops a new incentive-aware learning objective and applies bandit algorithms to the matching with transfers problem, providing near-optimal regret bounds.
Findings
Bandit algorithms can effectively learn market equilibria under uncertainty.
The proposed approach achieves near-optimal regret bounds.
Stability can be approximated in data-driven matching markets.
Abstract
Large-scale, two-sided matching platforms must find market outcomes that align with user preferences while simultaneously learning these preferences from data. Classical notions of stability (Gale and Shapley, 1962; Shapley and Shubik, 1971) are unfortunately of limited value in the learning setting, given that preferences are inherently uncertain and destabilizing while they are being learned. To bridge this gap, we develop a framework and algorithms for learning stable market outcomes under uncertainty. Our primary setting is matching with transferable utilities, where the platform both matches agents and sets monetary transfers between them. We design an incentive-aware learning objective that captures the distance of a market outcome from equilibrium. Using this objective, we analyze the complexity of learning as a function of preference structure, casting learning as a stochastic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
