Bandit Learning in Housing Markets
Shiyun Lin

TL;DR
This paper models the housing market as a multi-armed bandit problem where agents learn preferences over goods through repeated interactions, proposing algorithms with provable regret bounds for stable allocations.
Contribution
It introduces a novel bandit learning framework for housing markets, providing algorithms and theoretical bounds for both centralized and decentralized preference learning.
Findings
Achieves $O(rac{ ext{log} T}{ riangle^2})$ regret bounds
Establishes a matching lower bound for decentralized settings
Demonstrates order-optimality of the proposed algorithms
Abstract
The housing market, also known as one-sided matching market, is a classic exchange economy model where each agent on the demand side initially owns an indivisible good (a house) and has a personal preference over all goods. The goal is to find a core-stable allocation that exhausts all mutually beneficial exchanges among subgroups of agents. While this model has been extensively studied in economics and computer science due to its broad applications, little attention has been paid to settings where preferences are unknown and must be learned through repeated interactions. In this paper, we propose a statistical learning model within the multi-player multi-armed bandit framework, where players (agents) learn their preferences over arms (goods) from stochastic rewards. We introduce the notion of \emph{core regret} for each player as the market objective. We study both centralized and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Game Theory and Applications
