Matching while Learning
Ramesh Johari, Vijay Kamble, Yash Kanoria

TL;DR
This paper models a platform's challenge of matching limited supply with demand while learning worker attributes, providing a complete characterization of optimal policies balancing exploration and exploitation in a multi-armed bandit framework.
Contribution
It introduces a benchmark model with heterogeneous workers and capacity constraints, and characterizes the structure of optimal policies in the many-jobs limit.
Findings
Optimal policy involves estimating shadow prices for job types.
Platform balances exploration and exploitation based on learned worker types.
Framework applies broadly beyond labor markets.
Abstract
We consider the problem faced by a service platform that needs to match limited supply with demand but also to learn the attributes of new users in order to match them better in the future. We introduce a benchmark model with heterogeneous "workers" (demand) and a limited supply of "jobs" that arrive over time. Job types are known to the platform, but worker types are unknown and must be learned by observing match outcomes. Workers depart after performing a certain number of jobs. The expected payoff from a match depends on the pair of types and the goal is to maximize the steady-state rate of accumulation of payoff. Though we use terminology inspired by labor markets, our framework applies more broadly to platforms where a limited supply of heterogeneous products is matched to users over time. Our main contribution is a complete characterization of the structure of the optimal policy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Auction Theory and Applications · Smart Grid Energy Management
