Off-Policy Evaluation and Learning for Matching Markets
Yudai Hayashi, Shuhei Goda, Yuta Saito

TL;DR
This paper introduces novel off-policy evaluation estimators, DiPS and DPR, tailored for matching markets, which improve bias-variance trade-offs and enable offline policy learning in large-scale, bidirectional recommendation systems.
Contribution
The paper proposes new OPE estimators specifically designed for matching markets, combining existing techniques with intermediate labels to enhance evaluation accuracy and support offline policy optimization.
Findings
DiPS and DPR outperform existing OPE methods in experiments.
Theoretical analysis confirms reduced bias and variance of the proposed estimators.
Empirical results on synthetic and real data demonstrate improved evaluation and learning performance.
Abstract
Matching users based on mutual preferences is a fundamental aspect of services driven by reciprocal recommendations, such as job search and dating applications. Although A/B tests remain the gold standard for evaluating new policies in recommender systems for matching markets, it is costly and impractical for frequent policy updates. Off-Policy Evaluation (OPE) thus plays a crucial role by enabling the evaluation of recommendation policies using only offline logged data naturally collected on the platform. However, unlike conventional recommendation settings, the large scale and bidirectional nature of user interactions in matching platforms introduce variance issues and exacerbate reward sparsity, making standard OPE methods unreliable. To address these challenges and facilitate effective offline evaluation, we propose novel OPE estimators, \textit{DiPS} and \textit{DPR}, specifically…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques · Mobile Crowdsensing and Crowdsourcing · Game Theory and Voting Systems
