Matching-Based Policy Learning
Xuqiao Li, Ying Yan

TL;DR
This paper introduces a novel matching-based framework for policy learning that estimates the advantage function to identify optimal policies, demonstrating competitive performance through theoretical guarantees and empirical evaluations.
Contribution
It adapts matching methods to policy learning by estimating the advantage function, providing theoretical regret bounds and showing near rate-optimality.
Findings
The method achieves competitive finite-sample performance.
Theoretical regret bounds are established.
Empirical results validate the approach in simulations and real data.
Abstract
The beneficial effects of treatments vary across individuals in most studies. Treatment heterogeneity motivates practitioners to search for the optimal policy based on personal characteristics. A long-standing common practice in policy learning has been estimating and maximizing the value function using weighting techniques. Matching is widely used in many applied disciplines to infer causal effects, which is intuitively appealing because the observed covariates are directly balanced across different treatment groups. Nevertheless, matching is rarely explored in policy learning. In this work, we propose a matching-based policy learning framework. We adapt standard and bias-corrected matching methods to estimate an alternative form of the value function: the advantage function, which can be interpreted as the expected improvement achieved by implementing a given policy compared to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSocial Policy and Reform Studies · Local Government Finance and Decentralization
