Personalized Policy Learning through Discrete Experimentation: Theory and Empirical Evidence
Zhiqi Zhang, Zhiyu Zeng, Ruohan Zhan, Dennis Zhang

TL;DR
This paper develops a deep learning framework for personalized continuous policy learning from RCT data with discrete treatments, addressing limitations of traditional discretization methods and demonstrating significant empirical improvements.
Contribution
The paper introduces a theoretically grounded deep learning framework for personalized policy learning from limited RCT treatment levels, with proven asymptotic properties and practical validation.
Findings
Significant improvement in policy value estimation accuracy.
Enhanced ability to identify optimal personalized policies.
Empirical validation shows superior performance over benchmarks.
Abstract
Randomized Controlled Trials (RCTs), or A/B testing, have become the gold standard for optimizing various operational policies on online platforms. However, RCTs on these platforms typically cover a limited number of discrete treatment levels, while the platforms increasingly face complex operational challenges involving optimizing continuous variables, such as pricing and incentive programs. The current industry practice involves discretizing these continuous decision variables into several treatment levels and selecting the optimal discrete treatment level. This approach, however, often leads to suboptimal decisions as it cannot accurately extrapolate performance for untested treatment levels and fails to account for heterogeneity in treatment effects across user characteristics. This study addresses these limitations by developing a theoretically solid and empirically verified…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Causal Inference Techniques · Advanced Bandit Algorithms Research · Mobile Crowdsensing and Crowdsourcing
