In-Sample Evaluation of Subgroups Identified by Generic Machine Learning
Shuoxun Xu, Xinzhou Guo

TL;DR
This paper introduces a new method to accurately evaluate subgroups identified by machine learning within the same dataset, addressing bias issues inherent in traditional in-sample evaluation methods.
Contribution
It proposes a conditional adaptive perturbation approach that removes selection bias in in-sample subgroup evaluation, applicable to complex, data-dependent subgroups.
Findings
Method achieves valid inference regardless of regularity conditions.
Approach is model-free and suitable for black-box subgroup identification.
Demonstrated effectiveness through re-analysis of ACTG 175 trial.
Abstract
When a subgroup is identified from the data, it must be evaluated in a replicable way. The usual in-sample approach, which evaluates the post-hoc identified subgroup as predefined, might suffer from selection bias. This issue of in-sample evaluation of data-dependent objects is well recognized but particularly challenging here. Unlike discrete or finite-dimensional data-dependent objects addressed before, the selection bias here is induced by post-hoc identified subgroups, data-dependent sets potentially defined by infinite-dimensional functionals with nonsmooth boundaries known as nonregularity. The out-of-sample approach, which splits data for subgroup identification and evaluation, can help address selection bias but might suffer from efficiency loss and instability. In this paper, we propose a conditional adaptive perturbation approach to remove selection bias in in-sample subgroup…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
