In-Sample Evaluation of Subgroups Identified by Generic Machine Learning

Shuoxun Xu; Xinzhou Guo

arXiv:2605.03141·stat.ME·May 6, 2026

In-Sample Evaluation of Subgroups Identified by Generic Machine Learning

Shuoxun Xu, Xinzhou Guo

PDF

TL;DR

This paper introduces a new method to accurately evaluate subgroups identified by machine learning within the same dataset, addressing bias issues inherent in traditional in-sample evaluation methods.

Contribution

It proposes a conditional adaptive perturbation approach that removes selection bias in in-sample subgroup evaluation, applicable to complex, data-dependent subgroups.

Findings

01

Method achieves valid inference regardless of regularity conditions.

02

Approach is model-free and suitable for black-box subgroup identification.

03

Demonstrated effectiveness through re-analysis of ACTG 175 trial.

Abstract

When a subgroup is identified from the data, it must be evaluated in a replicable way. The usual in-sample approach, which evaluates the post-hoc identified subgroup as predefined, might suffer from selection bias. This issue of in-sample evaluation of data-dependent objects is well recognized but particularly challenging here. Unlike discrete or finite-dimensional data-dependent objects addressed before, the selection bias here is induced by post-hoc identified subgroups, data-dependent sets potentially defined by infinite-dimensional functionals with nonsmooth boundaries known as nonregularity. The out-of-sample approach, which splits data for subgroup identification and evaluation, can help address selection bias but might suffer from efficiency loss and instability. In this paper, we propose a conditional adaptive perturbation approach to remove selection bias in in-sample subgroup…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.