Sample Splitting as an M-Estimator with Application to Physical Activity Scoring
Eli S. Kravitz, Raymond J. Carroll, David Ruppert

TL;DR
This paper explores the use of sample splitting as an M-estimator to create valid inference procedures in physical activity research, particularly when using the same data for score creation and outcome modeling.
Contribution
It introduces a novel application of sample splitting in physical activity scoring, deriving the limiting distribution and analyzing multiple split combinations.
Findings
Sample splitting provides valid inference for physical activity scores.
Multiple sample splits converge to a set of estimating equations.
Theoretical derivation of the estimator's limiting distribution.
Abstract
Sample splitting is widely used in statistical applications, including classically in classification and more recently for inference post model selection. Motivating by problems in the study of diet, physical activity, and health, we consider a new application of sample splitting. Physical activity researchers wanted to create a scoring system to quickly assess physical activity levels. A score is created using a large cohort study. Then, using the same data, this score serves as a covariate in a model for the risk of disease or mortality. Since the data are used twice in this way, standard errors and confidence intervals from fitting the second model are not valid. To allow for proper inference, sample splitting can be used. One builds the score with a random half of the data and then uses the score when fitting a model to the other half of the data. We derive the limiting distribution…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Advanced Statistical Methods and Models · Nutritional Studies and Diet
