Variation in prediction accuracy due to randomness in data division and fair evaluation using interval estimation
Isao Goto

TL;DR
This study investigates how randomness in data partitioning affects the accuracy of diabetes prediction models and proposes using statistical interval estimation for fair comparison.
Contribution
It demonstrates the impact of initial state-dependent randomness on model accuracy and introduces interval estimation for equitable evaluation.
Findings
Prediction accuracy varies with initial state-dependent randomness.
Prediction accuracy distribution can be approximated by a normal distribution.
Interval estimation enables fair comparison of models.
Abstract
This paper attempts to answer a "simple question" in building predictive models using machine learning algorithms. Although diagnostic and predictive models for various diseases have been proposed using data from large cohort studies and machine learning algorithms, challenges remain in their generalizability. Several causes for this challenge have been pointed out, and partitioning of the dataset with randomness is considered to be one of them. In this study, we constructed 33,600 diabetes diagnosis models with "initial state" dependent randomness using autoML (automatic machine learning framework) and open diabetes data, and evaluated their prediction accuracy. The results showed that the prediction accuracy had an initial state-dependent distribution. Since this distribution could follow a normal distribution, we estimated the expected interval of prediction accuracy using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
