A Reproducible Log-Driven AutoML Framework for Interpretable Pipeline Optimization in Healthcare Risk Prediction
Rui Huang, Lican Huang

TL;DR
This paper presents yvsoucom-iterkit, a reproducible, log-driven AutoML framework for healthcare risk prediction, analyzing component importance and redundancy to optimize pipeline performance and robustness.
Contribution
The study introduces a deterministic, log-based AutoML system that enables detailed analysis of pipeline components, interactions, and redundancy for improved healthcare risk prediction.
Findings
Performance is driven by a small subset of components like augmentation, model choice, and imbalance handling.
Component similarity analysis reveals high redundancy among feature selection variants and augmentation methods.
Ensemble models achieve high and stable predictive performance on healthcare datasets.
Abstract
Accurate and reproducible disease risk prediction remains challenging due to heterogeneous features, limited samples, and severe class imbalance. This study introduces yvsoucom-iterkit, a deterministic and log-driven automated machine learning framework that formulates pipeline optimization as a fully reproducible, configuration-level system. Each pipeline is encoded as a traceable log entity, enabling analysis of component attribution, interactions, similarity, and cross-seed robustness. Experiments on the Pima Indians Diabetes and Stroke datasets across more than 18,000 pipeline configurations reveal a structured and partially redundant search space, where performance is governed by a small subset of interacting components. Random Forest importance analysis identifies augmentation (0.454), model choice (0.198), and imbalance handling (0.101) as key drivers on Pima, while imbalance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
