fastml: Guarded Resampling Workflows for Safer Automated Machine Learning in R
Selcuk Korkmaz, Dincer Goksuluk, Eda Karaismailoglu

TL;DR
fastml is an R package that enhances machine learning safety by preventing data leakage through guarded resampling, supporting various data scenarios and providing reliable performance estimates.
Contribution
It introduces guarded resampling in R, ensuring preprocessing is re-estimated within each fold to prevent leakage and improve model evaluation reliability.
Findings
Global preprocessing inflates performance estimates compared to guarded resampling.
fastml matches tidymodels performance while simplifying workflow.
Supported consistent benchmarking of survival models across datasets.
Abstract
Preprocessing leakage arises when scaling, imputation, or other data-dependent transformations are estimated before resampling, inflating apparent performance while remaining hard to detect. We present fastml, an R package that provides a single-call interface for leakage-aware machine learning through guarded resampling, where preprocessing is re-estimated inside each resample and applied to the corresponding assessment data. The package supports grouped and time-ordered resampling, blocks high-risk configurations, audits recipes for external dependencies, and includes sandboxed execution and integrated model explanation. We evaluate fastml with a Monte Carlo simulation contrasting global and fold-local normalization, a usability comparison with tidymodels under matched specifications, and survival benchmarks across datasets of different sizes. The simulation demonstrates that global…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
