A New Flexible Train-Test Split Algorithm, an approach for choosing among the Hold-out, K-fold cross-validation, and Hold-out iteration
Zahra Bami, Ali Behnampour, Aniruddha Bora, Hassan Doosti

TL;DR
This paper introduces a flexible framework for selecting the most appropriate data partitioning strategy in machine learning, demonstrating that the optimal validation method varies with algorithm, dataset, and metric.
Contribution
The paper presents a systematic Python-based approach to compare various validation schemes across multiple algorithms and datasets, highlighting the importance of tailored validation strategies.
Findings
No single validation method is best for all scenarios.
Validation strategy effectiveness depends on algorithm, dataset, and metric.
Flexible validation selection improves model evaluation accuracy.
Abstract
Choosing an appropriate strategy for partitioning data into training and evaluation sets is a critical step in machine learning, yet validation methods are often selected using default or conventional settings without considering their impact on generalizability and real-world performance. Common approaches such as hold-out validation or k-fold cross-validation with fixed k values are frequently applied based solely on empirical practice. To address this issue, we propose a flexible Python-based framework that systematically examines how different validation strategies affect predictive performance across seven widely used machine learning algorithms, including Decision Trees, K-Nearest Neighbors, Naive Bayes variants, Logistic Regression, calibrated linear Support Vector Machines, and histogram-based gradient boosting. The framework evaluates these methods under a wide range of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
