Fusion Sampling Validation in Data Partitioning for Machine Learning
Christopher Godwin Udomboso, Caston Sigauke, Ini Adinya

TL;DR
This paper introduces Fusion Sampling Validation (FSV), a hybrid data partitioning method combining SRS and KFCV, which improves accuracy and reliability in machine learning model evaluation, especially for large datasets and limited resources.
Contribution
The study proposes and validates FSV, a novel hybrid sampling approach that enhances data partitioning accuracy over traditional methods in machine learning.
Findings
FSV outperforms SRS and KFCV in accuracy and reliability.
FSV achieves lower bias and mean squared error.
FSV is effective for large datasets and resource-constrained environments.
Abstract
Effective data partitioning is known to be crucial in machine learning. Traditional cross-validation methods like K-Fold Cross-Validation (KFCV) enhance model robustness but often compromise generalisation assessment due to high computational demands and extensive data shuffling. To address these issues, the integration of the Simple Random Sampling (SRS), which, despite providing representative samples, can result in non-representative sets with imbalanced data. The study introduces a hybrid model, Fusion Sampling Validation (FSV), combining SRS and KFCV to optimise data partitioning. FSV aims to minimise biases and merge the simplicity of SRS with the accuracy of KFCV. The study used three datasets of 10,000, 50,000, and 100,000 samples, generated with a normal distribution (mean 0, variance 1) and initialised with seed 42. KFCV was performed with five folds and ten repetitions,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Imbalanced Data Classification Techniques · Face and Expression Recognition
