Fusion Sampling Validation in Data Partitioning for Machine Learning

Christopher Godwin Udomboso; Caston Sigauke; Ini Adinya

arXiv:2508.01325·cs.LG·August 5, 2025

Fusion Sampling Validation in Data Partitioning for Machine Learning

Christopher Godwin Udomboso, Caston Sigauke, Ini Adinya

PDF

Open Access

TL;DR

This paper introduces Fusion Sampling Validation (FSV), a hybrid data partitioning method combining SRS and KFCV, which improves accuracy and reliability in machine learning model evaluation, especially for large datasets and limited resources.

Contribution

The study proposes and validates FSV, a novel hybrid sampling approach that enhances data partitioning accuracy over traditional methods in machine learning.

Findings

01

FSV outperforms SRS and KFCV in accuracy and reliability.

02

FSV achieves lower bias and mean squared error.

03

FSV is effective for large datasets and resource-constrained environments.

Abstract

Effective data partitioning is known to be crucial in machine learning. Traditional cross-validation methods like K-Fold Cross-Validation (KFCV) enhance model robustness but often compromise generalisation assessment due to high computational demands and extensive data shuffling. To address these issues, the integration of the Simple Random Sampling (SRS), which, despite providing representative samples, can result in non-representative sets with imbalanced data. The study introduces a hybrid model, Fusion Sampling Validation (FSV), combining SRS and KFCV to optimise data partitioning. FSV aims to minimise biases and merge the simplicity of SRS with the accuracy of KFCV. The study used three datasets of 10,000, 50,000, and 100,000 samples, generated with a normal distribution (mean 0, variance 1) and initialised with seed 42. KFCV was performed with five folds and ten repetitions,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Imbalanced Data Classification Techniques · Face and Expression Recognition