A random shuffle method to expand a narrow dataset and overcome the associated challenges in a clinical study: a heart failure cohort example
Lorenzo Fassina, Alessandro Faragli, Francesco Paolo Lo Muzio,, Sebastian Kelle, Carlo Campana, Burkert Pieske, Frank Edelmann, Alessio, Alogna

TL;DR
This paper introduces a novel random shuffle method to artificially expand small clinical datasets, improving predictive modeling in heart failure studies without relying on complex hypotheses or models.
Contribution
The study presents a new random shuffle technique that significantly increases dataset size while maintaining statistical validity, outperforming traditional repeated-measures methods.
Findings
Dataset cardinality increased by approximately 10 times with the shuffle method.
Further increase to 21 times when combined with repeated-measures approach.
Enhanced datasets improved the accuracy of machine learning and regression models.
Abstract
Heart failure (HF) affects at least 26 million people worldwide, so predicting adverse events in HF patients represents a major target of clinical data science. However, achieving large sample sizes sometimes represents a challenge due to difficulties in patient recruiting and long follow-up times, increasing the problem of missing data. To overcome the issue of a narrow dataset cardinality (in a clinical dataset, the cardinality is the number of patients in that dataset), population-enhancing algorithms are therefore crucial. The aim of this study was to design a random shuffle method to enhance the cardinality of an HF dataset while it is statistically legitimate, without the need of specific hypotheses and regression models. The cardinality enhancement was validated against an established random repeated-measures method with regard to the correctness in predicting clinical conditions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
