A Hamiltonian Monte Carlo Model for Imputation and Augmentation of Healthcare Data
Narges Pourshahrokhi, Samaneh Kouchaki, Kord M. Kober, Christine, Miaskowski, Payam Barnaghi

TL;DR
This paper introduces a Bayesian Hamiltonian Monte Carlo method for imputing missing healthcare data and generating augmented samples, improving data quality and model performance in high-dimensional, small-sample datasets.
Contribution
It presents a novel folded Hamiltonian Monte Carlo approach for joint imputation and augmentation of healthcare data, addressing privacy and correlation challenges.
Findings
Enhanced data quality in cancer symptom assessment dataset
Improved model metrics such as precision, recall, and F1 score
Effective handling of high-dimensional, small-sample healthcare data
Abstract
Missing values exist in nearly all clinical studies because data for a variable or question are not collected or not available. Inadequate handling of missing values can lead to biased results and loss of statistical power in analysis. Existing models usually do not consider privacy concerns or do not utilise the inherent correlations across multiple features to impute the missing values. In healthcare applications, we are usually confronted with high dimensional and sometimes small sample size datasets that need more effective augmentation or imputation techniques. Besides, imputation and augmentation processes are traditionally conducted individually. However, imputing missing values and augmenting data can significantly improve generalisation and avoid bias in machine learning models. A Bayesian approach to impute missing values and creating augmented samples in high dimensional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Statistical Methods and Inference · Bayesian Methods and Mixture Models
