Impact of Comprehensive Data Preprocessing on Predictive Modelling of COVID-19 Mortality
Sangita Das, Subhrajyoti Maji

TL;DR
This study demonstrates that a tailored data preprocessing pipeline significantly improves the accuracy of machine learning models predicting COVID-19 mortality, emphasizing the importance of customized data handling techniques.
Contribution
The paper introduces a novel, comprehensive preprocessing pipeline that enhances predictive accuracy for COVID-19 mortality models beyond standard methods.
Findings
Custom pipeline improves model performance (e.g., RMSE and R-squared)
MLP Regressor outperforms DecisionTree in this context
Tailored preprocessing techniques are valuable for predictive modeling
Abstract
Accurate predictive models are crucial for analysing COVID-19 mortality trends. This study evaluates the impact of a custom data preprocessing pipeline on ten machine learning models predicting COVID-19 mortality using data from Our World in Data (OWID). Our pipeline differs from a standard preprocessing pipeline through four key steps. Firstly, it transforms weekly reported totals into daily updates, correcting reporting biases and providing more accurate estimates. Secondly, it uses localised outlier detection and processing to preserve data variance and enhance accuracy. Thirdly, it utilises computational dependencies among columns to ensure data consistency. Finally, it incorporates an iterative feature selection process to optimise the feature set and improve model performance. Results show a significant improvement with the custom pipeline: the MLP Regressor achieved a test RMSE…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare · Machine Learning in Healthcare · COVID-19 diagnosis using AI
MethodsSparse Evolutionary Training · Feature Selection
