Curse of Small Sample Size in Forecasting of the Active Cases in COVID-19 Outbreak
Mert Nak{\i}p, Onur \c{C}opur, C\"uneyt G\"uzeli\c{s}

TL;DR
This study investigates why machine learning models struggle to accurately forecast COVID-19 active cases long-term, highlighting the impact of small sample sizes and demonstrating that simple models perform best within a short 3-day window.
Contribution
The paper analyzes the limitations of complex machine learning models in COVID-19 forecasting due to small sample sizes and compares feature selection methods and models to improve short-term prediction accuracy.
Findings
Linear regression achieves high accuracy for 2-week forecasts
Complex models fail to generalize well with limited data
Short-term predictions (up to 3 days) are feasible with current data
Abstract
During the COVID-19 pandemic, a massive number of attempts on the predictions of the number of cases and the other future trends of this pandemic have been made. However, they fail to predict, in a reliable way, the medium and long term evolution of fundamental features of COVID-19 outbreak within acceptable accuracy. This paper gives an explanation for the failure of machine learning models in this particular forecasting problem. The paper shows that simple linear regression models provide high prediction accuracy values reliably but only for a 2-weeks period and that relatively complex machine learning models, which have the potential of learning long term predictions with low errors, cannot achieve to obtain good predictions with possessing a high generalization ability. It is suggested in the paper that the lack of a sufficient number of samples is the source of low prediction…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCOVID-19 diagnosis using AI · Anomaly Detection Techniques and Applications · COVID-19 epidemiological studies
MethodsFeature Selection · Linear Regression
