# Hierarchical forecasting of COVID-19 cases in Africa using machine learning models

**Authors:** Claris Shoko, Caston Sigauke, Katleho Makatjane

PMC · DOI: 10.3389/fepid.2026.1696282 · Frontiers in Epidemiology · 2026-02-11

## TL;DR

This paper introduces a machine learning-based hierarchical forecasting method to predict COVID-19 cases in Africa, improving accuracy compared to traditional models.

## Contribution

The study proposes a novel bottom-up hierarchical time series forecasting approach using XGBoost and random forest models for accurate disease forecasting in data-scarce regions.

## Key findings

- XGBoost outperformed other single models in forecasting accuracy metrics like mean absolute error and root mean square error.
- Southern Africa reported the highest number of cases despite low population density, highlighting health vulnerabilities and socioeconomic factors.
- The bottom-up hierarchical method combined with machine learning proved effective in limited data environments.

## Abstract

The COVID-19 pandemic posed significant challenges for public health systems, especially in Africa, where data scarcity, inadequate healthcare infrastructure, and regional disparities hindered effective forecasting and response efforts. Conventional forecasting methods have faced challenges in adequately addressing the complexity and detail necessary for effective policy interventions at various administrative levels. This study examines the challenge of producing accurate and coherent forecasts of COVID-19 cases within the hierarchical structure of Africa, which includes the continental, regional, and national levels.

To establish a comprehensive forecasting model that uses hierarchical time series forecasting through a bottom-up reconciliation approach augmented by machine learning algorithms. We employ extreme gradient boosting (XGBoost) and random forest models, subsequently improving predictive accuracy via a weighted average ensemble method. We produce forecasts at the national level and then aggregate them to ensure consistency across all hierarchical levels. The models are evaluated in comparison to conventional methods such as ARIMA and exponential smoothing.

Empirical findings indicate that XGBoost is the best among all the single forecast models used in this study, combining forecasts from the XGBoost with the random forest and assigning more weights to the XGBoost surpasses all other models in the area of mean absolute error, root mean square error, and mean absolute scale error. Results further revealed that Southern Africa, despite its low population density, reported the highest number of cases, indicating underlying health vulnerabilities and socioeconomic factors. In summary, the bottom-up HTSF method, when combined with machine learning, serves as an effective tool for forecasting in environments with limited data availability.

It is advisable to apply similar models to other infectious diseases and to expand their use to guide health interventions, resource allocation, and early warning systems in future pandemics.

## Linked entities

- **Diseases:** COVID-19 (MONDO:0100096)

## Full-text entities

- **Diseases:** tuberculosis co-infection (MESH:D060085), infected (MESH:D007239), COVID-19 (MESH:D000086382), tuberculosis (MESH:D014376), TB (MESH:D014390), HIV (MESH:D015658), communicable diseases (MESH:D003141)
- **Chemicals:** H2O (MESH:D014867), Cabo (MESH:C036046)
- **Species:** Gammacoronavirus (genus) [taxon 694013]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12932500/full.md

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12932500/full.md

## References

38 references — full list in the complete paper: https://tomesphere.com/paper/PMC12932500/full.md

---
Source: https://tomesphere.com/paper/PMC12932500