# Integrating DHIS2 and R for Enhanced Cholera Surveillance in Lebanon: A Case Study on Improving Data Quality

**Authors:** Abass Toufic Jouny, Hawraa Sweidan, Maryo Baakliny, Nada Ghosn

PMC · DOI: 10.3390/ijerph22111684 · International Journal of Environmental Research and Public Health · 2025-11-06

## TL;DR

This paper describes how integrating R with DHIS2 improved cholera surveillance in Lebanon by enhancing data quality and analysis during an outbreak.

## Contribution

The novel contribution is the development and application of R scripts to automate and improve DHIS2 data for cholera surveillance.

## Key findings

- Data completeness improved from 99.7% missing to fully complete after cleaning.
- Reclassification reduced suspected cases from 92.8% to 40%.
- Integration enabled better spatial and laboratory analysis for public health decisions.

## Abstract

During the 2022–2023 cholera outbreak in Lebanon, cases were reported through the District Health Information System 2 (DHIS2). We developed automated procedures in R computing language to improve completeness of routinely notified variables, apply case definition criteria, improve geographic accuracy and documentation of laboratory results. We developed R scripts for data cleaning, standardization, and reclassification, plotted epidemic curves and produced maps to display cholera incidence rates and rapid diagnostic test (RDT) coverage by district. We shared the R scripts on GitHub platform for open adaptation and use. Prior to cleaning, missingness reached 99.7% for inpatient status and 17–35% for other key variables. After cleaning, all fields were complete. Initially, 92.8% of cases were notified through DHIS2 as suspected and 7.2% as confirmed. Following reclassification, 40% were classified as suspected, 5.8% as confirmed, and 48.6% with unspecified classification. Laboratory data revealed that 5.8% of cases were culture positive, 2.2% RDT positive, and 65.1% had no documented testing. Among facility-entered cases (n = 5953), 11.4% were reported from a different governorate than the patient’s residence. At the time of the outbreak, the daily maps were generated based on place of residence. Integrating R-based analytics with DHIS2 enhanced data completeness, improved case classification, and enabled more better spatial and laboratory analysis. This combined approach provided a clearer epidemiological picture of the cholera outbreak, supporting data-driven public health decision-making and highlighting the value of integrating analytical tools with routine surveillance systems.

## Linked entities

- **Diseases:** cholera (MONDO:0015766)

## Full-text entities

- **Diseases:** Cholera (MESH:D002771)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12652578/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12652578/full.md

## References

14 references — full list in the complete paper: https://tomesphere.com/paper/PMC12652578/full.md

---
Source: https://tomesphere.com/paper/PMC12652578