# Statistical methods for predicting the presence of Salmonella Typhi in wastewater samples at Asante Akyem Agogo, Ghana

**Authors:** Sampson Twumasi-Ankrah, Michael Owusu, Michael Owusu-Ansah, Seidu Amenyaglo, Caleb Osei-Wusu Sarfo, Eric Darko, Portia Okyere Boakye, Christopher B. Uzzell, Isobel M. Blake, Nicholas C. Grassly, Yaw Adu-Sarkodie, Ellis Owusu-Dabo, Ana LTO Nascimento, Ana LTO Nascimento, Ana LTO Nascimento

PMC · DOI: 10.1371/journal.pntd.0013973 · PLOS Neglected Tropical Diseases · 2026-02-17

## TL;DR

This study uses statistical and machine learning models to predict Salmonella Typhi in wastewater in Ghana, finding that non-spatial Random Forest models perform best and that pH, season, and dissolved oxygen are key factors.

## Contribution

The study introduces a non-spatial Random Forest model that outperforms spatial and traditional models for predicting S. Typhi in wastewater.

## Key findings

- 44.13% of wastewater samples tested positive for Salmonella Typhi.
- Non-spatial Random Forest achieved 99.3% accuracy in predicting S. Typhi presence.
- pH, season, dissolved oxygen, and channel width were identified as key predictors.

## Abstract

Monitoring wastewater is vital for tracking typhoid fever in endemic areas. This study evaluated the performance of both spatial and non-spatial models in predicting Salmonella Typhi detection in wastewater from the Asante Akim North district in Ghana and identified key environmental risk factors.

We collected wastewater samples of Moore swabs at 40 sites across Agogo, Juansa, Hwidiem, and Domeabra over a period of 27 months. Multiplex PCR was used to detect Salmonella Typhi, focusing on the ttr, tviB, and staG genes. An Aquaprobe AP-2000 was also used to measure different physicochemical factors, such as pH, temperature, dissolved oxygen, and salinity. Three non-spatial models, namely Generalized Estimating Equations (Logistic), Mixed-Effects Models, and Random Forest, as well as four spatial models, including Bayesian Generalized Additive Models (GAM) and Spatial Generalized Linear Mixed Models (GLMM), were fitted to the wastewater dataset. Model fitting was done using 5-fold cross-validation, stratified by site. Model performance was evaluated using accuracy, sensitivity, and specificity. We also used SHapley Additive exPlanations (SHAP) analysis to find the most important predictors.

In general, 44.13% of the samples tested positive for S. Typhi. Detection was much higher during wet seasons (50.17% vs. 35.11%; p < 0.001), with fast flows (64.45%), and in channels that were 1–2 meters wide (58.70%). Positive samples had relatively higher pH (7.46 vs. 7.40; p < 0.001), dissolved oxygen (46.97% vs. 36.77%; p < 0.001), and rainfall (3.92mm vs. 3.30mm; p = 0.022). In comparing both non-spatial and spatial models, the non-spatial Random Forest model demonstrated the highest performance with an accuracy of 0.993, sensitivity of 0.997, and specificity of 0.989. In the SHAP analysis of the preferred non-spatial random forest model, it was found that pH, season, dissolved oxygen, positivity from the previous month, and channel width were identified as the best predictors.

S. Typhi detection is influenced by wastewater physicochemical properties, with pH, seasonal rainfall, and hydraulic conditions being the most significant. The non-spatial random forest model significantly outperforms both spatial and other non-spatial statistical methods.

Typhoid fever remains a significant public health concern in resource-limited areas with inadequate water and sanitation infrastructure. Monitoring Salmonella Typhi in wastewater provides a cost-effective method for tracking community transmission, particularly in regions where clinical surveillance is limited. In this study, we analyzed wastewater samples collected over 27 months from 40 sites in the Asante Akim North district of Ghana. We used statistical and machine learning models to predict the presence of S. Typhi and to identify key environmental factors that influence its detection. Our results indicate that pH levels, seasonality, dissolved oxygen, and channel width significantly affect detection rates. The non-spatial Random Forest model outperformed both spatial and traditional models, achieving an accuracy of 99.3%. These findings highlight the potential of combining wastewater-based surveillance with machine learning techniques to improve predictions of typhoid outbreaks and inform targeted public health interventions in endemic areas.

## Linked entities

- **Genes:** TTR (transthyretin) [NCBI Gene 7276], tviB (Vi polysaccharide biosynthesis UDP-N-acetylglucosamine C-6 dehydrogenase TviB) [NCBI Gene 9384277], stag (small t antigen) [NCBI Gene 6373585]
- **Diseases:** typhoid fever (MONDO:0005619)

## Full-text entities

- **Diseases:** fecal contamination (MESH:D005242), Typhoid fever (MESH:D014435), deaths (MESH:D003643), infection (MESH:D007239), Neglected Tropical Diseases (MESH:D058069)
- **Chemicals:** Aquaprobe AP-2000 (-), Water (MESH:D014867), oxygen (MESH:D010100)
- **Species:** Salmonella enterica subsp. enterica serovar Paratyphi A (no rank) [taxon 54388], Vibrio cholerae (species) [taxon 666], Homo sapiens (human, species) [taxon 9606], Enterovirus (genus) [taxon 12059], Salmonella enterica subsp. enterica serovar Typhi (no rank) [taxon 90370]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12923122/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12923122/full.md

## References

25 references — full list in the complete paper: https://tomesphere.com/paper/PMC12923122/full.md

---
Source: https://tomesphere.com/paper/PMC12923122