# Sub-population identification of multimorbidity in sub-Saharan African populations

**Authors:** Adebayo Oshingbesan, Michelle Kamp, Phelelani Thokozani Mpangase, Kayode Adetunji, Samuel Iddi, Daniel Maina Nderitu, Tanya Akumu, Okechinyere Achilonu, Isaac Kisiangani, Theophilous Mathema, Girmaw Tadesse, F. Xavier Gomez-Olive, Chodziwadziwa Whiteson Kabudula, Scott Hazelhurst, Gershim Asiki, Michele Ramsay, Skyler Speakman

PMC · DOI: 10.1038/s41598-025-96569-4 · 2025-04-22

## TL;DR

This paper introduces a new way to define and identify groups with multiple health conditions in sub-Saharan African populations using data science methods.

## Contribution

A novel definition of multimorbidity and an automated method to identify high-risk sub-populations in health data.

## Key findings

- High-risk sub-populations identified in one region also showed higher multimorbidity rates in another region.
- A more-at-risk sub-population was found beyond traditional age and sex stratifications.
- Automated stratification reveals nuanced health patterns that manual methods may miss.

## Abstract

This work provides three contributions that straddle the medical literature on multimorbidity and the data science community with an interest on exploratory analysis of health-related research data. First, we propose a definition for multimorbidity as the co-occurrence of (at least) two disease diagnoses from a pre-determined list. This interpretation adds to a growing body of working definitions emerging from the literature. Second, we apply this novel outcome of-interest to two sub-Saharan populations located in Nairobi, Kenya and Agincourt, South Africa. The source data for this analysis was collected as part of the Africa Wits-INDEPTH Partnership for Genomic Studies project. Third, we stratify this outcome-of-interest across all possible sub-populations and identify sub-populations with anomalously high (or low) rates of multimorbidity. Critically, the automatic stratification approach emphasizes efficient, disciplined exploratory-based analysis as a complementary alternative to more commonly-used confirmation analysis methods. Our results show that high-risk sub-populations identified in one part of the continent transfer to the other location (and vice-versa) with the equivalent sub-population at the other location also experiencing higher rates of multimorbidity. Second, we discover a real-world scenario where a more-at risk sub-population existed beyond the simpler sub-populations traditionally stratified by age and sex. This is in contrast to existing literature which commonly stratifies disease diagnoses by sex when reporting results. Patterns in diseases, and healthcare more generally, are likely more nuanced than manual approaches may be able to describe. This work helps introduce public health researchers to data science methods that scale to the size and complexity of modern day datasets.

## Full-text entities

- **Genes:** ALB (albumin) [NCBI Gene 213] {aka FDAHT, HSA, PRO0883, PRO0903, PRO1341}, INS (insulin) [NCBI Gene 3630] {aka IDDM, IDDM1, IDDM2, ILPR, IRDN, MODY10}
- **Diseases:** CVD (MESH:D002318), obesity (MESH:D009765), type 1 and type 2 diabetes (MESH:D003924), CKD (MESH:D051436), AIDS (MESH:D000163), heart attack (MESH:D009203), congestive heart failure (MESH:D006333), infectious diseases (MESH:D003141), HT (MESH:D006973), stroke (MESH:D020521), coronary artery disease (MESH:D003324), albuminuria (MESH:D000419), Diabetes and Digestive and Kidney Diseases (MESH:D003928), transient ischaemic attack (MESH:D002546), angina (MESH:D000787), Diabetes (MESH:D003920)
- **Chemicals:** triglycerides (MESH:D014280), TC (MESH:D013667), blood glucose (MESH:D001786), LDL-C (-), lipid (MESH:D008055), creatinine (MESH:D003404), alcohol (MESH:D000438), glucose (MESH:D005947)
- **Species:** Homo sapiens (human, species) [taxon 9606], Human immunodeficiency virus 1 (no rank) [taxon 11676]

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12015547/full.md

---
Source: https://tomesphere.com/paper/PMC12015547