# Overcoming denominator problems in refugee settings with fragmented electronic records for health and immigration data: a prediction-based approach

**Authors:** Stella Erdmann, Rosa Jahn, Sven Rohleder, Kayvan Bozorgmehr

PMC · DOI: 10.1186/s12874-024-02204-7 · BMC Medical Research Methodology · 2024-04-01

## TL;DR

This study proposes a prediction-based method to estimate population denominators in refugee health data using occupancy and patient records, improving disease frequency accuracy.

## Contribution

A novel empirical approach to address the denominator problem in refugee health studies by predicting missing population data using regression models.

## Key findings

- Using occupancy data as a denominator reduced disease frequency overestimation compared to using patient counts.
- Regression models based on age, sex, and center type predicted missing denominator data effectively.
- The method enabled more accurate comparisons of disease incidence across refugee centers and time periods.

## Abstract

Epidemiological studies in refugee settings are often challenged by the denominator problem, i.e. lack of population at risk data. We develop an empirical approach to address this problem by assessing relationships between occupancy data in refugee centres, number of refugee patients in walk-in clinics, and diseases of the digestive system.

Individual-level patient data from a primary care surveillance system (PriCarenet) was matched with occupancy data retrieved from immigration authorities. The three relationships were analysed using regression models, considering age, sex, and type of centre. Then predictions for the respective data category not available in each of the relationships were made. Twenty-one German on-site health care facilities in state-level registration and reception centres participated in the study, covering the time period from November 2017 to July 2021.

445 observations (“centre-months”) for patient data from electronic health records (EHR, 230 mean walk-in clinics visiting refugee patients per month and centre; standard deviation sd: 202) of a total of 47.617 refugee patients were available, 215 for occupancy data (OCC, mean occupancy of 348 residents, sd: 287), 147 for both (matched), leaving 270 observations without occupancy (EHR-unmatched) and 40 without patient data (OCC-unmatched). The incidence of diseases of the digestive system, using patients as denominators in the different sub-data sets were 9.2% (sd: 5.9) in EHR, 8.8% (sd: 5.1) when matched, 9.6% (sd: 6.4) in EHR- and 12% (sd 2.9) in OCC-unmatched. Using the available or predicted occupancy as denominator yielded average incidence estimates (per centre and month) of 4.7% (sd: 3.2) in matched data, 4.8% (sd: 3.3) in EHR- and 7.4% (sd: 2.7) in OCC-unmatched.

By modelling the ratio between patient and occupancy numbers in refugee centres depending on sex and age, as well as on the total number of patients or occupancy, the denominator problem in health monitoring systems could be mitigated. The approach helped to estimate the missing component of the denominator, and to compare disease frequency across time and refugee centres more accurately using an empirically grounded prediction of disease frequency based on demographic and centre typology. This avoided over-estimation of disease frequency as opposed to the use of patients as denominators.

The online version contains supplementary material available at 10.1186/s12874-024-02204-7.

## Full-text entities

- **Diseases:** OCC (MESH:D009784), diseases of the digestive system (MESH:D004066)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC10983725/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC10983725/full.md

## References

23 references — full list in the complete paper: https://tomesphere.com/paper/PMC10983725/full.md

---
Source: https://tomesphere.com/paper/PMC10983725