# A protocol for the HEaring Impairment Data Infrastructure (HEIDI) study

**Authors:** Yvonne Tran, Mariano Cabezas, Frank Tran, Jo-Anne Manski-Nankervis, Jitendra Jonnagaddala, Diana Tang, Kompal Sinha, Mohammad Nure Alam, Jessica Monaghan, Andrew Donald, Rebecca Mitchell, Matthew Crossley, Niloufer Selvadurai, Bamini Gopinath

PMC · DOI: 10.1371/journal.pone.0320294 · 2025-05-07

## TL;DR

This study creates a secure data infrastructure to better understand the care pathways of patients with hearing loss using real-world data.

## Contribution

The novel contribution is the development of the HEIDI data lake for integrated analysis of hearing impairment care pathways.

## Key findings

- HEIDI will link GP and audiology data to map patient journeys from general practice to specialist care.
- Machine learning will be used to predict which patients may benefit from proactive management and evaluate intervention outcomes.

## Abstract

Research suggests that early detection of hearing loss, coupled with prompt and appropriate treatment, can significantly alleviate its negative impacts. Routinely collected real-world data, such as those from electronic health records data, provide an opportunity to enhance our understanding of the management of hearing loss. This project aims to create the HEaring Impairment Data Infrastructure (HEIDI) data lake by assembling datasets from general practice (GP), audiology clinic registries, and cohort studies to investigate hearing-impaired patients’ care pathways. This study seeks to answer key research questions such as “How do patients with hearing loss navigate the care pathway from general practice clinics to audiology clinics?”.

The HEIDI data lake will be hosted in a secure research environment at Macquarie University, Sydney, Australia, that complies with Australian legal and ethical requirements to protect patient privacy. Afterwards, new integrated datasets will be built through data linkage of hearing and GP datasets. Finally, the HEIDI data warehouse will be developed and used as a stand-alone dataset for future research. Descriptive and predictive analytics will be undertaken to answer our research questions with the data warehouse. Descriptive analysis will include both conventional and advanced statistical techniques and visualisation that will help us understand the journey of patients with hearing loss. Machine learning strategies such as deep neural networks, support vector machines, and random forests for predictive analytics will also be employed to identify participants that could benefit from proactive management by their GP and determine the effect of interventions through the patient’s journey (e.g., referrals to specialist) on outcomes (e.g., adherence to the intervention).

The findings will be disseminated widely through academic journals, conferences and other presentations.

## Linked entities

- **Diseases:** hearing loss (MONDO:0005365), hearing impairment (MONDO:0005365)

## Full-text entities

- **Diseases:** HEaring Impairment Data Infrastructure (MESH:D034381)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12057931/full.md

---
Source: https://tomesphere.com/paper/PMC12057931