# Identification of sensorineural hearing loss subtypes using unsupervised machine learning and assessment of their replicability

**Authors:** Lilia Dimitrov, Watjana Lilaonitkul, Nishchay Mehta

PMC · DOI: 10.1038/s41598-025-33815-9 · 2026-01-20

## TL;DR

This study uses machine learning to identify subtypes of hearing loss and proposes a framework to improve the reliability of such methods for future research and clinical use.

## Contribution

The novel contribution is the development of a Clustering Replicability Framework to enhance the robustness of unsupervised machine learning in health research.

## Key findings

- A GMM identified 9 SNHL phenotypes in a UK cohort, partially overlapping with prior findings.
- The GMM model showed instability when tested with variations in the dataset.
- The proposed framework aims to improve replicability in UML-based health research.

## Abstract

Despite nearly 20% of the global population experiencing hearing loss, there remains limited insight into the underlying subtypes of its most prevalent cause, sensorineural hearing loss (SNHL). This understanding is crucial for effective therapeutic and preventative strategies. A recent study using a Gaussian Mixture Model (GMM) identified 10 distinct SNHL phenotypes in a large US cohort, highlighting the potential of unsupervised machine learning (UML) to provide a data-driven solution to this task. Rigorous validation of these models is essential; however, it is limited due to several factors, including the absence of ground truth labels for model evaluation, restricted data access, and the lack of a standardized reporting framework for comparing results across clustering studies. Here, we apply a GMM to a UK database of 109,854 audiograms, revealing 9 phenotypes, partly overlapping with prior findings. Notably, our study cohort is characterized by advanced age, a higher proportion of female participants, and more severe hearing impairments. We observed instability in the GMM model when subjected to variations in the original dataset. To enhance practices, we propose a Clustering Replicability Framework, ensuring robustness in UML driven health research for safe clinical translation.

The online version contains supplementary material available at 10.1038/s41598-025-33815-9.

## Linked entities

- **Diseases:** sensorineural hearing loss (MONDO:0010576), hearing loss (MONDO:0005365)

## Full-text entities

- **Diseases:** SNHL (MESH:D006319), presbycusis (MESH:D011304), low-frequency hearing loss (MESH:C565121), cochlear dysfunction (MESH:D000160), infection (MESH:D007239), noise trauma (MESH:D014012), ototoxic (MESH:D006311), sensory disorder (MESH:D012678), Meniere's disease (MESH:D008575), flat-loss (MESH:D005413), dementia (MESH:D003704), ADS (MESH:D016464), atrophy (MESH:D001284), age-related hearing loss (MESH:D010024), AC (MESH:D004618), Hearing loss (MESH:D034381), BC (MESH:D001847), high-frequency hearing loss (MESH:D006316), CHL (MESH:D006314), MEE (MESH:D004427), drop in hearing (MESH:D020427), NIHL (MESH:D006317), GMM (MESH:D004195)
- **Chemicals:** DEX (MESH:D003915), AC (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

11 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12852672/full.md

---
Source: https://tomesphere.com/paper/PMC12852672