# Estimating Sleep-Stage Distribution from Respiratory Sounds via Deep Audio Segmentation

**Authors:** Seungeon Choi, Joshep Shin, Yunu Kim, Jaemyung Shin, Minsam Ko

PMC · DOI: 10.3390/s25206282 · Sensors (Basel, Switzerland) · 2025-10-10

## TL;DR

This paper introduces a non-invasive method to estimate sleep stages using respiratory sounds, offering a less intrusive alternative to traditional sleep monitoring.

## Contribution

A novel framework using deep audio segmentation and interpretable models to estimate sleep stages from respiratory sounds.

## Key findings

- The segmentation model outperformed classical methods in predicting respiratory rate and cycle duration.
- The proposed method achieved favorable accuracy in predicting sleep stage proportions using the TabPFN model.
- The system enables transparent, contact-free sleep monitoring using passive audio.

## Abstract

Accurate assessment of sleep architecture is critical for diagnosing and managing sleep disorders, which significantly impact global health and well-being. While polysomnography (PSG) remains the clinical gold standard, its inherent intrusiveness, high cost, and logistical complexity limit its utility for routine or home-based monitoring. Recent advances highlight that subtle variations in respiratory dynamics, such as respiratory rate and cycle regularity, exhibit meaningful correlations with distinct sleep stages and could serve as valuable non-invasive biomarkers. In this work, we propose a framework for estimating sleep stage distribution—specifically Wake, Light (N1+N2), Deep (N3), and REM—based on respiratory audio captured over a single sleep episode. The framework comprises three principal components: (1) a segmentation module that identifies distinct respiratory cycles in respiratory sounds using a fine-tuned Transformer-based architecture; (2) a feature extraction module that derives a suite of statistical, spectral, and distributional descriptors from these segmented respiratory patterns; and (3) stage-specific regression models that predict the proportion of time spent in each sleep stage. Experiments on the public PSG-Audio dataset (287 subjects; mean 5.3 h per subject), using subject-wise cross-validation, demonstrate the efficacy of the proposed approach. The segmentation model achieved lower RMSE and MAE in predicting respiratory rate and cycle duration, outperforming classical signal-processing baselines. For sleep stage proportion prediction, the proposed method yielded favorable RMSE and MAE across all stages, with the TabPFN model consistently delivering the best results. By quantifying interpretable respiratory features and intentionally avoiding black-box end-to-end modeling, our system may support transparent, contact-free sleep monitoring using passive audio.

## Full-text entities

- **Diseases:** sleep disorders (MESH:D012893)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12567477/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12567477/full.md

## References

35 references — full list in the complete paper: https://tomesphere.com/paper/PMC12567477/full.md

---
Source: https://tomesphere.com/paper/PMC12567477