Interpretable Acoustic Features from Wakefulness Tracheal Breathing for OSA Severity Assessment

Ali Mohammad Alqudah; Walid Ashraf; Brian Lithgow; Zahra Moussavi

PMC · DOI:10.3390/jcm15031081·January 29, 2026

Interpretable Acoustic Features from Wakefulness Tracheal Breathing for OSA Severity Assessment

Ali Mohammad Alqudah, Walid Ashraf, Brian Lithgow, Zahra Moussavi

PDF

Open Access

TL;DR

This study presents a non-invasive method using breathing sounds and body measurements to assess the severity of sleep apnea, offering a more accessible alternative to traditional tests.

Contribution

The work introduces a machine-learning framework using interpretable acoustic features from tracheal breathing sounds for OSA severity classification.

Findings

01

The framework effectively discriminates among four OSA severity groups using tracheal breathing sounds and anthropometric variables.

02

Combining acoustic features with body measurements improves classification performance and reliability across all severity classes.

03

The approach shows potential for scalable and accessible OSA screening, enabling earlier detection.

Abstract

Background: Obstructive Sleep Apnea (OSA) is one of the most prevalent sleep disorders associated with cardiovascular complications, cognitive impairments, and reduced quality of life. Early and accurate diagnosis is essential. The present gold standard, polysomnography, is expensive and resource-intensive. This work develops a non-invasive machine-learning-based framework to classify four OSA severity groups (non, mild, moderate, and severe) using tracheal breathing sounds (TBSs) and anthropometric variables. Methods: A total of 199 participants were recruited, and TBS were recorded whilst awake (wakefulness) using a suprasternal microphone. The workflow included the following steps: signal preprocessing (segmentation, filtering, and normalization), multi-domain feature extraction representing spectral, temporal, nonlinear, and morphological features, adaptive feature normalization,…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Diseases5

Obstructive Sleep Apnea sleep disorders cognitive impairments cardiovascular complications OSA

Figures4

Click any figure to enlarge with its caption.

Funding1

—Natural Sciences and Engineering Research Council of Canada (NSERC)

Keywords

obstructive sleep apneatracheal breathing soundsmachine learningfeature selectionsignal processingensemble modelsanthropometric dataexplainable AI

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsObstructive Sleep Apnea Research · Phonocardiography and Auscultation Techniques · Voice and Speech Disorders

Full text

1. Introduction

Obstructive sleep apnea (OSA) is a common yet underdiagnosed sleep-related breathing disorder affecting nearly 20% of adults in North America and linked to cardiovascular disease, hypertension, diabetes, and increased perioperative risk [1,2]. Despite its prevalence, up to 80% of cases remain undiagnosed [3], creating primary healthcare and economic burdens. OSA arises from recurrent upper airway obstruction during sleep, and its severity is classified by the apnea-hypopnea index (AHI) [4]. While polysomnography (PSG) remains the diagnostic gold standard [5], it is costly, time-intensive, and often inaccessible. Screening tools such as STOP-Bang and Berlin questionnaires provide high sensitivity but low specificity, leading to frequent misclassification of OSA status (i.e., false positives) [6,7].

Recent advances in biomedical signal analysis offer promising alternatives. For instance, tracheal breathing sounds (TBSs) recorded during wakefulness have been shown to contain distinctive acoustic markers related to upper airway physiology [8,9,10,11,12,13,14,15,16]. Studies using power spectral, bispectral, and fractal analyses, and more recently, machine learning (ML) models, have demonstrated strong potential for OSA detection [8,9,10,11,12,13,14,15,16]. However, a significant gap remains between the extraction of acoustic features and their clinical interpretation. While many studies report statistically significant differences in signal characteristics, the physiological meaning of these features and their relationships with airway mechanics, airflow resistance, and neuromuscular control remain poorly understood [17,18].

Bridging this gap is essential to translate signal-based metrics into clinically interpretable and actionable tools. This study focuses on interpreting acoustic features extracted from wakefulness TBS across different OSA severity groups. By analyzing spectral power, bispectral coupling, and fractal dimensions, we explore how acoustic signatures reflect physiological mechanisms underlying airway obstruction. This approach aims to link quantitative signal analysis with clinical interpretation, supporting the development of objective, accessible, and scalable OSA screening.

2. Materials and Methods

In this study, we applied our previously validated workflow [8] for acquiring, preprocessing, and analyzing wakefulness tracheal breathing sounds (TBSs). Participants were recruited from individuals referred for overnight PSG, representing a clinically enriched cohort with elevated pre-test probability of OSA. Suprasternal TBS recordings were collected under controlled conditions: subjects were positioned supine and instructed to perform five full deep breaths through the nose with the mouth closed, followed by five deep breaths through the mouth while wearing a nose clip using a Sony ECM-77B, Tokyo, Japan omnidirectional condenser microphone (sensitivity: −52 dB ± 3.5 dB, frequency response: 40 Hz–20 kHz). Snoring history was not collected via subject self-report as part of the anthropometric questionnaire, and no snoring events were present in the wakeful breathing recordings analyzed in this study. Table 1 presents the distribution of subjects in the dataset by anthropometric features. Preprocessing followed our established procedures, including artifact inspection, adaptive segmentation of inspiration and expiration, and bandpass filtering to isolate physiological components. We then extracted a comprehensive set of spectral, nonlinear, fractal, morphological, and time-frequency features using the same methods detailed in [8]. These features were optimized for 1-vs-1 subgroup analyses to improve the interpretability and personalization of acoustic biomarkers. The complete workflow is summarized in Figure 1.

Preprocessing included careful inspection for background noise and vocal artifacts to be excluded, segmenting breathing sound signals into inspiratory/expiratory phases using adaptive thresholding of the log-variance envelope and Signal-to-Noise Ratio (SNR) computation, and bandpass filtering (75–3000 Hz, 4th-order Butterworth) to discard extraneous physiological and ambient signals [8,11,12]. Filtered signals were subsequently normalized using automated methods (mean-range scaling, z-score, min-max, and robust scaling) with mutual information to maximize feature-label dependency [8]. Then, a feature extraction method was applied to each processed mid-flow signal; the methodology spans multiple analytical domains, including spectral, temporal, and nonlinear analyses, as well as cross-domain analyses, ensuring a holistic, multidimensional representation of linear and nonlinear signal dynamics. The extracted features are grouped and explicitly optimized for 1-vs-1 labels [8]. This group-specific feature selection process enables the creation of personalized feature sets that enhance model robustness and improve interpretation for diagnostic and predictive applications [8]. The following features have been extracted:

Spectral features: Power spectrum density via Welch’s method, spectral centroid, entropy, kurtosis, bandwidth, flux, and crest metrics [19].
Bispectral features: Bootstrap-based confidence interval detection of nonstationary gaps and coupling metrics [20].
Fractal and nonlinear features: Hurst exponent, Lyapunov exponent, Recurrence Quantification Analysis (RQA), Katz and Higuchi fractal dimensions [21,22,23].
Wavelet and time-frequency features: Wavelet coefficients, Mel-Frequency Cepstral Coefficients (MFCCs), Constant-Q Transform statistics [24,25,26].
Morphological features: Image-based representation of spectrogram and bispectrum (bounding box area, holes, connected components, Euler number, contrast/homogeneity/correlation/energy descriptors) [20,27,28,29].
Time-domain metrics: Zero-crossing rate, root mean square, shimmer, jitter, and noise-to-harmonics ratio [30,31,32].

To identify stable and physiologically meaningful predictors of OSA severity, we applied the same three-stage feature selection framework described in [8], consisting of univariate statistical filtering, SHAP-based feature ranking, and RFE. Within this framework, the final feature subset was defined as the minimum number of features that preserved consistent performance across cross-validation folds while maintaining feature stability and physiological interpretability, as established in [8]. A further reduction in the feature set beyond this subset was previously shown to increase performance variability and reduce robustness; therefore, no additional feature pruning was applied in the present study. Model evaluation was conducted using a custom stratified k-fold CV scheme specifically designed to preserve the joint distribution of OSA severity labels and key anthropometric risk factors, including age, body mass index (BMI), neck circumference, sex, and Mallampati score [12]. Appendix A presents a summary of the top 35 selected features for each model, based on the extracted features.

As illustrated in Figure 2, the whole dataset is first partitioned into k folds, ensuring each fold has approximately equal representation of OSA severity classes (Non, Mild, Moderate, and Severe) and comparable distributions of the selected anthropometric variables. Rather than stratifying solely by OSA severity, the proposed strategy employs multi-criteria stratification to ensure that clinically relevant subgroups are consistently represented in both training and validation sets. This approach reduces sampling bias caused by population heterogeneity and yields a more reliable estimate of generalization performance across physiologically diverse subjects. Also, severity stratification followed standard AHI-based clinical definitions to preserve physiological granularity and enable interpretation of progressive airway dysfunction beyond binary disease detection. Table 2 shows the distribution of subjects’ anthropometric data of the k-fold splits.

To evaluate the discriminative power and physiological relevance of tracheal sound features, several complementary metrics were used. Area Under the Receiver Operating Characteristic (ROC) Curve (AUC) quantified each feature’s ability to distinguish between OSA severity groups, with higher values indicating stronger discrimination [33]. Pearson correlation coefficients measured the linear association between feature values and true labels, providing insight into how consistently a feature reflects the clinical outcome across cross-validation folds [34]. To assess robustness and generalizability, Absolute Delta AUC (AbsDeltaAUC) was calculated as the absolute difference between the training and test AUCs, with smaller values indicating more stable features that are less sensitive to data variability [35]. Finally, SHAP quantified each feature’s contribution to model predictions, while accounting for interactions with other features, thereby enhancing interpretability by highlighting physiologically meaningful patterns [36]. Together, these metrics enable ranking of features by both their discriminative ability and stability, supporting the identification of robust biomarkers that link acoustic and morphological descriptors to airway dynamics, airflow turbulence, and anatomical variations associated with OSA. As shown in Figure 3, the proposed feature evaluation framework integrates analyses of discriminative, correlational, stability, and explainability to identify physiologically relevant tracheal-sound features in OSA.

The final analytic framework integrates feature-level metrics, stability measures, and explainability-driven insights to produce tables and visualizations that highlight the most physiologically relevant acoustic and morphological features across OSA severity groups. Rather than focusing solely on classification performance, the analysis emphasizes how each feature contributes to model predictions and relates to underlying airway physiology. Ranked lists based on AUC [33], fold-wise stability (AbsDeltaAUC) [35], and SHAP values [36] provide a multidimensional perspective on feature importance, while correlation analyses link these features to clinical and anthropometric variables, including AHI, neck circumference (NC), and Mallampati Score (MPS) [34]. Image-based and spectro-temporal feature maps illustrate changes in sound texture, frequency patterns, and event shapes, revealing airflow turbulence, intermittent obstruction, and variations in airway mechanics. By combining quantitative metrics with visual interpretations, this framework transforms raw signal descriptors into clinically meaningful biomarkers, enhancing understanding of upper-airway dynamics, airflow irregularities, and anatomical risk factors associated with OSA severity [37].

3. Results

This section presents the key findings from the feature extraction and selection pipeline, highlighting the most discriminative tracheal-breathing-sound features for OSA severity classification. Analyses were conducted across six 1-vs-1 base models (Non-OSA vs. Mild, Non-OSA vs. Moderate, Non-OSA vs. Severe, Mild vs. Moderate, Mild vs. Severe, Moderate vs. Severe) and three folds of a custom stratified cross-validation, designed to preserve the joint distribution of severity groups and key anthropometric factors. Feature importance was assessed using both correlation-based ranking and SHAP values to identify consistently essential features. The top selected features for each model are detailed in Appendix A.

The models and selected features are empirical; their exact frequency bands or characteristics may differ for other datasets depending on the sensor used (e.g., different microphones). To keep feature names readable, we have categorized them by main characteristics, such as spectral or bispectral features, breathing type (mouth or nose), and phase (inspiration or expiration). The frequency regions from which the features were extracted are based on the 95% confidence interval of the training set, as proposed in our previous work [8]. For example, Bispectral_Centroid_Mean represents the mean bispectral energy centroid across all breathing conditions. Similarly, Spectral_Skewness_Mouth_Inspiration captures the skewness of the spectral distribution during mouth inspiration, and Spectral_FrequencyRatio_Mouth_Expiration represents the frequency ratio feature during mouth expiration. The detailed definitions of these features, including what a bounding box (BBox) is, the specific coordinates or frequency/time ranges, and the corresponding breathing conditions, are provided elsewhere (e.g., in a footnote, appendix, or table legend). This approach ensures that the main text remains readable while maintaining reproducibility and technical clarity. This general naming approach avoids dataset-specific details, makes the features more interpretable for readers outside the team, and preserves the essential information on how each feature was derived (Figure 4).

For detailed clinical interpretation of top features, a specific appendix (Appendix B) is dedicated to these details, while for detailed comparison-specific results, all supporting tables and figures are provided in the appendices: Appendix C (Non-OSA vs. Mild OSA), Appendix D (Non-OSA vs. Moderate OSA), Appendix E (Non-OSA vs. Severe OSA), Appendix F (Mild vs. Moderate OSA), Appendix G (Mild vs. Severe OSA), and Appendix H (Moderate vs. Severe OSA). Each appendix includes the top-ranked features by test AUC, the most stable features by AbsDeltaAUC, and the strongest anthropometric- and AHI-associated features for the corresponding comparison. Higher AUC values indicate greater discrimination between the two severity groups. In contrast, smaller AbsDeltaAUC values suggest more consistent feature performance across cross-validation folds, reflecting reduced variability in AUC estimates across data partitions. Correlation analyses with anthropometric variables and AHI provide additional insight into potential physiological relevance and relationships with clinical severity. The following subsections focus on the clinical interpretation of the most consistently supported features across these comparisons.

3.1. Clinical Interpretation of Top Features

To provide a physiologically grounded interpretation of the observed acoustic differences, we adopt a Structure–Function–Symptom framework. In this narrative, anatomical and structural characteristics of the upper airway (Structure), such as tissue compliance, airway narrowing, and fat deposition, influence airflow behavior during breathing (Function), including turbulence, nonlinear coupling, and ventilatory instability. These functional alterations manifest clinically as differences in apnea–hypopnea burden and disease severity (Symptom), quantified by the AHI. The following interpretations therefore explain how each significant acoustic feature reflects a structural-functional pathway underlying OSA progression.

This subsection provides a clinical interpretation of the most discriminative features, prioritized based on agreement across AUC ranking, AbsDeltaAUC, and SHAP importance. The goal is to link key acoustic and spectro-temporal descriptors to potential physiological and airflow changes associated with early manifestations of sleep-disordered breathing. Table 3 shows an overview of the clinical interpretation of features across different models. In contrast, more detailed feature-by-feature interpretations are provided in Appendix B. Collectively, this analysis facilitates a clearer understanding of how specific acoustic patterns may reflect underlying upper airway dynamics and disease progression. Furthermore, aligning model-derived features with known clinical mechanisms enhances the interpretability and translational relevance of the proposed framework.

3.2. Top-Ranked Features

The top 10 features, identified by their overall average rank across both correlation- and SHAP-based ranking methods, are presented in Table 4. These features consistently demonstrated high importance in distinguishing between different OSA severity groups. The top features include a mix of spectral, temporal, and morphological characteristics of the tracheal breathing sounds, as shown in Table 4. Notably, features related to spectral bandwidth (range of frequencies contributing to the signal), texture energy (quantifies uniformity and repetitiveness of bispectral patterns), spectral flux (measures frame-to-frame changes in the power spectrum), and statistical moments (mean, standard deviation, kurtosis, skewness) consistently appear among the most essential features. These features are robust across cross-validation folds and severity comparisons, indicating that they reliably capture physiologically relevant changes in airflow dynamics and turbulence. These features capture various aspects of the sound signal, including its frequency distribution, temporal dynamics, and overall intensity and complexity.

3.3. Feature Stability Across Folds and Models

The top 10 most stable tracheal breathing sound features were identified based on the lowest absolute differences between training and testing Area Under the Curve (AUC) values, as shown in Table 5. These features demonstrate minimal variability across different data splits, highlighting their robustness and consistency in discriminating OSA severity. Lower absolute Delta AUC values indicate that the predictive power of these features is reliably maintained across training and test datasets. In contrast, lower absolute Delta Corr values reflect more consistent correlations between the features and clinical measurements, indicating stable physiological relevance across folds. Collectively, these metrics suggest that the selected features are both robust and physiologically meaningful, making them strong candidates for inclusion in predictive models and clinical decision-support systems.

3.4. Correlation with Anthropometric Data

Strong associations were observed between several extracted tracheal-sound features and key clinical anthropometric measurements relevant to OSA, including BMI, NC, Sex, and MPS. These relationships, summarized in Table 6, highlight the clinical relevance of the acoustic and spectro-temporal descriptors identified in this study. The table emphasizes the features most strongly correlated with anthropometric parameters, providing insight into the physiological underpinnings of OSA severity and supporting their potential utility in predictive modeling.

Pearson correlation coefficients between acoustic features and anthropometric variables were computed independently within each cross-validation fold, rather than on pooled data. For each severity comparison, correlations were calculated using the fold-specific data subset, and the corresponding fold index is explicitly reported in Table 6.

These correlations were used exclusively for interpretability and physiological analysis and did not influence model training, feature selection, or classifier optimization. As such, they should not be interpreted as estimates of generalization performance, but rather as indicators of strong fold-specific associations between acoustic characteristics and anthropometric measures.

4. Discussion

This study aimed to identify interpretable features from wakefulness tracheal breathing sounds that are clinically relevant for assessing OSA severity. The consistent emergence of specific features across different models and folds, coupled with their stability and correlation with anthropometric data, underscores their potential as robust biomarkers for OSA [8,9,10,11,12,13,17,18,38].

4.1. Clinical Relevance of Key Features

Interpreting acoustic biomarkers through a structure–function–symptom lens enables a mechanistic understanding of how anatomical vulnerability of the upper airway translates into altered airflow dynamics and ultimately manifests as increasing OSA severity. OSA manifests through complex interactions between the upper airway anatomy, airflow turbulence, and respiratory control. Identifying features from tracheal breathing sounds recorded whilst awake that reliably reflect these physiological processes is crucial for non-invasive assessment. This section focuses on clinically meaningful indicators of OSA severity [39,40,41].

4.1.1. Non-OSA vs. Mild-OSA

From a structure–function–symptom perspective, early anatomical vulnerability of the upper airway (Structure), including mild tissue compliance and partial narrowing, leads to subtle functional airflow disturbances (Function), characterized by intermittent turbulence and disrupted nonlinear airflow–tissue coupling during inspiration. These functional alterations manifest clinically (Symptom) as mild elevations in AHI without sustained airway obstruction. The distinction between Non-OSA and Mild-OSA is characterized by the emergence of subtle yet consistent early signs of the upper airway instability during breathing whilst awake. These changes are captured by features such as MouthInspiration_Range_FreqSkewness, Average_BBox_TextureEnergy, and Average_BBox_FrequencyCentroidX, which quantify shifts in spectral energy, disruption of structured bispectral coupling, and changes in dominant frequency interactions, respectively (see Appendix B.1 for full feature definitions and physiological interpretation). The observed patterns indicate a transition from predominantly laminar airflow toward intermittently turbulent inspiratory flow, consistent with early upper-airway collapsibility and soft-tissue vibration [38,40,42]. These changes suggest the onset of periodic flow limitation without sustained obstruction, aligning with early physiological manifestations of mild OSA described in prior studies [16,41].

Overall, mild OSA is marked not by significant increases in breathing sound intensity but by early disruption of airflow regularity and spectro-temporal organization, as captured by these spectral and bispectral features. This finding supports the concept that the earliest stage of OSA manifests primarily as micro-instability and intermittent turbulence rather than overt obstruction, reinforcing the value of wakeful acoustic markers for early detection [16,43,44].

4.1.2. Non-OSA vs. Moderate-OSA

Within the structure–function–symptom framework, progressive anatomical narrowing and reduced airway stiffness (Structure) produce sustained airflow limitation and elevated inspiratory effort (Function), resulting in prolonged turbulent breathing events. Clinically (Symptom), these changes correspond to a clear increase in AHI and more frequent obstructive events, consistent with moderate OSA. In contrast to mild OSA, the transition from Non-OSA to Moderate-OSA reveals a clear escalation in airflow disturbance and respiratory effort. These changes are captured by features such as Average_Range_Maximum, Average_BBox_BoundingBoxDiagonal, Average_Range_MeanPower, and Average_BBox_ConnectedComponents, which quantify peak breathing sound energy, expansion of nonlinear bispectral interactions, overall sound intensity, and fragmentation of coupling patterns, respectively (see Appendix B.2 for full feature definitions and physiological interpretation). The results demonstrate stronger, more sustained turbulent breathing events, reflecting prolonged partial airway collapse and increased inspiratory drive [13,39,45]. Breathing sounds become more energetic and fragmented, consistent with repetitive cycles of obstruction and compensatory recovery.

These acoustic characteristics indicate that moderate OSA is physiologically defined by persistent airflow instability rather than isolated abnormalities. The increased duration, intensity, and fragmentation of breathing events align with established descriptions of heightened airway collapsibility and more frequent arousal-related breathing responses in moderate disease [46,47,48].

4.1.3. Non-OSA vs. Severe-OSA

Structurally, severe OSA is characterized by pronounced upper-airway collapsibility and reduced neuromuscular compensation. Functionally, this leads to chaotic airflow, repeated collapse–reopening cycles, and highly nonlinear breathing dynamics. These functional disturbances manifest clinically as high AHI values (Symptom), reflecting frequent apneic and hypopneic events. Severe OSA exhibits a markedly distinct acoustic phenotype, dominated by chaotic, high-energy, and highly irregular breathing patterns. These changes are captured by features such as Average_BBox_MeanValue, Average_BBox_TextureEnergy, Average_BBox_FractalDimension, Average_BBox_EnergyValue, and Average_BBox_KurtosisValue, which quantify average sound intensity, heterogeneity of bispectral coupling, complexity of local patterns, total sound energy, and prevalence of abrupt or impulsive events, respectively (see Appendix B.3 for complete feature definitions and physiological interpretation). The results indicate frequent and intense airflow collapse followed by forceful recovery breaths, producing complex and impulsive acoustic events across a broad frequency range [12,38,40]. The pronounced variability and structural disruption observed are consistent with unstable ventilatory control and recurrent airway obstruction.

Physiologically, these findings reflect deep upper-airway collapsibility, exaggerated negative pressure swings, and repeated collapse–reopening cycles characteristic of advanced OSA [41,46,49]. The elevated complexity and unpredictability of the acoustic patterns are in line with prior reports linking severe disease to chaotic airflow and disordered breathing mechanics [50,51,52].

4.1.4. Mild-OSA vs. Moderate-OSA

In structure–function–symptom terms, the transition from mild to moderate OSA reflects worsening anatomical compromise of the airway (Structure), which shifts airflow behavior from intermittent to persistent instability (Function). This progression manifests clinically (Symptom) as a sustained increase in AHI and reduced effectiveness of compensatory airway control. The progression from mild to moderate OSA represents a shift from intermittent airflow disturbance to more persistent and structurally disruptive obstruction. These changes are captured by features such as Average_BBox_MeanValue, Average_BBox_TextureEnergy, Average_BBox_FractalDimension, Average_BBox_EnergyValue, and Average_BBox_KurtosisValue, which quantify average sound intensity, heterogeneity of bispectral coupling, complexity of local patterns, total sound energy, and the prevalence of sharp or impulsive events, respectively (see Appendix B.4 for complete feature definitions and physiological interpretation). The results indicate increasing turbulence during both inspiration and expiration, accompanied by broader spectral involvement and greater fragmentation of breathing sounds [12,38,44,53]. This suggests that airflow irregularities are no longer isolated but sustained throughout the respiratory cycle.

Clinically, this transition reflects worsening airway collapsibility and reduced effectiveness of neuromuscular compensation during wakefulness. Moderate OSA therefore emerges as a state in which airflow instability becomes chronic rather than episodic, consistent with physiological models of disease progression [9,17,40,50].

4.1.5. Mild-OSA vs. Severe-OSA

Here, structural airway vulnerability becomes dominant (Structure), overwhelming compensatory mechanisms. Functionally, this produces highly variable, noisy, and energetically intense airflow patterns. Clinically (Symptom), these effects correspond to severe OSA, marked by large AHI values and pronounced breathing instability. Comparisons between mild and severe OSA highlight a pronounced escalation in airflow irregularity, respiratory effort, and acoustic unpredictability. These changes are captured by features such as MouthInspiration_BBox_FrequencyCentroidX, Average_Average_BBoxes_Entropy, Average_Range_RMS, Average_BBox_EnergyValue, and MouthExpiration_Range_SpectralEnergy, which quantify shifts in dominant frequencies, overall entropy of bispectral patterns, amplitude variability, total sound energy, and broadband spectral energy, respectively (see Appendix B.5 for complete feature definitions and physiological interpretation). The results reveal prolonged and noisy inspiratory phases, increased breath-to-breath variability, and intense turbulent bursts extending into expiration [38,44,45]. These patterns indicate a breakdown of compensatory airway control mechanisms that remain partially effective in mild disease.

From a physiological standpoint, severe OSA is characterized by loss of airflow stability, where collapsibility dominates over neuromuscular control. The marked increases in variability and turbulence observed align with descriptions of unstable ventilatory control and repeated collapse–recovery dynamics in severe disease [18,38,44,48].

4.1.6. Moderate-OSA vs. Severe-OSA

From a structure–function–symptom standpoint, severe OSA represents a qualitative shift rather than a linear extension of moderate disease: deeper structural collapse and airway instability (Structure) lead to near-chaotic airflow dynamics (Function), which clinically manifest (Symptom) as extreme AHI values and frequent obstructive episodes. The transition from moderate to severe OSA is marked by a qualitative shift from structured instability to near-chaotic airflow dynamics. These changes are captured by features such as Average_BBox_MedianValue, Average_BBox_IQRValue, Average_BBox_EnergyValue, Average_BBox_KurtosisValue, Average_BBox_Compactness, Average_BBox_StdValue, and Average_BBox_EntropyValue, which quantify overall breathing sound intensity, central and total variability, abrupt peaks, diffusion of high-intensity regions, dispersion, and randomness of airflow-related acoustic patterns, respectively (see Appendix B.6 for full feature definitions and physiological interpretation). The results indicate greater breath-to-breath variability, stronger and more erratic respiratory effort, and increasingly diffuse turbulent sound patterns [39,41,48]. Acoustic events become less compact and more topologically complex, reflecting deeper and more frequent airway collapse.

These findings suggest that severe OSA represents not merely an amplification of moderate disease but a distinct physiological regime characterized by unpredictable airflow, unstable arousal responses, and diminished airway resilience [18,38,44,50,54]. This distinction supports the clinical importance of separating moderate and severe OSA in severity stratification and management.

4.1.7. Physiological Themes Across Models

Across severity comparisons, certain recurring acoustic patterns reflect underlying physiological mechanisms of OSA. By examining features related to turbulence, airflow complexity, variability, and energy, we can identify consistent markers of airway instability, vibration, and compensatory respiratory effort. The following themes summarize how these features may collectively capture the progression of OSA.

Escalating turbulence, bandwidth, and centroid shifts correspond to rising Reynolds number and more pronounced vibration/snoring as the airway narrows [44].Event complexity (diagonals, perimeters, shape metrics): track segmented, irregular airflow fragments as OSA severity increases [43].Variability (Interquartile range (IQR), Standard Deviation (SD), entropy) reveals unstable ventilatory control, frequent arousals, and abrupt collapse–recovery dynamics [43].Amplitude/energy (mean, RMS, total) reflect increasing respiratory effort, loud post-obstructive inspiration, and compensatory surges in disease progression.

Each feature thus provides a physiologic aspect into how OSA disrupts the upper airway patency, generates turbulence and vibration, and drives instability and variability across both models [43,44].

4.2. Rationale for Multi-Class OSA Severity Stratification

While binary OSA classification (OSA vs. non-OSA) is common in screening-oriented studies, it does not capture the progressive and heterogeneous nature of the upper-airway dysfunction. Clinically defined AHI severity categories (mild, moderate, and severe) reflect distinct physiological states, including differences in airway collapsibility, airflow turbulence, ventilatory compensation, and symptom burden.

In this study, several acoustic features exhibited nonlinear or stage-specific behavior across severity levels, particularly in comparisons involving mild-to-moderate and moderate-to-severe transitions. Collapsing these groups into a binary framework would obscure intermediate phenotypes and reduce sensitivity to early or transitional disease mechanisms. By adopting a four-class framework, the proposed model preserves physiologically meaningful distinctions and enables severity-aware interpretation aligned with clinical risk stratification, perioperative assessment, and treatment decision-making.

4.3. Physiological and Clinical Interpretation of Feature Linkage to Severity

The high-ranking and stable features are not merely statistical constructs; they are direct acoustic manifestations of the anatomical and functional changes associated with progressive OSA severity. Tracheal sounds are generated by turbulent airflow, and their characteristics are susceptible to subtle changes in airway geometry, collapsibility, and compensatory respiratory effort, even during wakefulness [8,11].

Although all acoustic recordings in this study were obtained whilst awake under quiet breathing conditions and therefore do not contain snoring events, snoring history remains an important symptom associated with OSA severity. Chronic snoring reflects repetitive vibration of upper-airway soft tissues during sleep, which has been hypothesized to contribute to long-term structural changes such as tissue remodeling, inflammation, or altered compliance. These chronic structural modifications may persist beyond sleep and subtly influence airflow behavior and tracheal breathing acoustics even during wakefulness. Consequently, while the extracted acoustic features do not represent snoring sounds per se, they may indirectly reflect cumulative airway alterations associated with both OSA severity and a history of habitual snoring. Future studies incorporating quantitative snoring indices alongside wakeful and sleep-state recordings may further clarify this relationship.

4.3.1. Acoustic Signatures of Airway Chaos and Ventilatory Effort (AHI Correlation)

The correlations with the AHI provide the most direct clinical linkage. The following summarizes the correlations of features with AHI:

Decreased Texture Energy (Acoustic Disorganization): The feature exhibits the strongest negative correlation with OSA severity. Texture Energy is a quantitative measure of the uniformity and repetitiveness of local patterns in a spectrogram, computed by summing the squared values of the co-occurrence or filtered spectrogram matrix, reflecting how consistent and regular the acoustic structure is. As severity increases, the pharyngeal airway becomes intrinsically more compliant and prone to intermittent vibration and collapse, leading to flow separation and highly random, broadband turbulence. This shift from structured, laminar-like noise to chaotic, broadband turbulence disrupts the consistency of the spectrogram, resulting in a significant decrease in texture energy. This feature, therefore, serves as a powerful acoustic marker of increasing pharyngeal instability and vulnerability [39,42].
Increased Skewness (Compensatory Drive): Conversely, the high positive correlation (r ≈ 0.99) between spectral skewness and OSA severity indicates systematic changes in the distribution of sound amplitude. Positive skewness signifies a heavier tail toward high-amplitude values. Physiologically, this represents the subject’s increased reliance on intermittent, high-force maneuvers (such as a forceful, highly turbulent inhalation or a loud snort/gasp) to maintain adequate flow against increasing pharyngeal resistance. Clinically, this feature is an acoustic signature of heightened respiratory drive and compensatory effort, which scales directly with disease burden [41,45].

4.3.2. Morphological and Spectral Markers of Flow Limitation and Airway Dynamics

The consistently top-ranked spectral and morphological features provide a detailed view of the fluid dynamics within the compromised airway.

Spectral Bandwidth and Flux (Venturi Effect): These features are crucial markers of dynamic flow behavior. Airflow acceleration through a narrow, compliant pharyngeal segment (the site of flow limitation, a manifestation of the Venturi effect) generates high-velocity jets. The high spectral flux reflects the rapid, transient changes in the power spectrum as these turbulent jets form and dissipate during the breathing cycle. In contrast, increased bandwidth reflects a broader spread of acoustic energy across frequencies. Together, these changes are consistent with the presence and severity of flow-limiting segments, where the degree of narrowing modulates the strength and spectral extent of turbulent eddies [40,55].Fractal Dimension and Complexity (Non-linear System Behavior): The high-ranking fractal dimension quantifies the non-linear complexity of the signal. Increased airway resistance and turbulence are hallmarks of a system pushed toward instability. A higher fractal dimension suggests a highly complex, chaotic, and less predictable airflow pattern, aligning with established non-linear control theory, which views the respiratory system as operating close to a chaotic bifurcation point [50,51].

4.3.3. Validation Through Established Anatomical Risk Factors

The extremely high correlations between specific features and anthropometric measures provide vital proof of concept: the acoustic features are not just abstract discriminators but directly encode the physical risk factors [12,56,57]. Peak Intensity is the near-perfect correlation between Neck Circumference (NC), which is physiologically profound. It is a validated proxy for fat deposition and reduced pharyngeal tissue stiffness. This deposition not only narrows the airway but also influences sound wave propagation. The high peak acoustic value in expiration is a measurable outcome of a sound wave propagating through a physically constrained, often partially occluded, and highly compressible tissue structure. At the same time, Entropy is the strong correlation between (a proxy for systemic obesity and increased soft tissue mass) and links overall body habits to the acoustic randomness and disorganization of the expired airflow pattern.

The combined evidence from correlation, feature stability, and anthropometric validation strongly supports the use of wakefulness features for the objective and clinically meaningful assessment of severity [12,56,57].

4.4. Correlation with Anthropometric Data

The strong correlations observed between several tracheal sound features and anthropometric data further strengthen their clinical utility [12,17]. For example, features such as Mouth Expiration_BBox_650_15_1_0_peakValue, which correlate strongly with Neck Circumference (NC), and Mouth Expiration_BBox_155_2_0_0_entropyValue, which correlate strongly with BMI, are particularly noteworthy. NC and BMI are well-established risk factors and indicators of OSA severity. The direct relationship between these anthropometric measures and specific acoustic features suggests that the structural characteristics of the upper airway, influenced by body posture, are reflected in tracheal breathing sounds [15,58]. This provides a mechanistic link between anatomical predispositions to OSA and the acoustic manifestations captured by our features. Similarly, correlations with the Mallampati Score (MPS), an indicator of the oral cavity and pharyngeal space, further support the idea that the airway’s physical configuration influences sound production during breathing [47]. Although some fold-wise correlations approached unity, these values reflect strong associations observed within specific severity contrasts and folds and do not imply model overfitting, as correlation analysis was conducted independently of the predictive learning pipeline.

These findings suggest that wakefulness tracheal breathing sounds contain rich information reflecting the physiological state of the upper airway. The ability of these features to discriminate OSA severity during wakefulness, their stability across different models and folds, and their strong correlations with established anthropometric risk factors are particularly significant. This offers a non-invasive, convenient, and potentially cost-effective method for screening and monitoring [9,44,59]. The interpretability of these features allows for a deeper understanding of the underlying mechanisms of OSA, moving beyond black-box model predictions [9,44,59]. This interpretability is crucial for clinical acceptance and for guiding future research into targeted interventions.

4.5. Alignment with Prior Wakefulness-Based OSA Studies

The present findings align strongly with and extend our group’s previous works [12,17]. In the 2018 study [17], the authors demonstrated that spectral, bispectral, and fractal features extracted from tracheal sounds could effectively differentiate OSA from non-OSA subjects, achieving classification accuracies of approximately 70–75% and ROC values of 0.73–0.80. They further observed that mouth-inspiratory features provided the highest discrimination power and that several acoustic descriptors were only weakly influenced by anthropometric variability, such as body mass index (BMI) or neck circumference (NC). In the other 2019 study [12], they reinforced these results using a larger dataset of 199 subjects. It confirmed that combining tracheal sounds with anthropometric features increased diagnostic performance to 81.4% accuracy (sensitivity = 82.1%, specificity = 80.9%).

Consistent with prior findings, our results achieved diagnostic performance comparable to or superior to that of previous models, yielding AUC values ranging from 0.86 to 0.97 across multiple OSA severity levels. The strongest predictive power was again observed for mouth-inspiratory and low-frequency components (150–450 Hz), which exhibited elevated spectral energy and distinct morphological patterns in moderate and severe OSA subjects. Correlation analysis in our dataset similarly showed significant relationships between acoustic features and anthropometric markers, including BMI (r = 0.52–0.75) and NC (r = 0.47–0.72), corroborating earlier physiological interpretations of airway constriction and turbulence during inspiration.

Beyond confirming the earlier results, the present study extends the prior research in three significant ways:

Instead of a binary OSA vs. non-OSA classification, our framework performs multi-level severity stratification (non-OSA, mild, moderate, and severe), offering finer clinical granularity.
We introduce novel morphological and time–frequency gap descriptors, extracted from harmonic–percussive (HP) decompositions and spectrogram bounding boxes, which capture airway-specific acoustic signatures not examined in previous work.
Our use of ensemble-based models with SHAP explainability provides transparent quantification of feature contributions and robustness validation (Abs ΔAUC < 0.04 across folds), establishing reproducibility across subjects and folds.

Together, these advances confirm and expand on the foundational evidence presented previously by our group [12,17], demonstrating that wakefulness-based tracheal breathing sounds, combined with simple anthropometric measures, constitute a physiologically meaningful, non-invasive, and reproducible tool for OSA detection and severity classification.

4.6. Comparison with Other Awake Screening Modalities

To contextualize the proposed methodology, it is necessary to benchmark it against the full spectrum of screening modalities whilst awake. While questionnaires (e.g., STOP-Bang) are ubiquitous due to their zero-cost administration [60], they are hindered by low specificity (often <40%), leading to high false-positive rates [60]. Facial image analysis offers a non-contact alternative, but current methods often plateau at approximately 70% accuracy or require large, balanced datasets to avoid demographic bias [61]. Functional methods such as Negative Expiratory Pressure (NEP) [62] and Acoustic Pharyngometry [63] offer high accuracy by directly measuring airway collapsibility and geometry. However, these techniques often require specialized equipment and strictly controlled protocols, reducing their utility for rapid, large-scale screening compared to microphone-based approaches [12]. Speech analysis, while similar in modality to tracheal breathing sounds analysis [64], often relies on complex phonetic tasks and has shown lower accuracy and variable specificity depending on the features used.

Table 7 presents a comprehensive quantitative comparison between the proposed framework and representative studies across five distinct wakefulness-based modalities. To ensure a fair comparison, we focus on dataset size, task formulation, and performance metrics (Sensitivity/Specificity). This comparison highlights that while functional tests (NEP/Pharyngometry) offer high precision, they lack the portability of acoustic methods. Conversely, while questionnaires are portable, they lack the diagnostic accuracy of the proposed method.

4.7. Limitations and Future Work

Although the proposed framework achieved high interpretability, several technical limitations merit consideration. The analysis was performed on data collected from a single clinical site [8,11,12], which may limit the generalizability of the results to broader populations with differing acoustic environments, recording hardware, and demographic characteristics. Secondly, the sound recordings were conducted by a high-end Sony microphone. The selected features may change (although not significantly) depending on the sensor used. Future studies should incorporate multi-center datasets and cross-device validation to ensure robustness under real-world variability. Additionally, while the stratified k-fold design effectively balanced anthropometric covariates, the current results were based on a finite number of wakefulness recordings per subject, restricting the temporal representation of respiratory dynamics. Expanding the framework to include multi-cycle, sleep-stage-specific, or longitudinal tracheal sound data would enable modeling of disease progression and treatment response. Despite these constraints, the consistent feature stability and explainable ensemble structure provide a strong foundation for advancing automated, non-invasive OSA assessment.

In addition to these technical considerations, the study cohort consisted of individuals referred to overnight PSG and therefore represents a clinically enriched population with a higher pre-test probability of OSA than general or primary-care populations. This enrichment may lead to optimistic estimates of discrimination metrics, such as AUC, compared with deployment in lower-risk settings, and may particularly affect positive predictive value when disease prevalence is lower. However, the primary objective of this work was not to estimate population-level screening accuracy but to identify robust, physiologically interpretable acoustic biomarkers of OSA severity under controlled clinical conditions. These biomarkers reflect underlying airway dynamics and anatomical vulnerability, which are expected to generalize beyond referral-based cohorts.

In practical deployment, the proposed framework is best positioned as a first-line risk stratification or prioritization tool rather than a standalone diagnostic test. In primary-care or community settings, it could be used to identify individuals who would benefit most from expedited PSG, thereby improving resource allocation and reducing diagnostic delays. Future studies will focus on validating the framework in lower-prevalence populations, including primary-care and community-based cohorts, and on recalibrating decision thresholds to account for differences in disease prevalence and pre-test probability. Despite these constraints, the consistent feature stability and explainable ensemble structure provide a strong foundation for advancing automated, non-invasive OSA assessment.

5. Conclusions

In conclusion, this research successfully established an interpretable, machine-learning-driven framework that uses wakefulness tracheal breathing sounds as objective, severity-stratifying biomarkers for Obstructive Sleep Apnea. By identifying and validating stable acoustic features, texture energy, spectral bandwidth, and fractal dimension that exhibit strong mechanistic correlations with established anatomical risk factors, we have acoustically encoded the underlying physiological vulnerability of the upper airway. This interpretability moves beyond black-box diagnostics, offering clinicians clear, physiological correlates for disease progression. These findings represent a decisive step toward developing a non-invasive, cost-effective, and highly accessible screening tool, which is critically needed for timely diagnosis, perioperative risk stratification, and scalable long-term management in clinical settings globally.

Bibliography68

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Rizzo D. Baltzan M. Sirpal S. Dosman J. Kaminska M. Chung F. Prevalence and regional distribution of obstructive sleep apnea in Canada: Analysis from the Canadian Longitudinal Study on Aging Can. J. Public Health 202411597097910.17269/s 41997-024-00911-839037568 PMC 11644135 · doi ↗ · pubmed ↗
2Lechat B. Naik G. Reynolds A. Aishah A. Scott H. Loffler K.A. Vakulin A. Escourrou P. Mc Evoy R.D. Adams R.J. Multinight Prevalence, Variability, and Diagnostic Misclassification of Obstructive Sleep Apnea Am. J. Respir. Crit. Care Med.202220556356910.1164/rccm.202107-1761 OC 34904935 PMC 8906484 · doi ↗ · pubmed ↗
3Faria A. Allen A.H. Fox N. Ayas N. Laher I. The public health burden of obstructive sleep apnea Sleep Sci.2021142572653518620410.5935/1984-0063.20200111 PMC 8848533 · doi ↗ · pubmed ↗
4Singh M. Liao P. Kobah S. Wijeysundera D.N. Shapiro C. Chung F. Proportion of surgical patients with undiagnosed obstructive sleep apnoea Br. J. Anaesth.201311062963610.1093/bja/aes 46523257990 · doi ↗ · pubmed ↗
5Kushida C.A. Littner M.R. Morgenthaler T. Alessi C.A. Bailey D. Coleman J.Jr. Friedman L. Hirshkowitz M. Kapen S. Kramer M. Practice parameters for the indications for polysomnography and related procedures: An update for 2005 Sleep 20052849952110.1093/sleep/28.4.49916171294 · doi ↗ · pubmed ↗
6Chen L. Pivetta B. Nagappa M. Saripella A. Islam S. Englesakis M. Chung F. Validation of the STOP-Bang questionnaire for screening of obstructive sleep apnea in the general population and commercial drivers: A systematic review and meta-analysis Sleep Breath.2021251741175110.1007/s 11325-021-02299-y 33507478 PMC 8590671 · doi ↗ · pubmed ↗
7Mazzotti D.R. Keenan B.T. Thorarinsdottir E.H. Gislason T. Pack A.I. Sleep Apnea Global Interdisciplinary, C. Is the Epworth Sleepiness Scale Sufficient to Identify the Excessively Sleepy Subtype of OSA?Chest 202216155756110.1016/j.chest.2021.10.02734756944 PMC 8941607 · doi ↗ · pubmed ↗
8Alqudah A.M. Moussavi Z. Assessing Obstructive Sleep Apnea Severity During Wakefulness via Tracheal Breathing Sound Analysis Sensors 202525628010.3390/s 2520628041157332 PMC 12567693 · doi ↗ · pubmed ↗