# Speech-based respiratory diagnostics: A study on COVID-19 detection with machine learning

**Authors:** Gaurav Datkhile, Pramod H. Kachare, Sandeep B. Sangle, Ibrahim Al-Shourbaji, Abdoh Jabbari, Raimund Kirner, Abdalla Alameen

PMC · DOI: 10.1371/journal.pone.0332146 · PLOS One · 2025-11-21

## TL;DR

This study explores using speech sounds and machine learning to detect COVID-19 through respiratory sound analysis, aiming to create non-invasive diagnostic tools.

## Contribution

The paper introduces a novel combination of OpenSMILE features and ANOVA-based selection for improving machine learning accuracy in detecting COVID-19 from speech.

## Key findings

- Random Forest with ANOVA-based feature selection achieved 76.47% accuracy for vowel /a/.
- The best performance across multiple vowels was 75.54% using ANOVA-selected features.
- Friedman test confirmed Random Forest and ANOVA as robust combinations for classification.

## Abstract

Respiratory sound analysis has emerged as a promising approach for detecting and diagnosing respiratory diseases, including COVID-19. This study investigates using OpenSMILE features for COVID-19 detection using vowel speech sounds /a/, /e/, and /o/ from the COSWARA dataset. OpenSMILE facilitates the extraction of audio and functional features, which are then classified using various machine learning algorithms. Multiple ML classifiers Random Forest (RF), Support Vector Machine, Decision Tree, and Artificial Neural Network are evaluated. To enhance classification performance, five distinct feature selection techniques were applied: ANOVA, chi-square, Information Gain, ReliefF, and Gini index. Among these, ANOVA-based selection yielded the most consistent results across classifiers and vowel sounds. Among the models evaluated, the RF classifier achieved the highest accuracies of 76.47% for vowel /a/ and 75.54% for vowels /a/ and /o/, respectively, when combined with ANOVA-selected features (155, 163, and 161 features). To statistically assess model and feature selection performances, the Friedman test was conducted across classifiers and feature selection techniques. Results confirmed the significance of Random Forest and ANOVA as robust combinations. This research contributes to developing accessible, scalable, and non-invasive diagnostic tools, enhancing the potential of telemedicine and remote healthcare systems for the early detection of respiratory diseases.

## Linked entities

- **Diseases:** COVID-19 (MONDO:0100096)

## Full-text entities

- **Diseases:** COVID-19 (MESH:D000086382), respiratory diseases (MESH:D012140)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12637952/full.md

## Figures

12 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12637952/full.md

## References

15 references — full list in the complete paper: https://tomesphere.com/paper/PMC12637952/full.md

---
Source: https://tomesphere.com/paper/PMC12637952