A Semi-Supervised Algorithm for Improving the Consistency of Crowdsourced Datasets: The COVID-19 Case Study on Respiratory Disorder Classification
Lara Orlandic, Tomas Teijeiro, David Atienza

TL;DR
This paper presents a semi-supervised learning method to enhance the labeling consistency of crowdsourced cough audio data, improving COVID-19 detection accuracy and dataset reliability for respiratory disorder classification.
Contribution
It introduces a novel SSL approach to correct and augment noisy crowdsourced labels, increasing class separability and spectral feature clarity in cough sound datasets.
Findings
Re-labeled data shows 3x higher class separability than original labels.
Spectral differences between healthy and COVID-19 coughs are amplified in re-labeled data.
Re-labeled dataset improves cough classifier training effectiveness.
Abstract
Cough audio signal classification is a potentially useful tool in screening for respiratory disorders, such as COVID-19. Since it is dangerous to collect data from patients with such contagious diseases, many research teams have turned to crowdsourcing to quickly gather cough sound data, as it was done to generate the COUGHVID dataset. The COUGHVID dataset enlisted expert physicians to diagnose the underlying diseases present in a limited number of uploaded recordings. However, this approach suffers from potential mislabeling of the coughs, as well as notable disagreement between experts. In this work, we use a semi-supervised learning (SSL) approach to improve the labeling consistency of the COUGHVID dataset and the robustness of COVID-19 versus healthy cough sound classification. First, we leverage existing SSL expert knowledge aggregation techniques to overcome the labeling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRespiratory and Cough-Related Research · Respiratory viral infections research · Speech Recognition and Synthesis
