The COUGHVID crowdsourcing dataset: A corpus for the study of   large-scale cough analysis algorithms

Lara Orlandic; Tomas Teijeiro; David Atienza

arXiv:2009.11644·cs.SD·June 24, 2021

The COUGHVID crowdsourcing dataset: A corpus for the study of large-scale cough analysis algorithms

Lara Orlandic, Tomas Teijeiro, David Atienza

PDF

2 Repos

TL;DR

The COUGHVID dataset offers a large, diverse collection of over 20,000 crowdsourced cough recordings, expertly labeled to facilitate machine learning research for respiratory disease diagnosis, including COVID-19 detection.

Contribution

This paper introduces one of the largest expert-labeled cough datasets, combining crowdsourcing and clinical validation for improved cough analysis algorithms.

Findings

01

Over 20,000 cough recordings collected and filtered

02

More than 2,000 recordings labeled by pulmonologists

03

Dataset includes diverse demographics and COVID-19 statuses

Abstract

Cough audio signal classification has been successfully used to diagnose a variety of respiratory conditions, and there has been significant interest in leveraging Machine Learning (ML) to provide widespread COVID-19 screening. However, there is currently no validated database of cough sounds with which to train such ML models. The COUGHVID dataset provides over 20,000 crowdsourced cough recordings representing a wide range of subject ages, genders, geographic locations, and COVID-19 statuses. First, we filtered the dataset using our open-sourced cough detection algorithm. Second, experienced pulmonologists labeled more than 2,000 recordings to diagnose medical abnormalities present in the coughs, thereby contributing one of the largest expert-labeled cough datasets in existence that can be used for a plethora of cough audio classification tasks. Finally, we ensured that coughs labeled…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.