Supervised and Self-supervised Pretraining Based COVID-19 Detection Using Acoustic Breathing/Cough/Speech Signals
Xing-Yu Chen, Qiu-Shi Zhu, Jie Zhang, Li-Rong Dai

TL;DR
This study introduces a BiLSTM-based COVID-19 detection method utilizing acoustic signals from breath, speech, and cough, enhanced by pretraining and feature extraction techniques, achieving high accuracy on the DiCOVA dataset.
Contribution
It presents a novel bi-directional LSTM approach with pretraining and high-level feature extraction for COVID-19 detection from acoustic signals.
Findings
Achieved an AUC score of 88.44% on the DiCOVA blind test.
Pretraining with wav2vec2.0 improves feature quality and detection performance.
Combining high-level features with MFCC enhances model accuracy.
Abstract
In this work, we propose a bi-directional long short-term memory (BiLSTM) network based COVID-19 detection method using breath/speech/cough signals. By using the acoustic signals to train the network, respectively, we can build individual models for three tasks, whose parameters are averaged to obtain an average model, which is then used as the initialization for the BiLSTM model training of each task. This initialization method can significantly improve the performance on the three tasks, which surpasses the official baseline results. Besides, we also utilize a public pre-trained model wav2vec2.0 and pre-train it using the official DiCOVA datasets. This wav2vec2.0 model is utilized to extract high-level features of the sound as the model input to replace conventional mel-frequency cepstral coefficients (MFCC) features. Experimental results reveal that using high-level features together…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory · Bidirectional LSTM
