The EIHW-GLAM Deep Attentive Multi-model Fusion System for Cough-based   COVID-19 Recognition in the DiCOVA 2021 Challenge

Zhao Ren; Yi Chang; Bj\"orn W. Schuller

arXiv:2108.03041·cs.SD·August 9, 2021

The EIHW-GLAM Deep Attentive Multi-model Fusion System for Cough-based COVID-19 Recognition in the DiCOVA 2021 Challenge

Zhao Ren, Yi Chang, Bj\"orn W. Schuller

PDF

Open Access

TL;DR

This paper introduces a deep attentive multi-model fusion system that combines various audio representations to improve COVID-19 detection from cough sounds, achieving significant performance gains in the DiCOVA 2021 challenge.

Contribution

It presents a novel multi-model fusion approach using attention mechanisms at feature and decision levels for cough-based COVID-19 recognition.

Findings

01

Attention-based feature fusion achieves highest AUC of 77.96%.

02

System outperforms official baseline by 8.05%.

03

Multi-representation fusion enhances detection accuracy.

Abstract

Aiming to automatically detect COVID-19 from cough sounds, we propose a deep attentive multi-model fusion system evaluated on the Track-1 dataset of the DiCOVA 2021 challenge. Three kinds of representations are extracted, including hand-crafted features, image-from-audio-based deep representations, and audio-based deep representations. Afterwards, the best models on the three types of features are fused at both the feature level and the decision level. The experimental results demonstrate that the proposed attention-based fusion at the feature level achieves the best performance (AUC: 77.96%) on the test set, resulting in an 8.05% improvement over the official baseline.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCOVID-19 diagnosis using AI · Speech and Audio Processing · Music and Audio Processing