Are you wearing a mask? Improving mask detection from speech using augmentation by cycle-consistent GANs
Nicolae-C\u{a}t\u{a}lin Ristea, Radu Tudor Ionescu

TL;DR
This paper introduces a novel GAN-based data augmentation method for improving speech mask detection, demonstrating enhanced performance in a challenge setting and outperforming existing augmentation techniques.
Contribution
The paper presents a cycle-consistent GAN approach for generating augmented speech data to improve mask detection accuracy from speech signals.
Findings
Achieved 2.8% improvement over baseline in Mask Sub-Challenge.
Data augmentation increased performance by 0.9% on private test set.
Outperformed other baseline and state-of-the-art augmentation methods.
Abstract
The task of detecting whether a person wears a face mask from speech is useful in modelling speech in forensic investigations, communication between surgeons or people protecting themselves against infectious diseases such as COVID-19. In this paper, we propose a novel data augmentation approach for mask detection from speech. Our approach is based on (i) training Generative Adversarial Networks (GANs) with cycle-consistency loss to translate unpaired utterances between two classes (with mask and without mask), and on (ii) generating new training utterances using the cycle-consistent GANs, assigning opposite labels to each translated utterance. Original and translated utterances are converted into spectrograms which are provided as input to a set of ResNet neural networks with various depths. The networks are combined into an ensemble through a Support Vector Machines (SVM) classifier.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Methods1x1 Convolution · *Communicated@Fast*How Do I Communicate to Expedia? · Bottleneck Residual Block · Batch Normalization · Average Pooling · Max Pooling · Global Average Pooling · Residual Connection · Kaiming Initialization · Convolution
