Surgical Mask Detection with Convolutional Neural Networks and Data Augmentations on Spectrograms
Steffen Illium, Robert M\"uller, Andreas Sedlmeier, Claudia, Linnhoff-Popien

TL;DR
This paper evaluates the effectiveness of data augmentation techniques on convolutional neural networks for surgical mask detection from voice spectrograms, demonstrating improved performance over baseline methods.
Contribution
It provides an analysis of data augmentation impact on CNN performance for audio-based mask detection, highlighting robustness across different architectures.
Findings
Data augmentation improves CNN accuracy in mask detection.
Most baseline methods are outperformed with augmentation.
Robustness varies across different CNN architectures.
Abstract
In many fields of research, labeled datasets are hard to acquire. This is where data augmentation promises to overcome the lack of training data in the context of neural network engineering and classification tasks. The idea here is to reduce model over-fitting to the feature distribution of a small under-descriptive training dataset. We try to evaluate such data augmentation techniques to gather insights in the performance boost they provide for several convolutional neural networks on mel-spectrogram representations of audio data. We show the impact of data augmentation on the binary classification task of surgical mask detection in samples of human voice (ComParE Challenge 2020). Also we consider four varying architectures to account for augmentation robustness. Results show that most of the baselines given by ComParE are outperformed.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
