Classification of Speech with and without Face Mask using Acoustic Features
Rohan Kumar Das, Haizhou Li

TL;DR
This paper investigates acoustic features to classify speech with or without face masks, demonstrating improved accuracy by combining novel features with existing baselines for better mask detection in speech processing.
Contribution
The study introduces novel acoustic features based on linear filterbanks, phase, and long-term info, enhancing mask classification accuracy when fused with existing methods.
Findings
Achieved 73.50% unweighted average recall on test set.
Acoustic features effectively capture mask-related speech artifacts.
Fusion of features improves classification performance.
Abstract
The understanding and interpretation of speech can be affected by various external factors. The use of face masks is one such factors that can create obstruction to speech while communicating. This may lead to degradation of speech processing and affect humans perceptually. Knowing whether a speaker wears a mask may be useful for modeling speech for different applications. With this motivation, finding whether a speaker wears face mask from a given speech is included as a task in Computational Paralinguistics Evaluation (ComParE) 2020. We study novel acoustic features based on linear filterbanks, instantaneous phase and long-term information that can capture the artifacts for classification of speech with and without face mask. These acoustic features are used along with the state-of-the-art baselines of ComParE functionals, bag-of-audio-words, DeepSpectrum and auDeep features for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Phonetics and Phonology Research
