They are wearing a mask! Identification of Subjects Wearing a Surgical Mask from their Speech by means of x-vectors and Fisher Vectors
Jos\'e Vicente Egas-L\'opez

TL;DR
This paper compares Fisher Vector and x-vector features for identifying speech from subjects wearing surgical masks, demonstrating Fisher Vectors' superior performance and improved results through feature fusion.
Contribution
It introduces a novel application of Fisher Vectors for mask detection in speech and compares it with x-vectors, showing Fisher Vectors' effectiveness in this context.
Findings
Fisher Vector encodings outperform x-vectors for this task.
Fusion of features yields better classification accuracy.
The approach improves baseline scores in the Mask Sub-Challenge.
Abstract
Challenges based on Computational Paralinguistics in the INTERSPEECH Conference have always had a good reception among the attendees owing to its competitive academic and research demands. This year, the INTERSPEECH 2020 Computational Paralinguistics Challenge offers three different problems; here, the Mask Sub-Challenge is of specific interest. This challenge involves the classification of speech recorded from subjects while wearing a surgical mask. In this study, to address the above-mentioned problem we employ two different types of feature extraction methods. The x-vectors embeddings, which is the current state-of-the-art approach for Speaker Recognition; and the Fisher Vector (FV), that is a method originally intended for Image Recognition, but here we utilize it to discriminate utterances. These approaches employ distinct frame-level representations: MFCC and PLP. Using Support…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing
