They are wearing a mask! Identification of Subjects Wearing a Surgical   Mask from their Speech by means of x-vectors and Fisher Vectors

Jos\'e Vicente Egas-L\'opez

arXiv:2008.10014·eess.AS·August 25, 2020

They are wearing a mask! Identification of Subjects Wearing a Surgical Mask from their Speech by means of x-vectors and Fisher Vectors

Jos\'e Vicente Egas-L\'opez

PDF

Open Access

TL;DR

This paper compares Fisher Vector and x-vector features for identifying speech from subjects wearing surgical masks, demonstrating Fisher Vectors' superior performance and improved results through feature fusion.

Contribution

It introduces a novel application of Fisher Vectors for mask detection in speech and compares it with x-vectors, showing Fisher Vectors' effectiveness in this context.

Findings

01

Fisher Vector encodings outperform x-vectors for this task.

02

Fusion of features yields better classification accuracy.

03

The approach improves baseline scores in the Mask Sub-Challenge.

Abstract

Challenges based on Computational Paralinguistics in the INTERSPEECH Conference have always had a good reception among the attendees owing to its competitive academic and research demands. This year, the INTERSPEECH 2020 Computational Paralinguistics Challenge offers three different problems; here, the Mask Sub-Challenge is of specific interest. This challenge involves the classification of speech recorded from subjects while wearing a surgical mask. In this study, to address the above-mentioned problem we employ two different types of feature extraction methods. The x-vectors embeddings, which is the current state-of-the-art approach for Speaker Recognition; and the Fisher Vector (FV), that is a method originally intended for Image Recognition, but here we utilize it to discriminate utterances. These approaches employ distinct frame-level representations: MFCC and PLP. Using Support…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing