Investigation of Self-supervised Pre-trained Models for Classification of Voice Quality from Speech and Neck Surface Accelerometer Signals
Sudarsana Reddy Kadiri, Farhad Javanmardi, Paavo Alku

TL;DR
This study evaluates the effectiveness of self-supervised pre-trained models like wav2vec2 and HuBERT in classifying voice quality from speech and neck surface accelerometer signals, showing NSA signals and HuBERT features outperform traditional methods.
Contribution
It introduces a comprehensive comparison of pre-trained model features versus conventional features for voice quality classification using speech and NSA signals.
Findings
NSA signals yield better classification accuracy than speech signals.
Pre-trained model features outperform conventional features.
HuBERT features outperform wav2vec2 variants.
Abstract
Prior studies in the automatic classification of voice quality have mainly studied the use of the acoustic speech signal as input. Recently, a few studies have been carried out by jointly using both speech and neck surface accelerometer (NSA) signals as inputs, and by extracting MFCCs and glottal source features. This study examines simultaneously-recorded speech and NSA signals in the classification of voice quality (breathy, modal, and pressed) using features derived from three self-supervised pre-trained models (wav2vec2-BASE, wav2vec2-LARGE, and HuBERT) and using a SVM as well as CNNs as classifiers. Furthermore, the effectiveness of the pre-trained models is compared in feature extraction between glottal source waveforms and raw signal waveforms for both speech and NSA inputs. Using two signal processing methods (quasi-closed phase (QCP) glottal inverse filtering and zero frequency…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSupport Vector Machine
