voice2mode: Phonation Mode Classification in Singing using Self-Supervised Speech Models
Aju Ani Justus, Ruchit Agrawal, Sudarsana Reddy Kadiri, Shrikanth Narayanan

TL;DR
voice2mode leverages self-supervised speech models to accurately classify singing phonation modes, outperforming traditional features and revealing layer-wise behavior of models like HuBERT.
Contribution
This work demonstrates the transferability of large self-supervised speech models to singing phonation classification, achieving significant accuracy improvements over conventional methods.
Findings
HuBERT embeddings from early layers achieve ~95.7% accuracy.
Foundation-model features outperform spectral baselines.
Lower layers retain more acoustic/phonetic detail.
Abstract
We present voice2mode, a method for classification of four singing phonation modes (breathy, neutral (modal), flow, and pressed) using embeddings extracted from large self-supervised speech models. Prior work on singing phonation has relied on handcrafted signal features or task-specific neural nets; this work evaluates the transferability of speech foundation models to singing phonation classification. voice2mode extracts layer-wise representations from HuBERT and two wav2vec2 variants, applies global temporal pooling, and classifies the pooled embeddings with lightweight classifiers (SVM, XGBoost). Experiments on a publicly available soprano dataset (763 sustained vowel recordings, four labels) show that foundation-model features substantially outperform conventional spectral baselines (spectrogram, mel-spectrogram, MFCC). HuBERT embeddings obtained from early layers yield the best…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVoice and Speech Disorders · Music and Audio Processing · Speech Recognition and Synthesis
