Accounting for Variations in Speech Emotion Recognition with Nonparametric Hierarchical Neural Network
Lance Ying, Amrit Romana, Emily Mower Provost

TL;DR
This paper introduces the Nonparametric Hierarchical Neural Network (NHNN), a novel deep learning model for speech emotion recognition that adaptively accounts for variations without requiring explicit domain labels, outperforming existing methods.
Contribution
The study presents NHNN, a lightweight Bayesian nonparametric neural network that learns group-specific features without domain labels, improving emotion recognition accuracy.
Findings
NHNN outperforms similar complexity models in tests.
NHNN effectively learns group-specific features.
Model bridges performance gaps between groups.
Abstract
In recent years, deep-learning-based speech emotion recognition models have outperformed classical machine learning models. Previously, neural network designs, such as Multitask Learning, have accounted for variations in emotional expressions due to demographic and contextual factors. However, existing models face a few constraints: 1) they rely on a clear definition of domains (e.g. gender, noise condition, etc.) and the availability of domain labels; 2) they often attempt to learn domain-invariant features while emotion expressions can be domain-specific. In the present study, we propose the Nonparametric Hierarchical Neural Network (NHNN), a lightweight hierarchical neural network model based on Bayesian nonparametric clustering. In comparison to Multitask Learning approaches, the proposed model does not require domain/task labels. In our experiments, the NHNN models generally…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Music and Audio Processing · Speech Recognition and Synthesis
