Accounting for Variations in Speech Emotion Recognition with   Nonparametric Hierarchical Neural Network

Lance Ying; Amrit Romana; Emily Mower Provost

arXiv:2109.04316·cs.LG·September 10, 2021·1 cites

Accounting for Variations in Speech Emotion Recognition with Nonparametric Hierarchical Neural Network

Lance Ying, Amrit Romana, Emily Mower Provost

PDF

Open Access

TL;DR

This paper introduces the Nonparametric Hierarchical Neural Network (NHNN), a novel deep learning model for speech emotion recognition that adaptively accounts for variations without requiring explicit domain labels, outperforming existing methods.

Contribution

The study presents NHNN, a lightweight Bayesian nonparametric neural network that learns group-specific features without domain labels, improving emotion recognition accuracy.

Findings

01

NHNN outperforms similar complexity models in tests.

02

NHNN effectively learns group-specific features.

03

Model bridges performance gaps between groups.

Abstract

In recent years, deep-learning-based speech emotion recognition models have outperformed classical machine learning models. Previously, neural network designs, such as Multitask Learning, have accounted for variations in emotional expressions due to demographic and contextual factors. However, existing models face a few constraints: 1) they rely on a clear definition of domains (e.g. gender, noise condition, etc.) and the availability of domain labels; 2) they often attempt to learn domain-invariant features while emotion expressions can be domain-specific. In the present study, we propose the Nonparametric Hierarchical Neural Network (NHNN), a lightweight hierarchical neural network model based on Bayesian nonparametric clustering. In comparison to Multitask Learning approaches, the proposed model does not require domain/task labels. In our experiments, the NHNN models generally…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Music and Audio Processing · Speech Recognition and Synthesis