Shaking Acoustic Spectral Sub-bands Can Better Regularize Learning in   Affective Computing

Che-Wei Huang; Shrikanth Narayanan

arXiv:1804.06779·cs.SD·April 19, 2018·1 cites

Shaking Acoustic Spectral Sub-bands Can Better Regularize Learning in Affective Computing

Che-Wei Huang, Shrikanth Narayanan

PDF

Open Access

TL;DR

This paper explores a novel regularization method using spectral sub-bands in speech emotion recognition, demonstrating improved model performance and stability over traditional approaches.

Contribution

It introduces a spectral sub-band shaking technique and variants incorporating domain knowledge, enhancing regularization effectiveness in speech emotion recognition models.

Findings

01

Shaking spectral sub-bands outperforms shaking entire spectral features.

02

Proper early stopping leads to better generalization and smaller training-validation gap.

03

Proposed methods outperform baseline models in experiments.

Abstract

In this work, we investigate a recently proposed regularization technique based on multi-branch architectures, called Shake-Shake regularization, for the task of speech emotion recognition. In addition, we also propose variants to incorporate domain knowledge into model configurations. The experimental results demonstrate: $1)$ independently shaking sub-bands delivers favorable models compared to shaking the entire spectral-temporal feature maps. $2)$ with proper patience in early stopping, the proposed models can simultaneously outperform the baseline and maintain a smaller performance gap between training and validation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing