Towards Universal End-to-End Affect Recognition from Multilingual Speech   by ConvNets

Dario Bertero; Onno Kampman; Pascale Fung

arXiv:1901.06486·cs.CL·January 28, 2019·1 cites

Towards Universal End-to-End Affect Recognition from Multilingual Speech by ConvNets

Dario Bertero, Onno Kampman, Pascale Fung

PDF

Open Access

TL;DR

This paper introduces a universal end-to-end CNN model for affect recognition from multilingual speech, leveraging raw waveforms to improve emotion and personality detection across languages.

Contribution

It presents the first universal CNN-based affect recognition model trained on multiple languages simultaneously, outperforming single-language models and spectrogram-based CNNs.

Findings

01

12.8% improvement in emotion recognition accuracy

02

10.1% improvement in personality recognition accuracy

03

Network learns language-independent features like pitch and energy

Abstract

We propose an end-to-end affect recognition approach using a Convolutional Neural Network (CNN) that handles multiple languages, with applications to emotion and personality recognition from speech. We lay the foundation of a universal model that is trained on multiple languages at once. As affect is shared across all languages, we are able to leverage shared information between languages and improve the overall performance for each one. We obtained an average improvement of 12.8% on emotion and 10.1% on personality when compared with the same model trained on each language only. It is end-to-end because we directly take narrow-band raw waveforms as input. This allows us to accept as input audio recorded from any source and to avoid the overhead and information loss of feature extraction. It outperforms a similar CNN using spectrograms as input by 12.8% for emotion and 6.3% for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Sentiment Analysis and Opinion Mining · Speech Recognition and Synthesis