Affective Burst Detection from Speech using Kernel-fusion Dilated   Convolutional Neural Networks

Berkay Kopru; Engin Erzin

arXiv:2110.04091·cs.SD·October 11, 2021

Affective Burst Detection from Speech using Kernel-fusion Dilated Convolutional Neural Networks

Berkay Kopru, Engin Erzin

PDF

Open Access

TL;DR

This paper introduces a novel neural network architecture for detecting high-intensity affective bursts in speech, improving the accuracy of continuous emotion recognition by focusing on affective state changes.

Contribution

It proposes a kernel-fusion dilated CNN model specifically designed for affective burst detection, a new approach in continuous emotion recognition research.

Findings

01

KFDCNN outperforms baseline neural networks on RECOLA and CreativeIT datasets.

02

The model effectively segments affective state contours into idle and burst regions.

03

Affective burst detection enhances the understanding of emotion dynamics in speech.

Abstract

As speech-interfaces are getting richer and widespread, speech emotion recognition promises more attractive applications. In the continuous emotion recognition (CER) problem, tracking changes across affective states is an important and desired capability. Although CER studies widely use correlation metrics in evaluations, these metrics do not always capture all the high-intensity changes in the affective domain. In this paper, we define a novel affective burst detection problem to accurately capture high-intensity changes of the affective attributes. For this problem, we formulate a two-class classification approach to isolate affective burst regions over the affective state contour. The proposed classifier is a kernel-fusion dilated convolutional neural network (KFDCNN) architecture driven by speech spectral features to segment the affective attribute contour into idle and burst…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Speech and Audio Processing · Speech Recognition and Synthesis