Evaluating Gammatone Frequency Cepstral Coefficients with Neural   Networks for Emotion Recognition from Speech

Gabrielle K. Liu

arXiv:1806.09010·cs.SD·June 26, 2018·34 cites

Evaluating Gammatone Frequency Cepstral Coefficients with Neural Networks for Emotion Recognition from Speech

Gabrielle K. Liu

PDF

Open Access 1 Repo

TL;DR

This paper compares Gammatone Frequency Cepstral Coefficients (GFCCs) and Mel Frequency Cepstral Coefficients (MFCCs) for speech emotion recognition, demonstrating GFCCs' superior performance with neural networks.

Contribution

It introduces GFCCs as an alternative to MFCCs for emotion recognition and evaluates their effectiveness using neural network models.

Findings

01

GFCCs outperform MFCCs in emotion classification accuracy.

02

Recurrent neural networks perform better than fully connected networks.

03

GFCCs show promise for improved speech emotion recognition.

Abstract

Current approaches to speech emotion recognition focus on speech features that can capture the emotional content of a speech signal. Mel Frequency Cepstral Coefficients (MFCCs) are one of the most commonly used representations for audio speech recognition and classification. This paper proposes Gammatone Frequency Cepstral Coefficients (GFCCs) as a potentially better representation of speech signals for emotion recognition. The effectiveness of MFCC and GFCC representations are compared and evaluated over emotion and intensity classification tasks with fully connected and recurrent neural network architectures. The results provide evidence that GFCCs outperform MFCCs in speech emotion recognition.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

SoyBison/gammatone
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing