Bag-of-Audio-Words based on Autoencoder Codebook for Continuous Emotion   Prediction

Mohammed Senoussaoui; Patrick Cardinal; Alessandro Lameiras Koerich

arXiv:1907.04928·eess.AS·July 12, 2019·1 cites

Bag-of-Audio-Words based on Autoencoder Codebook for Continuous Emotion Prediction

Mohammed Senoussaoui, Patrick Cardinal, Alessandro Lameiras Koerich

PDF

Open Access

TL;DR

This paper introduces a novel Bag-of-Words approach using an autoencoder codebook for continuous emotion prediction, demonstrating improved accuracy on the AVEC 2017 audio dataset.

Contribution

It proposes a neural network-based codebook creation method that integrates dictionary learning and assignment, enhancing emotion prediction performance.

Findings

01

Improved CCC from 0.225 to 0.322 for arousal

02

Enhanced CCC from 0.244 to 0.368 for valence

03

Outperforms conventional BoW in emotion prediction

Abstract

In this paper we present a novel approach for extracting a Bag-of-Words (BoW) representation based on a Neural Network codebook. The conventional BoW model is based on a dictionary (codebook) built from elementary representations which are selected randomly or by using a clustering algorithm on a training dataset. A metric is then used to assign unseen elementary representations to the closest dictionary entries in order to produce a histogram. In the proposed approach, an autoencoder (AE) encompasses the role of both the dictionary creation and the assignment metric. The dimension of the encoded layer of the AE corresponds to the size of the dictionary and the output of its neurons represents the assignment metric. Experimental results for the continuous emotion prediction task on the AVEC 2017 audio dataset have shown an improvement of the Concordance Correlation Coefficient (CCC)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Emotion and Mood Recognition

MethodsAutoencoders · Solana Customer Service Number +1-833-534-1729