Bag-of-Audio-Words based on Autoencoder Codebook for Continuous Emotion Prediction
Mohammed Senoussaoui, Patrick Cardinal, Alessandro Lameiras Koerich

TL;DR
This paper introduces a novel Bag-of-Words approach using an autoencoder codebook for continuous emotion prediction, demonstrating improved accuracy on the AVEC 2017 audio dataset.
Contribution
It proposes a neural network-based codebook creation method that integrates dictionary learning and assignment, enhancing emotion prediction performance.
Findings
Improved CCC from 0.225 to 0.322 for arousal
Enhanced CCC from 0.244 to 0.368 for valence
Outperforms conventional BoW in emotion prediction
Abstract
In this paper we present a novel approach for extracting a Bag-of-Words (BoW) representation based on a Neural Network codebook. The conventional BoW model is based on a dictionary (codebook) built from elementary representations which are selected randomly or by using a clustering algorithm on a training dataset. A metric is then used to assign unseen elementary representations to the closest dictionary entries in order to produce a histogram. In the proposed approach, an autoencoder (AE) encompasses the role of both the dictionary creation and the assignment metric. The dimension of the encoded layer of the AE corresponds to the size of the dictionary and the output of its neurons represents the assignment metric. Experimental results for the continuous emotion prediction task on the AVEC 2017 audio dataset have shown an improvement of the Concordance Correlation Coefficient (CCC)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech and Audio Processing · Emotion and Mood Recognition
MethodsAutoencoders · Solana Customer Service Number +1-833-534-1729
