Jointly Predicting Emotion, Age, and Country Using Pre-Trained Acoustic   Embedding

Bagus Tris Atmaja; Zanjabila; and Akira Sasou

arXiv:2207.10333·eess.AS·September 28, 2022·ACIIW

Jointly Predicting Emotion, Age, and Country Using Pre-Trained Acoustic Embedding

Bagus Tris Atmaja, Zanjabila, and Akira Sasou

PDF

Open Access 1 Repo

TL;DR

This study demonstrates how pre-trained acoustic embeddings can be used in multitask learning to predict emotion, age, and country from speech, showing benefits over traditional features.

Contribution

It introduces a multitask learning framework using wav2vec 2.0 embeddings for simultaneous prediction of emotion, age, and country, highlighting the effectiveness of pre-trained models.

Findings

01

Pre-trained acoustic embeddings improve prediction accuracy.

02

Multitask learning with shared representations benefits all tasks.

03

Different acoustic features and normalization methods impact performance.

Abstract

In this paper, we demonstrated the benefit of using pre-trained model to extract acoustic embedding to jointly predict (multitask learning) three tasks: emotion, age, and native country. The pre-trained model was trained with wav2vec 2.0 large robust model on the speech emotion corpus. The emotion and age tasks were regression problems, while country prediction was a classification task. A single harmonic mean from three metrics was used to evaluate the performance of multitask learning. The classifier was a linear network with two independent layers and shared layers, including the output layers. This study explores multitask learning on different acoustic features (including the acoustic embedding extracted from a model trained on an affective speech dataset), seed numbers, batch sizes, and normalizations for predicting paralinguistic information from speech.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bagustris/ExVo2022
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Emotion and Mood Recognition · Music and Audio Processing