PERSONA: An Application for Emotion Recognition, Gender Recognition and   Age Estimation

Devyani Koshal; Orchid Chetia Phukan; Sarthak Jain; Arun Balaji; Buduru; Rajesh Sharma

arXiv:2406.06781·eess.AS·June 12, 2024

PERSONA: An Application for Emotion Recognition, Gender Recognition and Age Estimation

Devyani Koshal, Orchid Chetia Phukan, Sarthak Jain, Arun Balaji, Buduru, Rajesh Sharma

PDF

Open Access

TL;DR

PERSONA is a multi-task application that simultaneously predicts emotion, gender, and age from speech using a single model, demonstrating that speaker recognition pre-trained models outperform self-supervised models for this purpose.

Contribution

The paper introduces a multi-task learning framework for ER, GR, and AE, highlighting the superiority of speaker recognition pre-trained models over SSL models for these tasks.

Findings

01

Speaker recognition PTM outperforms SSL PTM in multi-task learning.

02

Single model approach reduces resource and time consumption.

03

Demonstrates feasibility of combined emotion, gender, and age prediction.

Abstract

Emotion Recognition (ER), Gender Recognition (GR), and Age Estimation (AE) constitute paralinguistic tasks that rely not on the spoken content but primarily on speech characteristics such as pitch and tone. While previous research has made significant strides in developing models for each task individually, there has been comparatively less emphasis on concurrently learning these tasks, despite their inherent interconnectedness. As such in this demonstration, we present PERSONA, an application for predicting ER, GR, and AE with a single model in the backend. One notable point is we show that representations from speaker recognition pre-trained model (PTM) is better suited for such a multi-task learning format than the state-of-the-art (SOTA) self-supervised (SSL) PTM by carrying out a comparative study. Our methodology obviates the need for deploying separate models for each task and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis

MethodsAutoencoders