PERSONA: An Application for Emotion Recognition, Gender Recognition and Age Estimation
Devyani Koshal, Orchid Chetia Phukan, Sarthak Jain, Arun Balaji, Buduru, Rajesh Sharma

TL;DR
PERSONA is a multi-task application that simultaneously predicts emotion, gender, and age from speech using a single model, demonstrating that speaker recognition pre-trained models outperform self-supervised models for this purpose.
Contribution
The paper introduces a multi-task learning framework for ER, GR, and AE, highlighting the superiority of speaker recognition pre-trained models over SSL models for these tasks.
Findings
Speaker recognition PTM outperforms SSL PTM in multi-task learning.
Single model approach reduces resource and time consumption.
Demonstrates feasibility of combined emotion, gender, and age prediction.
Abstract
Emotion Recognition (ER), Gender Recognition (GR), and Age Estimation (AE) constitute paralinguistic tasks that rely not on the spoken content but primarily on speech characteristics such as pitch and tone. While previous research has made significant strides in developing models for each task individually, there has been comparatively less emphasis on concurrently learning these tasks, despite their inherent interconnectedness. As such in this demonstration, we present PERSONA, an application for predicting ER, GR, and AE with a single model in the backend. One notable point is we show that representations from speaker recognition pre-trained model (PTM) is better suited for such a multi-task learning format than the state-of-the-art (SOTA) self-supervised (SSL) PTM by carrying out a comparative study. Our methodology obviates the need for deploying separate models for each task and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis
MethodsAutoencoders
