Prompting Audios Using Acoustic Properties For Emotion Representation

Hira Dhamyal; Benjamin Elizalde; Soham Deshmukh; Huaming Wang; Bhiksha; Raj; Rita Singh

arXiv:2310.02298·cs.SD·December 8, 2023·1 cites

Prompting Audios Using Acoustic Properties For Emotion Representation

Hira Dhamyal, Benjamin Elizalde, Soham Deshmukh, Huaming Wang, Bhiksha, Raj, Rita Singh

PDF

Open Access

TL;DR

This paper introduces a novel approach to emotion representation in audio by automatically generating acoustic prompts based on properties like pitch and speech rate, and uses contrastive learning to improve emotion recognition tasks.

Contribution

It proposes a new method of using acoustic properties to generate prompts for better emotion modeling from audio, enhancing performance in emotion recognition tasks.

Findings

01

Significant improvement in Emotion Audio Retrieval performance.

02

3.8% relative accuracy increase in Speech Emotion Recognition on Ravdess.

03

Acoustic prompts enhance model understanding of emotional expressions.

Abstract

Emotions lie on a continuum, but current models treat emotions as a finite valued discrete variable. This representation does not capture the diversity in the expression of emotion. To better represent emotions we propose the use of natural language descriptions (or prompts). In this work, we address the challenge of automatically generating these prompts and training a model to better learn emotion representations from audio and prompt pairs. We use acoustic properties that are correlated to emotion like pitch, intensity, speech rate, and articulation rate to automatically generate prompts i.e. 'acoustic prompts'. We use a contrastive learning objective to map speech to their respective acoustic prompts. We evaluate our model on Emotion Audio Retrieval and Speech Emotion Recognition. Our results show that the acoustic prompts significantly improve the model's performance in EAR, in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Emotion and Mood Recognition · Speech and Audio Processing

MethodsContrastive Learning