Prompting Audios Using Acoustic Properties For Emotion Representation
Hira Dhamyal, Benjamin Elizalde, Soham Deshmukh, Huaming Wang, Bhiksha, Raj, Rita Singh

TL;DR
This paper introduces a novel approach to emotion representation in audio by automatically generating acoustic prompts based on properties like pitch and speech rate, and uses contrastive learning to improve emotion recognition tasks.
Contribution
It proposes a new method of using acoustic properties to generate prompts for better emotion modeling from audio, enhancing performance in emotion recognition tasks.
Findings
Significant improvement in Emotion Audio Retrieval performance.
3.8% relative accuracy increase in Speech Emotion Recognition on Ravdess.
Acoustic prompts enhance model understanding of emotional expressions.
Abstract
Emotions lie on a continuum, but current models treat emotions as a finite valued discrete variable. This representation does not capture the diversity in the expression of emotion. To better represent emotions we propose the use of natural language descriptions (or prompts). In this work, we address the challenge of automatically generating these prompts and training a model to better learn emotion representations from audio and prompt pairs. We use acoustic properties that are correlated to emotion like pitch, intensity, speech rate, and articulation rate to automatically generate prompts i.e. 'acoustic prompts'. We use a contrastive learning objective to map speech to their respective acoustic prompts. We evaluate our model on Emotion Audio Retrieval and Speech Emotion Recognition. Our results show that the acoustic prompts significantly improve the model's performance in EAR, in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Emotion and Mood Recognition · Speech and Audio Processing
MethodsContrastive Learning
