Foundation Model Assisted Automatic Speech Emotion Recognition: Transcribing, Annotating, and Augmenting
Tiantian Feng, Shrikanth Narayanan

TL;DR
This paper investigates how foundational models can automate and improve speech emotion recognition by transcribing, annotating, and augmenting datasets, reducing manual effort and enhancing system performance.
Contribution
It introduces a novel approach leveraging foundational models to automate transcription, annotation, and augmentation in speech emotion recognition tasks.
Findings
Foundational models improve transcription quality for SER.
Combining multiple LLM outputs enhances emotion annotation accuracy.
Augmentation of datasets with unlabeled speech is feasible and beneficial.
Abstract
Significant advances are being made in speech emotion recognition (SER) using deep learning models. Nonetheless, training SER systems remains challenging, requiring both time and costly resources. Like many other machine learning tasks, acquiring datasets for SER requires substantial data annotation efforts, including transcription and labeling. These annotation processes present challenges when attempting to scale up conventional SER systems. Recent developments in foundational models have had a tremendous impact, giving rise to applications such as ChatGPT. These models have enhanced human-computer interactions including bringing unique possibilities for streamlining data collection in fields like SER. In this research, we explore the use of foundational models to assist in automating SER from transcription and annotation to augmentation. Our study demonstrates that these models can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Emotion and Mood Recognition · Speech and dialogue systems
