Effect of Attention and Self-Supervised Speech Embeddings on   Non-Semantic Speech Tasks

Payal Mohapatra; Akash Pandey; Yueyuan Sui; Qi Zhu

arXiv:2308.14359·cs.AI·September 29, 2023

Effect of Attention and Self-Supervised Speech Embeddings on Non-Semantic Speech Tasks

Payal Mohapatra, Akash Pandey, Yueyuan Sui, Qi Zhu

PDF

1 Repo

TL;DR

This paper investigates how attention mechanisms and self-supervised speech embeddings influence the performance of non-semantic speech tasks, specifically emotion perception, demonstrating improved results with HuBERT-Large and a lightweight sequence model.

Contribution

It highlights the impact of training schemes of foundation models on non-semantic speech tasks and introduces an effective approach using HuBERT-Large with a self-attention model.

Findings

01

HuBERT-Large with self-attention improves emotion perception accuracy by 4.6%.

02

Training schemes significantly affect foundation models' effectiveness for non-semantic tasks.

03

Multilingual and imbalanced datasets pose challenges but can be mitigated with appropriate models.

Abstract

Human emotion understanding is pivotal in making conversational technology mainstream. We view speech emotion understanding as a perception task which is a more realistic setting. With varying contexts (languages, demographics, etc.) different share of people perceive the same speech segment as a non-unanimous emotion. As part of the ACM Multimedia 2023 Computational Paralinguistics ChallengE (ComParE) in the EMotion Share track, we leverage their rich dataset of multilingual speakers and multi-label regression target of 'emotion share' or perception of that emotion. We demonstrate that the training scheme of different foundation models dictates their effectiveness for tasks beyond speech recognition, especially for non-semantic speech tasks like emotion understanding. This is a very complex task due to multilingual speakers, variability in the target labels, and inherent imbalance in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

payalmohapatra/emotionshare_acmmm23
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.