An Efficient Multitask Learning Architecture for Affective Vocal Burst Analysis
Tobias Hallmen, Silvan Mertes, Dominik Schiller, Elisabeth Andr\'e

TL;DR
This paper introduces an efficient multitask learning architecture using data2vec for affective vocal burst analysis, outperforming baselines in multiple subtasks of a recent challenge.
Contribution
It presents a novel multitask learning approach with data2vec features for vocal burst analysis, demonstrating superior performance over existing methods.
Findings
Outperforms baseline in three subtasks
Effective use of data2vec with multitask learning
Achieved state-of-the-art results in vocal burst analysis
Abstract
Affective speech analysis is an ongoing topic of research. A relatively new problem in this field is the analysis of vocal bursts, which are nonverbal vocalisations such as laughs or sighs. Current state-of-the-art approaches to address affective vocal burst analysis are mostly based on wav2vec2 or HuBERT features. In this paper, we investigate the use of the wav2vec successor data2vec in combination with a multitask learning pipeline to tackle different analysis problems at once. To assess the performance of our efficient multitask learning architecture, we participate in the 2022 ACII Affective Vocal Burst Challenge, showing that our approach substantially outperforms the baseline established there in three different subtasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Speech and Audio Processing
