An Efficient Multitask Learning Architecture for Affective Vocal Burst   Analysis

Tobias Hallmen; Silvan Mertes; Dominik Schiller; Elisabeth Andr\'e

arXiv:2209.13914·cs.SD·September 29, 2022·1 cites

An Efficient Multitask Learning Architecture for Affective Vocal Burst Analysis

Tobias Hallmen, Silvan Mertes, Dominik Schiller, Elisabeth Andr\'e

PDF

Open Access 1 Repo

TL;DR

This paper introduces an efficient multitask learning architecture using data2vec for affective vocal burst analysis, outperforming baselines in multiple subtasks of a recent challenge.

Contribution

It presents a novel multitask learning approach with data2vec features for vocal burst analysis, demonstrating superior performance over existing methods.

Findings

01

Outperforms baseline in three subtasks

02

Effective use of data2vec with multitask learning

03

Achieved state-of-the-art results in vocal burst analysis

Abstract

Affective speech analysis is an ongoing topic of research. A relatively new problem in this field is the analysis of vocal bursts, which are nonverbal vocalisations such as laughs or sighs. Current state-of-the-art approaches to address affective vocal burst analysis are mostly based on wav2vec2 or HuBERT features. In this paper, we investigate the use of the wav2vec successor data2vec in combination with a multitask learning pipeline to tackle different analysis problems at once. To assess the performance of our efficient multitask learning architecture, we participate in the 2022 ACII Affective Vocal Burst Challenge, showing that our approach substantially outperforms the baseline established there in three different subtasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hcmlab/acii-2022-vb-challenge
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Speech and Audio Processing