ParaLBench: A Large-Scale Benchmark for Computational Paralinguistics   over Acoustic Foundation Models

Zixing Zhang; Weixiang Xu; Zhongren Dong; Kanglin Wang; Yimeng Wu,; Jing Peng; Runming Wang; Dong-Yan Huang

arXiv:2411.09349·cs.SD·November 15, 2024

ParaLBench: A Large-Scale Benchmark for Computational Paralinguistics over Acoustic Foundation Models

Zixing Zhang, Weixiang Xu, Zhongren Dong, Kanglin Wang, Yimeng Wu,, Jing Peng, Runming Wang, Dong-Yan Huang

PDF

Open Access

TL;DR

This paper introduces ParaLBench, a comprehensive benchmark for evaluating diverse computational paralinguistics tasks across multiple datasets and acoustic foundation models, aiming to standardize performance comparison and advance the field.

Contribution

It provides the first large-scale, unified evaluation framework for diverse paralinguistic tasks using acoustic foundation models, facilitating fair comparison and future research directions.

Findings

01

Standardized evaluation across 10 datasets and 13 tasks

02

Comparison of 14 acoustic foundation models

03

Insights into cross-corpus generalizability

Abstract

Computational paralinguistics (ComParal) aims to develop algorithms and models to automatically detect, analyze, and interpret non-verbal information from speech communication, e. g., emotion, health state, age, and gender. Despite its rapid progress, it heavily depends on sophisticatedly designed models given specific paralinguistic tasks. Thus, the heterogeneity and diversity of ComParal models largely prevent the realistic implementation of ComParal models. Recently, with the advent of acoustic foundation models because of self-supervised learning, developing more generic models that can efficiently perceive a plethora of paralinguistic information has become an active topic in speech processing. However, it lacks a unified evaluation framework for a fair and consistent performance comparison. To bridge this gap, we conduct a large-scale benchmark, namely ParaLBench, which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques