SpeechParaling-Bench: A Comprehensive Benchmark for Paralinguistic-Aware Speech Generation

Ruohan Liu; Shukang Yin; Tao Wang; Dong Zhang; Weiji Zhuang; Shuhuai Ren; Ran He; Caifeng Shan; Chaoyou Fu

arXiv:2604.20842·cs.CL·April 23, 2026

SpeechParaling-Bench: A Comprehensive Benchmark for Paralinguistic-Aware Speech Generation

Ruohan Liu, Shukang Yin, Tao Wang, Dong Zhang, Weiji Zhuang, Shuhuai Ren, Ran He, Caifeng Shan, Chaoyou Fu

PDF

1 Datasets

TL;DR

This paper introduces SpeechParaling-Bench, a detailed benchmark for evaluating paralinguistic-aware speech generation in LALMs, addressing coverage, subjectivity, and evaluation challenges.

Contribution

It expands feature coverage, creates a multi-task benchmark, and develops a pairwise comparison evaluation pipeline to improve assessment reliability.

Findings

01

Current LALMs show significant limitations in controlling paralinguistic features.

02

Leading models struggle with static control and dynamic modulation of paralinguistic cues.

03

Misinterpretation of cues accounts for 43.3% of errors in dialogue scenarios.

Abstract

Paralinguistic cues are essential for natural human-computer interaction, yet their evaluation in Large Audio-Language Models (LALMs) remains limited by coarse feature coverage and the inherent subjectivity of assessment. To address these challenges, we introduce SpeechParaling-Bench, a comprehensive benchmark for paralinguistic-aware speech generation. It expands existing coverage from fewer than 50 to over 100 fine-grained features, supported by more than 1,000 English-Chinese parallel speech queries, and is organized into three progressively challenging tasks: fine-grained control, intra-utterance variation, and context-aware adaptation. To enable reliable evaluation, we further develop a pairwise comparison pipeline, in which candidate responses are evaluated against a fixed baseline by an LALM-based judge. By framing evaluation as relative preference rather than absolute scoring,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Ruohan2/SpeechParaling-Bench
dataset· 2.1k dl
2.1k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.