StyleBench: Evaluating Speech Language Models on Conversational Speaking Style Control

Haishu Zhao; Aokai Hao; Yuan Ge; Zhenqiang Hong; Tong Xiao; Jingbo Zhu

arXiv:2603.07599·cs.CL·March 10, 2026

StyleBench: Evaluating Speech Language Models on Conversational Speaking Style Control

Haishu Zhao, Aokai Hao, Yuan Ge, Zhenqiang Hong, Tong Xiao, Jingbo Zhu

PDF

Open Access

TL;DR

StyleBench is a comprehensive benchmark designed to evaluate speech language models' ability to control speaking style intensity across multiple dimensions in conversational settings, highlighting current performance gaps.

Contribution

This paper introduces StyleBench, the first systematic benchmark for assessing style intensity control in speech language models during multi-turn dialogues.

Findings

01

Leading SLMs show significant performance gaps in style control.

02

Performance varies across emotion, speed, volume, and pitch dimensions.

03

Analysis suggests potential directions for improving style control in future models.

Abstract

Speech language models (SLMs) have significantly extended the interactive capability of text-based Large Language Models (LLMs) by incorporating paralinguistic information. For more realistic interactive experience with customized styles, current SLMs have managed to interpret and control speaking style intensity from user prompts during the dialogue process. However, there remains a lack of systematic benchmarks that quantifies and evaluates the style intensity control ability in conversations. In this paper, we propose StyleBench, a multi-turn dialogue benchmark for comprehensively evaluating the style intensity control ability across four dimensions: emotion, speed, volume, and pitch. Our results reveal the performance gaps between leading SLMs and omni language models (OLMs), suggesting the underlying reasons and promising approaches for future exploration.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Mental Health via Writing · Speech and dialogue systems