Character Beyond Speech: Leveraging Role-Playing Evaluation in Audio Large Language Models via Reinforcement Learning
Dongjie Fu, Fangming Feng, Xize Cheng, Linjun Li, Zhou Zhao, and Tao Jin

TL;DR
This paper introduces RoleJudge, an audio-based evaluation framework for assessing character alignment in speech dialogue models, supported by a new dataset and reinforcement learning techniques.
Contribution
It presents RoleJudge, a novel audio evaluation method, and RoleChat, a new dataset with reasoning annotations, advancing character evaluation in speech models.
Findings
RoleJudge outperforms baseline models in accuracy and subjective assessments.
The RoleChat dataset includes diverse authentic and generated speech samples with reasoning annotations.
Reinforcement learning with Standard Alignment improves reward accuracy in character evaluation.
Abstract
The rapid evolution of multimodal large models has revolutionized the simulation of diverse characters in speech dialogue systems, enabling a novel interactive paradigm. Character attributes are manifested not only in textual responses but also through vocal features, as speech conveys rich paralinguistic information that is challenging to quantify. This poses significant difficulties in evaluating the character alignment of role-playing agents. To address these challenges, we present RoleJudge, an evaluation framework that leverages audio large language models to systematically assess the alignment between speech and character across multiple modalities and dimensions. Furthermore, we introduce RoleChat, the first voice role-playing evaluation dataset enriched with chain-of-thought reasoning annotations, comprising a diverse set of authentic and LLM-generated speech samples. Utilizing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
