DEBATE: A Dataset for Disentangling Textual Ambiguity in Mandarin Through Speech

Haotian Guo; Jing Han; Yongfeng Tu; Shihao Gao; Shengfan Shen; Wulong Xiang; Weihao Gan; Zixing Zhang

arXiv:2506.07502·cs.CL·June 10, 2025

DEBATE: A Dataset for Disentangling Textual Ambiguity in Mandarin Through Speech

Haotian Guo, Jing Han, Yongfeng Tu, Shihao Gao, Shengfan Shen, Wulong Xiang, Weihao Gan, Zixing Zhang

PDF

Open Access 1 Repo

TL;DR

DEBATE is a pioneering Chinese speech-text dataset designed to explore how speech cues can resolve textual ambiguities, highlighting significant gaps between machine and human understanding of spoken intent.

Contribution

This paper introduces DEBATE, the first dataset pairing ambiguous Chinese utterances with speech cues, enabling research on speech-based disambiguation and speaker intent.

Findings

01

Large performance gaps between models and humans in understanding spoken ambiguity.

02

DEBATE dataset contains 1,001 ambiguous utterances recorded by 10 speakers.

03

Benchmarking shows current models struggle with speech cues for disambiguation.

Abstract

Despite extensive research on textual and visual disambiguation, disambiguation through speech (DTS) remains underexplored. This is largely due to the lack of high-quality datasets that pair spoken sentences with richly ambiguous text. To address this gap, we present DEBATE, a unique public Chinese speech-text dataset designed to study how speech cues and patterns-pronunciation, pause, stress and intonation-can help resolve textual ambiguity and reveal a speaker's true intent. DEBATE contains 1,001 carefully selected ambiguous utterances, each recorded by 10 native speakers, capturing diverse linguistic ambiguities and their disambiguation through speech. We detail the data collection pipeline and provide rigorous quality analysis. Additionally, we benchmark three state-of-the-art large speech and audio-language models, illustrating clear and huge performance gaps between machine and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

smilehnu/debate
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Phonetics and Phonology Research · Multimodal Machine Learning Applications