The Role of Prosody in Spoken Question Answering

Jie Chi; Maureen de Seyssel; Natalie Schluter

arXiv:2502.05389·cs.CL·February 11, 2025

The Role of Prosody in Spoken Question Answering

Jie Chi, Maureen de Seyssel, Natalie Schluter

PDF

Open Access

TL;DR

This paper explores how prosody, the intonation and rhythm in speech, can enhance spoken question answering systems, revealing that prosody alone can be informative but is often overshadowed by lexical cues.

Contribution

It demonstrates the importance of prosody in spoken QA and highlights the need for better integration methods to leverage prosodic information effectively.

Findings

01

Models trained on prosody alone perform reasonably well.

02

Lexical information dominates when available.

03

Prosody provides valuable supplementary cues.

Abstract

Spoken language understanding research to date has generally carried a heavy text perspective. Most datasets are derived from text, which is then subsequently synthesized into speech, and most models typically rely on automatic transcriptions of speech. This is to the detriment of prosody--additional information carried by the speech signal beyond the phonetics of the words themselves and difficult to recover from text alone. In this work, we investigate the role of prosody in Spoken Question Answering. By isolating prosodic and lexical information on the SLUE-SQA-5 dataset, which consists of natural speech, we demonstrate that models trained on prosodic information alone can perform reasonably well by utilizing prosodic cues. However, we find that when lexical information is available, models tend to predominantly rely on it. Our findings suggest that while prosodic cues provide…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Neurobiology of Language and Bilingualism