Prosodic segmentation for parsing spoken dialogue

Elizabeth Nielsen; Mark Steedman; Sharon Goldwater

arXiv:2105.12667·cs.CL·October 13, 2021

Prosodic segmentation for parsing spoken dialogue

Elizabeth Nielsen, Mark Steedman, Sharon Goldwater

PDF

TL;DR

This paper explores how prosody can improve parsing of spoken dialogue by enabling turn-based models to effectively segment sentence-like units without pre-segmented input, matching the performance of models with gold-standard segmentation.

Contribution

It demonstrates that prosodic features can replace gold-standard segmentation in turn-based dialogue parsing, enabling more realistic speech processing applications.

Findings

01

Prosody enables turn-based models to match SU-based model performance.

02

Pitch and intensity are key features for boundary detection.

03

Prosody helps distinguish between SU boundaries and disfluencies.

Abstract

Parsing spoken dialogue poses unique difficulties, including disfluencies and unmarked boundaries between sentence-like units. Previous work has shown that prosody can help with parsing disfluent speech (Tran et al. 2018), but has assumed that the input to the parser is already segmented into sentence-like units (SUs), which isn't true in existing speech applications. We investigate how prosody affects a parser that receives an entire dialogue turn as input (a turn-based model), instead of gold standard pre-segmented SUs (an SU-based model). In experiments on the English Switchboard corpus, we find that when using transcripts alone, the turn-based model has trouble segmenting SUs, leading to worse parse performance than the SU-based model. However, prosody can effectively replace gold standard SU boundaries: with prosody, the turn-based model performs as well as the SU-based model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.