On the Role of Style in Parsing Speech with Neural Models
Trang Tran, Jiahong Yuan, Yang Liu, Mari Ostendorf

TL;DR
This paper explores how neural models can leverage written text and prosody to improve parsing of spontaneous speech, addressing style mismatches between written and spoken language.
Contribution
It demonstrates that neural approaches enable effective use of written text and prosody for better speech parsing, highlighting the importance of spontaneous speech data for training.
Findings
Neural models improve speech parsing using written text.
Prosody further enhances parsing accuracy.
Spontaneous speech data is more beneficial than read speech for training.
Abstract
The differences in written text and conversational speech are substantial; previous parsers trained on treebanked text have given very poor results on spontaneous speech. For spoken language, the mismatch in style also extends to prosodic cues, though it is less well understood. This paper re-examines the use of written text in parsing speech in the context of recent advances in neural language processing. We show that neural approaches facilitate using written text to improve parsing of spontaneous speech, and that prosody further improves over this state-of-the-art result. Further, we find an asymmetric degradation from read vs. spontaneous mismatch, with spontaneous speech more generally useful for training parsers.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
