Assessing the Use of Prosody in Constituency Parsing of Imperfect   Transcripts

Trang Tran; Mari Ostendorf

arXiv:2106.07794·cs.CL·June 16, 2021

Assessing the Use of Prosody in Constituency Parsing of Imperfect Transcripts

Trang Tran, Mari Ostendorf

PDF

1 Repo

TL;DR

This paper investigates how prosodic features can improve constituency parsing of imperfect transcripts from conversational speech, especially in the presence of ASR errors, by jointly learning prosody and parsing.

Contribution

It introduces a neural parser that incorporates prosodic features into sentence encoding and demonstrates its effectiveness in reranking ASR outputs on conversational speech.

Findings

01

Prosody significantly improves parsing accuracy in ASR error scenarios.

02

The parser achieves 13-15% oracle N-best gain over 1-best ASR output.

03

Prosody helps recover function words, leading to more grammatical parses.

Abstract

This work explores constituency parsing on automatically recognized transcripts of conversational speech. The neural parser is based on a sentence encoder that leverages word vectors contextualized with prosodic features, jointly learning prosodic feature extraction with parsing. We assess the utility of the prosody in parsing on imperfect transcripts, i.e. transcripts with automatic speech recognition (ASR) errors, by applying the parser in an N-best reranking framework. In experiments on Switchboard, we obtain 13-15% of the oracle N-best gain relative to parsing the 1-best ASR output, with insignificant impact on word recognition error rate. Prosody provides a significant part of the gain, and analyses suggest that it leads to more grammatical utterances via recovering function words.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

trangham283/asr_preps
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.