Improving coreference resolution with automatically predicted prosodic information
Ina R\"osiger, Sabrina Stehwien, Arndt Riester, Ngoc Thang Vu

TL;DR
This paper demonstrates that automatically predicted prosodic features like pitch accents and phrasing, extracted via CNN from speech signals, can significantly enhance coreference resolution in spoken language processing.
Contribution
It introduces a CNN-based method for automatic prosodic annotation that improves coreference resolution performance on spoken language data.
Findings
Automatic prosodic features improve coreference resolution accuracy.
CNN effectively predicts pitch accents and phrase boundaries from speech.
Prosodic information enhances text-based coreference models for spoken language.
Abstract
Adding manually annotated prosodic information, specifically pitch accents and phrasing, to the typical text-based feature set for coreference resolution has previously been shown to have a positive effect on German data. Practical applications on spoken language, however, would rely on automatically predicted prosodic information. In this paper we predict pitch accents (and phrase boundaries) using a convolutional neural network (CNN) model from acoustic features extracted from the speech signal. After an assessment of the quality of these automatic prosodic annotations, we show that they also significantly improve coreference resolution.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPhonetics and Phonology Research · Speech Recognition and Synthesis · Speech and dialogue systems
