Combining semantic and syntactic structure for language modeling
Rens Bod

TL;DR
This paper demonstrates that incorporating non-headword dependencies via a data-oriented parsing model trained on semantic and syntactic data significantly improves speech recognition accuracy, addressing limitations of previous structured language models.
Contribution
It introduces a novel DOP model trained with maximum likelihood reestimation, effectively capturing non-headword dependencies for better language modeling.
Findings
Non-headword dependencies improve word error rate
A new DOP model trained with maximum likelihood
Structured models benefit from semantic and syntactic info
Abstract
Structured language models for speech recognition have been shown to remedy the weaknesses of n-gram models. All current structured language models are, however, limited in that they do not take into account dependencies between non-headwords. We show that non-headword dependencies contribute to significantly improved word error rate, and that a data-oriented parsing model trained on semantically and syntactically annotated data can exploit these dependencies. This paper also contains the first DOP model trained by means of a maximum likelihood reestimation procedure, which solves some of the theoretical shortcomings of previous DOP models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
