Exploiting Syntactic Structure for Natural Language Modeling
Ciprian Chelba (CLSP, The Johns Hopkins University)

TL;DR
This paper introduces a syntactic structure-based language model that integrates parsing with probabilistic modeling, leading to improved speech recognition accuracy over traditional 3-gram models.
Contribution
It presents an original probabilistic approach combining parsing and language modeling, enhancing speech recognition performance.
Findings
Improved perplexity over 3-gram models
Reduced word error rate in speech recognition
Effective use of syntactic structure in language modeling
Abstract
The thesis presents an attempt at using the syntactic structure in natural language for improved language models for speech recognition. The structured language model merges techniques in automatic parsing and language modeling using an original probabilistic parameterization of a shift-reduce parser. A maximum likelihood reestimation procedure belonging to the class of expectation-maximization algorithms is employed for training the model. Experiments on the Wall Street Journal, Switchboard and Broadcast News corpora show improvement in both perplexity and word error rate - word lattice rescoring - over the standard 3-gram language model. The significance of the thesis lies in presenting an original approach to language modeling that uses the hierarchical - syntactic - structure in natural language to improve on current 3-gram modeling techniques for large vocabulary speech recognition.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Data Quality and Management
