Prefix Probabilities from Stochastic Tree Adjoining Grammars
Mark-Jan Nederhof (DFKI), Anoop Sarkar (UPenn), Giorgio Satta, (UPadova)

TL;DR
This paper presents an algorithm to compute prefix probabilities from stochastic Tree Adjoining Grammars, enabling their use in language modeling for speech recognition with efficient computation.
Contribution
It introduces a novel O(n^6) algorithm for prefix probability calculation from stochastic TAGs, bridging structural grammar models and probabilistic language modeling.
Findings
Algorithm computes prefix probabilities in O(n^6) time.
Enables stochastic TAGs to be used for language modeling.
Precomputes subderivation probabilities for structural contributions.
Abstract
Language models for speech recognition typically use a probability model of the form Pr(a_n | a_1, a_2, ..., a_{n-1}). Stochastic grammars, on the other hand, are typically used to assign structure to utterances. A language model of the above form is constructed from such grammars by computing the prefix probability Sum_{w in Sigma*} Pr(a_1 ... a_n w), where w represents all possible terminations of the prefix a_1 ... a_n. The main result in this paper is an algorithm to compute such prefix probabilities given a stochastic Tree Adjoining Grammar (TAG). The algorithm achieves the required computation in O(n^6) time. The probability of subderivations that do not derive any words in the prefix, but contribute structurally to its derivation, are precomputed to achieve termination. This algorithm enables existing corpus-based estimation techniques for stochastic TAGs to be used for language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling
