Lexicalized Stochastic Modeling of Constraint-Based Grammars using Log-Linear Measures and EM Training
Stefan Riezler, Detlef Prescher, Jonas Kuhn, Mark Johnson

TL;DR
This paper introduces a novel stochastic modeling approach for constraint-based grammars using log-linear models and EM training, applied to German LFG grammar, demonstrating significant improvements in precision and lexicalization.
Contribution
It presents a new EM-based estimation method for log-linear models of constraint-based grammars and introduces a class-based lexicalization technique that enhances model performance.
Findings
86% precision on exact match task
10% gain from EM training over parsebank
10% improvement with lexicalization
Abstract
We present a new approach to stochastic modeling of constraint-based grammars that is based on log-linear models and uses EM for estimation from unannotated data. The techniques are applied to an LFG grammar for German. Evaluation on an exact match task yields 86% precision for an ambiguity rate of 5.4, and 90% precision on a subcat frame match for an ambiguity rate of 25. Experimental comparison to training from a parsebank shows a 10% gain from EM training. Also, a new class-based grammar lexicalization is presented, showing a 10% gain over unlexicalized models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech and dialogue systems · Topic Modeling
