Lexicalized Stochastic Modeling of Constraint-Based Grammars using   Log-Linear Measures and EM Training

Stefan Riezler; Detlef Prescher; Jonas Kuhn; Mark Johnson

arXiv:cs/0008034·cs.CL·May 23, 2007

Lexicalized Stochastic Modeling of Constraint-Based Grammars using Log-Linear Measures and EM Training

Stefan Riezler, Detlef Prescher, Jonas Kuhn, Mark Johnson

PDF

Open Access

TL;DR

This paper introduces a novel stochastic modeling approach for constraint-based grammars using log-linear models and EM training, applied to German LFG grammar, demonstrating significant improvements in precision and lexicalization.

Contribution

It presents a new EM-based estimation method for log-linear models of constraint-based grammars and introduces a class-based lexicalization technique that enhances model performance.

Findings

01

86% precision on exact match task

02

10% gain from EM training over parsebank

03

10% improvement with lexicalization

Abstract

We present a new approach to stochastic modeling of constraint-based grammars that is based on log-linear models and uses EM for estimation from unannotated data. The techniques are applied to an LFG grammar for German. Evaluation on an exact match task yields 86% precision for an ambiguity rate of 5.4, and 90% precision on a subcat frame match for an ambiguity rate of 25. Experimental comparison to training from a parsebank shows a 10% gain from EM training. Also, a new class-based grammar lexicalization is presented, showing a 10% gain over unlexicalized models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech and dialogue systems · Topic Modeling