Valence Induction with a Head-Lexicalized PCFG
Glenn Carroll, Mats Rooth (IMS, Universit\"at Stuttgart)

TL;DR
This paper introduces a method for learning verb valences from a large corpus using a lexicalized PCFG and a modified EM algorithm, achieving highly accurate subcategorization frames.
Contribution
It presents a novel approach combining lexicalized PCFGs with a modified EM algorithm to induce valence frames from large text data.
Findings
High accuracy in induced frame distributions
Effective comparison with dictionary data
Entropy measures confirm model quality
Abstract
This paper presents an experiment in learning valences (subcategorization frames) from a 50 million word text corpus, based on a lexicalized probabilistic context free grammar. Distributions are estimated using a modified EM algorithm. We evaluate the acquired lexicon both by comparison with a dictionary and by entropy measures. Results show that our model produces highly accurate frame distributions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Authorship Attribution and Profiling
