Learning Efficient Disambiguation
Khalil Sima'an

TL;DR
This dissertation introduces a domain-specific specialization framework for performance models like Data Oriented Parsing, aiming to improve efficiency and overcome computational limitations by focusing on limited domains and minimizing model entropy.
Contribution
It proposes the Ambiguity-Reduction Specialization (ARS) framework and algorithms for specializing DOP models, enhancing efficiency and integrating specialized models with original ones.
Findings
Specialized DOP models outperform original models in experiments
The algorithms effectively limit hypothesis space to 'safe' models
Specialization reduces model entropy and improves processing speed
Abstract
This dissertation analyses the computational properties of current performance-models of natural language parsing, in particular Data Oriented Parsing (DOP), points out some of their major shortcomings and suggests suitable solutions. It provides proofs that various problems of probabilistic disambiguation are NP-Complete under instances of these performance-models, and it argues that none of these models accounts for attractive efficiency properties of human language processing in limited domains, e.g. that frequent inputs are usually processed faster than infrequent ones. The central hypothesis of this dissertation is that these shortcomings can be eliminated by specializing the performance-models to the limited domains. The dissertation addresses "grammar and model specialization" and presents a new framework, the Ambiguity-Reduction Specialization (ARS) framework, that formulates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · AI-based Problem Solving and Planning · Topic Modeling
