Unsupervised Disambiguation of Syncretism in Inflected Lexicons
Ryan Cotterell, Christo Kirov, Sabrina J. Mielke, Jason Eisner

TL;DR
This paper introduces an unsupervised neural network approach to disambiguate morphological syncretism in inflected lexicons, enabling probabilistic analysis of ambiguous word forms without context.
Contribution
It presents a novel neural model that disambiguates morphological feature bundles using only unigram counts, without relying on contextual information.
Findings
Effective disambiguation across 5 languages
Model handles rare feature bundles well
Provides a new evaluation framework for lexical ambiguity
Abstract
Lexical ambiguity makes it difficult to compute various useful statistics of a corpus. A given word form might represent any of several morphological feature bundles. One can, however, use unsupervised learning (as in EM) to fit a model that probabilistically disambiguates word forms. We present such an approach, which employs a neural network to smoothly model a prior distribution over feature bundles (even rare ones). Although this basic model does not consider a token's context, that very property allows it to operate on a simple list of unigram type counts, partitioning each count among different analyses of that unigram. We discuss evaluation metrics for this novel task and report results on 5 languages.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Authorship Attribution and Profiling
