Unsupervised Disambiguation of Syncretism in Inflected Lexicons

Ryan Cotterell; Christo Kirov; Sabrina J. Mielke; Jason Eisner

arXiv:1806.03740·cs.CL·February 26, 2020

Unsupervised Disambiguation of Syncretism in Inflected Lexicons

Ryan Cotterell, Christo Kirov, Sabrina J. Mielke, Jason Eisner

PDF

Open Access

TL;DR

This paper introduces an unsupervised neural network approach to disambiguate morphological syncretism in inflected lexicons, enabling probabilistic analysis of ambiguous word forms without context.

Contribution

It presents a novel neural model that disambiguates morphological feature bundles using only unigram counts, without relying on contextual information.

Findings

01

Effective disambiguation across 5 languages

02

Model handles rare feature bundles well

03

Provides a new evaluation framework for lexical ambiguity

Abstract

Lexical ambiguity makes it difficult to compute various useful statistics of a corpus. A given word form might represent any of several morphological feature bundles. One can, however, use unsupervised learning (as in EM) to fit a model that probabilistically disambiguates word forms. We present such an approach, which employs a neural network to smoothly model a prior distribution over feature bundles (even rare ones). Although this basic model does not consider a token's context, that very property allows it to operate on a simple list of unigram type counts, partitioning each count among different analyses of that unigram. We discuss evaluation metrics for this novel task and report results on 5 languages.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Authorship Attribution and Profiling