TL;DR
This paper introduces a semi-supervised generative model for morphological inflection that leverages raw token-level data, improving accuracy in low-resource language settings through a structured variational autoencoder and efficient inference.
Contribution
It presents a novel structured variational autoencoder model with a wake-sleep inference algorithm for semi-supervised morphological inflection learning.
Findings
Over 10% accuracy improvement in some languages
Effective use of raw token-level data in low-resource scenarios
Validated on 23 languages from Universal Dependencies
Abstract
Statistical morphological inflectors are typically trained on fully supervised, type-level data. One remaining open research question is the following: How can we effectively exploit raw, token-level data to improve their performance? To this end, we introduce a novel generative latent-variable model for the semi-supervised learning of inflection generation. To enable posterior inference over the latent variables, we derive an efficient variational inference procedure based on the wake-sleep algorithm. We experiment on 23 languages, using the Universal Dependencies corpora in a simulated low-resource setting, and find improvements of over 10% absolute accuracy in some cases.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
