A Structured Variational Autoencoder for Contextual Morphological   Inflection

Lawrence Wolf-Sonkin; Jason Naradowsky; Sabrina J. Mielke; Ryan; Cotterell

arXiv:1806.03746·cs.CL·February 26, 2020

A Structured Variational Autoencoder for Contextual Morphological Inflection

Lawrence Wolf-Sonkin, Jason Naradowsky, Sabrina J. Mielke, Ryan, Cotterell

PDF

2 Repos

TL;DR

This paper introduces a semi-supervised generative model for morphological inflection that leverages raw token-level data, improving accuracy in low-resource language settings through a structured variational autoencoder and efficient inference.

Contribution

It presents a novel structured variational autoencoder model with a wake-sleep inference algorithm for semi-supervised morphological inflection learning.

Findings

01

Over 10% accuracy improvement in some languages

02

Effective use of raw token-level data in low-resource scenarios

03

Validated on 23 languages from Universal Dependencies

Abstract

Statistical morphological inflectors are typically trained on fully supervised, type-level data. One remaining open research question is the following: How can we effectively exploit raw, token-level data to improve their performance? To this end, we introduce a novel generative latent-variable model for the semi-supervised learning of inflection generation. To enable posterior inference over the latent variables, we derive an efficient variational inference procedure based on the wake-sleep algorithm. We experiment on 23 languages, using the Universal Dependencies corpora in a simulated low-resource setting, and find improvements of over 10% absolute accuracy in some cases.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.