SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological   Inflection

Ekaterina Vylomova; Jennifer White; Elizabeth Salesky; Sabrina J.; Mielke; Shijie Wu; Edoardo Ponti; Rowan Hall Maudslay; Ran Zmigrod; Josef; Valvoda; Svetlana Toldova; Francis Tyers; Elena Klyachko; Ilya Yegorov,; Natalia Krizhanovsky; Paula Czarnowska; Irene Nikkarinen; Andrew; Krizhanovsky; Tiago Pimentel; Lucas Torroba Hennigen; Christo Kirov; Garrett; Nicolai; Adina Williams; Antonios Anastasopoulos; Hilaria Cruz; Eleanor; Chodroff; Ryan Cotterell; Miikka Silfverberg; Mans Hulden

arXiv:2006.11572·cs.CL·July 15, 2020

SIGMORPHON 2020 Shared Task 0: Typologically Diverse Morphological Inflection

Ekaterina Vylomova, Jennifer White, Elizabeth Salesky, Sabrina J., Mielke, Shijie Wu, Edoardo Ponti, Rowan Hall Maudslay, Ran Zmigrod, Josef, Valvoda, Svetlana Toldova, Francis Tyers, Elena Klyachko, Ilya Yegorov,, Natalia Krizhanovsky, Paula Czarnowska, Irene Nikkarinen, Andrew

PDF

1 Repo

TL;DR

The SIGMORPHON 2020 shared task evaluated the ability of systems to perform morphological inflection across 90 typologically diverse languages, highlighting neural models' effectiveness and the benefits of data augmentation, with some non-neural methods excelling in low-resource scenarios.

Contribution

This study introduces a large-scale multilingual morphological inflection challenge, comparing neural and non-neural approaches across many languages and resource levels, revealing insights into model generalization and data strategies.

Findings

01

Neural models dominated top performance with transformers and multilingual RNNs.

02

Data augmentation and ensemble methods improved low-resource language results.

03

Some non-neural methods outperformed neural models on certain languages with limited data.

Abstract

A broad goal in natural language processing (NLP) is to develop a system that has the capacity to process any natural language. Most systems, however, are developed using data from just one language such as English. The SIGMORPHON 2020 shared task on morphological reinflection aims to investigate systems' ability to generalize across typologically distinct languages, many of which are low resource. Systems were developed using data from 45 languages and just 5 language families, fine-tuned with data from an additional 45 languages and 10 language families (13 in total), and evaluated on all 90 languages. A total of 22 systems (19 neural) from 10 teams were submitted to the task. All four winning systems were neural (two monolingual transformers and two massively multilingual RNN-based models with gated attention). Most teams demonstrate utility of data hallucination and augmentation,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sigmorphon2020/task0-data
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.