Can a Transformer Pass the Wug Test? Tuning Copying Bias in Neural   Morphological Inflection Models

Ling Liu; Mans Hulden

arXiv:2104.06483·cs.CL·April 15, 2021·5 cites

Can a Transformer Pass the Wug Test? Tuning Copying Bias in Neural Morphological Inflection Models

Ling Liu, Mans Hulden

PDF

Open Access

TL;DR

This paper investigates how to improve neural morphological inflection models, especially Transformers, in generalizing to unseen words by enhancing copying bias through substring-aware data augmentation techniques.

Contribution

It introduces a substring-based hallucination method that significantly improves the Transformer model's ability to generalize to unseen lemmata in morphological inflection tasks.

Findings

01

Substring-based hallucination outperforms character-level methods.

02

Enhanced copying bias improves generalization to unseen words.

03

Transformers benefit from targeted data augmentation in low-overlap scenarios.

Abstract

Deep learning sequence models have been successfully applied to the task of morphological inflection. The results of the SIGMORPHON shared tasks in the past several years indicate that such models can perform well, but only if the training data cover a good amount of different lemmata, or if the lemmata that are inflected at test time have also been seen in training, as has indeed been largely the case in these tasks. Surprisingly, standard models such as the Transformer almost completely fail at generalizing inflection patterns when asked to inflect previously unseen lemmata -- i.e. under "wug test"-like circumstances. While established data augmentation techniques can be employed to alleviate this shortcoming by introducing a copying bias through hallucinating synthetic new word forms using the alphabet in the language at hand, we show that, to be more effective, the hallucination…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Handwritten Text Recognition Techniques

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Dense Connections · Softmax · Dropout · Adam · Layer Normalization