Can a Transformer Pass the Wug Test? Tuning Copying Bias in Neural Morphological Inflection Models
Ling Liu, Mans Hulden

TL;DR
This paper investigates how to improve neural morphological inflection models, especially Transformers, in generalizing to unseen words by enhancing copying bias through substring-aware data augmentation techniques.
Contribution
It introduces a substring-based hallucination method that significantly improves the Transformer model's ability to generalize to unseen lemmata in morphological inflection tasks.
Findings
Substring-based hallucination outperforms character-level methods.
Enhanced copying bias improves generalization to unseen words.
Transformers benefit from targeted data augmentation in low-overlap scenarios.
Abstract
Deep learning sequence models have been successfully applied to the task of morphological inflection. The results of the SIGMORPHON shared tasks in the past several years indicate that such models can perform well, but only if the training data cover a good amount of different lemmata, or if the lemmata that are inflected at test time have also been seen in training, as has indeed been largely the case in these tasks. Surprisingly, standard models such as the Transformer almost completely fail at generalizing inflection patterns when asked to inflect previously unseen lemmata -- i.e. under "wug test"-like circumstances. While established data augmentation techniques can be employed to alleviate this shortcoming by introducing a copying bias through hallucinating synthetic new word forms using the alphabet in the language at hand, we show that, to be more effective, the hallucination…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Handwritten Text Recognition Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Dense Connections · Softmax · Dropout · Adam · Layer Normalization
