A Three Step Training Approach with Data Augmentation for Morphological Inflection
Gabor Szolnok, Botond Barta, Dorina Lakatos, Judit Acs

TL;DR
This paper introduces a three-step training method with data augmentation for morphological inflection across diverse languages, improving simplicity and applicability over existing models, though not surpassing Transformer baselines.
Contribution
It proposes a novel three-step training approach combined with data augmentation techniques tailored for morphological inflection in multiple languages.
Findings
Outperformed other submissions in the shared task
Data augmentation and training steps generally improve performance
Model remains simpler and more adaptable than Transformer baselines
Abstract
We present the BME submission for the SIGMORPHON 2021 Task 0 Part 1, Generalization Across Typologically Diverse Languages shared task. We use an LSTM encoder-decoder model with three step training that is first trained on all languages, then fine-tuned on each language families and finally finetuned on individual languages. We use a different type of data augmentation technique in the first two steps. Our system outperformed the only other submission. Although it remains worse than the Transformer baseline released by the organizers, our model is simpler and our data augmentation techniques are easily applicable to new languages. We perform ablation studies and show that the augmentation techniques and the three training steps often help but sometimes have a negative effect.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Handwritten Text Recognition Techniques
MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Layer Normalization · Label Smoothing · Adam · Residual Connection · Multi-Head Attention · Softmax
