Effective General-Domain Data Inclusion for the Machine Translation Task by Vanilla Transformers
Hassan Soliman

TL;DR
This paper explores how incorporating additional general-domain data into Transformer-based machine translation models enhances translation quality, demonstrated by a 2 BLEU point improvement on German-English translation tasks.
Contribution
It introduces a method for integrating general-domain data into Transformer training and analyzes its impact on translation performance.
Findings
Including IWSLT'16 data improves BLEU score by 2 points.
Qualitative analysis reveals better translation quality with more diverse data.
General-domain data inclusion benefits Transformer-based translation systems.
Abstract
One of the vital breakthroughs in the history of machine translation is the development of the Transformer model. Not only it is revolutionary for various translation tasks, but also for a majority of other NLP tasks. In this paper, we aim at a Transformer-based system that is able to translate a source sentence in German to its counterpart target sentence in English. We perform the experiments on the news commentary German-English parallel sentences from the WMT'13 dataset. In addition, we investigate the effect of the inclusion of additional general-domain data in training from the IWSLT'16 dataset to improve the Transformer model performance. We find that including the IWSLT'16 dataset in training helps achieve a gain of 2 BLEU score points on the test set of the WMT'13 dataset. Qualitative analysis is introduced to analyze how the usage of general-domain data helps improve the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
MethodsAttention Is All You Need · Test · Linear Layer · Byte Pair Encoding · Softmax · Dropout · Dense Connections · Residual Connection · Multi-Head Attention · Absolute Position Encodings
