Attention Link: An Efficient Attention-Based Low Resource Machine Translation Architecture
Zeping Min

TL;DR
This paper introduces attention link, a novel architecture that enhances transformer-based neural machine translation models, especially under low-resource training conditions, leading to significant performance improvements and a new state-of-the-art BLEU score.
Contribution
The paper proposes the attention link architecture, demonstrating its theoretical and empirical advantages for low-resource machine translation tasks.
Findings
Attention link improves translation quality in low-resource scenarios.
Achieved a new state-of-the-art BLEU score of 37.9 on IWSLT14 de-en.
Significant performance gains across multiple language pairs.
Abstract
Transformers have achieved great success in machine translation, but transformer-based NMT models often require millions of bilingual parallel corpus for training. In this paper, we propose a novel architecture named as attention link (AL) to help improve transformer models' performance, especially in low training resources. We theoretically demonstrate the superiority of our attention link architecture in low training resources. Besides, we have done a large number of experiments, including en-de, de-en, en-fr, en-it, it-en, en-ro translation tasks on the IWSLT14 dataset as well as real low resources scene on bn-gu and gu-ta translation tasks on the CVIT PIB dataset. All the experiment results show our attention link is powerful and can lead to a significant improvement. In addition, we achieve a 37.9 BLEU score, a new sota, on the IWSLT14 de-en task by combining our attention link and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
