Analyzing Architectures for Neural Machine Translation Using Low   Computational Resources

Aditya Mandke; Onkar Litake; Dipali Kadam

arXiv:2111.03813·cs.CL·November 30, 2021

Analyzing Architectures for Neural Machine Translation Using Low Computational Resources

Aditya Mandke, Onkar Litake, Dipali Kadam

PDF

TL;DR

This paper compares neural machine translation architectures trained on limited computational resources, finding that while transformers excel in accuracy, LSTMs offer a faster alternative suitable for time-constrained environments.

Contribution

It provides an empirical evaluation of different NMT architectures under low-resource training conditions, highlighting trade-offs between accuracy and training time.

Findings

01

Transformers achieved higher BLEU scores but required more training time.

02

LSTMs trained faster and performed competitively in BLEU scores.

03

More complex transformer architectures with additional encoders and decoders took longer to train with lower BLEU scores.

Abstract

With the recent developments in the field of Natural Language Processing, there has been a rise in the use of different architectures for Neural Machine Translation. Transformer architectures are used to achieve state-of-the-art accuracy, but they are very computationally expensive to train. Everyone cannot have such setups consisting of high-end GPUs and other resources. We train our models on low computational resources and investigate the results. As expected, transformers outperformed other architectures, but there were some surprising results. Transformers consisting of more encoders and decoders took more time to train but had fewer BLEU scores. LSTM performed well in the experiment and took comparatively less time to train than transformers, making it suitable to use in situations having time constraints.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Sigmoid Activation · Dropout · Residual Connection · Dense Connections · Absolute Position Encodings · Byte Pair Encoding · Softmax