Exploring Hyper-Parameter Optimization for Neural Machine Translation on GPU Architectures
Robert Lim, Kenneth Heafield, Hieu Hoang, Mark Briers, Allen Malony

TL;DR
This paper investigates hyper-parameter optimization for neural machine translation on GPU architectures, analyzing how different settings impact performance, convergence, and translation accuracy to guide high-performing NMT system development.
Contribution
It provides a comprehensive analysis of hyper-parameter effects on NMT training across various GPU architectures, highlighting key parameters for optimal performance.
Findings
Certain hyper-parameters significantly influence translation speed and accuracy.
Multi-node GPU setups improve training efficiency and model convergence.
Insights on hyper-parameter prioritization for high-performance NMT systems.
Abstract
Neural machine translation (NMT) has been accelerated by deep learning neural networks over statistical-based approaches, due to the plethora and programmability of commodity heterogeneous computing architectures such as FPGAs and GPUs and the massive amount of training corpuses generated from news outlets, government agencies and social media. Training a learning classifier for neural networks entails tuning hyper-parameters that would yield the best performance. Unfortunately, the number of parameters for machine translation include discrete categories as well as continuous options, which makes for a combinatorial explosive problem. This research explores optimizing hyper-parameters when training deep learning neural networks for machine translation. Specifically, our work investigates training a language model with Marian NMT. Results compare NMT under various hyper-parameter…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
