Pragmatic Neural Language Modelling in Machine Translation
Paul Baltescu, Phil Blunsom

TL;DR
This paper investigates how to effectively scale neural language models for machine translation, analyzing their impact on translation quality, optimization techniques, and trade-offs with traditional models, aiming to guide scalable model development.
Contribution
It provides a comprehensive evaluation of scaling techniques, normalization, optimization tricks, and trade-offs for neural language models in machine translation.
Findings
Neural models are promising for memory-constrained environments.
Explicit normalization is sometimes necessary for neural models.
Neural models still lag behind traditional models in raw translation quality.
Abstract
This paper presents an in-depth investigation on integrating neural language models in translation systems. Scaling neural language models is a difficult task, but crucial for real-world applications. This paper evaluates the impact on end-to-end MT quality of both new and existing scaling techniques. We show when explicitly normalising neural models is necessary and what optimisation tricks one should use in such scenarios. We also focus on scalable training algorithms and investigate noise contrastive estimation and diagonal contexts as sources for further speed improvements. We explore the trade-offs between neural models and back-off n-gram models and find that neural models make strong candidates for natural language applications in memory constrained environments, yet still lag behind traditional models in raw translation quality. We conclude with a set of recommendations one…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
