Chunk-based Nearest Neighbor Machine Translation
Pedro Henrique Martins, Zita Marinho, Andr\'e F. T. Martins

TL;DR
This paper introduces a chunk-based $k$NN-MT model for machine translation that retrieves token chunks instead of individual tokens, significantly improving decoding speed with minimal quality loss.
Contribution
It proposes a novel chunk-based retrieval approach for $k$NN-MT, enhancing translation speed while maintaining translation quality in domain adaptation scenarios.
Findings
Achieves up to 4x speed-up in decoding
Maintains translation quality with small drops
Effective in static and on-the-fly domain adaptation
Abstract
Semi-parametric models, which augment generation with retrieval, have led to impressive results in language modeling and machine translation, due to their ability to retrieve fine-grained information from a datastore of examples. One of the most prominent approaches, NN-MT, exhibits strong domain adaptation capabilities by retrieving tokens from domain-specific datastores \citep{khandelwal2020nearest}. However, NN-MT requires an expensive retrieval operation for every single generated token, leading to a very low decoding speed (around 8 times slower than a parametric model). In this paper, we introduce a \textit{chunk-based} NN-MT model which retrieves chunks of tokens from the datastore, instead of a single token. We propose several strategies for incorporating the retrieved chunks into the generation process, and for selecting the steps at which the model needs to search for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
