C-NMT: A Collaborative Inference Framework for Neural Machine   Translation

Yukai Chen; Roberta Chiaro; Enrico Macii; Massimo Poncino; Daniele; Jahier Pagliari

arXiv:2204.04043·cs.LG·April 11, 2022

C-NMT: A Collaborative Inference Framework for Neural Machine Translation

Yukai Chen, Roberta Chiaro, Enrico Macii, Massimo Poncino, Daniele, Jahier Pagliari

PDF

Open Access

TL;DR

This paper introduces C-NMT, a collaborative inference framework that reduces latency in neural machine translation by enabling edge-cloud cooperation, adapting existing CI methods to sequence generation tasks.

Contribution

It adapts collaborative inference techniques to neural machine translation, addressing sequence latency estimation and demonstrating significant latency reduction.

Findings

01

Latency reduced by up to 44%

02

Effective edge-cloud collaboration for NMT

03

Adapts CI methods to sequence tasks

Abstract

Collaborative Inference (CI) optimizes the latency and energy consumption of deep learning inference through the inter-operation of edge and cloud devices. Albeit beneficial for other tasks, CI has never been applied to the sequence- to-sequence mapping problem at the heart of Neural Machine Translation (NMT). In this work, we address the specific issues of collaborative NMT, such as estimating the latency required to generate the (unknown) output sequence, and show how existing CI methods can be adapted to these applications. Our experiments show that CI can reduce the latency of NMT by up to 44% compared to a non-collaborative approach.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications