Dynamic Transformer for Efficient Machine Translation on Embedded   Devices

Hishan Parry; Lei Xun; Amin Sabet; Jia Bi; Jonathon Hare; Geoff V.; Merrett

arXiv:2107.08199·cs.CL·August 3, 2021

Dynamic Transformer for Efficient Machine Translation on Embedded Devices

Hishan Parry, Lei Xun, Amin Sabet, Jia Bi, Jonathon Hare, Geoff V., Merrett

PDF

TL;DR

This paper introduces Dynamic-HAT, a resource-aware Transformer model for machine translation on embedded devices, enabling real-time adaptation to hardware constraints with minimal performance loss.

Contribution

It presents a novel dynamic Transformer approach that efficiently switches between sub-models at runtime based on resource availability, optimizing translation quality and latency.

Findings

01

Switching time between sub-transformers is less than 1 second.

02

BLEU score loss is under 1.5% when switching without retraining.

03

Performance can be scaled across different hardware with a 1% BLEU score improvement.

Abstract

The Transformer architecture is widely used for machine translation tasks. However, its resource-intensive nature makes it challenging to implement on constrained embedded devices, particularly where available hardware resources can vary at run-time. We propose a dynamic machine translation model that scales the Transformer architecture based on the available resources at any particular time. The proposed approach, 'Dynamic-HAT', uses a HAT SuperTransformer as the backbone to search for SubTransformers with different accuracy-latency trade-offs at design time. The optimal SubTransformers are sampled from the SuperTransformer at run-time, depending on latency constraints. The Dynamic-HAT is tested on the Jetson Nano and the approach uses inherited SubTransformers sampled directly from the SuperTransformer with a switching time of <1s. Using inherited SubTransformers results in a BLEU…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Label Smoothing · Residual Connection · Dense Connections · Softmax · Layer Normalization