Dynamic Transformer for Efficient Machine Translation on Embedded Devices
Hishan Parry, Lei Xun, Amin Sabet, Jia Bi, Jonathon Hare, Geoff V., Merrett

TL;DR
This paper introduces Dynamic-HAT, a resource-aware Transformer model for machine translation on embedded devices, enabling real-time adaptation to hardware constraints with minimal performance loss.
Contribution
It presents a novel dynamic Transformer approach that efficiently switches between sub-models at runtime based on resource availability, optimizing translation quality and latency.
Findings
Switching time between sub-transformers is less than 1 second.
BLEU score loss is under 1.5% when switching without retraining.
Performance can be scaled across different hardware with a 1% BLEU score improvement.
Abstract
The Transformer architecture is widely used for machine translation tasks. However, its resource-intensive nature makes it challenging to implement on constrained embedded devices, particularly where available hardware resources can vary at run-time. We propose a dynamic machine translation model that scales the Transformer architecture based on the available resources at any particular time. The proposed approach, 'Dynamic-HAT', uses a HAT SuperTransformer as the backbone to search for SubTransformers with different accuracy-latency trade-offs at design time. The optimal SubTransformers are sampled from the SuperTransformer at run-time, depending on latency constraints. The Dynamic-HAT is tested on the Jetson Nano and the approach uses inherited SubTransformers sampled directly from the SuperTransformer with a switching time of <1s. Using inherited SubTransformers results in a BLEU…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Label Smoothing · Residual Connection · Dense Connections · Softmax · Layer Normalization
