HAT: Hardware-Aware Transformers for Efficient Natural Language Processing
Hanrui Wang, Zhanghao Wu, Zhijian Liu, Han Cai, Ligeng Zhu, Chuang, Gan, Song Han

TL;DR
This paper introduces Hardware-Aware Transformers (HAT), a neural architecture search method that designs efficient, hardware-specific transformer models for NLP tasks, significantly improving speed and size on resource-constrained devices.
Contribution
HAT employs neural architecture search with a large design space and evolutionary algorithms to create hardware-optimized transformer models for diverse hardware platforms.
Findings
3x speedup on Raspberry Pi-4
3.7x smaller model size compared to baseline
12,041x less search cost with no performance loss
Abstract
Transformers are ubiquitous in Natural Language Processing (NLP) tasks, but they are difficult to be deployed on hardware due to the intensive computation. To enable low-latency inference on resource-constrained hardware platforms, we propose to design Hardware-Aware Transformers (HAT) with neural architecture search. We first construct a large design space with and . Then we train a that covers all candidates in the design space, and efficiently produces many with weight sharing. Finally, we perform an evolutionary search with a hardware latency constraint to find a specialized dedicated to run fast on the target hardware. Extensive experiments on four machine translation tasks demonstrate that HAT can discover efficient models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
[ACL 2020] HAT: Hardware-Aware Transformers for Efficient Natural Language Processing· youtube
Taxonomy
TopicsAdvanced Neural Network Applications · Topic Modeling · Adversarial Robustness in Machine Learning
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Label Smoothing · Multi-Head Attention · Adam · *Communicated@Fast*How Do I Communicate to Expedia? · Dropout · Byte Pair Encoding
