HEAT: Hardware-Efficient Automatic Tensor Decomposition for Transformer Compression
Jiaqi Gu, Ben Keller, Jean Kossaifi, Anima Anandkumar, Brucek, Khailany, David Z. Pan

TL;DR
This paper introduces HEAT, a hardware-aware tensor decomposition framework that automates and optimizes transformer compression, significantly improving energy efficiency with minimal accuracy loss.
Contribution
HEAT provides an automated, hardware-aware tensor decomposition method for transformer compression, optimizing tensorization and decomposition ranks for better hardware efficiency.
Findings
Reduces energy-delay product by 5.7x
Less than 1.1% accuracy loss
Outperforms hand-tuned baselines
Abstract
Transformers have attained superior performance in natural language processing and computer vision. Their self-attention and feedforward layers are overparameterized, limiting inference speed and energy efficiency. Tensor decomposition is a promising technique to reduce parameter redundancy by leveraging tensor algebraic properties to express the parameters in a factorized form. Prior efforts used manual or heuristic factorization settings without hardware-aware customization, resulting in poor hardware efficiencies and large performance degradation. In this work, we propose a hardware-aware tensor decomposition framework, dubbed HEAT, that enables efficient exploration of the exponential space of possible decompositions and automates the choice of tensorization shape and decomposition rank with hardware-aware co-optimization. We jointly investigate tensor contraction path…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Tensor decomposition and applications · Advanced Neural Network Applications
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Weight Decay · Dense Connections · Refunds@Expedia|||How do I get a full refund from Expedia? · WordPiece · Residual Connection · Linear Warmup With Linear Decay · Layer Normalization
