On-device AI: Quantization-aware Training of Transformers in Time-Series
Tianheng Ling, Gregor Schiele

TL;DR
This paper explores quantization-aware training to optimize Transformer models for time-series forecasting, enabling efficient deployment on resource-constrained embedded FPGA hardware.
Contribution
It introduces a quantization-aware training approach specifically tailored for Transformer models in time-series forecasting tasks, enhancing deployment efficiency on embedded devices.
Findings
Reduced model size and runtime memory footprint.
Maintained forecasting accuracy with quantization.
Facilitated efficient FPGA deployment.
Abstract
Artificial Intelligence (AI) models for time-series in pervasive computing keep getting larger and more complicated. The Transformer model is by far the most compelling of these AI models. However, it is difficult to obtain the desired performance when deploying such a massive model on a sensor device with limited resources. My research focuses on optimizing the Transformer model for time-series forecasting tasks. The optimized model will be deployed as hardware accelerators on embedded Field Programmable Gate Arrays (FPGAs). I will investigate the impact of applying Quantization-aware Training to the Transformer model to reduce its size and runtime memory footprint while maximizing the advantages of FPGAs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Layer · Adam · Layer Normalization · Attention Is All You Need · Position-Wise Feed-Forward Layer · Dense Connections · Residual Connection · Multi-Head Attention · Byte Pair Encoding · Absolute Position Encodings
