On-device AI: Quantization-aware Training of Transformers in Time-Series

Tianheng Ling; Gregor Schiele

arXiv:2408.16495·cs.LG·August 30, 2024

On-device AI: Quantization-aware Training of Transformers in Time-Series

Tianheng Ling, Gregor Schiele

PDF

TL;DR

This paper explores quantization-aware training to optimize Transformer models for time-series forecasting, enabling efficient deployment on resource-constrained embedded FPGA hardware.

Contribution

It introduces a quantization-aware training approach specifically tailored for Transformer models in time-series forecasting tasks, enhancing deployment efficiency on embedded devices.

Findings

01

Reduced model size and runtime memory footprint.

02

Maintained forecasting accuracy with quantization.

03

Facilitated efficient FPGA deployment.

Abstract

Artificial Intelligence (AI) models for time-series in pervasive computing keep getting larger and more complicated. The Transformer model is by far the most compelling of these AI models. However, it is difficult to obtain the desired performance when deploying such a massive model on a sensor device with limited resources. My research focuses on optimizing the Transformer model for time-series forecasting tasks. The optimized model will be deployed as hardware accelerators on embedded Field Programmable Gate Arrays (FPGAs). I will investigate the impact of applying Quantization-aware Training to the Transformer model to reduce its size and runtime memory footprint while maximizing the advantages of FPGAs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · Adam · Layer Normalization · Attention Is All You Need · Position-Wise Feed-Forward Layer · Dense Connections · Residual Connection · Multi-Head Attention · Byte Pair Encoding · Absolute Position Encodings