A Study of Quantisation-aware Training on Time Series Transformer Models for Resource-constrained FPGAs
Tianheng Ling, Chao Qian, Lukas Einhaus, Gregor Schiele

TL;DR
This paper introduces an adaptive quantisation-aware training method for time series Transformer models, optimizing resource use on FPGAs by dynamically selecting quantisation schemes based on data distribution, maintaining accuracy and robustness.
Contribution
It proposes a novel adaptive quantisation scheme that switches between symmetric and asymmetric quantisation during training, tailored to data distribution, for efficient FPGA deployment.
Findings
Reduces computational overhead with maintained precision
Robust performance on real-world data and mixed-precision quantisation
Supports most objects quantised to 4 bits
Abstract
This study explores the quantisation-aware training (QAT) on time series Transformer models. We propose a novel adaptive quantisation scheme that dynamically selects between symmetric and asymmetric schemes during the QAT phase. Our approach demonstrates that matching the quantisation scheme to the real data distribution can reduce computational overhead while maintaining acceptable precision. Moreover, our approach is robust when applied to real-world data and mixed-precision quantisation, where most objects are quantised to 4 bits. Our findings inform model quantisation and deployment decisions while providing a foundation for advancing quantisation techniques.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · CCD and CMOS Imaging Sensors · Neural Networks and Reservoir Computing
MethodsMulti-Head Attention · Dense Connections · Linear Layer · Label Smoothing · Absolute Position Encodings · Attention Is All You Need · Adam · Residual Connection · Layer Normalization · Softmax
