A Study of Quantisation-aware Training on Time Series Transformer Models   for Resource-constrained FPGAs

Tianheng Ling; Chao Qian; Lukas Einhaus; Gregor Schiele

arXiv:2310.02654·cs.LG·October 5, 2023·1 cites

A Study of Quantisation-aware Training on Time Series Transformer Models for Resource-constrained FPGAs

Tianheng Ling, Chao Qian, Lukas Einhaus, Gregor Schiele

PDF

Open Access

TL;DR

This paper introduces an adaptive quantisation-aware training method for time series Transformer models, optimizing resource use on FPGAs by dynamically selecting quantisation schemes based on data distribution, maintaining accuracy and robustness.

Contribution

It proposes a novel adaptive quantisation scheme that switches between symmetric and asymmetric quantisation during training, tailored to data distribution, for efficient FPGA deployment.

Findings

01

Reduces computational overhead with maintained precision

02

Robust performance on real-world data and mixed-precision quantisation

03

Supports most objects quantised to 4 bits

Abstract

This study explores the quantisation-aware training (QAT) on time series Transformer models. We propose a novel adaptive quantisation scheme that dynamically selects between symmetric and asymmetric schemes during the QAT phase. Our approach demonstrates that matching the quantisation scheme to the real data distribution can reduce computational overhead while maintaining acceptable precision. Moreover, our approach is robust when applied to real-world data and mixed-precision quantisation, where most objects are quantised to 4 bits. Our findings inform model quantisation and deployment decisions while providing a foundation for advancing quantisation techniques.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · CCD and CMOS Imaging Sensors · Neural Networks and Reservoir Computing

MethodsMulti-Head Attention · Dense Connections · Linear Layer · Label Smoothing · Absolute Position Encodings · Attention Is All You Need · Adam · Residual Connection · Layer Normalization · Softmax