Sparse Binary Transformers for Multivariate Time Series Modeling
Matt Gorbett, Hossein Shirazi, Indrakshi Ray

TL;DR
This paper demonstrates that sparse binary Transformers can effectively model multivariate time series tasks like classification, anomaly detection, and forecasting, with significant reductions in computational complexity and storage without sacrificing accuracy.
Contribution
The work introduces sparse binary-weighted Transformers for multivariate time series, achieving comparable accuracy to dense models while substantially reducing computational and storage requirements.
Findings
Achieved up to 53x reduction in storage size.
Reduced FLOPs by up to 10.5x.
Maintained accuracy comparable to dense Transformers.
Abstract
Compressed Neural Networks have the potential to enable deep learning across new applications and smaller computational environments. However, understanding the range of learning tasks in which such models can succeed is not well studied. In this work, we apply sparse and binary-weighted Transformers to multivariate time series problems, showing that the lightweight models achieve accuracy comparable to that of dense floating-point Transformers of the same structure. Our model achieves favorable results across three time series learning tasks: classification, anomaly detection, and single-step forecasting. Additionally, to reduce the computational complexity of the attention mechanism, we apply two modifications, which show little to no decline in model performance: 1) in the classification task, we apply a fixed mask to the query, key, and value activations, and 2) for forecasting and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Time Series Analysis and Forecasting · Stock Market Forecasting Methods
MethodsMulti-Head Attention · Linear Layer · Softmax · Layer Normalization · Label Smoothing · Adam · Residual Connection · Dense Connections · Dropout · Absolute Position Encodings
