Sparse-VQ Transformer: An FFN-Free Framework with Vector Quantization for Enhanced Time Series Forecasting
Yanjun Zhao, Tian Zhou, Chao Chen, Liang Sun, Yi Qian, Rong Jin

TL;DR
Sparse-VQ introduces an FFN-free transformer framework with vector quantization and RevIN, effectively reducing noise and improving accuracy in time series forecasting, while also being computationally efficient.
Contribution
It proposes a novel FFN-free transformer architecture using sparse vector quantization and RevIN, addressing noise and distribution shifts in time series forecasting.
Findings
Outperforms leading models with 7.84% and 4.17% MAE reduction.
Reduces model parameter count and overfitting.
Can enhance existing transformer models.
Abstract
Time series analysis is vital for numerous applications, and transformers have become increasingly prominent in this domain. Leading methods customize the transformer architecture from NLP and CV, utilizing a patching technique to convert continuous signals into segments. Yet, time series data are uniquely challenging due to significant distribution shifts and intrinsic noise levels. To address these two challenges,we introduce the Sparse Vector Quantized FFN-Free Transformer (Sparse-VQ). Our methodology capitalizes on a sparse vector quantization technique coupled with Reverse Instance Normalization (RevIN) to reduce noise impact and capture sufficient statistics for forecasting, serving as an alternative to the Feed-Forward layer (FFN) in the transformer architecture. Our FFN-free approach trims the parameter count, enhancing computational efficiency and reducing overfitting. Through…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Image and Signal Denoising Methods · Complex Systems and Time Series Analysis
MethodsAttention Is All You Need · Activation Patching · Residual Connection · Dropout · Layer Normalization · Dense Connections · Position-Wise Feed-Forward Layer · Label Smoothing · Softmax · Absolute Position Encodings
