Sparse-VQ Transformer: An FFN-Free Framework with Vector Quantization   for Enhanced Time Series Forecasting

Yanjun Zhao; Tian Zhou; Chao Chen; Liang Sun; Yi Qian; Rong Jin

arXiv:2402.05830·cs.LG·February 9, 2024·1 cites

Sparse-VQ Transformer: An FFN-Free Framework with Vector Quantization for Enhanced Time Series Forecasting

Yanjun Zhao, Tian Zhou, Chao Chen, Liang Sun, Yi Qian, Rong Jin

PDF

Open Access

TL;DR

Sparse-VQ introduces an FFN-free transformer framework with vector quantization and RevIN, effectively reducing noise and improving accuracy in time series forecasting, while also being computationally efficient.

Contribution

It proposes a novel FFN-free transformer architecture using sparse vector quantization and RevIN, addressing noise and distribution shifts in time series forecasting.

Findings

01

Outperforms leading models with 7.84% and 4.17% MAE reduction.

02

Reduces model parameter count and overfitting.

03

Can enhance existing transformer models.

Abstract

Time series analysis is vital for numerous applications, and transformers have become increasingly prominent in this domain. Leading methods customize the transformer architecture from NLP and CV, utilizing a patching technique to convert continuous signals into segments. Yet, time series data are uniquely challenging due to significant distribution shifts and intrinsic noise levels. To address these two challenges,we introduce the Sparse Vector Quantized FFN-Free Transformer (Sparse-VQ). Our methodology capitalizes on a sparse vector quantization technique coupled with Reverse Instance Normalization (RevIN) to reduce noise impact and capture sufficient statistics for forecasting, serving as an alternative to the Feed-Forward layer (FFN) in the transformer architecture. Our FFN-free approach trims the parameter count, enhancing computational efficiency and reducing overfitting. Through…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Image and Signal Denoising Methods · Complex Systems and Time Series Analysis

MethodsAttention Is All You Need · Activation Patching · Residual Connection · Dropout · Layer Normalization · Dense Connections · Position-Wise Feed-Forward Layer · Label Smoothing · Softmax · Absolute Position Encodings