Persistence Initialization: A novel adaptation of the Transformer   architecture for Time Series Forecasting

Espen Haugsdal; Erlend Aune; Massimiliano Ruocco

arXiv:2208.14236·cs.LG·August 31, 2022·1 cites

Persistence Initialization: A novel adaptation of the Transformer architecture for Time Series Forecasting

Espen Haugsdal, Erlend Aune, Massimiliano Ruocco

PDF

Open Access

TL;DR

This paper introduces Persistence Initialization, a novel adaptation of the Transformer architecture for time series forecasting, which improves performance and convergence speed by initializing models as naive persistence models.

Contribution

The paper proposes a new adaptation called Persistence Initialization that enhances Transformer models for time series forecasting, demonstrating superior performance and faster training.

Findings

01

Achieves competitive results on the M4 dataset.

02

Outperforms existing Transformer models for time series forecasting.

03

Improves performance with larger models and specific normalization and encoding choices.

Abstract

Time series forecasting is an important problem, with many real world applications. Ensembles of deep neural networks have recently achieved impressive forecasting accuracy, but such large ensembles are impractical in many real world settings. Transformer models been successfully applied to a diverse set of challenging problems. We propose a novel adaptation of the original Transformer architecture focusing on the task of time series forecasting, called Persistence Initialization. The model is initialized as a naive persistence model by using a multiplicative gating mechanism combined with a residual skip connection. We use a decoder Transformer with ReZero normalization and Rotary positional encodings, but the adaptation is applicable to any auto-regressive neural network model. We evaluate our proposed architecture on the challenging M4 dataset, achieving competitive performance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTime Series Analysis and Forecasting · Stock Market Forecasting Methods · Hydrological Forecasting Using AI

MethodsAttention Is All You Need · Linear Layer · Label Smoothing · Softmax · Absolute Position Encodings · Layer Normalization · Dropout · Dense Connections · Adam · Position-Wise Feed-Forward Layer