U-shaped Transformer: Retain High Frequency Context in Time Series Analysis
Qingkui Chen, Yiqin Zhang

TL;DR
This paper introduces a U-shaped Transformer model that preserves high-frequency information in time series prediction by combining transformer and MLP advantages, leading to improved performance across datasets.
Contribution
It proposes a novel U-shaped Transformer architecture with skip-layer connections and multi-scale feature extraction for better time series analysis.
Findings
Achieves state-of-the-art results on multiple datasets.
Effectively retains high-frequency information in predictions.
Operates with relatively low computational cost.
Abstract
Time series prediction plays a crucial role in various industrial fields. In recent years, neural networks with a transformer backbone have achieved remarkable success in many domains, including computer vision and NLP. In time series analysis domain, some studies have suggested that even the simplest MLP networks outperform advanced transformer-based networks on time series forecast tasks. However, we believe these findings indicate there to be low-rank properties in time series sequences. In this paper, we consider the low-pass characteristics of transformers and try to incorporate the advantages of MLP. We adopt skip-layer connections inspired by Unet into traditional transformer backbone, thus preserving high-frequency context from input to output, namely U-shaped Transformer. We introduce patch merge and split operation to extract features with different scales and use larger…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Neural Networks and Applications · Anomaly Detection Techniques and Applications
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Byte Pair Encoding · Position-Wise Feed-Forward Layer · Residual Connection · Absolute Position Encodings · Adam · Layer Normalization · Label Smoothing
