Domain Adaptation for Time series Transformers using One-step fine-tuning
Subina Khanal, Seshu Tirupathi, Giulio Zizzo, Ambrish Rawat, and, Torben Bach Pedersen

TL;DR
This paper proposes a one-step fine-tuning method for time series Transformers that improves prediction accuracy in limited data domains by combining pre-training, data augmentation, and gradual unfreezing, outperforming existing methods.
Contribution
It introduces a novel one-step fine-tuning approach with source data integration and gradual unfreezing to enhance Transformer performance on limited data time series tasks.
Findings
Improves indoor temperature prediction by 4.35%.
Enhances wind power prediction by 11.54%.
Outperforms state-of-the-art baselines on two real-world datasets.
Abstract
The recent breakthrough of Transformers in deep learning has drawn significant attention of the time series community due to their ability to capture long-range dependencies. However, like other deep learning models, Transformers face limitations in time series prediction, including insufficient temporal understanding, generalization challenges, and data shift issues for the domains with limited data. Additionally, addressing the issue of catastrophic forgetting, where models forget previously learned information when exposed to new data, is another critical aspect that requires attention in enhancing the robustness of Transformers for time series tasks. To address these limitations, in this paper, we pre-train the time series Transformer model on a source domain with sufficient data and fine-tune it on the target domain with limited data. We introduce the \emph{One-step fine-tuning}…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Time Series Analysis and Forecasting · Anomaly Detection Techniques and Applications
MethodsMulti-Head Attention · Attention Is All You Need · Label Smoothing · Absolute Position Encodings · Layer Normalization · Softmax · Residual Connection · Linear Layer · Byte Pair Encoding · Dropout
