Enhancing Masked Time-Series Modeling via Dropping Patches
Tianyu Qiu, Yi Xie, Yun Xiong, Hao Niu, Xiaofeng Gao

TL;DR
This paper introduces DropPatch, a method that enhances masked time-series modeling by randomly dropping subsequence patches, improving training efficiency and performance across various scenarios through empirical and theoretical analysis.
Contribution
It proposes DropPatch, a novel patch-dropping technique that boosts pre-training efficiency and model robustness in masked time-series modeling.
Findings
DropPatch improves pre-training efficiency significantly.
It enhances model performance in in-domain, cross-domain, and few-shot scenarios.
Theoretically, it prevents representation collapse in Transformers.
Abstract
This paper explores how to enhance existing masked time-series modeling by randomly dropping sub-sequence level patches of time series. On this basis, a simple yet effective method named DropPatch is proposed, which has two remarkable advantages: 1) It improves the pre-training efficiency by a square-level advantage; 2) It provides additional advantages for modeling in scenarios such as in-domain, cross-domain, few-shot learning and cold start. This paper conducts comprehensive experiments to verify the effectiveness of the method and analyze its internal mechanism. Empirically, DropPatch strengthens the attention mechanism, reduces information redundancy and serves as an efficient means of data augmentation. Theoretically, it is proved that DropPatch slows down the rate at which the Transformer representations collapse into the rank-1 linear subspace by randomly dropping patches, thus…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Visualization and Analytics · Time Series Analysis and Forecasting
MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Absolute Position Encodings · Dense Connections · Multi-Head Attention · Position-Wise Feed-Forward Layer · Label Smoothing · Residual Connection · Adam
