VarDrop: Enhancing Training Efficiency by Reducing Variate Redundancy in Periodic Time Series Forecasting

Junhyeok Kang; Yooju Shin; and Jae-Gil Lee

arXiv:2501.14183·cs.LG·April 8, 2026

VarDrop: Enhancing Training Efficiency by Reducing Variate Redundancy in Periodic Time Series Forecasting

Junhyeok Kang, Yooju Shin, and Jae-Gil Lee

PDF

1 Video

TL;DR

VarDrop is a novel method that reduces variate token redundancy in multivariate time series forecasting, significantly improving training efficiency by using frequency-based grouping and sparse attention.

Contribution

The paper introduces VarDrop, a new strategy that adaptively omits redundant variate tokens using frequency hashing and stratified sampling to enhance training efficiency.

Findings

01

VarDrop outperforms existing efficient baselines on benchmark datasets.

02

It significantly reduces computational cost of attention in multivariate forecasting.

03

The method maintains forecasting accuracy while improving efficiency.

Abstract

Variate tokenization, which independently embeds each variate as separate tokens, has achieved remarkable improvements in multivariate time series forecasting. However, employing self-attention with variate tokens incurs a quadratic computational cost with respect to the number of variates, thus limiting its training efficiency for large-scale applications. To address this issue, we propose VarDrop, a simple yet efficient strategy that reduces the token usage by omitting redundant variate tokens during training. VarDrop adaptively excludes redundant tokens within a given batch, thereby reducing the number of tokens used for dot-product attention while preserving essential information. Specifically, we introduce k-dominant frequency hashing (k-DFH), which utilizes the ranked dominant frequencies in the frequency domain as a hash value to efficiently group variate tokens exhibiting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

VarDrop: Enhancing Training Efficiency by Reducing Variate Redundancy in Periodic Time Series Forecasting· underline