Efficient Time Series Processing for Transformers and State-Space Models through Token Merging
Leon G\"otz, Marcel Kollovieh, Stephan G\"unnemann, Leo Schwinn

TL;DR
This paper introduces local token merging for time series processing, significantly improving efficiency of transformers and state-space models with minimal accuracy loss, enabling scalable long-sequence analysis.
Contribution
It proposes local merging, a domain-specific token merging algorithm that reduces complexity and enables causal merging in transformer decoders for time series.
Findings
Achieves up to 5400% acceleration on Chronos model.
Local merging reduces complexity from quadratic to linear.
Spectral properties predict merging benefits without downstream evaluation.
Abstract
Despite recent advances in subquadratic attention mechanisms or state-space models, processing long token sequences still imposes significant computational requirements. Token merging has emerged as a solution to increase computational efficiency in computer vision architectures. In this work, we perform the first investigations of token merging in time series analysis on both transformers and state-space models. We further introduce local merging, a domain-specific token merging algorithm that selectively combines tokens within a local neighborhood, achieving two major benefits: a) Local merging can adjust its computational complexity from quadratic to linear based on the neighborhood size to effectively scale to long sequences; b) Local merging is the first causal merging scheme enabling token merging in transformer decoders. Further, we identify spectral properties of the input data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Neural Networks and Applications
MethodsAttention Is All You Need · Dense Connections · Softmax · Layer Normalization · Linear Layer · Multi-Head Attention · Residual Connection · Vision Transformer
