Efficient Time Series Processing for Transformers and State-Space Models through Token Merging

Leon G\"otz; Marcel Kollovieh; Stephan G\"unnemann; Leo Schwinn

arXiv:2405.17951·cs.LG·August 6, 2025

Efficient Time Series Processing for Transformers and State-Space Models through Token Merging

Leon G\"otz, Marcel Kollovieh, Stephan G\"unnemann, Leo Schwinn

PDF

Open Access

TL;DR

This paper introduces local token merging for time series processing, significantly improving efficiency of transformers and state-space models with minimal accuracy loss, enabling scalable long-sequence analysis.

Contribution

It proposes local merging, a domain-specific token merging algorithm that reduces complexity and enables causal merging in transformer decoders for time series.

Findings

01

Achieves up to 5400% acceleration on Chronos model.

02

Local merging reduces complexity from quadratic to linear.

03

Spectral properties predict merging benefits without downstream evaluation.

Abstract

Despite recent advances in subquadratic attention mechanisms or state-space models, processing long token sequences still imposes significant computational requirements. Token merging has emerged as a solution to increase computational efficiency in computer vision architectures. In this work, we perform the first investigations of token merging in time series analysis on both transformers and state-space models. We further introduce local merging, a domain-specific token merging algorithm that selectively combines tokens within a local neighborhood, achieving two major benefits: a) Local merging can adjust its computational complexity from quadratic to linear based on the neighborhood size to effectively scale to long sequences; b) Local merging is the first causal merging scheme enabling token merging in transformer decoders. Further, we identify spectral properties of the input data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTime Series Analysis and Forecasting · Neural Networks and Applications

MethodsAttention Is All You Need · Dense Connections · Softmax · Layer Normalization · Linear Layer · Multi-Head Attention · Residual Connection · Vision Transformer