MICA: Multivariate Infini Compressive Attention for Time Series Forecasting

Willa Potosnak; Nina \.Zukowska; Micha{\l} Wili\'nski; Dan Howarth; Ignacy St\k{e}pka; Mononito Goswami; Artur Dubrawski

arXiv:2604.06473·cs.LG·May 12, 2026

MICA: Multivariate Infini Compressive Attention for Time Series Forecasting

Willa Potosnak, Nina \.Zukowska, Micha{\l} Wili\'nski, Dan Howarth, Ignacy St\k{e}pka, Mononito Goswami, Artur Dubrawski

PDF

TL;DR

MICA introduces a scalable cross-channel attention mechanism for multivariate time series forecasting, significantly improving accuracy and efficiency over traditional Transformer models.

Contribution

The paper proposes MICA, a novel attention technique that extends sequence attention to the channel dimension, enabling scalable and effective multivariate forecasting.

Findings

01

MICA reduces forecast error by 5.4% on average across benchmarks.

02

Models with MICA outperform deep multivariate Transformer and MLP baselines.

03

MICA scales linearly with channel count and context length, improving efficiency.

Abstract

Multivariate forecasting with Transformers faces a core scalability challenge: modeling cross-channel dependencies via attention compounds attention's quadratic sequence complexity with quadratic channel scaling, making full cross-channel attention impractical for high-dimensional time series. We propose Multivariate Infini Compressive Attention (MICA), an architectural design to extend channel-independent Transformers to channel-dependent forecasting. By adapting efficient attention techniques from the sequence dimension to the channel dimension, MICA adds a cross-channel attention mechanism to channel-independent backbones that scales linearly with channel count and context length. We evaluate channel-independent Transformer architectures with and without MICA across multiple forecasting benchmarks. MICA reduces forecast error over its channel-independent counterparts by 5.4% on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.