MICA: Multivariate Infini Compressive Attention for Time Series Forecasting
Willa Potosnak, Nina \.Zukowska, Micha{\l} Wili\'nski, Dan Howarth, Ignacy St\k{e}pka, Mononito Goswami, Artur Dubrawski

TL;DR
MICA introduces a scalable cross-channel attention mechanism for multivariate time series forecasting, significantly improving accuracy and efficiency over traditional Transformer models.
Contribution
The paper proposes MICA, a novel attention technique that extends sequence attention to the channel dimension, enabling scalable and effective multivariate forecasting.
Findings
MICA reduces forecast error by 5.4% on average across benchmarks.
Models with MICA outperform deep multivariate Transformer and MLP baselines.
MICA scales linearly with channel count and context length, improving efficiency.
Abstract
Multivariate forecasting with Transformers faces a core scalability challenge: modeling cross-channel dependencies via attention compounds attention's quadratic sequence complexity with quadratic channel scaling, making full cross-channel attention impractical for high-dimensional time series. We propose Multivariate Infini Compressive Attention (MICA), an architectural design to extend channel-independent Transformers to channel-dependent forecasting. By adapting efficient attention techniques from the sequence dimension to the channel dimension, MICA adds a cross-channel attention mechanism to channel-independent backbones that scales linearly with channel count and context length. We evaluate channel-independent Transformer architectures with and without MICA across multiple forecasting benchmarks. MICA reduces forecast error over its channel-independent counterparts by 5.4% on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
