Low Rank Transformer for Multivariate Time Series Anomaly Detection and Localization
Charalampos Shimillas, Kleanthis Malialis, Konstantinos Fokianos, Marios M. Polycarpou

TL;DR
This paper introduces ALoRa-T, a low-rank regularized Transformer model for multivariate time series anomaly detection and localization, providing theoretical insights and outperforming existing methods in accuracy.
Contribution
It presents a novel low-rank regularization for Transformers and a new localization method, advancing the understanding and performance of anomaly diagnosis in multivariate time series.
Findings
Significantly outperforms state-of-the-art in detection accuracy
Effectively localizes anomalies to specific variables
Provides theoretical insights linking Transformers to statistical methods
Abstract
Multivariate time series (MTS) anomaly diagnosis, which encompasses both anomaly detection and localization, is critical for the safety and reliability of complex, large-scale real-world systems. The vast majority of existing anomaly diagnosis methods offer limited theoretical insights, especially for anomaly localization, which is a vital but largely unexplored area. The aim of this contribution is to study the learning process of a Transformer when applied to MTS by revealing connections to statistical time series methods. Based on these theoretical insights, we propose the Attention Low-Rank Transformer (ALoRa-T) model, which applies low-rank regularization to self-attention, and we introduce the Attention Low-Rank score, effectively capturing the temporal characteristics of anomalies. Finally, to enable anomaly localization, we propose the ALoRa-Loc method, a novel approach that…
Peer Reviews
Decision·ICLR 2026 Poster
- The authors theoretically relate the Transformer architecture back to the techniques from classical time series modeling. Based on this insight, they propose technically sound and well-motivated modifications to the Transformer architecture, further specializing it for the task of MTS anomaly detection. - The authors propose novel detection and localization frameworks that are more reliable than previously used metrics. - Together, the proposed method and detection/localization methods succe
- According to Table 1, it appears that AloRa-Det is more effective on some datasets (ex) HAI or SMD) than other (SwAT, MSL). What causes such a discrepancy in the results? Is ALoRa-Det more effective at detecting certain anomaly types than others? - The majority of the baselines are drawn from Transformer-backed anomaly detection methods (for a good reason). Yet, it would be helpful to add some baselines from other families of MTS anomaly detection methods, such as reconstruction or contrastiv
(1) The paper provides a coherent spectral perspective on attention that is simple to compute conceptually and ties to an interpretable diagnostic. (2) The authors diagnose that point-adjustment inflates results—sometimes making them indistinguishable from random scoring—and therefore pivot to range-aware/affiliation-based metrics, improving evaluation validity. (3) The localization section explicitly models propagation via contribution weights (E, C), which is more principled than per-dimens
(1) While the detection pipeline uses two thresholds ($h_1, h_2$), Appendix A provides data-driven approach of choosing threshold $h_1$, but this is still a per-dataset manual step, introducing hyperparameter sensitivity. Also, neither ablation on $h_2$ selection nor heuristics on choosing it was provided. (2) The paper’s central intuition—\textit{anomalous windows yield higher attention rank}—is supported empirically (plots/observations) but lacks a formal guarantee. No theoretical background
1. The idea of detecting anomalies by analyzing the transformer’s learning behavior is original and insightful. It opens a new direction for understanding model-internal representations in time-series anomaly detection. 2. The focus on anomaly localization is meaningful and practically valuable.
1. The paper uses Spearman correlation to estimate dependencies among sequence pairs but does not justify why this choice is preferred over Pearson correlation or Cosine Similarity. Furthermore, the paper states that only the top-K correlated pairs are retained, yet the criterion for determining K is not specified or experimentally analyzed. 2. The central claim that “the rank of SA-matrices increases in the presence of anomalies” is only supported by empirical observation on a few datasets. Th
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Time Series Analysis and Forecasting · Software System Performance and Reliability
