Correlated Attention in Transformers for Multivariate Time Series
Quang Minh Nguyen, Lam M. Nguyen, Subhro Das

TL;DR
This paper introduces a correlated attention mechanism for Transformers that better captures feature cross-correlations and lagged dependencies in multivariate time series, improving performance across various tasks.
Contribution
It proposes a novel correlated attention mechanism that integrates into Transformer encoders to effectively model cross-feature and lagged dependencies in multivariate time series.
Findings
Achieves state-of-the-art results in imputation, anomaly detection, and classification.
Enhances Transformer models with correlated attention for better feature dependency modeling.
Demonstrates consistent improvements across multiple multivariate time series tasks.
Abstract
Multivariate time series (MTS) analysis prevails in real-world applications such as finance, climate science and healthcare. The various self-attention mechanisms, the backbone of the state-of-the-art Transformer-based models, efficiently discover the temporal dependencies, yet cannot well capture the intricate cross-correlation between different features of MTS data, which inherently stems from complex dynamical systems in practice. To this end, we propose a novel correlated attention mechanism, which not only efficiently captures feature-wise dependencies, but can also be seamlessly integrated within the encoder blocks of existing well-known Transformers to gain efficiency improvement. In particular, correlated attention operates across feature channels to compute cross-covariance matrices between queries and keys with different lag values, and selectively aggregate representations at…
Peer Reviews
Decision·Submitted to ICLR 2024
This paper is well-structured, presenting a thorough background introduction and a step-by-step introduction of the novel concept, the Correlated Attention Block (CAB). A notable feature of this work is the seamless integration of CAB into encoder-only architectures of Transformers, making it a potentially good-to-have addition to the field. Furthermore, the authors conducted an extensive set of experiments across three different tasks, utilizing a variety of common datasets. The results consis
1. Page 9, Line 1 of **Conclusion And Future Work**: There's a minor typo that needs correction - "bloc" should be changed to "block." 2. Citation Style: The reference list shows some inconsistency in the citation style. To enhance clarity and uniformity, consider standardizing the format across all references. For example, you could list all NeurIPS papers with consistent formatting, and for papers from other conferences or sources, ensure that their respective publication details are included
1. The authors propose a correlated attention mechanism to capture lagged cross-covariance between variates, which can combined with existing encoder-only transformer structure. 2. The experiments show correlated attention mechanism enhances base Transformer models.
1. The novelty of the paper is limited. The proposed correlated attention is basically a extension for Autoformer, which only captures auto-correlation. However, this method neither proposes a good method to reduce the computational complexity caused by calculating cross-correlation, which is almost unacceptable in actual scenarios, nor does the author conduct a comparative experiment with Autoformer to prove that the introduction of corss-correlation can bring to achieve practical improvements
The paper's main focus is to address the learning of feature-wise correlation in the transformer attention setup. They explore if the learning feature-wise correlation actually helps in tasks other than forecasting such as anomaly detection, imputation, and classification. The proposed correlated attention can capture not only conventional cross-correlation but also capture auto-correlation, and lagged cross-correlation. The idea that allows one to learn lagged correlation and be able to integ
1, Some parts of the paper presentation could be improved, such as the explanation of the methods, for more details check the question sections. 2. The experiment section does not look very convincing due to the comparison setup (if it is fair or not, please refer to the question section) and results. Given the huge computational cost of integrating the cross-correlation, the experiment results do not look that significant.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Anomaly Detection Techniques and Applications · Complex Systems and Time Series Analysis
MethodsAttention Is All You Need · Dense Connections · Dropout · Softmax · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Label Smoothing · Linear Layer · Adam · Multi-Head Attention
