MEMTO: Memory-guided Transformer for Multivariate Time Series Anomaly Detection
Junho Song, Keonwoo Kim, Jeonglyul Oh, Sungzoon Cho

TL;DR
MEMTO introduces a memory-guided Transformer model with a novel memory module and bi-dimensional detection criterion, significantly improving multivariate time series anomaly detection performance across diverse real-world datasets.
Contribution
The paper proposes MEMTO, a novel memory-guided Transformer with a two-phase training paradigm and bi-dimensional detection, addressing over-generalization and enhancing anomaly detection accuracy.
Findings
Achieves an average F1-score of 95.74% on five datasets.
Outperforms previous state-of-the-art methods.
Validated effectiveness of key components through extensive experiments.
Abstract
Detecting anomalies in real-world multivariate time series data is challenging due to complex temporal dependencies and inter-variable correlations. Recently, reconstruction-based deep models have been widely used to solve the problem. However, these methods still suffer from an over-generalization issue and fail to deliver consistently high performance. To address this issue, we propose the MEMTO, a memory-guided Transformer using a reconstruction-based approach. It is designed to incorporate a novel memory module that can learn the degree to which each memory item should be updated in response to the input data. To stabilize the training procedure, we use a two-phase training paradigm which involves using K-means clustering for initializing memory items. Additionally, we introduce a bi-dimensional deviation-based detection criterion that calculates anomaly scores considering both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Time Series Analysis and Forecasting · Network Security and Intrusion Detection
MethodsMulti-Head Attention · Linear Layer · Attention Is All You Need · Absolute Position Encodings · Dropout · Dense Connections · Byte Pair Encoding · Softmax · Layer Normalization · Position-Wise Feed-Forward Layer
