Harnessing Vision-Language Models for Time Series Anomaly Detection
Zelin He, Sarah Alnegheimish, Matthew Reimherr

TL;DR
This paper introduces a novel two-stage approach utilizing vision-language models for time series anomaly detection, significantly improving accuracy and efficiency without requiring time-series training.
Contribution
It presents a new method combining lightweight vision encoders and VLMs for anomaly detection, outperforming existing models without additional training.
Findings
VLM4TS outperforms baselines with 24.6% higher F1-max score.
The approach is 36x more efficient in token usage.
It achieves superior accuracy without time-series training.
Abstract
Time-series anomaly detection (TSAD) has played a vital role in a variety of fields, including healthcare, finance, and sensor-based condition monitoring. Prior methods, which mainly focus on training domain-specific models on numerical data, lack the visual-temporal understanding capacity that human experts have to identify contextual anomalies. To fill this gap, we explore a solution based on vision language models (VLMs). Recent studies have shown the ability of VLMs for visual understanding tasks, yet their direct application to time series has fallen short on both accuracy and efficiency. To harness the power of VLMs for TSAD, we propose a two-stage solution, with (1) ViT4TS, a vision-screening stage built on a relatively lightweight pre-trained vision encoder, which leverages 2D time series representations to accurately localize candidate anomalies; (2) VLM4TS, a VLM-based stage…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Time Series Analysis and Forecasting · Machine Learning in Healthcare
