# CALM: A Framework for Continuous, Adaptive, and LLM-Mediated Anomaly Detection in Time-Series Streams

**Authors:** Ashok Devireddy, Shunping Huang

arXiv: 2508.21273 · 2025-09-01

## TL;DR

CALM is an innovative framework that combines continuous model fine-tuning and large language model judgments to improve real-time anomaly detection in evolving time-series data streams.

## Contribution

It introduces a novel end-to-end adaptive anomaly detection system that leverages LLMs for semantic judgment and continuous fine-tuning to handle concept drift.

## Key findings

- Improved ROC AUC scores on TSB-UAD benchmark datasets.
- Effective real-time adaptation to data pattern changes.
- Enhanced anomaly detection accuracy through LLM-guided dataset curation.

## Abstract

The detection of anomalies in non-stationary time-series streams is a critical but challenging task across numerous industrial and scientific domains. Traditional models, trained offline, suffer significant performance degradation when faced with concept drift, where the underlying statistical properties of the data change over time. This paper introduces CALM (Continuous, Adaptive, and LLM-Mediated), a novel, end-to-end framework for real-time anomaly detection designed to address this challenge. CALM is built on the Apache Beam distributed processing framework and leverages the TimesFm foundation model for forecasting-based anomaly detection. The framework's novelty lies in two core contributions. First, it implements a closed-loop, continuous fine-tuning mechanism that allows the anomaly detection model to adapt to evolving data patterns in near real-time. Second, it introduces an LLM-as-a-Judge component, a Large Language Model that provides semantic, context-aware judgments on detected anomalies to curate a high-quality training dataset, deciding whether an anomaly represents transient noise or a meaningful pattern shift. We evaluate CALM on the comprehensive TSB-UAD benchmark. Our results demonstrate that the continuously fine-tuned model improves the ROC AUC score in most datasets compared to the static, pre-trained base model, validating the efficacy of our adaptive, LLM-guided approach to maintaining high-performance anomaly detection in dynamic streaming environments.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.21273/full.md

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/2508.21273/full.md

## References

50 references — full list in the complete paper: https://tomesphere.com/paper/2508.21273/full.md

---
Source: https://tomesphere.com/paper/2508.21273