AnomSeer: Reinforcing Multimodal LLMs to Reason for Time-Series Anomaly Detection

Junru Zhang; Lang Feng; Haoran Shi; Xu Guo; Han Yu; Yabo Dong; Duanqing Xu

arXiv:2602.08868·cs.LG·February 10, 2026

AnomSeer: Reinforcing Multimodal LLMs to Reason for Time-Series Anomaly Detection

Junru Zhang, Lang Feng, Haoran Shi, Xu Guo, Han Yu, Yabo Dong, Duanqing Xu

PDF

Open Access 3 Reviews

TL;DR

AnomSeer enhances multimodal large language models for time-series anomaly detection by grounding reasoning in detailed structural analysis, improving accuracy and interpretability across diverse scenarios.

Contribution

It introduces a novel reinforcement learning approach with time-series grounded policy optimization and expert reasoning traces for improved anomaly detection.

Findings

01

Outperforms larger commercial models in classification accuracy

02

Provides verifiable, fine-grained reasoning traces

03

Effective across diverse anomaly scenarios

Abstract

Time-series anomaly detection (TSAD) with multimodal large language models (MLLMs) is an emerging area, yet a persistent challenge remains: MLLMs rely on coarse time-series heuristics but struggle with multi-dimensional, detailed reasoning, which is vital for understanding complex time-series data. We present AnomSeer to address this by reinforcing the model to ground its reasoning in precise, structural details of time series, unifying anomaly classification, localization, and explanation. At its core, an expert chain-of-thought trace is generated to provide a verifiable, fine-grained reasoning from classical analyses (e.g., statistical measures, frequency transforms). Building on this, we propose a novel time-series grounded policy optimization (TimerPO) that incorporates two additional components beyond standard reinforcement learning: a time-series grounded advantage based on…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 6Confidence 4

Strengths

- **Originality (compositional):** A thoughtful pairing of *process-evidence alignment* via OT with *orthogonal advantage composition* inside GRPO, targeted at TSAD. The ExpCoT design grounds CoT in verifiable TSAD signals rather than generic text heuristics. - **Quality:** Solid performance improvements over zero-shot MLLMs and a strong RL baseline, with the biggest wins on the hard frequency/trend categories that motivated the paper. Ablations show each component matters; sensitivity to (\alph

Weaknesses

- **Related Work coverage:** Missing discussion of **OT in RL/alignment** and **multi-objective/gradient-projection** literature (e.g., PCGrad). As a result, novelty may be under-justified as more than a careful composition. - **Baselines:** Ablations remove components, but comparisons lack *alternative* multi-objective schemes: (i) simple weighted-sum (no projection), (ii) PCGrad-style gradient orthogonalization, (iii) replacing OT with cosine/CLIP-style similarity. These are crucia

Reviewer 02Rating 4Confidence 4

Strengths

The idea of incorporating expert-generated reasoning traces from classical TSAD methods is conceptually sound. It provides a structured and verifiable way to include numerical priors into the model. The TimerPO algorithm is also technically interesting, especially its use of Optimal Transport to measure reasoning similarity. The framework demonstrates decent generalization, with stable performance across datasets despite being trained only on synthetic data. This suggests a certain degree of ro

Weaknesses

- Several issues limit the strength of the paper’s claims. First, the evaluation on the AnomLLM dataset may be affected by potential information leakage, since ExpCoT traces include ground-truth anomaly intervals. - Second, the paper lacks ablation studies isolating the effects of ExpCoT and GRPO, which makes it difficult to understand their individual contributions. - Third, the manuscript does not provide clear definitions or implementation details for the reported Affinity-Precision, Affinity

Reviewer 03Rating 4Confidence 4

Strengths

Clear and meaningful motivation addressing the lack of fine-grained reasoning in MLLMs for time-series tasks. Well-designed framework combining ExpCoT (expert chain-of-thought supervision) and TimerPO (reinforcement optimization). Strong experimental results demonstrating improved interpretability and reasoning quality.

Weaknesses

My concerns are as follows: 1. The generation of ExpCoT requires traditional statistical analyses (FFT, residual detection, Matrix Profile, etc.), and each anomaly type needs specific parameters and templates. When transferring to new domains, the “expert reasoning templates” need to be redefined; and it is difficult to automatically scale to large heterogeneous datasets. 2. Since each anomaly type is defined by fixed parameters and templates, if such reasoning chains are already effective, I am

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Time Series Analysis and Forecasting · Topic Modeling