ALARM: Automated MLLM-Based Anomaly Detection in Complex-EnviRonment Monitoring with Uncertainty Quantification

Congjing Zhang; Feng Lin; Xinyi Zhao; Pei Guo; Wei Li; Lin Chen; Chaoyue Zhao; Shuai Huang

arXiv:2512.03101·cs.LG·March 4, 2026

ALARM: Automated MLLM-Based Anomaly Detection in Complex-EnviRonment Monitoring with Uncertainty Quantification

Congjing Zhang, Feng Lin, Xinyi Zhao, Pei Guo, Wei Li, Lin Chen, Chaoyue Zhao, Shuai Huang

PDF

Open Access

TL;DR

ALARM is a novel framework that leverages large language models with uncertainty quantification to improve visual anomaly detection in complex environments, demonstrating superior performance across multiple domains.

Contribution

This paper introduces ALARM, a new UQ-supported MLLM-based VAD framework that combines reasoning, self-reflection, and ensemble techniques for robust anomaly detection.

Findings

01

ALARM outperforms existing methods on real-world datasets.

02

It demonstrates high reliability and applicability across different complex environments.

03

The framework effectively quantifies uncertainty to enhance decision-making.

Abstract

The advance of Large Language Models (LLMs) has greatly stimulated research interest in developing multi-modal LLM (MLLM)-based visual anomaly detection (VAD) algorithms that can be deployed in complex environments. The challenge is that in these complex environments, the anomalies are sometimes highly contextual and also ambiguous, and thereby, uncertainty quantification (UQ) is a crucial capacity for an MLLM-based VAD system to succeed. In this paper, we introduce our UQ-supported MLLM-based VAD framework called ALARM. ALARM integrates UQ with quality-assurance techniques like reasoning chain, self-reflection, and MLLM ensemble for robust and accurate performance and is designed based on a rigorous probabilistic inference pipeline and computational process. Extensive empirical evaluations are conducted using the real-world smart-home benchmark data and wound image classification data,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning