PrismVAU: Prompt-Refined Inference System for Multimodal Video Anomaly Understanding

I\~naki Erregue; Kamal Nasrollahi; Sergio Escalera

arXiv:2601.02927·cs.CV·January 9, 2026

PrismVAU: Prompt-Refined Inference System for Multimodal Video Anomaly Understanding

I\~naki Erregue, Kamal Nasrollahi, Sergio Escalera

PDF

Open Access

TL;DR

PrismVAU is a lightweight, real-time system for video anomaly understanding that uses a single off-the-shelf multimodal large language model to detect, explain, and refine anomalies efficiently without extensive annotations or external modules.

Contribution

The paper introduces PrismVAU, a novel system that simplifies and accelerates multimodal video anomaly understanding using prompt optimization and a two-stage approach with minimal supervision.

Findings

01

Competitive detection performance on standard benchmarks

02

Provides interpretable anomaly explanations

03

Operates efficiently without external modules or dense processing

Abstract

Video Anomaly Understanding (VAU) extends traditional Video Anomaly Detection (VAD) by not only localizing anomalies but also describing and reasoning about their context. Existing VAU approaches often rely on fine-tuned multimodal large language models (MLLMs) or external modules such as video captioners, which introduce costly annotations, complex training pipelines, and high inference overhead. In this work, we introduce PrismVAU, a lightweight yet effective system for real-time VAU that leverages a single off-the-shelf MLLM for anomaly scoring, explanation, and prompt optimization. PrismVAU operates in two complementary stages: (1) a coarse anomaly scoring module that computes frame-level anomaly scores via similarity to textual anchors, and (2) an MLLM-based refinement module that contextualizes anomalies through system and user prompts. Both textual anchors and prompts are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Human Pose and Action Recognition · Video Analysis and Summarization