Transformer-Driven Multimodal Fusion for Explainable Suspiciousness Estimation in Visual Surveillance

Kuldeep Singh Yadav; Lalan Kumar

arXiv:2512.09311·cs.CV·December 11, 2025

Transformer-Driven Multimodal Fusion for Explainable Suspiciousness Estimation in Visual Surveillance

Kuldeep Singh Yadav, Lalan Kumar

PDF

Open Access

TL;DR

This paper introduces a large-scale dataset and a transformer-based multimodal framework for real-time suspiciousness estimation in visual surveillance, enhancing accuracy and interpretability in threat detection.

Contribution

It presents the USE50k dataset and DeepUSEvision framework, combining multimodal fusion and transformer networks for improved suspiciousness analysis in complex environments.

Findings

01

Superior accuracy over state-of-the-art methods

02

Robustness across diverse surveillance scenarios

03

Enhanced interpretability of suspiciousness scores

Abstract

Suspiciousness estimation is critical for proactive threat detection and ensuring public safety in complex environments. This work introduces a large-scale annotated dataset, USE50k, along with a computationally efficient vision-based framework for real-time suspiciousness analysis. The USE50k dataset contains 65,500 images captured from diverse and uncontrolled environments, such as airports, railway stations, restaurants, parks, and other public areas, covering a broad spectrum of cues including weapons, fire, crowd density, abnormal facial expressions, and unusual body postures. Building on this dataset, we present DeepUSEvision, a lightweight and modular system integrating three key components, i.e., a Suspicious Object Detector based on an enhanced YOLOv12 architecture, dual Deep Convolutional Neural Networks (DCNN-I and DCNN-II) for facial expression and body-language recognition…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Explainable Artificial Intelligence (XAI)