Detection of Hate and Threat in Digital Forensics: A Case-Driven Multimodal Approach
Ponkoj Chandra Shill

TL;DR
This paper introduces a multimodal forensic framework that detects hate and threats by analyzing heterogeneous evidence like images and text, improving interpretability and evidentiary traceability.
Contribution
It presents a case-driven approach that explicitly identifies evidence types and applies modality-specific analysis, enhancing forensic decision-making and evidence interpretation.
Findings
Consistent detection performance across heterogeneous evidence scenarios
Improved interpretability and traceability in forensic analysis
Effective use of vision-language models for semantic reasoning
Abstract
Digital forensic investigations increasingly rely on heterogeneous evidence such as images, scanned documents, and contextual reports. These artifacts may contain explicit or implicit expressions of harm, hate, threat, violence, or intimidation, yet existing automated approaches often assume clean text input or apply vision models without forensic justification. This paper presents a case-driven multimodal approach for hate and threat detection in forensic analysis. The proposed framework explicitly determines the presence and source of textual evidence, distinguishing between embedded text, associated contextual text, and image-only evidence. Based on the identified evidence configuration, the framework selectively applies text analysis, multimodal fusion, or image-only semantic reasoning using vision language models with vision transformer backbones (ViT). By conditioning inference on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
