TL;DR
This paper introduces LogicAD, a novel approach using Vision Language Models combined with logic reasoning to detect anomalies in images, providing explainable results and achieving state-of-the-art performance on benchmark datasets.
Contribution
The paper demonstrates the effectiveness of AVLMs for logical anomaly detection and introduces a method that combines format embedding and a logic reasoner for improved interpretability and accuracy.
Findings
Achieves SOTA AUROC of 86.0% on MVTec LOCO AD
F1-max score of 83.7%, outperforming previous methods
Provides explainable anomaly detection results
Abstract
Logical image understanding involves interpreting and reasoning about the relationships and consistency within an image's visual content. This capability is essential in applications such as industrial inspection, where logical anomaly detection is critical for maintaining high-quality standards and minimizing costly recalls. Previous research in anomaly detection (AD) has relied on prior knowledge for designing algorithms, which often requires extensive manual annotations, significant computing power, and large amounts of data for training. Autoregressive, multimodal Vision Language Models (AVLMs) offer a promising alternative due to their exceptional performance in visual reasoning across various domains. Despite this, their application to logical AD remains unexplored. In this work, we investigate using AVLMs for logical AD and demonstrate that they are well-suited to the task.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
