JUDO: A Juxtaposed Domain-Oriented Multimodal Reasoner for Industrial Anomaly QA

Hyunju Kang; Woohyun Lee; Jaewon Kim; Hogun Park

arXiv:2605.20284·cs.CV·May 21, 2026

JUDO: A Juxtaposed Domain-Oriented Multimodal Reasoner for Industrial Anomaly QA

Hyunju Kang, Woohyun Lee, Jaewon Kim, Hogun Park

PDF

1 Video

TL;DR

JUDO is a multimodal reasoning framework that integrates domain knowledge and visual context to improve industrial anomaly detection and explanation, outperforming existing models on the MMAD benchmark.

Contribution

The paper introduces JUDO, a novel multimodal reasoning model that incorporates domain knowledge and visual comparison for enhanced industrial anomaly analysis.

Findings

01

JUDO surpasses models like Qwen2.5-VL-7B and GPT-4o on the MMAD benchmark.

02

Visual juxtaposition improves defect segmentation accuracy.

03

Domain knowledge injection enhances reasoning quality.

Abstract

Industrial anomaly detection has been significantly advanced by Large Multimodal Models (LMMs), enabling diverse human instructions beyond detection, particularly through visually grounded reasoning for better image understanding. However, LMMs lack domain-specific knowledge, which limits their ability to generate accurate responses in complex industrial scenarios. In this work, we present JUDO, Juxtaposed Domain-Oriented Multimodal Reasoner, a framework that efficiently incorporates domain knowledge and context in visual and textual reasoning. Through visual reasoning, our model segments the defect region by juxtaposing query images with normal images as visual domain context, enabling a fine-grained visual comparative inspection. Furthermore, we inject domain knowledge through supervised fine-tuning (SFT) to enhance context understanding and subsequently guide domain reasoning through…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

JUDO: A Juxtaposed Domain-Oriented Multimodal Reasoner for Industrial Anomaly QA· slideslive