EAGLE: Expert-Augmented Attention Guidance for Tuning-Free Industrial Anomaly Detection in Multimodal Large Language Models
Xiaomeng Peng, Xilang Huang, Seon Han Choi

TL;DR
EAGLE is a tuning-free framework that enhances multimodal large language models for industrial anomaly detection by integrating expert detectors and guiding attention without model fine-tuning.
Contribution
It introduces a novel tuning-free approach combining expert anomaly detectors with frozen MLLMs, improving detection accuracy while maintaining semantic reasoning.
Findings
EAGLE improves anomaly detection accuracy up to 94.4% on MVTec-AD.
EAGLE enhances alignment between MLLM attention and ground-truth defect regions.
The framework performs well across five MLLM backbones without parameter updates.
Abstract
Multimodal large language models (MLLMs) can enrich industrial anomaly detection with semantic descriptions and anomaly reasoning, but they still lag specialist anomaly detectors in binary detection accuracy. Existing approaches address this gap by fine-tuning MLLMs or training bridging modules to align expert outputs with MLLM inputs, limiting flexibility across backbones. We propose EAGLE, a tuning-free framework that integrates expert anomaly detectors with frozen MLLMs. EAGLE consists of Threshold-Guided Prompt Selection (TGPS), which estimates a decision threshold from expert model statistics and selects textual and visual prompts, and Confidence-Aware Attention Sharpening (CAAS), which shifts MLLM attention toward visual evidence when expert confidence is low. Beyond improving accuracy, we analyze MLLM attention and find that correct anomaly predictions are associated with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
