Architecture-Aware Explanation Auditing for Industrial Visual Inspection
Sibo Jia, Zihang Zhao, Kunrong Li

TL;DR
This paper introduces an architecture-aware explanation audit protocol for industrial visual inspection models, revealing how explanation faithfulness depends on model structure and guiding better explanation design.
Contribution
It formalizes an explanation auditing method grounded in model architecture, demonstrating the importance of readout structure for faithful explanations in industrial vision models.
Findings
ViT-Tiny + Attention Rollout has lower Deletion AUC than other models despite accuracy.
Swin-Tiny's spatial hierarchy makes it compatible with Grad-CAM, highlighting readout structure importance.
Native explanation methods are less faithful than model-agnostic approaches like RISE.
Abstract
Industrial visual inspection systems increasingly rely on deep classifiers whose heatmap explanations may appear visually plausible while failing to identify the image regions that actually drive model decisions. This paper operationalizes an architecture-aware explanation audit protocol grounded in the native-readout hypothesis: the perturbation-based faithfulness of an explanation method is bounded by its structural distance from the model's native decision mechanism. On WM-811K wafer maps (9 classes, 172k images) under a three-seed zero-fill perturbation protocol, ViT-Tiny + Attention Rollout attains Deletion AUC 0.211 against 0.432-0.525 for Swin-Tiny / ResNet18+CBAM / DenseNet121 + Grad-CAM (abs(Cohen's d) > 1.1), despite lower classification accuracy. Swin-Tiny disentangles architecture family from readout structure: despite being a Transformer, its spatial feature-map hierarchy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
