Architecture-Aware Explanation Auditing for Industrial Visual Inspection

Sibo Jia; Zihang Zhao; Kunrong Li

arXiv:2605.14255·cs.LG·May 19, 2026

Architecture-Aware Explanation Auditing for Industrial Visual Inspection

Sibo Jia, Zihang Zhao, Kunrong Li

PDF

TL;DR

This paper introduces an architecture-aware explanation audit protocol for industrial visual inspection models, revealing how explanation faithfulness depends on model structure and guiding better explanation design.

Contribution

It formalizes an explanation auditing method grounded in model architecture, demonstrating the importance of readout structure for faithful explanations in industrial vision models.

Findings

01

ViT-Tiny + Attention Rollout has lower Deletion AUC than other models despite accuracy.

02

Swin-Tiny's spatial hierarchy makes it compatible with Grad-CAM, highlighting readout structure importance.

03

Native explanation methods are less faithful than model-agnostic approaches like RISE.

Abstract

Industrial visual inspection systems increasingly rely on deep classifiers whose heatmap explanations may appear visually plausible while failing to identify the image regions that actually drive model decisions. This paper operationalizes an architecture-aware explanation audit protocol grounded in the native-readout hypothesis: the perturbation-based faithfulness of an explanation method is bounded by its structural distance from the model's native decision mechanism. On WM-811K wafer maps (9 classes, 172k images) under a three-seed zero-fill perturbation protocol, ViT-Tiny + Attention Rollout attains Deletion AUC 0.211 against 0.432-0.525 for Swin-Tiny / ResNet18+CBAM / DenseNet121 + Grad-CAM (abs(Cohen's d) > 1.1), despite lower classification accuracy. Swin-Tiny disentangles architecture family from readout structure: despite being a Transformer, its spatial feature-map hierarchy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.