AGOP as Explanation: From Feature Learning to Per-Sample Attribution in Image Classifiers
Raj Kiran Gupta Katakam

TL;DR
This paper introduces AGOP-Weighted, a novel attribution method based on the Average Gradient Outer Product, which improves explanation accuracy for image classifiers and offers zero inference cost variants.
Contribution
The paper proposes AGOP-Weighted and two variants, demonstrating superior attribution performance over existing methods like Integrated Gradients on benchmark datasets.
Findings
AGOP-Weighted achieves 44% higher mIoU than IG on synthetic tasks.
AGOP-Global achieves 7x higher mIoU than IG on multiplicative tasks at zero inference cost.
Both AGOP variants outperform GradCAM and VanillaGrad in benchmark tests.
Abstract
The Average Gradient Outer Product (AGOP) governs feature learning in neural networks: the Neural Feature Ansatz states that weight Gram matrices at each layer align with the corresponding AGOP matrices computed over the training distribution. We ask a complementary question: can this same quantity serve as a post-hoc attribution method for explaining individual predictions? We introduce AGOP-Weighted: a novel attribution method that multiplies the per-sample gradient by sqrt(diag(M) / max diag(M)), a training-distribution prior that suppresses gradient noise and amplifies consistently important pixels -- a combination not present in any prior attribution method. We formalise two companion variants -- AGOP-Local (per-sample gradient, equivalent to VanillaGrad) and AGOP-Global (diag(M) directly as a zero-cost saliency map) -- and implement an efficient training-time accumulation hook;…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
