An interpretation of the final fully connected layer
Siddhartha

TL;DR
This paper introduces a novel method to interpret the weights of the final fully connected layer in image classification neural networks by linking supervised learning to policy gradient concepts, enabling identification of key image regions.
Contribution
It proposes a new interpretation technique that does not assume specific network architectures and is computationally efficient, connecting supervised learning with reinforcement learning principles.
Findings
Identifies discriminative image regions effectively
Works with various pre-trained models
Provides insights into neural network decision-making
Abstract
In recent years neural networks have achieved state-of-the-art accuracy for various tasks but the the interpretation of the generated outputs still remains difficult. In this work we attempt to provide a method to understand the learnt weights in the final fully connected layer in image classification models. We motivate our method by drawing a connection between the policy gradient objective in RL and supervised learning objective. We suggest that the commonly used cross entropy based supervised learning objective can be regarded as a special case of the policy gradient objective. Using this insight we propose a method to find the most discriminative and confusing parts of an image. Our method does not make any prior assumption about neural network achitecture and has low computational cost. We apply our method on publicly available pre-trained models and report the generated results.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI)
