Explaining, Evaluating and Enhancing Neural Networks' Learned Representations
Marco Bertolini, Djork-Arn\'e Clevert, Floriane Montanari

TL;DR
This paper introduces a new explainability framework for neural networks trained without specific tasks, proposing aggregation methods and evaluation scores that improve representation quality and downstream performance.
Contribution
It presents a novel aggregation method for attribution maps and introduces scores for evaluating informativeness and disentanglement of learned representations.
Findings
Scores correlate with desired properties of representations.
Adopting scores as constraints improves downstream task performance.
Saliency strategies can be independent of model parameters.
Abstract
Most efforts in interpretability in deep learning have focused on (1) extracting explanations of a specific downstream task in relation to the input features and (2) imposing constraints on the model, often at the expense of predictive performance. New advances in (unsupervised) representation learning and transfer learning, however, raise the need for an explanatory framework for networks that are trained without a specific downstream task. We address these challenges by showing how explainability can be an aid, rather than an obstacle, towards better and more efficient representations. Specifically, we propose a natural aggregation method generalizing attribution maps between any two (convolutional) layers of a neural network. Additionally, we employ such attributions to define two novel scores for evaluating the informativeness and the disentanglement of latent embeddings. Extensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Adversarial Robustness in Machine Learning
