TL;DR
This paper introduces concept saliency maps for visualizing relevant features in the latent space of deep generative models like VAEs, enhancing interpretability in both image and biological data analysis.
Contribution
It proposes a novel method to generate saliency maps for high-level concepts in unsupervised generative models, extending interpretability tools beyond classification tasks.
Findings
Concept saliency maps highlight important input features for high-level concepts.
Application to CelebA dataset elucidates facial attribute features.
Application to spatial transcriptomics data demonstrates biological interpretability.
Abstract
Evaluating, explaining, and visualizing high-level concepts in generative models, such as variational autoencoders (VAEs), is challenging in part due to a lack of known prediction classes that are required to generate saliency maps in supervised learning. While saliency maps may help identify relevant features (e.g., pixels) in the input for classification tasks of deep neural networks, similar frameworks are understudied in unsupervised learning. Therefore, we introduce a new method of obtaining saliency maps for latent representations of known or novel high-level concepts, often called concept vectors in generative models. Concept scores, analogous to class scores in classification tasks, are defined as dot products between concept vectors and encoded input data, which can be readily used to compute the gradients. The resulting concept saliency maps are shown to highlight input…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsInterpretability
