Label-Free Explainability for Unsupervised Models
Jonathan Crabb\'e, Mihaela van der Schaar

TL;DR
This paper introduces label-free explainability methods for unsupervised models, enabling interpretation without labels by highlighting influential features and training examples, demonstrated on autoencoders.
Contribution
It extends existing explainability techniques to unsupervised settings by proposing label-free importance measures for features and examples.
Findings
Effective in identifying influential features and examples in unsupervised models.
Applicable as wrappers around existing importance methods.
Improves interpretability of autoencoder representations.
Abstract
Unsupervised black-box models are challenging to interpret. Indeed, most existing explainability methods require labels to select which component(s) of the black-box's output to interpret. In the absence of labels, black-box outputs often are representation vectors whose components do not correspond to any meaningful quantity. Hence, choosing which component(s) to interpret in a label-free unsupervised/self-supervised setting is an important, yet unsolved problem. To bridge this gap in the literature, we introduce two crucial extensions of post-hoc explanation techniques: (1) label-free feature importance and (2) label-free example importance that respectively highlight influential features and training examples for a black-box to construct representations at inference time. We demonstrate that our extensions can be successfully implemented as simple wrappers around many existing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Machine Learning in Healthcare · Machine Learning and Data Classification
