Concept Probing: Where to Find Human-Defined Concepts (Extended Version)
Manuel de Sousa Ribeiro, Afonso Leote, Jo\~ao Leite

TL;DR
This paper introduces a method to automatically identify the most informative neural network layer for probing human-defined concepts, enhancing interpretability of model internal representations.
Contribution
It proposes an automated approach to select the optimal layer for concept probing based on informativeness and regularity, validated through extensive empirical analysis.
Findings
The method effectively identifies the best layer for concept probing.
Probing results vary significantly across layers and models.
Automated layer selection improves interpretability of neural networks.
Abstract
Concept probing has recently gained popularity as a way for humans to peek into what is encoded within artificial neural networks. In concept probing, additional classifiers are trained to map the internal representations of a model into human-defined concepts of interest. However, the performance of these probes is highly dependent on the internal representations they probe from, making identifying the appropriate layer to probe an essential task. In this paper, we propose a method to automatically identify which layer's representations in a neural network model should be considered when probing for a given human-defined concept of interest, based on how informative and regular the representations are with respect to the concept. We validate our findings through an exhaustive empirical analysis over different neural network models and datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Advanced Graph Neural Networks · Domain Adaptation and Few-Shot Learning
