Concept Probing: Where to Find Human-Defined Concepts (Extended Version)

Manuel de Sousa Ribeiro; Afonso Leote; Jo\~ao Leite

arXiv:2507.18681·cs.LG·July 28, 2025

Concept Probing: Where to Find Human-Defined Concepts (Extended Version)

Manuel de Sousa Ribeiro, Afonso Leote, Jo\~ao Leite

PDF

Open Access

TL;DR

This paper introduces a method to automatically identify the most informative neural network layer for probing human-defined concepts, enhancing interpretability of model internal representations.

Contribution

It proposes an automated approach to select the optimal layer for concept probing based on informativeness and regularity, validated through extensive empirical analysis.

Findings

01

The method effectively identifies the best layer for concept probing.

02

Probing results vary significantly across layers and models.

03

Automated layer selection improves interpretability of neural networks.

Abstract

Concept probing has recently gained popularity as a way for humans to peek into what is encoded within artificial neural networks. In concept probing, additional classifiers are trained to map the internal representations of a model into human-defined concepts of interest. However, the performance of these probes is highly dependent on the internal representations they probe from, making identifying the appropriate layer to probe an essential task. In this paper, we propose a method to automatically identify which layer's representations in a neural network model should be considered when probing for a given human-defined concept of interest, based on how informative and regular the representations are with respect to the concept. We validate our findings through an exhaustive empirical analysis over different neural network models and datasets.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Advanced Graph Neural Networks · Domain Adaptation and Few-Shot Learning