TL;DR
ConceptTracer is an interactive tool that uses information-theoretic measures to analyze and interpret neural network representations by identifying neurons responsive to human-understandable concepts.
Contribution
It introduces a novel framework combining saliency and selectivity measures for exploring concept encoding in neural networks, demonstrated on TabPFN.
Findings
Facilitates discovery of interpretable neurons in neural networks
Enables analysis of concept saliency and selectivity in representations
Provides a practical framework for understanding concept encoding
Abstract
Neural networks deliver impressive predictive performance across a variety of tasks, but they are often opaque in their decision-making processes. Despite a growing interest in mechanistic interpretability, tools for systematically exploring the representations learned by neural networks in general, and tabular foundation models in particular, remain limited. In this work, we introduce ConceptTracer, an interactive application for analyzing neural representations through the lens of human-interpretable concepts. ConceptTracer integrates two information-theoretic measures that quantify concept saliency and selectivity, enabling researchers and practitioners to identify neurons that respond strongly to individual concepts. We demonstrate the utility of ConceptTracer on representations learned by TabPFN and show that our approach facilitates the discovery of interpretable neurons.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
