Concept-Guided Interpretability via Neural Chunking

Shuchen Wu; Stephan Alaniz; Shyamgopal Karthik; Peter Dayan; Eric Schulz; Zeynep Akata

arXiv:2505.11576·cs.LG·October 23, 2025

Concept-Guided Interpretability via Neural Chunking

Shuchen Wu, Stephan Alaniz, Shyamgopal Karthik, Peter Dayan, Eric Schulz, Zeynep Akata

PDF

Open Access 1 Video

TL;DR

This paper introduces methods to interpret neural networks by segmenting their activity into meaningful chunks that reflect concepts, demonstrating causal influence and offering a new approach to understanding complex models.

Contribution

It proposes three novel chunking methods for neural interpretability, leveraging cognitive principles and data structure to reveal concept representations in neural networks.

Findings

01

Effective extraction of concept-encoding chunks across architectures

02

Grafting chunks influences model behavior predictably

03

Methods applicable with or without labeled data

Abstract

Neural networks are often described as black boxes, reflecting the significant challenge of understanding their internal workings and interactions. We propose a different perspective that challenges the prevailing view: rather than being inscrutable, neural networks exhibit patterns in their raw population activity that mirror regularities in the training data. We refer to this as the Reflection Hypothesis and provide evidence for this phenomenon in both simple recurrent neural networks (RNNs) and complex large language models (LLMs). Building on this insight, we propose to leverage our cognitive tendency of chunking to segment high-dimensional neural population dynamics into interpretable units that reflect underlying concepts. We propose three methods to extract recurring chunks on a neural population level, complementing each other based on label availability and neural data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Concept-Guided Interpretability via Neural Chunking· slideslive

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Multimodal Machine Learning Applications