Discovering Chunks in Neural Embeddings for Interpretability

Shuchen Wu; Stephan Alaniz; Eric Schulz; Zeynep Akata

arXiv:2502.01803·cs.LG·February 5, 2025

Discovering Chunks in Neural Embeddings for Interpretability

Shuchen Wu, Stephan Alaniz, Eric Schulz, Zeynep Akata

PDF

Open Access

TL;DR

This paper introduces a novel framework for interpreting neural networks by identifying and extracting recurring chunks in neural embeddings, inspired by human cognition, to better understand their internal representations.

Contribution

It demonstrates how to extract interpretable chunks from neural embeddings in RNNs and large language models, providing a new approach to understanding neural population activity.

Findings

01

Hidden states reflect imposed regularities in RNNs.

02

Recurring embedding states correspond to concepts in LLMs.

03

Perturbations to embedding states influence concept activation.

Abstract

Understanding neural networks is challenging due to their high-dimensional, interacting components. Inspired by human cognition, which processes complex sensory data by chunking it into recurring entities, we propose leveraging this principle to interpret artificial neural population activities. Biological and artificial intelligence share the challenge of learning from structured, naturalistic data, and we hypothesize that the cognitive mechanism of chunking can provide insights into artificial systems. We first demonstrate this concept in recurrent neural networks (RNNs) trained on artificial sequences with imposed regularities, observing that their hidden states reflect these patterns, which can be extracted as a dictionary of chunks that influence network responses. Extending this to large language models (LLMs) like LLaMA, we identify similar recurring embedding states…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsLLaMA