DeepDecipher: Accessing and Investigating Neuron Activation in Large Language Models
Albert Garde, Esben Kran, Fazl Barez

TL;DR
DeepDecipher is a user-friendly API and interface that facilitates the analysis and interpretation of neuron activations in large language models, enhancing transparency and understanding of model internals.
Contribution
This work introduces DeepDecipher, a novel tool that makes advanced interpretability techniques accessible and scalable for analyzing transformer-based LLMs.
Findings
Enables efficient neuron analysis in large models
Allows comparison of different models' internal behaviors
Improves transparency and trustworthiness of LLMs
Abstract
As large language models (LLMs) become more capable, there is an urgent need for interpretable and transparent tools. Current methods are difficult to implement, and accessible tools to analyze model internals are lacking. To bridge this gap, we present DeepDecipher - an API and interface for probing neurons in transformer models' MLP layers. DeepDecipher makes the outputs of advanced interpretability techniques for LLMs readily available. The easy-to-use interface also makes inspecting these complex models more intuitive. This paper outlines DeepDecipher's design and capabilities. We demonstrate how to analyze neurons, compare models, and gain insights into model behavior. For example, we contrast DeepDecipher's functionality with similar tools like Neuroscope and OpenAI's Neuron Explainer. DeepDecipher enables efficient, scalable analysis of LLMs. By granting access to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Machine Learning in Materials Science · Ferroelectric and Negative Capacitance Devices
