GLUScope: A Tool for Analyzing GLU Neurons in Transformer Language Models
Sebastian Gerstner, Hinrich Sch\"utze

TL;DR
GLUScope is an open-source tool designed to analyze neurons in Transformer language models, especially those with gated activation functions like SwiGLU, providing interpretability insights by examining sign combinations of activations.
Contribution
The paper introduces GLUScope, a novel tool that analyzes neuron activations in recent Transformer models with gated functions, addressing challenges of understanding positive and negative activation combinations.
Findings
Revealed diverse activation sign combinations in neurons.
Provided insights into neuron functionalities with gated activations.
Enabled visualization of neuron behavior across different sign patterns.
Abstract
We present GLUScope, an open-source tool for analyzing neurons in Transformer-based language models, intended for interpretability researchers. We focus on more recent models than previous tools do; specifically we consider gated activation functions such as SwiGLU. This introduces a new challenge: understanding positive activations is not enough. Instead, both the gate and the in activation of a neuron can be positive or negative, leading to four different possible sign combinations that in some cases have quite different functionalities. Accordingly, for any neuron, our tool shows text examples for each of the four sign combinations, and indicates how often each combination occurs. We describe examples of how our tool can lead to novel insights. A demo is available at https: //sjgerstner.github.io/gluscope.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications · Embodied and Extended Cognition
