SuperActivators: Only the Tail of the Distribution Contains Reliable Concept Signals
Cassandra Goldberg, Chaehyeon Kim, Adam Stein, Eric Wong

TL;DR
This paper reveals that the extreme high tail of in-concept activations provides a reliable signal for concept detection, outperforming traditional methods across modalities and architectures, and enhances feature attribution methods.
Contribution
It introduces the SuperActivator Mechanism, showing that high tail activations reliably indicate concept presence and improves interpretability methods.
Findings
SuperActivator tokens outperform standard concept detection methods by up to 14% F1 score.
The mechanism is consistent across image and text modalities, architectures, and layers.
Using high tail activations improves feature attribution for concepts.
Abstract
Concept vectors aim to enhance model interpretability by linking internal representations with human-understandable semantics, but their utility is often limited by noisy and inconsistent activations. In this work, we uncover a clear pattern within the noise, which we term the SuperActivator Mechanism: while in-concept and out-of-concept activations overlap considerably, the token activations in the extreme high tail of the in-concept distribution provide a reliable signal of concept presence. We demonstrate the generality of this mechanism by showing that SuperActivator tokens consistently outperform standard vector-based and prompting concept detection approaches, achieving up to a 14% higher F1 score across image and text modalities, model architectures, model layers, and concept extraction techniques. Finally, we leverage SuperActivator tokens to improve feature attributions for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Multimodal Machine Learning Applications
