Capturing Polysemanticity with PRISM: A Multi-Concept Feature Description Framework

Laura Kopf; Nils Feldhus; Kirill Bykov; Philine Lou Bommer; Anna Hedstr\"om; Marina M.-C. H\"ohne; Oliver Eberle

arXiv:2506.15538·cs.LG·November 13, 2025

Capturing Polysemanticity with PRISM: A Multi-Concept Feature Description Framework

Laura Kopf, Nils Feldhus, Kirill Bykov, Philine Lou Bommer, Anna Hedstr\"om, Marina M.-C. H\"ohne, Oliver Eberle

PDF

Open Access 1 Video

TL;DR

PRISM is a new framework for interpreting large language models that captures both single and multiple concepts encoded in neurons, improving the accuracy and depth of feature descriptions over existing methods.

Contribution

PRISM introduces a novel approach to identify and score polysemantic features in LLMs, addressing limitations of monosemantic assumptions in interpretability methods.

Findings

01

PRISM outperforms existing methods in description quality.

02

PRISM effectively captures polysemantic neuron behaviors.

03

Benchmark results show improved interpretability metrics.

Abstract

Automated interpretability research aims to identify concepts encoded in neural network features to enhance human understanding of model behavior. Within the context of large language models (LLMs) for natural language processing (NLP), current automated neuron-level feature description methods face two key challenges: limited robustness and the assumption that each neuron encodes a single concept (monosemanticity), despite increasing evidence of polysemanticity. This assumption restricts the expressiveness of feature descriptions and limits their ability to capture the full range of behaviors encoded in model internals. To address this, we introduce Polysemantic FeatuRe Identification and Scoring Method (PRISM), a novel framework specifically designed to capture the complexity of features in LLMs. Unlike approaches that assign a single description per neuron, common in many automated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Capturing Polysemanticity with PRISM: A Multi-Concept Feature Description Framework· slideslive

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Multimodal Machine Learning Applications