Visual Exploration of Feature Relationships in Sparse Autoencoders with Curated Concepts

Xinyuan Yan; Shusen Liu; Kowshik Thopalli; Bei Wang

arXiv:2511.06048·cs.CL·November 11, 2025

Visual Exploration of Feature Relationships in Sparse Autoencoders with Curated Concepts

Xinyuan Yan, Shusen Liu, Kowshik Thopalli, Bei Wang

PDF

Open Access

TL;DR

This paper introduces an interactive visualization system for exploring sparse autoencoder features in large language models, focusing on curated concepts to improve interpretability and analysis of feature relationships.

Contribution

It presents a focused exploration framework and a hybrid visualization approach that combines topology-based encoding with dimensionality reduction for better interpretability.

Findings

01

Enables targeted analysis of SAE features related to curated concepts.

02

Reduces issues of overplotting and neighborhood distortion in visualizations.

03

Facilitates deeper understanding of concept representations in latent space.

Abstract

Sparse autoencoders (SAEs) have emerged as a powerful tool for uncovering interpretable features in large language models (LLMs) through the sparse directions they learn. However, the sheer number of extracted directions makes comprehensive exploration intractable. While conventional embedding techniques such as UMAP can reveal global structure, they suffer from limitations including high-dimensional compression artifacts, overplotting, and misleading neighborhood distortions. In this work, we propose a focused exploration framework that prioritizes curated concepts and their corresponding SAE features over attempts to visualize all available features simultaneously. We present an interactive visualization system that combines topology-based visual encoding with dimensionality reduction to faithfully represent both local and global relationships among selected features. This hybrid…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Explainable Artificial Intelligence (XAI) · Data Visualization and Analytics