Visual Exploration of Feature Relationships in Sparse Autoencoders with Curated Concepts
Xinyuan Yan, Shusen Liu, Kowshik Thopalli, Bei Wang

TL;DR
This paper introduces an interactive visualization system for exploring sparse autoencoder features in large language models, focusing on curated concepts to improve interpretability and analysis of feature relationships.
Contribution
It presents a focused exploration framework and a hybrid visualization approach that combines topology-based encoding with dimensionality reduction for better interpretability.
Findings
Enables targeted analysis of SAE features related to curated concepts.
Reduces issues of overplotting and neighborhood distortion in visualizations.
Facilitates deeper understanding of concept representations in latent space.
Abstract
Sparse autoencoders (SAEs) have emerged as a powerful tool for uncovering interpretable features in large language models (LLMs) through the sparse directions they learn. However, the sheer number of extracted directions makes comprehensive exploration intractable. While conventional embedding techniques such as UMAP can reveal global structure, they suffer from limitations including high-dimensional compression artifacts, overplotting, and misleading neighborhood distortions. In this work, we propose a focused exploration framework that prioritizes curated concepts and their corresponding SAE features over attempts to visualize all available features simultaneously. We present an interactive visualization system that combines topology-based visual encoding with dimensionality reduction to faithfully represent both local and global relationships among selected features. This hybrid…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Explainable Artificial Intelligence (XAI) · Data Visualization and Analytics
