Towards Open-Ended Visual Scientific Discovery with Sparse Autoencoders
Samuel Stevens, Jacob Beattie, Tanya Berger-Wolf, Yu Su

TL;DR
This paper explores how sparse autoencoders can facilitate open-ended scientific discovery by extracting meaningful features from foundation model representations across various scientific domains, demonstrated through ecological imagery.
Contribution
It introduces a domain-agnostic method using sparse autoencoders for uncovering unknown patterns in foundation model representations, advancing beyond targeted structure extraction.
Findings
SAEs align with semantic concepts in segmentation tasks
Surface fine-grained anatomical structures without labels
Applicable across diverse scientific data types
Abstract
Scientific archives now contain hundreds of petabytes of data across genomics, ecology, climate, and molecular biology that could reveal undiscovered patterns if systematically analyzed at scale. Large-scale, weakly-supervised datasets in language and vision have driven the development of foundation models whose internal representations encode structure (patterns, co-occurrences and statistical regularities) beyond their training objectives. Most existing methods extract structure only for pre-specified targets; they excel at confirmation but do not support open-ended discovery of unknown patterns. We ask whether sparse autoencoders (SAEs) can enable open-ended feature discovery from foundation model representations. We evaluate this question in controlled rediscovery studies, where the learned SAE features are tested for alignment with semantic concepts on a standard segmentation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Biomedical Text Mining and Ontologies
