Sparse Autoencoders Reveal Interpretable Structure in Small Gene Language Models
Haoxiang Guan, Jiyan He, Jie Zhang

TL;DR
This paper demonstrates that sparse autoencoders can interpret small gene language models by uncovering biologically meaningful genomic features, indicating that even compact models learn structured representations.
Contribution
It shows that sparse autoencoders effectively reveal interpretable genomic features in small gene language models, expanding understanding of model interpretability across sizes.
Findings
Small gene models encode biologically relevant features.
SAEs uncover transcription factor binding motifs.
Small models learn structured genomic representations.
Abstract
Sparse autoencoders (SAEs) have recently emerged as a powerful tool for interpreting the internal representations of large language models (LLMs), revealing latent latent features with semantical meaning. This interpretability has also proven valuable in biological domains: applying SAEs to protein language models uncovered meaningful features related to protein structure and function. More recently, SAEs have been used to analyze genomics-focused models such as Evo 2, identifying interpretable features in gene sequences. However, it remains unclear whether SAEs can extract meaningful representations from small gene language models, which have fewer parameters and potentially less expressive capacity. To address it, we propose applying SAEs to the activations of a small gene language model. We demonstrate that even small-scale models encode biologically relevant genomic features, such…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBioinformatics and Genomic Networks · Genomics and Rare Diseases · Biomedical Text Mining and Ontologies
