Sparse Autoencoders Reveal Interpretable Structure in Small Gene Language Models

Haoxiang Guan; Jiyan He; Jie Zhang

arXiv:2507.07486·q-bio.OT·July 11, 2025

Sparse Autoencoders Reveal Interpretable Structure in Small Gene Language Models

Haoxiang Guan, Jiyan He, Jie Zhang

PDF

Open Access

TL;DR

This paper demonstrates that sparse autoencoders can interpret small gene language models by uncovering biologically meaningful genomic features, indicating that even compact models learn structured representations.

Contribution

It shows that sparse autoencoders effectively reveal interpretable genomic features in small gene language models, expanding understanding of model interpretability across sizes.

Findings

01

Small gene models encode biologically relevant features.

02

SAEs uncover transcription factor binding motifs.

03

Small models learn structured genomic representations.

Abstract

Sparse autoencoders (SAEs) have recently emerged as a powerful tool for interpreting the internal representations of large language models (LLMs), revealing latent latent features with semantical meaning. This interpretability has also proven valuable in biological domains: applying SAEs to protein language models uncovered meaningful features related to protein structure and function. More recently, SAEs have been used to analyze genomics-focused models such as Evo 2, identifying interpretable features in gene sequences. However, it remains unclear whether SAEs can extract meaningful representations from small gene language models, which have fewer parameters and potentially less expressive capacity. To address it, we propose applying SAEs to the activations of a small gene language model. We demonstrate that even small-scale models encode biologically relevant genomic features, such…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBioinformatics and Genomic Networks · Genomics and Rare Diseases · Biomedical Text Mining and Ontologies