Interpreting and Steering Protein Language Models through Sparse Autoencoders
Edith Natalia Villegas Garcia, Alessio Ansuini

TL;DR
This paper uses sparse autoencoders to interpret internal representations of protein language models, linking latent components to protein features and guiding sequence generation for targeted biological properties.
Contribution
It introduces a method to interpret and steer protein language models using sparse autoencoders, enhancing understanding and control over model outputs.
Findings
Identified latent components associated with protein features like transmembrane regions and motifs.
Demonstrated guiding sequence generation towards specific targets such as zinc finger domains.
Provided insights into the interpretability of biological sequence models.
Abstract
The rapid advancements in transformer-based language models have revolutionized natural language processing, yet understanding the internal mechanisms of these models remains a significant challenge. This paper explores the application of sparse autoencoders (SAE) to interpret the internal representations of protein language models, specifically focusing on the ESM-2 8M parameter model. By performing a statistical analysis on each latent component's relevance to distinct protein annotations, we identify potential interpretations linked to various protein characteristics, including transmembrane regions, binding sites, and specialized motifs. We then leverage these insights to guide sequence generation, shortlisting the relevant latent components that can steer the model towards desired targets such as zinc finger domains. This work contributes to the emerging field of mechanistic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Bioinformatics · Topic Modeling · Biomedical Text Mining and Ontologies
