Interpreting and Steering Protein Language Models through Sparse   Autoencoders

Edith Natalia Villegas Garcia; Alessio Ansuini

arXiv:2502.09135·cs.LG·February 14, 2025

Interpreting and Steering Protein Language Models through Sparse Autoencoders

Edith Natalia Villegas Garcia, Alessio Ansuini

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper uses sparse autoencoders to interpret internal representations of protein language models, linking latent components to protein features and guiding sequence generation for targeted biological properties.

Contribution

It introduces a method to interpret and steer protein language models using sparse autoencoders, enhancing understanding and control over model outputs.

Findings

01

Identified latent components associated with protein features like transmembrane regions and motifs.

02

Demonstrated guiding sequence generation towards specific targets such as zinc finger domains.

03

Provided insights into the interpretability of biological sequence models.

Abstract

The rapid advancements in transformer-based language models have revolutionized natural language processing, yet understanding the internal mechanisms of these models remains a significant challenge. This paper explores the application of sparse autoencoders (SAE) to interpret the internal representations of protein language models, specifically focusing on the ESM-2 8M parameter model. By performing a statistical analysis on each latent component's relevance to distinct protein annotations, we identify potential interpretations linked to various protein characteristics, including transmembrane regions, binding sites, and specialized motifs. We then leverage these insights to guide sequence generation, shortlisting the relevant latent components that can steer the model towards desired targets such as zinc finger domains. This work contributes to the emerging field of mechanistic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

edithvillegas/plm-sae
pytorchOfficial

Models

🤗
evillegasgarcia/sae_esm2_6_l3
model· 1 dl
1 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Bioinformatics · Topic Modeling · Biomedical Text Mining and Ontologies