TL;DR
ProtoMech is a novel framework that uncovers and utilizes cross-layer computational circuits in protein language models, enhancing understanding and enabling high-fitness protein design.
Contribution
It introduces cross-layer transcoders to discover sparse, interpretable circuits in pLMs, capturing full model computation and improving protein design capabilities.
Findings
Recovers 82-89% of original model performance in classification tasks.
Identifies compressed circuits using less than 1% of latent space, retaining up to 79% accuracy.
Outperforms baseline methods in over 70% of protein design cases.
Abstract
Protein language models (pLMs) have emerged as powerful predictors of protein structure and function. However, the computational circuits underlying their predictions remain poorly understood. Recent mechanistic interpretability methods decompose pLM representations into interpretable features, but they treat each layer independently and thus fail to capture cross-layer computation, limiting their ability to approximate the full model. We introduce ProtoMech, a framework for discovering computational circuits in pLMs using cross-layer transcoders that learn sparse latent representations jointly across layers to capture the model's full computational circuitry. Applied to the pLM ESM2, ProtoMech recovers 82-89% of the original performance on protein family classification and function prediction tasks. ProtoMech then identifies compressed circuits that use <1% of the latent space while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
