Structural Interpretations of Protein Language Model Representations via Differentiable Graph Partitioning
Siddhant Dutta, Edward Tan Beng Wai, Soumick Sarker, Pasan Gunawardane, Jagath C. Rajapakse

TL;DR
This paper introduces SoftBlobGIN, a graph neural network framework that interprets protein language model representations by extracting structural and functional insights, improving interpretability and downstream task performance.
Contribution
The authors propose a plug-and-play method that projects protein language model features onto contact graphs and applies differentiable pooling for structural interpretability without retraining the language model.
Findings
SoftBlobGIN achieves 92.8% enzyme classification accuracy.
GNNExplainer recovers biologically meaningful active-site residues.
Residue AUROC improves from 0.885 to 0.983 on binding-site detection.
Abstract
Protein language models such as ESM-2 learn rich residue representations that achieve strong performance on protein function prediction, but their features remain difficult to interpret as structural evolutionary signals are encoded in dense latent spaces. We propose a plug--play framework that projects ESM-2 representations onto protein contact graphs applies , a lightweight Graph Isomorphism Network with differentiable Gumbel-softmax substructure pooling, to perform structure-aware message passing learn coarse functional substructures for downstream prediction tasks. Across enzyme classification, SoftBlobGIN achieves 92.8\% accuracy 0.898 macro-F1. Unlike post hoc analysis of protein language models alone, our method produces directly auditable structural explanations: GNNExplainer recovers biologically meaningful active-site residues,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
