Structural Interpretations of Protein Language Model Representations via Differentiable Graph Partitioning

Siddhant Dutta; Edward Tan Beng Wai; Soumick Sarker; Pasan Gunawardane; Jagath C. Rajapakse

arXiv:2605.10985·cs.LG·May 13, 2026

Structural Interpretations of Protein Language Model Representations via Differentiable Graph Partitioning

Siddhant Dutta, Edward Tan Beng Wai, Soumick Sarker, Pasan Gunawardane, Jagath C. Rajapakse

PDF

TL;DR

This paper introduces SoftBlobGIN, a graph neural network framework that interprets protein language model representations by extracting structural and functional insights, improving interpretability and downstream task performance.

Contribution

The authors propose a plug-and-play method that projects protein language model features onto contact graphs and applies differentiable pooling for structural interpretability without retraining the language model.

Findings

01

SoftBlobGIN achieves 92.8% enzyme classification accuracy.

02

GNNExplainer recovers biologically meaningful active-site residues.

03

Residue AUROC improves from 0.885 to 0.983 on binding-site detection.

Abstract

Protein language models such as ESM-2 learn rich residue representations that achieve strong performance on protein function prediction, but their features remain difficult to interpret as structural $&$ evolutionary signals are encoded in dense latent spaces. We propose a plug- $&$ -play framework that projects ESM-2 representations onto protein contact graphs $&$ applies $SoftBlobGIN$ , a lightweight Graph Isomorphism Network with differentiable Gumbel-softmax substructure pooling, to perform structure-aware message passing $&$ learn coarse functional substructures for downstream prediction tasks. Across enzyme classification, SoftBlobGIN achieves 92.8\% accuracy $&$ 0.898 macro-F1. Unlike post hoc analysis of protein language models alone, our method produces directly auditable structural explanations: GNNExplainer recovers biologically meaningful active-site residues,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.