# PLM-eXplain: divide and conquer the protein embedding space

**Authors:** Jan van Eck, Dea Gogishvili, Wilson Silva, Sanne Abeln

PMC · DOI: 10.1093/bioinformatics/btaf631 · Bioinformatics · 2025-11-21

## TL;DR

PLM-eXplain improves the interpretability of protein language models by splitting their embeddings into interpretable and predictive components, without losing accuracy.

## Contribution

PLM-eXplain introduces an explainable adapter layer that factors PLM embeddings into interpretable biochemical features and predictive residuals.

## Key findings

- PLM-X maintains high predictive performance while incorporating interpretable biochemical features like secondary structure and hydropathy.
- The method was successfully applied to three classification tasks: extracellular vesicle association, transmembrane helix prediction, and aggregation propensity prediction.
- PLM-X offers a generalizable solution for enhancing the interpretability of protein language models across various applications.

## Abstract

Protein language models (PLMs) have revolutionized computational biology through their ability to generate powerful sequence representations for diverse prediction tasks. However, their black-box nature limits biological interpretation and translation to actionable insights. Bridging this gap requires approaches that maintain predictive performance while providing interpretable explanations of model behaviour.

We present PLM-eXplain (PLM-X), an explainable adapter layer that bridges this gap by factoring PLM embeddings into two complementary components: an interpretable subspace based on established biochemical features, and a residual subspace that retains predictive, non-interpretable information. Using embeddings from ESM2 and ProtBert, PLM-X incorporates well-established properties, including secondary structure and hydropathy, while maintaining high predictive performance. We demonstrate the effectiveness of our approach across three biologically relevant classification tasks: extracellular vesicle association, transmembrane helix prediction, and aggregation propensity prediction. PLM-X enables biological interpretation of model decisions without sacrificing accuracy, offering a generalizable solution for enhancing PLM interpretability across various downstream applications.

Source code and models are available at https://github.com/AIT4LIFE-UU/PLM-eXplain/.

## Full-text entities

- **Genes:** SS3 (Sarcoidosis, susceptibility to, 3) [NCBI Gene 100196919], FXYD1 (FXYD domain containing ion transport regulator 1) [NCBI Gene 5348] {aka PLM}, LEP (leptin) [NCBI Gene 3952] {aka LEPD, OB, OBS}, ITIH2 (inter-alpha-trypsin inhibitor heavy chain 2) [NCBI Gene 3698] {aka H2P, ITI-HC2, SHAP}
- **Diseases:** Alzheimer (MESH:D000544), amyloid (MESH:C000718787), X (MESH:D000326), PLMs (MESH:D007806)
- **Chemicals:** Histidine (MESH:D006639), lipid (MESH:D008055), amino acid (MESH:D000596), Cysteine (MESH:D003545)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12790820/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12790820/full.md

## References

41 references — full list in the complete paper: https://tomesphere.com/paper/PMC12790820/full.md

---
Source: https://tomesphere.com/paper/PMC12790820