Toward the Explainability of Protein Language Models

Andrea Hunklinger; Noelia Ferruz

arXiv:2506.19532·q-bio.BM·December 8, 2025

Toward the Explainability of Protein Language Models

Andrea Hunklinger, Noelia Ferruz

PDF

TL;DR

This paper reviews the application of explainable AI techniques to protein language models, highlighting their potential to improve understanding, trust, and utility in protein research and design.

Contribution

It categorizes the workflow of protein AI modeling into four contexts, identifies five roles of explainability, and discusses future challenges and directions.

Findings

01

XAI can serve as Evaluator, Multitasker, Engineer, Coach, and Teacher in protein research.

02

The Evaluator role is the most widely adopted among these roles.

03

Future needs include benchmarks, open-source tools, visualizations, and wet-lab validation.

Abstract

Protein language models (pLMs) excel in a variety of tasks that range from structure prediction to the design of functional enzymes. However, these models operate as black boxes, and their underlying working principles remain unclear. Here, we survey emerging applications of explainable artificial intelligence (XAI) to pLMs and describe the potential of XAI in protein research. We divide the workflow of protein AI modeling into four information contexts: (i) training sequences, (ii) input prompt, (iii) model architecture, and (iv) input-output pairs. For each, we describe existing methods and applications of XAI. Additionally, from published studies we distil five (potential) roles that XAI can play in protein research: Evaluator, Multitasker, Engineer, Coach, and Teacher, with the Evaluator role being the only one widely adopted so far. These roles aim to help both protein scientists…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLayer Normalization · Dropout · Absolute Position Encodings · Dense Connections · Byte Pair Encoding · Softmax · Label Smoothing · Transformer