Insights Into the Inner Workings of Transformer Models for Protein   Function Prediction

Markus Wenzel; Erik Gr\"uner; Nils Strodthoff

arXiv:2309.03631·cs.LG·February 12, 2024·2 cites

Insights Into the Inner Workings of Transformer Models for Protein Function Prediction

Markus Wenzel, Erik Gr\"uner, Nils Strodthoff

PDF

Open Access 1 Repo

TL;DR

This paper applies an extended explainable AI method to transformer models for protein function prediction, revealing biologically relevant amino acids and model attention patterns that correspond to known protein features.

Contribution

It introduces an extension of integrated gradients for inspecting internal transformer representations, linking model attention to biological features in protein sequences.

Findings

01

Identified amino acids important for protein functions that align with biological expectations.

02

Found transformer heads with attention correlating significantly with known protein annotations.

03

Demonstrated the interpretability of transformer models in biological sequence analysis.

Abstract

Motivation: We explored how explainable artificial intelligence (XAI) can help to shed light into the inner workings of neural networks for protein function prediction, by extending the widely used XAI method of integrated gradients such that latent representations inside of transformer models, which were finetuned to Gene Ontology term and Enzyme Commission number prediction, can be inspected too. Results: The approach enabled us to identify amino acids in the sequences that the transformers pay particular attention to, and to show that these relevant sequence parts reflect expectations from biology and chemistry, both in the embedding layer and inside of the model, where we identified transformer heads with a statistically significant correspondence of attribution maps with ground truth sequence annotations (e.g. transmembrane regions, active sites) across many proteins. Availability…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

markuswenzel/xai-proteins
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBioinformatics and Genomic Networks · Computational Drug Discovery Methods · Biomedical Text Mining and Ontologies

MethodsOntology