Interpreting BERT architecture predictions for peptide presentation by   MHC class I proteins

Hans-Christof Gasser; Georges Bedran; Bo Ren; David Goodlett; Javier; Alfaro; Ajitha Rajan

arXiv:2111.07137·q-bio.QM·November 16, 2021

Interpreting BERT architecture predictions for peptide presentation by MHC class I proteins

Hans-Christof Gasser, Georges Bedran, Bo Ren, David Goodlett, Javier, Alfaro, Ajitha Rajan

PDF

Open Access 1 Repo

TL;DR

This paper introduces ImmunoBERT, a BERT-based model for predicting peptide presentation by MHC class I proteins, and applies interpretability techniques to understand the model's decision factors, aligning with biological insights.

Contribution

The study presents a novel BERT-based model for MHC I peptide presentation prediction and demonstrates the use of SHAP and LIME interpretability methods in this domain.

Findings

01

Amino acids near peptide terminals are highly influential.

02

Certain MHC positions (A, B, F pockets) are key importance factors.

03

Model predictions align with known biological structures.

Abstract

The major histocompatibility complex (MHC) class-I pathway supports the detection of cancer and viruses by the immune system. It presents parts of proteins (peptides) from inside a cell on its membrane surface enabling visiting immune cells that detect non-self peptides to terminate the cell. The ability to predict whether a peptide will get presented on MHC Class I molecules helps in designing vaccines so they can activate the immune system to destroy the invading disease protein. We designed a prediction model using a BERT-based architecture (ImmunoBERT) that takes as input a peptide and its surrounding regions (N and C-terminals) along with a set of MHC class I (MHC-I) molecules. We present a novel application of well known interpretability techniques, SHAP and LIME, to this domain and we use these results along with 3D structure visualizations and amino acid frequencies to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hcgasser/immunobert
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topicsvaccines and immunoinformatics approaches · Influenza Virus Research Studies · RNA and protein synthesis mechanisms

MethodsShapley Additive Explanations · Local Interpretable Model-Agnostic Explanations