# Machine learning application to predict binding affinity between peptide containing non-canonical amino acids and HLA-A0201

**Authors:** Shan Jiang, Zhaoqian Su, Nathaniel Bloodworth, Yunchao Liu, Cristina E. Martina, David G. Harrison, Jens Meiler

PMC · DOI: 10.1371/journal.pone.0314833 · 2025-06-27

## TL;DR

This paper introduces a machine learning tool to predict how well peptides with non-standard amino acids bind to MHC-Ι proteins, which is important for immune response and drug design.

## Contribution

The paper presents a novel machine learning model for predicting MHC-Ι binding affinity of peptides with non-canonical amino acids.

## Key findings

- The model achieved an R2 value of 0.477 and RMSE of 0.735 in 5-fold cross-validation.
- It outperforms existing tools for peptides with non-canonical amino acids.
- The model can aid in designing more effective peptide-based therapeutics.

## Abstract

Class Ι major histocompatibility complexes (MHC-Ι), encoded by the highly polymorphic HLA-A, HLA-B, and HLA-C genes in humans, are expressed on all nucleated cells. Both self and foreign proteins are processed to peptides of 8–10 amino acids, loaded into MHC-Ι, within the endoplasmic reticulum and then presented on the cell surface. Foreign peptides presented in this fashion activate CD8 + T cells and their immunogenicity correlates with their affinity for the MHC-Ι binding groove. Thus, predicting antigen binding affinity for MHC-Ι is a valuable tool for identifying potentially immunogenic antigens. While quite a few predictors for MHC-Ι binding exist, there are no currently available tools that can predict antigen/MHC-Ι binding affinity for antigens with explicitly labeled post-translational modifications or unusual/non-canonical amino acids (NCAAs). However, such modifications are increasingly recognized as critical mediators of peptide immunogenicity. In this work, we propose a machine learning application that quantifies the binding affinity of epitopes containing NCAAs to MHC-Ι and compares its performance with other commonly used regressors. Our model demonstrates robust performance, with 5-fold cross-validation yielding an R2 value of 0.477 and a root-mean-square error (RMSE) of 0.735, indicating strong predictive capability for peptides with NCAAs. This work provides a valuable tool for the computational design and optimization of peptides incorporating NCAAs, potentially accelerating the development of novel peptide-based therapeutics with enhanced properties and efficacy.

## Linked entities

- **Genes:** HLA-A (major histocompatibility complex, class I, A) [NCBI Gene 3105], HLA-B (major histocompatibility complex, class I, B) [NCBI Gene 3106], HLA-C (major histocompatibility complex, class I, C) [NCBI Gene 3107]

## Full-text entities

- **Genes:** HLA-B (major histocompatibility complex, class I, B) [NCBI Gene 3106] {aka AS, B-4901, HLAB}, CD8A (CD8 subunit alpha) [NCBI Gene 925] {aka CD8, CD8alpha, IMD116, Leu2, p32}, HLA-C (major histocompatibility complex, class I, C) [NCBI Gene 3107] {aka D6S204, HLA-JY3, HLAC, HLC-C, MHC, PSORS1}, HLA-A (major histocompatibility complex, class I, A) [NCBI Gene 3105] {aka HLAA}
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

11 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12204577/full.md

---
Source: https://tomesphere.com/paper/PMC12204577