Accelerating MHC-II Epitope Discovery via Multi-Scale Prediction in Antigen Presentation
Yue Wan, Jiayi Yuan, Zhiwei Feng, Xiaowei Jia

TL;DR
This paper introduces a comprehensive dataset and a multi-scale evaluation framework for improving computational prediction of MHC-II antigen presentation, addressing existing challenges in data scarcity and complexity.
Contribution
It provides a curated, standardized dataset and benchmarks multiple models across three key ML tasks in MHC-II epitope prediction, advancing the field.
Findings
New dataset with richer biological context for MHC-II
Benchmarking results of existing models across multiple tasks
Analysis of modeling strategies for improved predictions
Abstract
Antigenic epitope presented by major histocompatibility complex II (MHC-II) proteins plays an essential role in immunotherapy. However, compared to the more widely studied MHC-I in computational immunotherapy, the study of MHC-II antigenic epitope poses significantly more challenges due to its complex binding specificity and ambiguous motif patterns. Consequently, existing datasets for MHC-II interactions are smaller and less standardized than those available for MHC-I. To address these challenges, we present a well-curated dataset derived from the Immune Epitope Database (IEDB) and other public sources. It not only extends and standardizes existing peptide-MHC-II datasets, but also introduces a novel antigen-MHC-II dataset with richer biological context. Leveraging this dataset, we formulate three major machine learning (ML) tasks of peptide binding, peptide presentation, and antigen…
Peer Reviews
Decision·Submitted to ICLR 2026
This manuscript introduces antigen-level prediction for MHC-II, with a novel evaluation metric (CR-AUC). In addition, it has rigorous dataset curation, extensive experiments, and thorough ablation studies. The manuscript is well organized and clearly written. Furthermore, it provides a good resource for the community and advances the modeling of MHC-II antigen presentation.
The model underperforms in peptide-binding affinity (BA) prediction compared to state-of-the-art methods like RPEMHC and NetMHCIIpan4.3, an issue the authors attribute to checkpoint selection bias but which could be addressed more systematically. Furthermore, the study is limited to single-allele data and does not handle the more complex but common real-world scenario of multi-allelic mass spectrometry samples. The antigen-level modeling is also potentially impacted by missing antigen annotation
The authors collect a large dataset of 1.2M peptides with 134 unique human MHC-II, labeled as positives or negatives depending on the interaction and whether the peptide is presented or not. This is certainly a welcome and valuable resource. They introduce an antigen-level prediction task and evaluation framework. They enrich the dataset with annotations from multiple sources, such as ESM2 residue embeddings, predicted peptide binding motifs from MoDec, inferred structures by AlphaFold3, and ot
One issue that the authors discuss is that the data for some MHC is very unbalanced. They attempt to rebalance by generating negatives by ad hoc data augmentation techniques. The paper has little representation learning content, so my only concern is whether ICLR is the best venue for this work. In fact, the model they use for predictions is only described in the Appendix, with little mention given in the main text.
1. Modeling MHC-II antigen presentation directly from the antigen-protein perspective is an interesting and meaningful research direction. 2. The combination of MHC ESM embeddings and structural features effectively enhances predictive performance. 3. The description of the dataset and modeling details is clear, well-structured, and easy to follow.
1. The novelty of the work is limited. a) The proposed network architecture is highly similar to ImmuScope, with only minor modifications. b) The use of joint training (as in NetMHCpan-4.0), binding core prediction (NetMHCIIpan-3.2), ESM embeddings (HLAIIpolo, Pep2Vec), and structural features (NetMHCpan-4.2) for performance enhancement is already well-established in the field of MHC-II peptide presentation. c) Moreover, recent versions of NetMHCIIpan (≥4.0) have already incorporated peptide
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topicsvaccines and immunoinformatics approaches · Immunotherapy and Immune Responses · T-cell and B-cell Immunology
