Immunogenicity Prediction with Dual Attention Enables Vaccine Target Selection
Song Li, Yang Tan, Song Ke, Liang Hong, Bingxin Zhou

TL;DR
VenusVaccine is a deep learning model with dual attention that improves immunogenicity prediction accuracy, aiding vaccine target selection by leveraging extensive datasets and advanced representations.
Contribution
We introduce VenusVaccine, a novel dual attention deep learning framework that integrates protein sequence and structure embeddings for enhanced immunogenicity prediction.
Findings
VenusVaccine outperforms existing methods across multiple metrics.
The comprehensive dataset improves model training and evaluation.
Post-hoc validation confirms practical utility in vaccine design.
Abstract
Immunogenicity prediction is a central topic in reverse vaccinology for finding candidate vaccines that can trigger protective immune responses. Existing approaches typically rely on highly compressed features and simple model architectures, leading to limited prediction accuracy and poor generalizability. To address these challenges, we introduce VenusVaccine, a novel deep learning solution with a dual attention mechanism that integrates pre-trained latent vector representations of protein sequences and structures. We also compile the most comprehensive immunogenicity dataset to date, encompassing over 7000 antigen sequences, structures, and immunogenicity labels from bacteria, virus, and tumor. Extensive experiments demonstrate that VenusVaccine outperforms existing methods across a wide range of evaluation metrics. Furthermore, we establish a post-hoc validation protocol to assess…
Peer Reviews
Decision·ICLR 2025 Poster
- novel model architecture - contribution of the Immuno datasets - ProVaccine achieves mostly top performance as reported in Table 1
- It's not obvious what the contribution of any of the representations (sequence, fine, coarse, and descriptors) is to model performance. What if only the sequence representation is used as input? Doing ablation studies is essential for understanding these contributions, and should be included in this work. - Figure 4 is hard to interpret, it would be easier to understand if instead the likelihood stats of the 11 determined immunogens under each model were reported. - The appendix is missing:
- In my knowledge, the ProVaccine tool is the first method to utilize all three modalities of sequence, structure, and amino-acid descriptors to predict immunogenicity with deep-learning. - The authors have tried multiple PLMs to encode the sequences (ESM2, ANKH, and ProtBert) and structural embedding methods (ESM3, FoldSeek) to encode the structures. - They have compiled a new dataset called IMMUNO (stratified by antigen type- viral, bacterial, and tumour), which should serve as a valuable reso
- Availability of negative dataset is a known problem in this field. The authors have used VAXIJEN to classify sequences as non-antigens and then subsequently filtered the sequences based on sequence-homology to compile the negative dataset. This will bias the negative dataset for certain features which might coincide with the features used in other tools (not just VAXIJEN, AA descriptors used for immunogenicity predictions are generally common across various methods). This might overinflate th
1. This paper proposed a deep learning supervised learning framework with a dual attention mechanism to address the key challenges of immunogenicity prediction in vaccine development. 2. This paper provided a valuable benchmark for future research in vaccine development and immunogenicity prediction.
1. The work is innovative to construct a new benchmark dataset, but they didn't show novelty in algorithm development. The authors need to prove their methods on existed dataset, in addition to their own dataset. 2. The dual-attention mechanism, while sophisticated, may increase the computational cost and complexity of the model significantly. The paper does not thoroughly compare this method against simpler architectures to show that the added complexity justifies the performance gains. 3. The
Code & Models
Videos
Taxonomy
Topicsvaccines and immunoinformatics approaches · Hepatitis B Virus Studies · SARS-CoV-2 and COVID-19 Research
MethodsSoftmax · Attention Is All You Need
