# Foundation Protein Language Models for Influenza A Virus T-Cell Epitope Prediction: A Transformer-Based Viroinformatics Framework

**Authors:** Syed Nisar Hussain Bukhari, Kingsley A. Ogudo

PMC · DOI: 10.3390/v18030380 · Viruses · 2026-03-18

## TL;DR

This paper introduces a transformer-based framework using protein language models to accurately predict T-cell epitopes in Influenza A virus, improving vaccine design.

## Contribution

A novel transformer-based viroinformatics framework using ESM-2 embeddings for T-cell epitope prediction in Influenza A virus.

## Key findings

- The model achieves 97% accuracy and 0.99 AUC in predicting T-cell epitopes from Influenza A virus peptides.
- Protein language models and self-attention significantly outperform classical machine learning methods.
- Monte Carlo dropout and attention-based interpretability enhance prediction reliability and biological insights.

## Abstract

Influenza A virus remains a major cause of respiratory disease worldwide and poses a persistent challenge to vaccine development due to its rapid genetic evolution and antigenic variability. T-cell-based immunity has therefore gained increasing importance, as it can provide broader and more durable protection by targeting conserved viral regions. Accurate identification of T-cell epitopes (TCEs) is a fundamental requirement for epitope-based vaccine design and immunological research. Although numerous computational methods have been proposed, many existing approaches rely on handcrafted physicochemical features, which offer limited ability to capture contextual sequence dependencies. In this study, a transformer-based viroinformatics framework is proposed for the binary prediction of TCEs from Influenza A virus peptide sequences. The framework employs a pretrained Evolutionary Scale Modeling-2 (ESM-2) protein language model (PLM) to generate rich, contextualized embeddings directly from raw amino acid sequences, eliminating the need for manual feature engineering. These embeddings are processed using a lightweight attention-based transformer classifier to learn epitope-specific sequence patterns. The model achieves strong and stable predictive performance, attaining an accuracy of approximately 97% and an AUC close to 0.99 under stratified cross-validation. Ablation analysis further confirms that protein language model representations and self-attention contribute substantially to performance gains over classical machine learning baselines. To enhance practical reliability, Monte Carlo dropout is incorporated during inference to provide uncertainty-aware predictions, enabling differentiation between high-confidence and ambiguous peptide candidates. In addition, attention-based interpretability is used to identify residue-level contributions to model decisions, offering biologically meaningful insights into epitope recognition. Overall, this study demonstrates that PLMs combined with Transformer architectures provide an effective, interpretable, and a promising computational framework for Influenza A TCE discovery and vaccine research.

## Full-text entities

- **Diseases:** respiratory disease (MESH:D012140)
- **Chemicals:** amino acid (MESH:D000596)
- **Species:** Influenza A virus (no rank) [taxon 11320]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13030479/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13030479/full.md

## References

41 references — full list in the complete paper: https://tomesphere.com/paper/PMC13030479/full.md

---
Source: https://tomesphere.com/paper/PMC13030479