# EpicPred: predicting phenotypes driven by epitope-binding TCRs using attention-based multiple instance learning

**Authors:** Jaemin Jeon, Suwan Yu, Sangam Lee, Sang Cheol Kim, Hye-Yeong Jo, Inuk Jung, Kwangsoo Kim

PMC · DOI: 10.1093/bioinformatics/btaf080 · Bioinformatics · 2025-02-21

## TL;DR

EpicPred is a new method that predicts T-cell receptor interactions with epitopes to better understand and treat diseases like cancer and COVID-19.

## Contribution

EpicPred introduces a novel attention-based multiple instance learning approach for predicting TCR–epitope interactions linked to specific phenotypes.

## Key findings

- EpicPred achieved an average AUROC of 0.80 ± 0.07 in predicting disease-related phenotypes.
- The method outperformed existing approaches in identifying TCR–epitope interactions specific to cancer and COVID-19.
- OSR was used to reduce false positives by filtering out unlikely TCR–epitope interactions.

## Abstract

Correctly identifying epitope-binding T-cell receptors (TCRs) is important to both understand their underlying biological mechanism in association to some phenotype and accordingly develop T-cell mediated immunotherapy treatments. Although the importance of the CDR3 region in TCRs for epitope recognition is well recognized, methods for profiling their interactions in association to a certain disease or phenotype remains less studied. We developed EpicPred to identify phenotype-specific TCR–epitope interactions. EpicPred first predicts and removes unlikely TCR–epitope interactions to reduce false positives using the Open-set Recognition (OSR). Subsequently, multiple instance learning was used to identify TCR–epitope interactions specific to a cancer type or severity levels of COVID-19 infected patients.

From six public TCR databases, 244 552 TCR sequences and 105 unique epitopes were used to predict epitope-binding TCRs and to filter out non-epitope-binding TCRs using the OSR method. The predicted interactions were used to further predict the phenotype groups in two cancer and four COVID-19 TCR-seq datasets of both bulk and single-cell resolution. EpicPred outperformed the competing methods in predicting the phenotypes, achieving an average AUROC of 0.80 ± 0.07.

The EpicPred Software is available at https://github.com/jaeminjj/EpicPred.

## Linked entities

- **Diseases:** cancer (MONDO:0004992), COVID-19 (MONDO:0100096)

## Full-text entities

- **Genes:** TRBV20OR9-2 (T cell receptor beta variable 20/OR9-2 (non-functional)) [NCBI Gene 6962] {aka CDR3, TCRBV20S2, TCRBV2O, TCRBV2S2O}
- **Diseases:** COVID-19 (MESH:D000086382), cancer (MESH:D009369)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC11879650/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11879650/full.md

## References

39 references — full list in the complete paper: https://tomesphere.com/paper/PMC11879650/full.md

---
Source: https://tomesphere.com/paper/PMC11879650