# CNNeoPP: a large language model-enhanced deep learning pipeline for personalized neoantigen prediction and liquid biopsy applications

**Authors:** Yu Cai, Rui Chen, Mingming Song, Lei Wang, Zirong Huo, Dongyan Yang, Sitong Zhang, Shenghan Gao, Seungyong Hwang, Ling Bai, Yonggang Lv, Yali Cui, Xi Zhang

PMC · DOI: 10.3389/fimmu.2026.1722117 · Frontiers in Immunology · 2026-02-04

## TL;DR

This paper introduces CNNeoPP, a deep learning pipeline enhanced by large language models to improve neoantigen prediction for cancer immunotherapy and non-invasive tumor monitoring.

## Contribution

The novel CNNeo model and CNNeoPP pipeline use large language model-derived features to enhance neoantigen prediction accuracy and enable non-invasive detection.

## Key findings

- CNNeoPP outperforms existing tools in neoantigen prediction when validated with independent datasets and experimental assays.
- Plasma cell-free DNA can be used for non-invasive neoantigen prediction, with detectability improved by higher sequencing depth and prioritization strategies.
- CNNeoDB was created as a public database compiling neoantigen data from multiple sources.

## Abstract

Neoantigens have emerged as promising targets for personalized cancer immunotherapy. However, accurate identification of immunogenic neoantigens remains a challenge due to limitations in existing predictive models. Here, we present CNNeo, a novel deep learning-based neoantigen prediction model, and CNNeoPP, an integrated computational pipeline for neoantigen discovery. CNNeo employs large language model-derived sequence representations and multi-modal feature integration, demonstrating superior predictive performance compared to existing tools. CNNeoPP was rigorously validated using independent datasets, including the TESLA dataset, and experimental validation via ELISpot T-cell assays. Additionally, we conducted a proof-of-concept study utilizing plasma cell-free DNA to explore the feasibility of non-invasive neoantigen prediction. We found that increased sequencing depth enhances neoantigen detectability, further amplified by the prioritization strategy of CNNeoPP. CNNeoDB, a publicly accessible database was developed compiling neoantigen data from multiple sources. This study establishes robust tools for neoantigen prediction, with implications for optimizing cancer immunotherapy and liquid biopsy-based tumor monitoring. CNNeoPP is available at https://github.com/AaronChen007/neoantigen.

## Linked entities

- **Diseases:** cancer (MONDO:0004992)

## Full-text entities

- **Genes:** IFNG (interferon gamma) [NCBI Gene 3458] {aka IFG, IFI, IMD69}, NXF1 (nuclear RNA export factor 1) [NCBI Gene 10482] {aka MEX67, TAP}, HLA-A (major histocompatibility complex, class I, A) [NCBI Gene 3105] {aka HLAA}, IFNG (interferon gamma) [NCBI Gene 403801] {aka IFN-G, IFN-gamma}, TRBV20OR9-2 (T cell receptor beta variable 20/OR9-2 (non-functional)) [NCBI Gene 6962] {aka CDR3, TCRBV20S2, TCRBV2O, TCRBV2S2O}, HLA-C (major histocompatibility complex, class I, C) [NCBI Gene 3107] {aka D6S204, HLA-JY3, HLAC, HLC-C, MHC, PSORS1}
- **Diseases:** BCa (MESH:D001943), MS (MESH:D009103), melanoma (MESH:D008545), A549 lung cancer (MESH:D008175), Cancer (MESH:D009369)
- **Chemicals:** CO2 (MESH:D002245), DMSO (MESH:D004121), AMPure (-), Aspartic acid (MESH:D001224), Amino acids (MESH:D000596), water (MESH:D014867), biotin (MESH:D001710), glutamic acid (MESH:D018698), paraffin (MESH:D010232), 3,3',5,5'-tetramethylbenzidine (MESH:C021758)
- **Species:** Cytomegalovirus (genus) [taxon 10358], human gammaherpesvirus 4 (Epstein Barr virus, no rank) [taxon 10376], Homo sapiens (human, species) [taxon 9606]
- **Cell lines:** A549 — Homo sapiens (Human), Lung adenocarcinoma, Cancer cell line (CVCL_0023)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12913462/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12913462/full.md

## References

46 references — full list in the complete paper: https://tomesphere.com/paper/PMC12913462/full.md

---
Source: https://tomesphere.com/paper/PMC12913462