# IFNg_DeepKG: A Novel Model for Identifying Interferon-Gamma-Inducing Epitopes Using Knowledge Graph RAG in Biomedical Applications

**Authors:** Van The Le, Juan Peter Timothy Yuune, Yu-Yen Ou

PMC · DOI: 10.1021/acs.jcim.5c02248 · Journal of Chemical Information and Modeling · 2025-12-31

## TL;DR

IFNg_DeepKG is a new model that improves the identification of immune-activating protein fragments by combining deep learning with biological knowledge.

## Contribution

The novel integration of a knowledge graph with a protein language model enhances epitope prediction by incorporating biological context.

## Key findings

- IFNg_DeepKG achieves AUCs of 0.99 and 0.95 on human and mouse datasets, outperforming baseline models.
- The model generalizes well across species with AUCs of 0.94 and 0.93 on independent datasets.
- It successfully identifies epitopes relevant to diseases like COVID-19 and Alzheimer’s.

## Abstract

The accurate and efficient computational identification
of interferon-gamma-inducing
epitopes (IFNgIE) is a critical bottleneck in the design of next-generation
vaccines and immunotherapies. Existing computational models, while
adept at learning sequence-based patterns, frequently fail to incorporate
the rich biological context that governs an epitope’s immunogenicity,
such as its protein of origin, host, and disease association. To address
this limitation, we propose IFNg_DeepKG, a new deep learning framework
that synergistically integrates a pretrained protein language model
(ESM2), a custom knowledge graph (KG) using a Retrieval-Augmented
Generation (RAG) approach, and a multiscale convolutional neural network
(MSCNN). The model’s central innovation lies in its use of
the RAG-KG to enrich sequence embeddings with external, biologically
informed context, thereby significantly enhancing predictive performance.
IFNg_DeepKG demonstrates superior performance on independent test
data sets, achieving an AUC of 0.99 on the Human H_IFNgInd1 data set
and 0.95 on the Mouse M_IFNgInd1 data set, a substantial increase
over baseline models. With the more challenging independent data sets,
the model demonstrated strong cross-species generalization, achieving
AUCs of 0.94 (H_IFNgInd2) and 0.93 (M_IFNgInd2). The framework successfully
identifies and classifies clinically relevant epitopes, including
those associated with COVID-19 and Alzheimer’s disease. By
bridging the gap between sequence-based features and biological contexts,
IFNg_DeepKG represents a significant advancement in computational
immunology, offering a scalable and powerful platform for rational
epitope discovery and precision medicine.

## Linked entities

- **Diseases:** COVID-19 (MONDO:0100096), Alzheimer’s disease (MONDO:0004975)
- **Species:** Homo sapiens (taxon 9606), Mus musculus (taxon 10090)

## Full-text entities

- **Genes:** IFNG (interferon gamma) [NCBI Gene 3458] {aka IFG, IFI, IMD69}
- **Diseases:** Alzheimer's disease (MESH:D000544), COVID-19 (MESH:D000086382)
- **Species:** Homo sapiens (human, species) [taxon 9606], Mus musculus (house mouse, species) [taxon 10090]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12801304/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12801304/full.md

## References

47 references — full list in the complete paper: https://tomesphere.com/paper/PMC12801304/full.md

---
Source: https://tomesphere.com/paper/PMC12801304