# NeSyDPP-4: discovering DPP-4 inhibitors for diabetes treatment with a neuro-symbolic AI approach

**Authors:** Delower Hossain, Ehsan Saghapour, Jake Y. Chen

PMC · DOI: 10.3389/fbinf.2025.1603133 · Frontiers in Bioinformatics · 2025-07-21

## TL;DR

This paper introduces NeSyDPP-4, a neuro-symbolic AI model that outperforms existing methods in predicting effective DPP-4 inhibitors for diabetes treatment.

## Contribution

The novel contribution is the use of a neuro-symbolic approach (LTN) for DPP-4 inhibitor discovery, achieving superior performance over traditional models.

## Key findings

- NeSyDPP-4 achieved an accuracy of 0.9725 and outperformed DNN and Transformer baseline models.
- The model demonstrated strong generalizability with an accuracy of 0.9579 on the external DTC dataset.
- The neuro-symbolic approach offers a cost-effective alternative to traditional in vivo screening for drug discovery.

## Abstract

Diabetes Mellitus (DM) constitutes a global epidemic and is one of the top ten leading causes of mortality (WHO, 2019), projected to rank seventh by 2030. The US National Diabetes Statistics Report (2021) states that 38.4 million Americans have diabetes. Dipeptidyl Peptidase-4 (DPP-4) is an FDA-approved target for the treatment of type 2 diabetes mellitus (T2DM). However, current DPP-4 inhibitors may cause adverse effects, including gastrointestinal issues, severe joint pain (FDA safety warning), nasopharyngitis, hypersensitivity, and nausea. Moreover, the development of novel drugs and the in vivo assessment of DPP-4 inhibition are both costly and often impractical. These challenges highlight the urgent need for efficient in-silico approaches to facilitate the discovery and optimization of safer and more effective DPP-4 inhibitors.

Quantitative Structure-Activity Relationship (QSAR) modeling is a widely used computational approach for evaluating the properties of chemical substances. In this study, we employed a Neuro-symbolic (NeSy) approach, specifically the Logic Tensor Network (LTN), to develop a DPP-4 QSAR model capable of identifying potential small-molecule inhibitors and predicting bioactivity classification. For comparison, we also implemented baseline models using Deep Neural Networks (DNNs) and Transformers. A total of 6,563 bioactivity records (SMILES-based compounds with IC50 values) were collected from ChEMBL, PubChem, BindingDB, and GTP. Feature sets used for model training included descriptors (CDK Extended–PaDEL), fingerprints (Morgan), chemical language model embeddings (ChemBERTa-2), LLaMa 3.2 embedding features, and physicochemical properties.

Among all tested configurations, the Neuro-symbolic QSAR model (NeSyDPP-4) performed best using a combination of CDK extended and Morgan fingerprints. The model achieved an accuracy of 0.9725, an F1-score of 0.9723, an ROC AUC of 0.9719, and a Matthews correlation coefficient (MCC) of 0.9446. These results outperformed the baseline DNN and Transformer models, as well as existing state-of-the-art (SOTA) methods. To further validate the robustness of the model, we conducted an external evaluation using the Drug Target Common (DTC) dataset, where NeSyDPP-4 also demonstrated strong performance, with an accuracy of 0.9579, an AUC-ROC of 0.9565, a Matthews Correlation Coefficient (MCC) of 0.9171, and an F1-score of 0.9577.

These findings suggest that the NeSyDPP-4 model not only delivered high predictive performance but also demonstrated generalizability to external datasets. This approach presents a cost-effective and reliable alternative to traditional vivo screening, offering valuable support for the identification and classification of biologically active DPP-4 inhibitors in the treatment of type 2 diabetes mellitus (T2DM).

## Linked entities

- **Proteins:** DPP4 (dipeptidyl peptidase 4)
- **Diseases:** Diabetes Mellitus (MONDO:0005015), type 2 diabetes mellitus (MONDO:0005148), T2DM (MONDO:0005148)

## Full-text entities

- **Diseases:** T2DM (MESH:D003924), nausea (MESH:D009325), DM (MESH:D003920), nasopharyngitis (MESH:D009304), joint pain (MESH:D018771), hypersensitivity (MESH:D004342)
- **Chemicals:** NeSyDPP-4 (-), GTP (MESH:D006160)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12319772/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12319772/full.md

## References

51 references — full list in the complete paper: https://tomesphere.com/paper/PMC12319772/full.md

---
Source: https://tomesphere.com/paper/PMC12319772