# Intelligent in-silico prioritization of antimalarial peptide candidates under explicit physicochemical windows via de novo CTCM-Neo generation and conformal-gated calibrated classification

**Authors:** Muhammad Aamir, Khosro Rezaee, Maryam Saberi Anari

PMC · DOI: 10.3389/fcimb.2026.1707267 · Frontiers in Cellular and Infection Microbiology · 2026-03-04

## TL;DR

This paper introduces a computational framework to efficiently generate and prioritize antimalarial peptides using calibrated predictions and explicit physicochemical constraints.

## Contribution

A novel generate-then-classify framework with calibrated risk-aware decision rules for de novo antimalarial peptide design under explicit constraints.

## Key findings

- The framework achieves AUROC ≈0.93 and AUPRC ≈0.80 on a held-out evaluation set.
- Independent runs on unseen peptides show 92.86% and 93.33% accuracy with balanced precision and recall.
- Hyperparameter sweeps reveal stable optima, supporting reproducibility and robustness.

## Abstract

Malaria remains a major global health burden and motivates fast, reliable in silico prioritization of antimalarial (AM) peptide candidates. Designing such peptides is challenging due to the vast search space, scarce or noisy supervision, and potential out-of-distribution miscalibration of computational scores. Prior pipelines typically rank existing sequences rather than generate new candidates under explicit design constraints with calibrated, risk-aware decision rules.

We propose a constraint-guided generate–then–classify framework. A low-data generator—an optimized variant of CTCM-Neo—proposes de novo sequences within APD3-derived windows for net charge, GRAVY, and Boman index. A frozen, temperature-scaled protein language-model classifier (ConformaX-PEP) outputs calibrated probabilities for predicted antimalarial activity and hemolysis, and a split-conformal gate with risk level α=0.1 converts these scores into accept/reject decisions at fixed operating thresholds pact ≥ 0.78 and phemo ≤ 0.20.

On the initial 322-sequence corpus (52 AM, 200 unlabeled, 70 positive-like), a held-out evaluation achieves AUROC ≈0.93, AUPRC ≈0.80, and ECE ≈0.03, indicating strong discrimination with low calibration error prior to external testing. The method outperforms strong baselines in convergence speed and reliability. On 210 previously unseen peptides (80 AM, 130 NM), two independent runs achieve 92.86% and 93.33% accuracy with balanced precision and recall and good calibration. Hyperparameter sweeps reveal broad, stable optima, supporting reproducibility. Template-based docking with GalaxyPepDock is used strictly as a hypothesis-generating structural sanity check and does not constitute evidence of biological binding or efficacy.

Overall, the framework compresses the search space into a small, risk-bounded set of computationally prioritized candidates and provides a scalable, uncertainty-aware route for downstream experimental follow-up. All results reported here are computational, and antimalarial activity remains to be confirmed experimentally.

## Linked entities

- **Diseases:** malaria (MONDO:0005136)

## Full-text entities

- **Diseases:** Malaria (MESH:D008288), hemolysis (MESH:D006461)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12996230/full.md

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12996230/full.md

## References

44 references — full list in the complete paper: https://tomesphere.com/paper/PMC12996230/full.md

---
Source: https://tomesphere.com/paper/PMC12996230