# Cysteine pattern barcoding-based dataset filtration enhances the machine learning-assisted interpretation of Conus venom peptide therapeutics

**Authors:** Rimsha Bibi, Noshaba Qasmi, Sajid Rashid

PMC · DOI: 10.1371/journal.pone.0327578 · 2025-07-11

## TL;DR

This study uses machine learning to analyze cone snail venom peptides and identify those with therapeutic potential by examining their cysteine patterns.

## Contribution

A novel dataset filtration method using cysteine pattern barcoding improves machine learning predictions of venom peptide therapeutic potential.

## Key findings

- Cysteine pattern barcodes were generated for 5,985 cone snail peptides across 82 species.
- A Random Forest model achieved 90.48% accuracy in classifying peptides based on therapeutic potential.
- Structural and binding pattern analysis revealed similarities between approved and novel peptides.

## Abstract

Crude cone snail venom is a rich source of bioactive compounds with significant therapeutic potential. In this study, we conducted a comprehensive analysis of 5,985 cone snail peptides across 82 Conus species to identify unique cysteine (Cys) patterns and associated frameworks. The classification of these Cys patterns, based on conserved framework combinations, enabled the generation of species-level pattern barcodes. These barcodes were then evaluated to assess the species correlations of individual sequences. By analyzing 151 known Conus peptide PDB files, we computed Cys disulfide linkages to assess overall stability profiles. Incorporating barcode data allowed us to filter the dataset and prepare it for machine learning (ML) processing. Random Forest (RF) modeling, a supervised learning technique, was used to predict the therapeutic potential of venom peptides. Feature extraction was based on known venom-derived approved peptide-based drugs. The dataset was split into a 70:30 train-test ratio. A total of 6,430 peptides (5,985 from cone snails and 445 from other venomous species) were used to evaluate model prediction capability. The proposed model achieved ideal accuracy (90.48%) in peptide therapeutic classification. Subsequent model outputs underwent further structural and binding pattern analysis against known targets, revealing significant similarities between the binding patterns of approved and novel peptides. The model’s performance could be further enhanced by incorporating additional datasets and optimizing feature selection, potentially broadening its applicability to larger peptide datasets. Overall, this study underscores the potential of ML in advancing pharmacological research on diverse venom peptides.

## Linked entities

- **Species:** Conus (taxon 6490)

## Full-text entities

- **Chemicals:** disulfide (MESH:D004220), Cys (MESH:D003545)
- **Species:** Conus [taxon 2056754]

## Figures

24 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12250603/full.md

---
Source: https://tomesphere.com/paper/PMC12250603