# Prediction of bacterial protein–compound interactions with only positive samples

**Authors:** Ki-Hwa Kim, Avinash Yaganapu, Sai Kosaraju, Aashish Bhatt, Yun Lyna Luo, Sai Phani Parsa, Juyeon Park, Hyun Lee, Jun Hyuck Lee, Tae-Jin Oh, Mingon Kang

PMC · DOI: 10.1093/bioinformatics/btag067 · 2026-02-18

## TL;DR

This paper introduces a new method to predict interactions between bacterial proteins and compounds using only positive examples, which is important for drug discovery and biotechnology.

## Contribution

A novel Positive-Unlabeled learning framework called BIN-PU is proposed for bacterial CPI prediction without negative samples.

## Key findings

- BIN-PU outperforms existing PU models in predicting bacterial CPIs using only positive samples.
- BIN-PU's performance was validated on bacterial CYP data and confirmed with biological experiments.
- The method is reproducible and effective on uncurated and human CYP datasets.

## Abstract

Prediction of Compound–Protein Interactions (CPI) in bacteria is crucial to advance various pharmaceutical and chemical engineering fields, including biocatalysis, drug discovery, and industrial processing. However, current CPI models cannot be applied for bacterial CPI prediction due to the lack of curated negative interaction samples.

We propose a novel Positive-Unlabeled (PU) learning framework, named BIN-PU, to address this limitation. BIN-PU generates pseudo positive and negative labels from known positive interaction data, enabling effective training of deep learning models for CPI prediction. We also propose a weighted positive loss function that weights to truly positive samples. We have validated BIN-PU coupled with multiple CPI backbone models, comparing the performance with the existing PU models using bacterial cytochrome P450 (CYP) data. Extensive experiments demonstrate the superiority of BIN-PU over the benchmark models in predicting CPIs with only truly positive samples. Furthermore, we have validated BIN-PU on additional bacterial proteins obtained from literature review, human CYP datasets, and uncurated data for its reproducibility. We have also validated the CPI prediction for the uncurated CYP data with biological and biophysical experiments. BIN-PU represents a significant advancement in CPI prediction for bacterial proteins, opening new possibilities for improving predictive models in related biological interaction tasks.

The source code and data are available at https://github.com/datax-lab/CYP.

## Linked entities

- **Proteins:** CYP71B9 (cytochrome P450, family 71, subfamily B, polypeptide 9), PPIG (peptidylprolyl isomerase G)

## Full-text entities

- **Genes:** PPIG (peptidylprolyl isomerase G) [NCBI Gene 9360] {aka CARS-Cyp, CYP, SCAF10, SRCyp}, CYP4F3 (cytochrome P450 family 4 subfamily F member 3) [NCBI Gene 4051] {aka CPF3, CYP4F, CYPIVF3, LTB4H}
- **Diseases:** CPI (MESH:C563663)
- **Chemicals:** carbon (MESH:D002244), 4-androstenedione (MESH:D000735), His (MESH:D006639), nandrolone (MESH:D009277), Iodobenzene (MESH:C031905), progesterone (MESH:D011374), amino acids (MESH:D000596), HEM (MESH:D006418), BIN (-), NADH (MESH:D009243), Pdx (MESH:C418863), ATP (MESH:D000255), prednisone (MESH:D011241), steroid (MESH:D013256)
- **Species:** Paenibacillus sp. (species) [taxon 58172], Homo sapiens (human, species) [taxon 9606], Streptomyces sp. (species) [taxon 1931], Streptomyces alboniger (species) [taxon 132473], Bacteria Latreille et al. 1825 (Bacteria stick insect, genus) [taxon 629395], Dehalobacter sp. E3 (species) [taxon 307490]

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12975285/full.md

---
Source: https://tomesphere.com/paper/PMC12975285