# ProMol_Func: A Structure-Free Deep Learning Model for Virtual Screening

**Authors:** Zixuan Feng, Max Kim, Aweon Richards, Tania J. Lupoli, Yingkai Zhang

PMC · DOI: 10.1021/jacsau.6c00173 · 2026-02-24

## TL;DR

ProMol_Func is a new deep learning model that can screen for drug candidates without needing protein structures, and it performs well even on new targets.

## Contribution

ProMol_Func introduces a structure-free deep learning framework using molecule graphs and protein function embeddings for improved virtual screening.

## Key findings

- ProMol_Func achieves an EF1% of 10.9 on the LIT-PCBA benchmark, showing strong screening performance.
- The model successfully identified inhibitors for E. coli DnaK, a target not in the training data.
- ProMol_Func improves generalization by using experimentally validated inactives and random decoys in training.

## Abstract

In computational-aided drug discovery, structure-based
drug design
models are computationally intensive and rely on protein structures,
limiting their scalability and generalization. Additionally, many
existing models suffer from inflated false-positive rates due to the
scarcity of negative binding data for training. To overcome these
challenges, we present ProMol_Func, a structure-free deep learning
framework that integrates graph-based encodings of small molecules
with protein function embeddings derived solely from amino acid sequences.
By augmenting the training data set with both experimentally validated
inactives and randomly selected decoys, ProMol_Func improves screening
power and generalization. The model achieves state-of-the-art performance
on the challenging LIT-PCBA (Library of Integrated Targeted-Panel
of Cell-Based Assays) benchmark, with an enrichment factor (EF1%)
of 10.9, demonstrating robust screening power in realistic assay settings.
Furthermore, in a zero-shot prospective application to E.
coli DnaK, a protein chaperone without actives in the training
set, ProMol_Func successfully identified compounds that inhibit its
ATPase activity or alter the protein’s thermal stability, validating
the potential of ProMol_Func for discovering binders toward novel
targets. These results position ProMol_Func as an efficient and scalable
alternative to traditional structure-dependent approaches in early
stage hit discovery.

## Linked entities

- **Proteins:** dnaK (heat shock protein 70)

## Full-text entities

- **Genes:** ATPase [NCBI Gene 3654511]
- **Chemicals:** LIT (-)
- **Species:** Escherichia coli (E. coli, species) [taxon 562]

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13014237/full.md

---
Source: https://tomesphere.com/paper/PMC13014237