# SCORCH2: A Generalized Heterogeneous Consensus Model for High‐Enrichment Interaction‐Based Virtual Screening

**Authors:** Lin Chen, Vincent Blay, Pedro J. Ballester, Douglas R. Houston

PMC · DOI: 10.1002/advs.202508318 · Advanced Science · 2025-08-20

## TL;DR

SCORCH2 is a new machine learning tool that improves the accuracy and efficiency of finding drug candidates by analyzing protein-ligand interactions.

## Contribution

SCORCH2 introduces a novel heterogeneous consensus model that enhances virtual screening performance and interpretability.

## Key findings

- SCORCH2 outperforms previous methods in predictive accuracy and generalizability across diverse biological targets.
- The model demonstrates strong transferability by identifying hits on previously unseen targets.
- SCORCH2 eliminates the need for detailed docking pose selection, simplifying the screening workflow.

## Abstract

The discovery of effective therapeutics remains a complex, costly, and time‐consuming endeavor, characterized by high failure rates and significant resource investments. A central bottleneck in early‐stage drug discovery is identifying suitable hit compounds with moderate affinity for known biological targets. Although advancements occur, current in silico virtual screening methods are subject to limitations, including model overfitting, data bias, and constrained interpretability in their predictive processes. In this study, we present SCORCH2, a machine learning‐based framework designed to simultaneously enhance the performance and interpretability of virtual screening by leveraging interaction features. Comparing with its predecessor SCORCH, SCORCH2 exhibits superior predictive accuracy and generalizability across a wide range of biological targets. Importantly, SCORCH2 demonstrates robust hit identification capabilities on previously unseen targets, indicating strong transferability. Furthermore, SCORCH2 obviates the need for meticulous docking pose selection, streamlining the screening process. These advances highlight the potential of SCORCH2 as a valuable tool in accelerating drug discovery campaigns.

Researchers developed SCORCH2, an advanced machine‐learning scoring function for virtual screening that combines two complementary XGBoost models to re‐evaluate protein‐ligand binding plausibility. The method demonstrates superior performance on standard benchmarks, shows excellent generalization to previously unseen protein targets, and provides interpretable predictions while streamlining the drug discovery workflow.

## Full-text entities

- **Genes:** QPCT (glutaminyl-peptide cyclotransferase) [NCBI Gene 25797] {aka GCT, QC, sQC}, TECR (trans-2,3-enoyl-CoA reductase) [NCBI Gene 9524] {aka GPSN2, MRT14, SC2, TER}, HSP90AA1 (heat shock protein 90 alpha family class A member 1) [NCBI Gene 3320] {aka EL52, HEL-S-65p, HSP86, HSP89A, HSP90A, HSP90N}, FNTA (farnesyltransferase, CAAX box, subunit alpha) [NCBI Gene 2339] {aka FPTA, PGGT1A, PTAR2}, CTSL (cathepsin L) [NCBI Gene 1514] {aka CATL, CTSL1, MEP}, SYK (spleen associated tyrosine kinase) [NCBI Gene 6850] {aka IMD82, p72-Syk}, SHROOM4 (shroom family member 4) [NCBI Gene 57477] {aka MRXSSDS, SHAP, shrm4}, SIRT2 (sirtuin 2) [NCBI Gene 22933] {aka SIR2, SIR2L, SIR2L2}, LCK (LCK proto-oncogene, Src family tyrosine kinase) [NCBI Gene 3932] {aka IMD22, LSK, YT16, p56lck, pp58lck}, NR3C2 (nuclear receptor subfamily 3 group C member 2) [NCBI Gene 4306] {aka MCR, MLR, MR, NR3C2VIT}, MMP13 (matrix metallopeptidase 13) [NCBI Gene 4322] {aka CLG3, MANDP1, MDST, MMP-13}
- **Diseases:** IBVS (MESH:D019292), MLSFs (MESH:D007859)
- **Chemicals:** C (MESH:D002244), Iridium (MESH:D007495), oxygen (MESH:D010100), indazole (MESH:D007191), hydrogen (MESH:D006859), water (MESH:D014867), ENTOSPLETINIB (MESH:C000589391), 4-[(3-{8-[(3,4-Dimethoxyphenyl)amino]imidazo[1,2-A]pyrazin-6-Yl}benzoyl)amino]benzoic Acid (-), FEP (MESH:D011138), imidazo[1,2-a] pyrazine (MESH:C556838), metal (MESH:D008670), PS (MESH:D010758)
- **Species:** Severe acute respiratory syndrome-related coronavirus (no rank) [taxon 694009]
- **Mutations:** L40S
- **Cell lines:** DUD-E — Drosophila melanogaster (Fruit fly), Spontaneously immortalized cell line (CVCL_Z894)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12622527/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12622527/full.md

## References

69 references — full list in the complete paper: https://tomesphere.com/paper/PMC12622527/full.md

---
Source: https://tomesphere.com/paper/PMC12622527