# Could statistical potential models achieve comparable or better performance than deep learning models?

**Authors:** Zhihao Wang, Sheng Wang, Jingjing Guo, Yuguang Mu, Xiangdong Liu, Liangzhen Zheng, Weifeng Li

PMC · DOI: 10.1093/bib/bbag088 · Briefings in Bioinformatics · 2026-03-02

## TL;DR

This paper shows that a well-designed statistical potential model, HybridSP, can outperform deep learning models in predicting protein-ligand interactions, offering high accuracy and interpretability.

## Contribution

Proposes HybridSP, a hybrid statistical potential model that combines multiple terms and achieves state-of-the-art performance in docking and screening.

## Key findings

- HybridSP achieves a 91.6% docking success rate on the CASF-2016 benchmark.
- It has an enrichment factor of 29.35 at the top 1% in virtual screening.
- The model outperforms deep learning approaches while maintaining interpretability.

## Abstract

Accurately predicting protein–ligand interactions is vital for structure-based drug discovery. Although deep learning (DL) models have shown strong performance, the potential of traditional statistical potentials under data-limited conditions remains underexplored. Here, we systematically assess several statistical potential models in docking and virtual screening. We find that docking benefits from distance-dependent pairwise atom–atom potentials with clear physical meanings, while screening relies more on orientation-dependent atom–residue potentials that capture local chemical environments. Based on these findings, we propose HybridSP, a hybrid potential combining distance-dependent atom–atom, atom–residue, and orientation-dependent atom–residue terms. An affinity-weighted scheme is applied to correct biases in statistical distributions. On the CASF-2016 benchmark, HybridSP achieves a 91.6% docking success rate and an enrichment factor of 29.35 at the top 1%, rivaling and even surpassing state-of-the-art DL models. Its strong screening ability is further validated on directory of useful decoys-enhanced and directory of useful decoys-adjusted. These results demonstrate that well-designed statistical potentials can achieve high performance and interpretability without complex DL architectures, offering an efficient alternative for scoring function design. The models are available at: https://github.com/zelixirSH/HybridSP.git.

## Full-text entities

- **Genes:** F2 (coagulation factor II, thrombin) [NCBI Gene 2147] {aka PT, RPRGL2, THPH1}, MAPK14 (mitogen-activated protein kinase 14) [NCBI Gene 1432] {aka CSBP, CSBP1, CSBP2, CSPB1, EXIP, Mxi2}, PRB1 (proline rich protein BstNI subfamily 1) [NCBI Gene 5542] {aka PM, PMF, PMS, PRB1L, PRB1M}, CYP4F3 (cytochrome P450 family 4 subfamily F member 3) [NCBI Gene 4051] {aka CPF3, CYP4F, CYPIVF3, LTB4H}, CDK2 (cyclin dependent kinase 2) [NCBI Gene 1017] {aka CDKN2, p33(CDK2)}, CYP2B6 (cytochrome P450 family 2 subfamily B member 6) [NCBI Gene 1555] {aka CPB6, CYP2B, CYP2B7, CYPIIB6, EFVM, IIB1}, BACE1 (beta-secretase 1) [NCBI Gene 23621] {aka ASP2, BACE, HSPC104}, TYK2 (tyrosine kinase 2) [NCBI Gene 7297] {aka IMD35, JTK1}
- **Diseases:** AD (MESH:D000544), FEP (MESH:D011502), DL (MESH:D007859)
- **Chemicals:** C.3-GLU_OE1 (-), hydrogen (MESH:D006859), halogen (MESH:D006219), O (MESH:D010100), C (MESH:D002244), N (MESH:D009584), amide (MESH:D000577), glycine (MESH:D005998)
- **Species:** Homo sapiens (human, species) [taxon 9606]
- **Cell lines:** DUD-E — Rattus norvegicus (Rat), Transformed cell line (CVCL_5U39)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12951076/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12951076/full.md

## References

64 references — full list in the complete paper: https://tomesphere.com/paper/PMC12951076/full.md

---
Source: https://tomesphere.com/paper/PMC12951076