Benchmarking Single-Pose Docking, Consensus Rescoring, and Supervised ML on the LIT-PCBA Library: A Critical Evaluation of DiffDock, AutoDock-GPU, GNINA, and DiffDock-NMDN

Youssef Abo-Dahab; Xiaoiang Xiang; Joanne Chun; Liang Zhao

arXiv:2605.01681·cs.LG·May 6, 2026

Benchmarking Single-Pose Docking, Consensus Rescoring, and Supervised ML on the LIT-PCBA Library: A Critical Evaluation of DiffDock, AutoDock-GPU, GNINA, and DiffDock-NMDN

Youssef Abo-Dahab, Xiaoiang Xiang, Joanne Chun, Liang Zhao

PDF

TL;DR

This study evaluates various docking and scoring methods on the LIT-PCBA benchmark, revealing supervised machine learning re-ranking significantly improves early enrichment but no single method dominates across all targets.

Contribution

It provides a comprehensive large-scale comparison of classical docking, AI-based tools, consensus strategies, and supervised ML, highlighting their relative strengths and limitations.

Findings

01

AutoDock-GNINA achieved the highest median EF1%.

02

DiffDock-based approaches underperformed on challenging targets.

03

Supervised ML re-ranking doubled early enrichment performance.

Abstract

Virtual screening performance depends heavily on the chosen docking and scoring methods. Recent AI-based tools such as DiffDock and NMDN have reported strong benchmark results, but their practical utility on realistic, experimentally-derived datasets remains unclear. Here we perform a large-scale evaluation on the LIT-PCBA library (15 targets, 578,295 ligand-target pairs with experimentally confirmed actives and inactives). We compare AutoDock-GPU and DiffDock for pose generation, followed by rescoring with GNINA and NMDN. We further evaluate rank-based consensus strategies and supervised machine learning models trained on docking features. GNINA rescoring of AutoDock-GPU poses (AutoDock-GNINA) emerged as the strongest single method with a median EF1% of 2.14. DiffDock-based approaches underperformed relative to AutoDock-GNINA, particularly on challenging targets such as OPRK1.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.