Benchmarking open-source tools for in silico antiviral drug discovery
Daniel C. Elton, Preston W. Estep

TL;DR
This paper benchmarks open-source computational tools for in silico antiviral drug discovery, introduces a curated viral protein-ligand dataset, and evaluates model performance for predicting antiviral binding affinities.
Contribution
It presents a comprehensive benchmarking of 15 tools, a new antiviral dataset, and insights into model performance improvements through fine-tuning.
Findings
Boltz-2 and DrugFormDTA ranked highest among ML models
GNINA performed best among docking tools
Fine-tuning DrugFormDTA improved correlation from 0.5 to 0.7
Abstract
Antivirals are uniquely positioned to be deployed quickly during a new outbreak, especially when repurposed from approved drugs. Yet there are no FDA-approved antivirals for the majority of viral families with pandemic potential. Here we lay out the case for investing in technologies and techniques for antiviral drug discovery and designing antiviral combinations. We present a survey of open source datasets and computational tools for in silico antiviral drug discovery, with a particular focus on the latest AI-based systems and docking tools. We then present our custom dataset of 43,005 viral protein-ligand binding measurements that we curated from BindingDB and other sources. Importantly, we found that 31% of viral protein binding data in BindingDB required polyprotein sequences to be carefully split before the data were suitable for training or testing ML models. Using our custom…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
