Benchmarking open-source tools for in silico antiviral drug discovery

Daniel C. Elton; Preston W. Estep

arXiv:2605.04265·q-bio.BM·May 7, 2026

Benchmarking open-source tools for in silico antiviral drug discovery

Daniel C. Elton, Preston W. Estep

PDF

TL;DR

This paper benchmarks open-source computational tools for in silico antiviral drug discovery, introduces a curated viral protein-ligand dataset, and evaluates model performance for predicting antiviral binding affinities.

Contribution

It presents a comprehensive benchmarking of 15 tools, a new antiviral dataset, and insights into model performance improvements through fine-tuning.

Findings

01

Boltz-2 and DrugFormDTA ranked highest among ML models

02

GNINA performed best among docking tools

03

Fine-tuning DrugFormDTA improved correlation from 0.5 to 0.7

Abstract

Antivirals are uniquely positioned to be deployed quickly during a new outbreak, especially when repurposed from approved drugs. Yet there are no FDA-approved antivirals for the majority of viral families with pandemic potential. Here we lay out the case for investing in technologies and techniques for antiviral drug discovery and designing antiviral combinations. We present a survey of open source datasets and computational tools for in silico antiviral drug discovery, with a particular focus on the latest AI-based systems and docking tools. We then present our custom dataset of 43,005 viral protein-ligand binding measurements that we curated from BindingDB and other sources. Importantly, we found that 31% of viral protein binding data in BindingDB required polyprotein sequences to be carefully split before the data were suitable for training or testing ML models. Using our custom…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.