# Leveraging viral genome sequences and machine learning models for identification of potentially selective antiviral agents

**Authors:** Tuan Xu, Miao Xu, Qi Zhang, Catherine Z. Chen, Wei Zheng, Ruili Huang

PMC · DOI: 10.1038/s42004-025-01583-2 · Communications Chemistry · 2025-06-20

## TL;DR

This paper uses viral genomes and machine learning to find new antiviral drugs, successfully identifying promising candidates against SARS-CoV-2.

## Contribution

The novel integration of viral genome sequences with drug data to build predictive models for antiviral screening.

## Key findings

- Machine learning models achieved AUC-ROC >0.72 for virus-selective and >0.79 for pan-antiviral predictions.
- Virtual screening identified 346 compounds, with 9.4% and 37% hit rates in two in vitro assays.
- Top compounds showed antiviral potencies around 1 µM.

## Abstract

Viral genome sequencing provides valuable information for antiviral development, yet its integration with machine learning for virtual screening remains underexplored. To bridge this gap, viral genome sequences were combined with structural data of approved and investigational antivirals to identify virus-selective agents. In parallel, quantitative structure-activity relationship (QSAR) models were built to predict pan-antivirals. Robust models were generated with the area under the receiver operating characteristic curve (AUC-ROC) >0.72 for virus-selective and >0.79 for pan-antiviral predictions. These models were applied to virtually screen ~360 K compounds for anti-SARS-CoV-2 activity. The 346 compounds identified by the models were tested using two in vitro assays, yielding hit rates of 9.4% (24/256) in the pseudotyped particle (PP) entry assay and 37% (47/128) in the RNA-dependent RNA polymerase (RdRp) assay. The top compounds showed potencies around 1 µM. This study provides a framework for virtual screening of virus-selective and pan- antivirals against emerging pathogens.

Leveraging viral genome sequencing for antiviral drug development remains underexplored in machine learning applications. Here, the authors integrate viral genome sequences with drug structural data to create robust predictive models, identifying potential antivirals (e.g., anti-SARS-CoV-2 compounds).

## Linked entities

- **Diseases:** SARS-CoV-2 (MONDO:0100096)

## Full-text entities

- **Species:** Severe acute respiratory syndrome coronavirus 2 (no rank) [taxon 2697049]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12181400/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12181400/full.md

## References

9 references — full list in the complete paper: https://tomesphere.com/paper/PMC12181400/full.md

---
Source: https://tomesphere.com/paper/PMC12181400