# Strategies for robust, accurate, and generalizable benchmarking of drug discovery platforms

**Authors:** Melissa Van Norden, William Mangione, Zackary Falls, Ram Samudrala

PMC · DOI: 10.1093/bioinformatics/btaf604 · Bioinformatics · 2025-11-05

## TL;DR

This paper improves benchmarking methods for drug discovery platforms and evaluates their performance using drug-disease associations.

## Contribution

The paper introduces revised benchmarking protocols aligned with best practices and evaluates their impact on platform performance.

## Key findings

- CANDO ranks a small percentage of known drugs in top positions for their diseases using drug-indication databases.
- Performance correlates with drug association counts and chemical similarity within disease categories.
- TTD outperformed CTD in benchmarking when comparing overlapping drug-indication associations.

## Abstract

Benchmarking is essential for the improvement and comparison of drug discovery platforms. We revised the protocols used to benchmark our Computational Analysis of Novel Drug Opportunities (CANDO) multiscale therapeutic discovery platform to bring them into strong alignment with best practices.

CANDO ranked 7.4% and 12.1% of known drugs in the top 10 compounds for their respective diseases/indications using drug-indication mappings from the Comparative Toxicogenomics Database (CTD) and Therapeutic Targets Database (TTD), respectively. Performance was weakly positively correlated (Spearman correlation coefficient > 0.3) with the number of drugs associated with an indication and moderately correlated (coefficient > 0.5) with intra-indication chemical similarity. There was also a moderate correlation between performance on our original and new benchmarking protocols. Better performance was observed when using TTD instead of CTD when drug-indication associations appearing in both mappings were assessed.

CANDO is available at https://github.com/ram-compbio/CANDO. The version used in this article is available at http://compbio.buffalo.edu/data/mc_cando_benchmarking2.

## Full-text entities

- **Genes:** CTD (Coats disease) [NCBI Gene 1283], AR (androgen receptor) [NCBI Gene 367] {aka AIS, AR8, DHTR, HPCX3, HUMARA, HYSP1}, ERBB2 (erb-b2 receptor tyrosine kinase 2) [NCBI Gene 2064] {aka CD340, HER-2, HER-2/neu, HER2, MLN 19, MLN-19}
- **Diseases:** breast cancer (MESH:D001943), castration-resistant prostate cancer (MESH:D064129), nAIA (MESH:D007562), TTD (MESH:D018467), virus infection (MESH:D014777), Prostatic neoplasms (MESH:D011471), nNDCG (MESH:D012090), CANDO (MESH:D000086382), cancers (MESH:D009369), Endocrine System Diseases (MESH:D004700), Infections (MESH:D007239)
- **Chemicals:** flutamide (MESH:D005485), gemcitabine (MESH:D000093542), methotrexate (MESH:D008727), Apalutamide (MESH:C572045), bicalutamide (MESH:C053541), Enzalutamide (MESH:C540278), CANDO (-), nilutamide (MESH:C021277), thalidomide (MESH:D013792)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12607264/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12607264/full.md

## References

155 references — full list in the complete paper: https://tomesphere.com/paper/PMC12607264/full.md

---
Source: https://tomesphere.com/paper/PMC12607264