Targeting SARS-CoV-2 with AI- and HPC-enabled Lead Generation: A First Data Release
Yadu Babuji, Ben Blaiszik, Tom Brettin, Kyle Chard, Ryan Chard, Austin, Clyde, Ian Foster, Zhi Hong, Shantenu Jha, Zhuozhao Li, Xuefeng Liu, Arvind, Ramanathan, Yi Ren, Nicholaus Saint, Marcus Schwarting, Rick Stevens,, Hubertus van Dam, Rick Wagner

TL;DR
This paper presents a large-scale data release of over 4.2 billion molecules with pre-computed properties, supporting AI-driven drug discovery efforts against SARS-CoV-2.
Contribution
It provides the first extensive dataset combining molecular structures and properties to facilitate AI and HPC-based drug screening for COVID-19.
Findings
23 datasets with 4.2 billion molecules available
Pre-computed molecular fingerprints, images, and descriptors
Enables rapid AI model development for drug discovery
Abstract
Researchers across the globe are seeking to rapidly repurpose existing drugs or discover new drugs to counter the the novel coronavirus disease (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). One promising approach is to train machine learning (ML) and artificial intelligence (AI) tools to screen large numbers of small molecules. As a contribution to that effort, we are aggregating numerous small molecules from a variety of sources, using high-performance computing (HPC) to computer diverse properties of those molecules, using the computed properties to train ML/AI models, and then using the resulting models for screening. In this first data release, we make available 23 datasets collected from community sources representing over 4.2 B molecules enriched with pre-computed: 1) molecular fingerprints to aid similarity searches, 2) 2D images of molecules…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · COVID-19 diagnosis using AI · Cell Image Analysis Techniques
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
