Synthetic Data from Diffusion Models Improve Drug Discovery Prediction

Bing Hu; Ashish Saragadam; Anita Layton; Helen Chen

arXiv:2405.03799·cs.LG·May 8, 2024·2 cites

Synthetic Data from Diffusion Models Improve Drug Discovery Prediction

Bing Hu, Ashish Saragadam, Anita Layton, Helen Chen

PDF

Open Access

TL;DR

This paper introduces Syngand, a diffusion GNN model that generates synthetic drug data to address data sparsity in drug discovery, enhancing the ability to perform cross-dataset analyses.

Contribution

The paper presents a novel diffusion GNN model, Syngand, capable of generating ligand and pharmacokinetic data end-to-end for drug discovery applications.

Findings

01

Synthetic data improves downstream regression performance

02

Syngand effectively samples pharmacokinetic data for existing ligands

03

Initial results show promising efficacy in drug property prediction

Abstract

Artificial intelligence (AI) is increasingly used in every stage of drug development. Continuing breakthroughs in AI-based methods for drug discovery require the creation, improvement, and refinement of drug discovery data. We posit a new data challenge that slows the advancement of drug discovery AI: datasets are often collected independently from each other, often with little overlap, creating data sparsity. Data sparsity makes data curation difficult for researchers looking to answer key research questions requiring values posed across multiple datasets. We propose a novel diffusion GNN model Syngand capable of generating ligand and pharmacokinetic data end-to-end. We show and provide a methodology for sampling pharmacokinetic data for existing ligands using our Syngand model. We show the initial promising results on the efficacy of the Syngand-generated synthetic target property…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputational Drug Discovery Methods · Analytical Chemistry and Chromatography

MethodsDiffusion