Synthetic Data from Diffusion Models Improve Drug Discovery Prediction
Bing Hu, Ashish Saragadam, Anita Layton, Helen Chen

TL;DR
This paper introduces Syngand, a diffusion GNN model that generates synthetic drug data to address data sparsity in drug discovery, enhancing the ability to perform cross-dataset analyses.
Contribution
The paper presents a novel diffusion GNN model, Syngand, capable of generating ligand and pharmacokinetic data end-to-end for drug discovery applications.
Findings
Synthetic data improves downstream regression performance
Syngand effectively samples pharmacokinetic data for existing ligands
Initial results show promising efficacy in drug property prediction
Abstract
Artificial intelligence (AI) is increasingly used in every stage of drug development. Continuing breakthroughs in AI-based methods for drug discovery require the creation, improvement, and refinement of drug discovery data. We posit a new data challenge that slows the advancement of drug discovery AI: datasets are often collected independently from each other, often with little overlap, creating data sparsity. Data sparsity makes data curation difficult for researchers looking to answer key research questions requiring values posed across multiple datasets. We propose a novel diffusion GNN model Syngand capable of generating ligand and pharmacokinetic data end-to-end. We show and provide a methodology for sampling pharmacokinetic data for existing ligands using our Syngand model. We show the initial promising results on the efficacy of the Syngand-generated synthetic target property…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational Drug Discovery Methods · Analytical Chemistry and Chromatography
MethodsDiffusion
