Deep Generative Models for Discrete Genotype Simulation
Sihan Xie (GABI), Thierry Tribout (GABI), Didier Boichard (GABI), Blaise Hanczar (IBISC), Julien Chiquet (MIA Paris-Saclay), Eric Barrey (GABI)

TL;DR
This paper evaluates deep generative models like VAEs, Diffusion Models, and GANs for simulating discrete genotype data, demonstrating their ability to capture genetic patterns and genotype-phenotype associations.
Contribution
It introduces adaptations of these models specifically for discrete genotype data and provides a comprehensive comparison and practical guidelines for future research.
Findings
Models effectively capture genetic patterns
Preserve genotype-phenotype associations
Offer practical guidelines for genotype simulation
Abstract
Deep generative models open new avenues for simulating realistic genomic data while preserving privacy and addressing data accessibility constraints. While previous studies have primarily focused on generating gene expression or haplotype data, this study explores generating genotype data in both unconditioned and phenotype-conditioned settings, which is inherently more challenging due to the discrete nature of genotype data. In this work, we developed and evaluated commonly used generative models, including Variational Autoencoders (VAEs), Diffusion Models, and Generative Adversarial Networks (GANs), and proposed adaptation tailored to discrete genotype data. We conducted extensive experiments on large-scale datasets, including all chromosomes from cow and multiple chromosomes from human. Model performance was assessed using a well-established set of metrics drawn from both deep…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
