Absorb & Escape: Overcoming Single Model Limitations in Generating Genomic Sequences
Zehui Li, Yuhao Ni, Guoxuan Xia, William Beardall, Akashaditya Das,, Guy-Bart Stan, Yiren Zhao

TL;DR
This paper introduces Absorb & Escape, a novel method combining AR models and Diffusion Models to improve the accuracy and quality of synthetic genomic sequence generation, addressing heterogeneity challenges.
Contribution
The paper proposes a post-training sampling technique that refines generative models by alternating between AR and Diffusion Model outputs, enhancing sequence realism.
Findings
A&E outperforms existing models in motif distribution accuracy.
The method improves diversity and genome integration of generated sequences.
Extensive experiments across 15 species validate the approach.
Abstract
Abstract Recent advances in immunology and synthetic biology have accelerated the development of deep generative methods for DNA sequence design. Two dominant approaches in this field are AutoRegressive (AR) models and Diffusion Models (DMs). However, genomic sequences are functionally heterogeneous, consisting of multiple connected regions (e.g., Promoter Regions, Exons, and Introns) where elements within each region come from the same probability distribution, but the overall sequence is non-homogeneous. This heterogeneous nature presents challenges for a single model to accurately generate genomic sequences. In this paper, we analyze the properties of AR models and DMs in heterogeneous genomic sequence generation, pointing out crucial limitations in both methods: (i) AR models capture the underlying distribution of data by factorizing and learning the transition probability but fail…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenomics and Phylogenetic Studies · Algorithms and Data Compression
MethodsDiffusion · Balanced Selection
