Continuous Diffusion Transformers for Designing Synthetic Regulatory Elements

Jonathan Liu; Kia Ghods

arXiv:2603.10885·cs.LG·March 12, 2026

Continuous Diffusion Transformers for Designing Synthetic Regulatory Elements

Jonathan Liu, Kia Ghods

PDF

Open Access

TL;DR

This paper introduces a parameter-efficient Diffusion Transformer for generating cell-type-specific regulatory DNA sequences, achieving comparable performance to U-Net models with significantly fewer epochs and reduced memorization.

Contribution

The authors develop a transformer-based diffusion model with a CNN encoder that matches U-Net performance in fewer epochs and enhances regulatory activity prediction through fine-tuning.

Findings

01

Model matches U-Net validation loss in 13 epochs, 60 times faster.

02

Reduces memorization of training data from 5.3% to 1.7%.

03

Fine-tuning with Enformer improves predicted regulatory activity by 38 times.

Abstract

We present a parameter-efficient Diffusion Transformer (DiT) for generating 200bp cell-type-specific regulatory DNA sequences. By replacing the U-Net backbone of DNA-Diffusion with a transformer denoiser equipped with a 2D CNN input encoder, our model matches the U-Net's best validation loss in 13 epochs (60 $\times$ fewer) and converges 39% lower, while reducing memorization from 5.3% to 1.7% of generated sequences aligning to training data via BLAT. Ablations show the CNN encoder is essential: without it, validation loss increases 70% regardless of positional embedding choice. We further apply DDPO finetuning using Enformer as a reward model, achieving a 38 $\times$ improvement in predicted regulatory activity. Cross-validation against DRAKES on an independent prediction task confirms that improvements reflect genuine regulatory signal rather than reward model overfitting.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenomics and Chromatin Dynamics · RNA Research and Splicing · RNA and protein synthesis mechanisms