Latent Diffusion Models for Controllable RNA Sequence Generation
Kaixuan Huang, Yukang Yang, Kaidi Fu, Yanyi Chu, Le Cong, Mengdi Wang

TL;DR
RNAdiffusion introduces a novel latent diffusion framework for controllable RNA sequence generation, enabling optimization of functional properties and improving biological relevance in generated sequences.
Contribution
This work develops RNAdiffusion, combining pretrained models and diffusion processes for RNA design, with integrated reward-guided optimization for functional properties.
Findings
Generates RNA sequences matching natural distributions across biological metrics.
Successfully fine-tunes and optimizes 5'-UTRs for high translation efficiency.
Outperforms baselines in balancing reward maximization and structural stability.
Abstract
This work presents RNAdiffusion, a latent diffusion model for generating and optimizing discrete RNA sequences of variable lengths. RNA is a key intermediary between DNA and protein, exhibiting high sequence diversity and complex three-dimensional structures to support a wide range of functions. We utilize pretrained BERT-type models to encode raw RNA sequences into token-level, biologically meaningful representations. A Query Transformer is employed to compress such representations into a set of fixed-length latent vectors, with an autoregressive decoder trained to reconstruct RNA sequences from these latent variables. We then develop a continuous diffusion model within this latent space. To enable optimization, we integrate the gradients of reward models--surrogates for RNA functional properties--into the backward diffusion process, thereby generating RNAs with high reward scores.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAttention Is All You Need · Sparse Evolutionary Training · Linear Layer · Multi-Head Attention · Layer Normalization · Dense Connections · Adam · Residual Connection · Position-Wise Feed-Forward Layer · Label Smoothing
