Designing RNAs with Language Models
Milan Gautam, Ning Dai, Tianshuo Zhou, Bowen Xie, David Mathews, Liang Huang

TL;DR
This paper introduces a novel approach to RNA design using autoregressive language models, framing it as a conditional sequence generation task, which outperforms traditional methods in efficiency and accuracy.
Contribution
The authors propose a new neural language model-based method for RNA design, combining supervised training and reinforcement learning to improve performance and scalability.
Findings
Outperforms state-of-the-art on key metrics
Achieves 1.7x faster design process
Effective across multiple datasets
Abstract
RNA design, the task of finding a sequence that folds into a target secondary structure, has broad biological and biomedical impact but remains computationally challenging due to the exponentially large sequence space and exponentially many competing folds. Traditional approaches treat it as an optimization problem, relying on per-instance heuristics or constraint-based search. We instead reframe RNA design as conditional sequence generation and introduce a reusable neural approximator, instantiated as an autoregressive language model (LM), that maps target structures directly to sequences. We first train our model in a supervised setting on random-induced structure-sequence pairs, and then use reinforcement learning (RL) to optimize end-to-end metrics. We also propose methods to select a small subset for RL that greatly improves RL efficiency and quality. Across four datasets, our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRNA and protein synthesis mechanisms · Machine Learning in Materials Science · RNA Interference and Gene Delivery
