GoForth: Language Models for RNA Design under Structure, Sequence, and Coding Constraints
Michael Lindsey

TL;DR
GoForth introduces a novel RNA language model that efficiently generates sequences satisfying complex structural, sequence, and coding constraints, advancing RNA design capabilities.
Contribution
The paper presents a forward-trained conditional RNA language model that separates design components and performs well on constrained inverse folding tasks.
Findings
Achieves fast, high-quality candidate generation for complex RNA design tasks.
Provides semantic embeddings and a learned notion of designability.
Validates on inverse-folding benchmarks and constrained design tasks.
Abstract
RNA inverse sequence design has broad biological and engineering applications, but computational methods for practical design queries remain limited. Such queries may impose several constraints at once, including target folds or motifs, fixed bases, and coding restrictions, while leaving arbitrary sequence and structure in unspecified regions. Because these constraints may permit many acceptable sequences, we study RNA design as a conditional generative modeling problem. The basic object is a conditional law over RNA sequences given a user-specified condition, with full inverse folding as a special case. We introduce GoForth, a forward-trained RNA language model that conditions on structure, sequence, and coding targets. The formulation separates three ingredients that are often entangled in RNA design: a sequence prior, a forward folding sampler, and a reward or likelihood oracle. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
