Bridging Sequence-Structure Alignment in RNA Foundation Models
Heng Yang, Renzhi Chen, Ke Li

TL;DR
This paper introduces OmniGenome, a novel RNA foundation model that effectively aligns sequences and structures, enabling bidirectional mapping and achieving state-of-the-art results in RNA and DNA tasks.
Contribution
OmniGenome is the first RNA foundation model to support flexible sequence-structure alignment and bidirectional mapping, improving RNA design and structure prediction capabilities.
Findings
Solved 74% of puzzles in EternaV2 benchmark
Existing models solved up to 3% of puzzles
Achieved state-of-the-art performance on RNA and DNA benchmarks
Abstract
The alignment between RNA sequences and structures in foundation models (FMs) has yet to be thoroughly investigated. Existing FMs have struggled to establish sequence-structure alignment, hindering the free flow of genomic information between RNA sequences and structures. In this study, we introduce OmniGenome, an RNA FM trained to align RNA sequences with respect to secondary structures based on structure-contextualised modelling. The alignment enables free and bidirectional mappings between sequences and structures by utilising the flexible RNA modelling paradigm that supports versatile input and output modalities, i.e., sequence and/or structure as input/output. We implement RNA design and zero-shot secondary structure prediction as case studies to evaluate the Seq2Str and Str2Seq mapping capacity of OmniGenome. Results on the EternaV2 benchmark show that OmniGenome solved 74% of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsRNA and protein synthesis mechanisms · Genomics and Phylogenetic Studies
MethodsSparse Evolutionary Training · ALIGN
