RiNALMo: General-Purpose RNA Language Models Can Generalize Well on Structure Prediction Tasks
Rafael Josip Peni\'c, Tin Vla\v{s}i\'c, Roland G. Huber, Yue Wan, Mile \v{S}iki\'c

TL;DR
RiNALMo is a large-scale RNA language model that effectively captures RNA structural information and generalizes well to unseen RNA families, advancing RNA understanding and structure prediction.
Contribution
The paper introduces RiNALMo, the largest RNA language model to date, demonstrating its ability to implicitly learn RNA structure and outperform existing methods on various tasks.
Findings
Achieves state-of-the-art results on multiple downstream tasks.
Generalizes well to unseen RNA families.
Implicitly captures RNA structural information.
Abstract
While RNA has recently been recognized as an interesting small-molecule drug target, many challenges remain to be addressed before we take full advantage of it. This emphasizes the necessity to improve our understanding of its structures and functions. Over the years, sequencing technologies have produced an enormous amount of unlabeled RNA data, which hides a huge potential. Motivated by the successes of protein language models, we introduce RiboNucleic Acid Language Model (RiNALMo) to unveil the hidden code of RNA. RiNALMo is the largest RNA language model to date, with 650M parameters pre-trained on 36M non-coding RNA sequences from several databases. It can extract hidden knowledge and capture the underlying structure information implicitly embedded within the RNA sequences. RiNALMo achieves state-of-the-art results on several downstream tasks. Notably, we show that its…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRNA and protein synthesis mechanisms · Machine Learning in Bioinformatics · Genomics and Phylogenetic Studies
