Character-level Tokenizations as Powerful Inductive Biases for RNA Foundational Models
Adri\'an Morales-Pastor, Raquel V\'azquez-Reza, Mi{\l}osz Wiecz\'or,, Cl\`audia Valverde, Manel Gil-Sorribes, Bertran Miquel-Oliver, \'Alvaro, Ciudad, Alexis Molina

TL;DR
This paper introduces ChaRNABERT, a novel learnable tokenization-based RNA foundational model that achieves state-of-the-art results in RNA-related tasks, addressing a significant gap in computational biology.
Contribution
The paper presents ChaRNABERT, a sample- and parameter-efficient RNA foundational model with learnable tokenization, improving performance on multiple RNA benchmarks and interaction prediction tasks.
Findings
Achieved state-of-the-art performance on RNA benchmarks.
Effective in RNA-protein and aptamer-protein interaction prediction.
Models are sample- and parameter-efficient.
Abstract
RNA is a vital biomolecule with numerous roles and functions within cells, and interest in targeting it for therapeutic purposes has grown significantly in recent years. However, fully understanding and predicting RNA behavior, particularly for applications in drug discovery, remains a challenge due to the complexity of RNA structures and interactions. While foundational models in biology have demonstrated success in modeling several biomolecules, especially proteins, achieving similar breakthroughs for RNA has proven more difficult. Current RNA models have yet to match the performance observed in the protein domain, leaving an important gap in computational biology. In this work, we present ChaRNABERT, a suite of sample and parameter-efficient RNA foundational models, that through a learnable tokenization process, are able to reach state-of-the-art performance on several tasks in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRNA and protein synthesis mechanisms · DNA and Nucleic Acid Chemistry · Genomics and Chromatin Dynamics
