A Comparative Review of RNA Language Models
He Wang, Yikun Zhang, Jie Chen, Jian Zhan, Yaoqi Zhou

TL;DR
This paper reviews and compares 13 RNA language models across different classes, highlighting their strengths and weaknesses in structure prediction and functional classification, and emphasizes the need for balanced training.
Contribution
It categorizes RNA LMs into three classes and provides a comprehensive comparison with DNA and protein LMs, revealing their varied performance in key tasks.
Findings
Models excel in either structure prediction or function classification, but not both.
Balanced unsupervised training is necessary for improved overall performance.
RNA LMs show diverse capabilities depending on their training focus.
Abstract
Given usefulness of protein language models (LMs) in structure and functional inference, RNA LMs have received increased attentions in the last few years. However, these RNA models are often not compared against the same standard. Here, we divided RNA LMs into three classes (pretrained on multiple RNA types (especially noncoding RNAs), specific-purpose RNAs, and LMs that unify RNA with DNA or proteins or both) and compared 13 RNA LMs along with 3 DNA and 1 protein LMs as controls in zero-shot prediction of RNA secondary structure and functional classification. Results shows that the models doing well on secondary structure prediction often perform worse in function classification or vice versa, suggesting that more balanced unsupervised training is needed.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Machine Learning in Bioinformatics
