RNA-GPT: Multimodal Generative System for RNA Sequence Understanding
Yijia Xiao, Edward Sun, Yiqiao Jin, Wei Wang

TL;DR
RNA-GPT is a multimodal generative system that leverages large language models and extensive RNA literature to facilitate RNA sequence understanding and research, offering a scalable and automated approach.
Contribution
This paper introduces RNA-GPT, a novel multimodal RNA chat model that integrates RNA sequence encoders with LLMs and a new RNA-QA dataset for improved RNA research tools.
Findings
RNA-GPT effectively handles complex RNA queries.
RNA-QA dataset contains 407,616 RNA samples for training.
The system streamlines RNA discovery and research processes.
Abstract
RNAs are essential molecules that carry genetic information vital for life, with profound implications for drug development and biotechnology. Despite this importance, RNA research is often hindered by the vast literature available on the topic. To streamline this process, we introduce RNA-GPT, a multi-modal RNA chat model designed to simplify RNA discovery by leveraging extensive RNA literature. RNA-GPT integrates RNA sequence encoders with linear projection layers and state-of-the-art large language models (LLMs) for precise representation alignment, enabling it to process user-uploaded RNA sequences and deliver concise, accurate responses. Built on a scalable training pipeline, RNA-GPT utilizes RNA-QA, an automated system that gathers RNA annotations from RNACentral using a divide-and-conquer approach with GPT-4o and latent Dirichlet allocation (LDA) to efficiently handle large…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRNA and protein synthesis mechanisms
