RNAMunin: A Deep Machine Learning Model for Non-coding RNA Discovery
Lauren Lui, Torben Nielsen

TL;DR
RNAMunin is a scalable, fast deep learning model that identifies non-coding RNAs directly from genomic sequences without transcriptomics data, aiding comprehensive genome annotation.
Contribution
It introduces RNAMunin, a novel, efficient ML model capable of detecting ncRNAs from large-scale genomic data solely based on sequence information.
Findings
Able to process large metagenomic datasets efficiently
Detects ncRNAs without requiring transcription data
Operates with approximately 1 million parameters
Abstract
Functional annotation of microbial genomes is often biased toward protein-coding genes, leaving a vast, unexplored landscape of non-coding RNAs (ncRNAs) that are critical for regulating bacterial and archaeal physiology, stress response and metabolism. Identifying ncRNAs directly from genomic sequence is a paramount challenge in bioinformatics and biology, essential for understanding the complete regulatory potential of an organism. This paper presents RNAMunin, a machine learning (ML) model that is capable of finding ncRNAs using genomic sequence alone. It is also computationally viable for large sequence datasets such as long read metagenomic assemblies with contigs totaling multiple Gbp. RNAMunin is trained on Rfam sequences extracted from approximately 60 Gbp of long read metagenomes from 16 San Francisco Estuary samples. We know of no other model that can detect ncRNAs based solely…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRNA and protein synthesis mechanisms · Cancer-related molecular mechanisms research · RNA modifications and cancer
