A 5' UTR Language Model for Decoding Untranslated Regions of mRNA and Function Predictions
Yanyi Chu, Dan Yu, Yupeng Li, Kaixuan Huang, Yue Shen, Le Cong, Jason, Zhang, Mengdi Wang

TL;DR
This paper introduces UTR-LM, a specialized language model trained on 5' UTR sequences that significantly improves predictions of translation-related metrics and aids in designing more efficient therapeutic mRNA sequences.
Contribution
The study develops and fine-tunes a novel language model for 5' UTRs, incorporating structural data, and demonstrates its superior performance in predicting translation efficiency and identifying functional regions.
Findings
Outperforms benchmarks by up to 60% in translation efficiency prediction
Identifies unannotated internal ribosome entry sites with higher accuracy
Designs 5' UTRs that increase protein production by 32.5% in experiments
Abstract
The 5' UTR, a regulatory region at the beginning of an mRNA molecule, plays a crucial role in regulating the translation process and impacts the protein expression level. Language models have showcased their effectiveness in decoding the functions of protein and genome sequences. Here, we introduced a language model for 5' UTR, which we refer to as the UTR-LM. The UTR-LM is pre-trained on endogenous 5' UTRs from multiple species and is further augmented with supervised information including secondary structure and minimum free energy. We fine-tuned the UTR-LM in a variety of downstream tasks. The model outperformed the best-known benchmark by up to 42% for predicting the Mean Ribosome Loading, and by up to 60% for predicting the Translation Efficiency and the mRNA Expression Level. The model also applies to identifying unannotated Internal Ribosome Entry Sites within the untranslated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRNA and protein synthesis mechanisms · RNA modifications and cancer · Chemical Synthesis and Analysis
MethodsLib
