Pre-trained protein language model for codon optimization

Shashank Pathak; Guohui Lin

arXiv:2412.10411·q-bio.QM·December 17, 2024

Pre-trained protein language model for codon optimization

Shashank Pathak, Guohui Lin

PDF

Open Access

TL;DR

This paper explores using a pre-trained protein language model to optimize codon sequences in mRNA, improving stability and expression for vaccine applications by fine-tuning the model for specific ORF sequences.

Contribution

It introduces a novel approach of fine-tuning a pre-trained protein language model for codon optimization, enhancing mRNA vaccine design.

Findings

01

Generated ORFs outperform natural counterparts in stability and expression metrics.

02

Enhanced performance on benchmark ORFs for SARS-CoV-2 and VZV.

03

Demonstrates potential for tailored ORF design in mRNA vaccines.

Abstract

Motivation: Codon optimization of Open Reading Frame (ORF) sequences is essential for enhancing mRNA stability and expression in applications like mRNA vaccines, where codon choice can significantly impact protein yield which directly impacts immune strength. In this work, we investigate the use of a pre-trained protein language model (PPLM) for getting a rich representation of amino acids which could be utilized for codon optimization. This leaves us with a simpler fine-tuning task over PPLM in optimizing ORF sequences. Results: The ORFs generated by our proposed models outperformed their natural counterparts encoding the same proteins on computational metrics for stability and expression. They also demonstrated enhanced performance against the benchmark ORFs used in mRNA vaccines for the SARS-CoV-2 viral spike protein and the varicella-zoster virus (VZV). These results highlight the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Bioinformatics · RNA and protein synthesis mechanisms · Genomics and Phylogenetic Studies