TL;DR
CodonMPNN is a novel method that generates organism-specific and codon-optimized DNA sequences conditioned on protein structures, improving expression yields and maintaining inverse folding performance.
Contribution
It introduces CodonMPNN, a new approach for generating codon sequences conditioned on protein structure and organism, enhancing expression efficiency in protein engineering.
Findings
CodonMPNN outperforms baselines in recovering wild-type codons.
It generates higher-fitness codon sequences more frequently.
Maintains performance of previous inverse folding methods.
Abstract
Generating protein sequences conditioned on protein structures is an impactful technique for protein engineering. When synthesizing engineered proteins, they are commonly translated into DNA and expressed in an organism such as yeast. One difficulty in this process is that the expression rates can be low due to suboptimal codon sequences for expressing a protein in a host organism. We propose CodonMPNN, which generates a codon sequence conditioned on a protein backbone structure and an organism label. If naturally occurring DNA sequences are close to codon optimality, CodonMPNN could learn to generate codon sequences with higher expression yields than heuristic codon choices for generated amino acid sequences. Experiments show that CodonMPNN retains the performance of previous inverse folding approaches and recovers wild-type codons more frequently than baselines. Furthermore, CodonMPNN…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
