De novo generation of functional terpene synthases using TpsGPT
Hamsini Ramanathan, Roman Bushuiev, Matou\v{s} Sold\'at, Jir\'i Kohout, T\'eo Hebra, Joshua David Smith, Josef Sivic, Tom\'a\v{s} Pluskal

TL;DR
This paper presents TpsGPT, a novel protein language model fine-tuned on TPS sequences, capable of generating functional terpene synthases de novo, validated through computational metrics and experimental activity assays.
Contribution
Introduces TpsGPT, the first scalable generative model for de novo TPS enzyme design, combining protein language modeling with rigorous validation and experimental confirmation.
Findings
Generated 28k candidate sequences with diverse structures.
Identified 7 promising TPS candidates satisfying all validation metrics.
Confirmed enzymatic activity in at least 2 de novo designed sequences.
Abstract
Terpene synthases (TPS) are a key family of enzymes responsible for generating the diverse terpene scaffolds that underpin many natural products, including front-line anticancer drugs such as Taxol. However, de novo TPS design through directed evolution is costly and slow. We introduce TpsGPT, a generative model for scalable TPS protein design, built by fine-tuning the protein language model ProtGPT2 on 79k TPS sequences mined from UniProt. TpsGPT generated de novo enzyme candidates in silico and we evaluated them using multiple validation metrics, including EnzymeExplorer classification, ESMFold structural confidence (pLDDT), sequence diversity, CLEAN classification, InterPro domain detection, and Foldseek structure alignment. From an initial pool of 28k generated sequences, we identified seven putative TPS enzymes that satisfied all validation criteria. Experimental validation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPlant biochemistry and biosynthesis · Microbial Natural Products and Biosynthesis · Computational Drug Discovery Methods
