# PHA synthase variant design using a conditional variational autoencoder

**Authors:** Tuula Tenkanen, Anna Ylinen, Paula Jouhten, Merja Penttilä, Sandra Castillo

PMC · DOI: 10.1371/journal.pcbi.1014087 · PLOS Computational Biology · 2026-03-19

## TL;DR

Researchers used an AI model to design new PHA synthase enzymes, successfully creating two active variants that can produce a plastic-like material.

## Contribution

This is the first use of a conditional variational autoencoder to design novel PHA synthase enzymes with functional activity.

## Key findings

- A conditional variational autoencoder was used to generate thousands of PHA synthase enzyme variants.
- Two of the 16 selected enzymes were confirmed active and produced poly(hydroxybutyrate) in yeast.
- The active enzymes had 87 and 98 amino acid substitutions compared to native enzymes.

## Abstract

Polyhydroxyalkanoate (PHA) synthases are a group of complex, dimeric enzymes which catalyze polymerization of R-hydroxyacids into PHAs. PHA properties depend on their monomer composition but enzymes found in nature have limited specificities to certain R-hydroxyacids only. In this study, a conditional variational autoencoder was used for the first time to design novel PHA synthases. The model was trained with native protein sequences obtained from Uniprot and was used for the creation of approximately 10 000 new PHA synthase enzymes. Out of these, 16 sequences were selected for in vivo validation. The selection criteria included the presence of conserved residues such as catalytic amino acids and amino acids in the dimer interface and structural features like the number of α-helices in the N-terminal part of the enzyme. Two of the 16 novel PHA synthases that had substantial numbers of amino acid substitutions (87 and 98) with respect to the most similar native enzymes were confirmed active and produced poly(hydroxybutyrate) (PHB) when expressed in yeast S. cerevisiae. The results show the power of AI based methods to create active variants of highly complex dimer enzymes.

Enzymes found in nature are limited to the ones that have been beneficial for life during evolution. However, enzymes as proteins whose function arises from their structure are not limited to the ones existing in nature. Therefore, protein design calls for intelligent methods that generate proteins that are expressed, fold, and are active. In this work we developed a deep generative model for PHA synthase variant design. Deep generative models generate new data that resembles the training data. We trained our model using natural polyhydroxyalkanoate (PHA) synthases to generate novel PHA synthase variants. PHA synthases use various monomers to polymerize PHA that has potential as oil-based plastic replacement material. We analyzed the activity of 16 novel PHA synthases we designed and found two of them active. The two active enzymes contained 87 and 98 amino acid substitutions compared with the closest native PHA synthases. Our work paves the way for the design of novel PHA synthase variants and other enzymes of application interest.

## Linked entities

- **Chemicals:** PHB (PubChem CID 135)

## Full-text entities

- **Genes:** LBR (lamin B receptor) [NCBI Gene 3930] {aka C14SR, DHCR14B, LMN2R, PHA, PHASK, TDRD18}, PGK1 (phosphoglycerate kinase) [NCBI Gene 850370], PAH (phenylalanine hydroxylase) [NCBI Gene 5053] {aka PH, PKU, PKU1}, EGH1 (hydrolase) [NCBI Gene 854824]
- **Chemicals:** Nile red (MESH:C044808), cellobiose (MESH:D002475), water (MESH:D014867), PHAs (MESH:D054813), glucose (MESH:D005947), lipids (MESH:D008055), TEG (MESH:C028914), LiAc (MESH:C488804), sulfuric acid (MESH:C033158), alanine (MESH:D000409), PHB (MESH:C000720856), NADH (MESH:D009243), xylose (MESH:D014994), 3-hydroxybutyric acid 1,3-13C2 (-), 3-hydroxybutyrate (MESH:D020155), carbon (MESH:D002244), methanol (MESH:D000432), oil (MESH:D009821), DMSO (MESH:D004121), amino acid (MESH:D000596), acid (MESH:D000143), 3-hydroxybutyryl-CoA (MESH:C030372), Chloroform (MESH:D002725)
- **Species:** Homo sapiens (human, species) [taxon 9606], Chromobacterium sp. (species) [taxon 306190], Saccharomyces cerevisiae (baker's yeast, species) [taxon 4932], Legionella shakespearei (species) [taxon 45075], Janthinobacterium lividum (species) [taxon 29581], Janthinobacterium sp. (species) [taxon 1871054], Cupriavidus necator (species) [taxon 106590], Brevundimonas sp. (species) [taxon 1871086], Legionella sp. (species) [taxon 459], Chromobacterium sp. USM2 (species) [taxon 611307]
- **Mutations:** C149S, serine is replaced with cysteine, E2611S
- **Cell lines:** PhaCVAE2 — Homo sapiens (Human), Colon carcinoma, Cancer cell line (CVCL_A628), H3887 — Homo sapiens (Human), Transformed cell line (CVCL_X113), PhaA — Homo sapiens (Human), Human papillomavirus-related endocervical adenocarcinoma, Cancer cell line (CVCL_B1NB), PhaCVAE6 — Homo sapiens (Human), Tongue squamous cell carcinoma, Cancer cell line (CVCL_5985), -7D — Bos taurus (Bovine), Transformed cell line (CVCL_B6P4)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13020758/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13020758/full.md

## References

64 references — full list in the complete paper: https://tomesphere.com/paper/PMC13020758/full.md

---
Source: https://tomesphere.com/paper/PMC13020758