Engineering of plant-derived P450scc for de novo biosynthesis of pregnenolone
Zikai Chao, Qihang Chen, Wenbao Zhao, Jianhong He, Qi Li, Tianqi Gao, Wenqian Wei, Song Liu, Jingwen Zhou, Weizhu Zeng, Sha Xu

TL;DR
This study engineered plant enzymes to efficiently produce pregnenolone, a steroid precursor, in yeast, achieving a major breakthrough in scalable biosynthesis.
Contribution
The study introduces a novel enzyme engineering strategy for plant-derived P450scc to enable high-efficiency pregnenolone biosynthesis in yeast.
Findings
Structure-guided engineering of DlCYP87A improved catalytic efficiency in microbial systems.
A 1.46 g/L pregnenolone titer was achieved in a 5-liter fermentation system.
This is the first gram-scale de novo biosynthesis of pregnenolone using engineered yeast.
Abstract
In contrast to the extensively researched animal CYP11A1 system, the catalytic mechanism of sterol side-chain cleavage by plant-derived cytochrome P450scc enzymes remains poorly understood. Through the integration of computational structural biology and enzyme channel engineering, this study successfully elucidated the key intermediates in the stepwise hydroxylation-cleavage catalytic process of Digitalis purpurea-derived DlCYP87A enzyme. Building on this foundation, we implemented structure-guided rational design to precisely engineer the substrate channel and catalytic pocket, systematically delineating their structure-activity relationships, which ultimately overcame the critical catalytic bottleneck of low conversion efficiency in heterologous microbial systems expressing plant-derived P450scc. This study established an efficient steroid synthesis system in Saccharomyces cerevisiae…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPharmacogenetics and Drug Metabolism · Steroid Chemistry and Biochemistry · Plant tissue culture and regeneration
Introduction
1
Steroid hormones such as estrogens and androgens are central regulators of reproduction, development, and systemic physiology [1,2]. They influence processes ranging from sexual differentiation and fertility to bone homeostasis, immune balance, and neural function [3]. Given these broad activities, steroid hormones are also major therapeutic agents in endocrinology, oncology, and hormone replacement therapy [4]. A unifying feature of their biosynthesis is the conversion of sterol substrates into pregnenolone (Prn), which is the universal precursor of all downstream steroid products [5,6]. Considering the difficulty in this transformation, side-chain cleavage is widely regarded as the rate-limiting step in steroid hormone biosynthesis [7]. Consequently, improving the efficiency of this process is essential for not only understanding steroidogenesis but also enabling large-scale industrial production of steroid intermediates and drugs [8,9].
Previous efforts to reconstruct Prn biosynthesis have focused almost exclusively on animal-derived P450scc systems [10]. In vertebrates, the side-chain cleavage complex consists of CYP11A1 together with its redox partners adrenodoxin (ADX) and adrenodoxin reductase (ADR) [11]. Mechanistic studies have provided detailed descriptions of substrate binding, oxygen activation, and formation of the highly reactive Compound I intermediate. Despite this abundant biochemical knowledge, translating the vertebrate system into microbial hosts such as Saccharomyces cerevisiae has proven difficult [12]. The need for mitochondrial targeting signals, proper electron transfer through [2Fe–2S] clusters, and balanced cofactor supply severely constrains activity in non-native environments [13]. Reported titers of Prn remain low, with previous studies achieving only tens of milligrams per liter and even optimized systems yielding less than one gram per liter under bioreactor conditions [11]. These limitations reflect both the structural complexity of CYP11A1 and the metabolic burden imposed on engineered microbial cells. As a result, vertebrate-derived pathways have struggled to meet the demands of industrial-scale production, thereby highlighting the need for alternative catalytic systems [14,15].
The discovery of plant-derived side-chain cleavage enzymes has opened promising new avenues [16]. Enzymes from the CYP87A family, which were first identified in cardiac glycoside–producing plants such as Digitalis, can cleave sterol side chains to yield Prn [17]. Unlike vertebrate CYP11A1, plant CYP87A enzymes operate with cytochrome P450 reductase (CPR) and localize to the endoplasmic reticulum rather than mitochondria [18]. This simpler architecture circumvents some of the challenges associated with electron transfer in microbial hosts. Despite these advantages of plant CYP87A enzymes, research publications and rational engineering efforts in this field have remained substantially underdeveloped compared to the extensive exploration of animal-derived P450scc systems since the initial discovery of CYP87A activity in 1972 [19]. Therefore, elucidating the heterologous adaptation mechanisms of plant P450scc in microbial systems is of critical importance for developing efficient steroid biosynthesis technology platforms [20].
In this study, we significantly enhanced the catalytic efficiency of DlCYP87A—a plant-derived P450scc from Digitalis purpurea—through systematic substrate channel reconstruction and precision engineering of the catalytic pocket. By integrating transcriptomics-guided optimization of yeast organelles, we achieved high-level production of 1.46 g/L pregnenolone (Prn) in a 5-L bioreactor. This work represents the first gram-scale biosynthesis of Prn via a plant-derived P450scc pathway, establishing a key technological foundation for the industrial biomanufacturing of steroid compounds.
Materials and methods
2
Chemicals
2.1
Standards of pregnenolone, cholesterol, and campesterol were purchased from Macklin Reagent (Shanghai, China). Sodium sulfate (Na_2_SO_4_), sodium chloride (NaCl), dipotassium phosphate (K_2_HPO_4_), monopotassium phosphate (KH_2_PO_4_), tryptone, glucose, yeast extract, glycerol, isopropyl β-d-1-thiogalactopyranoside (IPTG), 5-aminolevulinic acid hydrochloride, hydrochloric acid (HCl), sulfuric acid (H_2_SO_4_), potassium hydroxide (KOH), and sodium hydroxide (NaOH) (all of analytical reagent grade) were purchased from Sangon Biotech (Shanghai, China).
Gene synthesis and plasmid construction
2.2
A Gibson Assembly kit (TransGen, Beijing, China) was used to insert the required gene fragments into the vector. Detailed information on the primers is provided in Supplementary Material 1. All primers and constructed plasmids were verified by Sanger sequencing (Sangon Biotechnology Co., Ltd., Shanghai, China). Codon-optimized sequences of DHCR7, DHCR24, CYP11A1, ADR, ADX, CYP87A, and CPR, along with the corresponding primers, were synthesized by Sangon Biotechnology Co., Ltd. (Tables S1 and S2). Escherichia coli JM109 (TransGen, Beijing, China) was used as the host for gene cloning and protein expression.
Culturing S. cerevisiae for Prn production
2.3
S. cerevisiae fermentation was categorized into two types: using free plasmids and genes integrated into the genome. When S. cerevisiae was fermented using the free plasmid pY26-1 with a uracil marker (Ura), Ura-deficient yeast nitrogen base (YNB) medium was used. The medium contained 20 g/L glucose, 1.74 g/L amino acid–free YNB, 5 g/L ammonium sulfate, 0.05 g/L leucine, 0.05 g/L histidine, 0.05 g/L tryptophan, and 0.05 g/L uracil. When S. cerevisiae was fermented by integrating genes into the genome, a yeast extract peptone dextrose (YPD) medium was used, consisting of 20 g/L peptone, 10 g/L yeast extract, and 20 g/L glucose. Shake flask cultivation experiments were performed using 250-mL flasks, with each flask containing 25 mL of YPD or YNB medium. The seeds cultivated for 17 h were inoculated at 1% (v/v) into 25 mL of YPD or YNB medium and then incubated at 220 rpm and 30 °C for 72 h.
The medium included 40 g/L glucose, 10 g/L yeast extract, 20 g/L peptone, and 1.2 g of FeSO_4_·7H_2_O (sterilized by filtration and added at the time of inoculation) for fermentation in a 5-L bioreactor containing 2.5 L of YPD medium. Also, an antifoaming agent at a concentration of 0.1‰ was added before sterilizing the medium (150 μL of antifoam added, ensuring no foaming during fermentation, with sterilization at 115 °C for 20 min). The pH was maintained between 5.4 and 5.6 using 5 M NaOH throughout the process. The feed medium (1.5 L) comprised 800 g/L glucose, 18 g/L KH_2_PO_4_, 10.24 g/L MgSO_4_·7H_2_O, 7 g/L K_2_SO_4_, and 0.56 g/L NaSO_3_, sterilized at 115 °C for 20 min. Trace element solution A (metal ions) was added at a rate of 20 mL/L and trace element solution B (vitamins) at a rate of 24 mL/L, both sterilized by filtration. Additionally, 800 mL of nitrogen source medium was prepared, including 160 g of peptone and 80 g of yeast extract. The steps for preparing the seed culture for the fermentation tank were as follows. A few fresh yeast colonies were picked up using an inoculation loop. These colonies were inoculated into a flask containing 10 mL of YPD medium. The flask was incubated at 220 rpm and 30 °C for 17–20 h to prepare the primary seed culture. Then, 1.5% of the primary seed culture was transferred into a 500-mL flask containing 200 mL of YPD medium. This flask was incubated at 220 rpm and 30 °C for an additional 17–20 h to prepare the secondary seed culture, which was subsequently used to inoculate the fermentation tank. We ensured that the medium for the seed cultures was kept consistent with that used in the fermentation tank.
The fermentation process of Prn biosynthesis in the 5-L bioreactor was divided into three stages. In stage 1, the medium contained 40 g/L glucose, 10 g/L yeast extract, and 20 g/L tryptone, and fermentation was conducted at 220 rpm and 30 °C for about 16 h. In stage 2, after the initial 40 g/L glucose was consumed, 800 g/L glucose was gradually added. Moreover, the 800 g/L glucose was added at a rate of 10 mL/h during the 16–72 h of fermentation. In stage 3, the glucose flow rate was reduced to 2 mL/h in 72–144 h to ensure that the accumulation of alcohol in the fermentation system had no impact on cell growth. Nitrogen sources were added after 72 h for promoting cell growth and product accumulation.
Extraction of pregnenolone from S. cerevisiae
2.4
The culture broth (1 mL) was sampled, transferred to a 2-mL crushing tube, and centrifuged at 4000 rpm for 20 min. Then, the supernatant was discarded. Subsequently, a 30% potassium hydroxide saponification solution was prepared in 90% ethanol. Furthermore, 2 mL of MP with bacterial cells was added to 1 mL of saponification solution and heated to 88 °C for 3 h to allow a reflux saponification reaction to occur. Subsequently, the saponification solution was transferred into a 5-mL centrifuge tube, and 1 mL of water and 1 mL of ether were added. The mixture was vigorously shaken for 10 min and centrifuged at 12,000 rpm and 4 °C for 20 min. Then, 500 μL of the supernatant was collected for subsequent analysis using gas chromatography, HPLC, and HPLC coupled with mass spectrometry (HPLC-MS). Methanol-d4 was used as the solvent for nuclear magnetic resonance (NMR) analysis, whereas all other conditions remained the same.
Phylogenetic analysis of CYP87A
2.5
All residue sequences of CYP87A were downloaded from the NCBI database and saved in Fast Alignment Search Tool-All format. The multiple-sequence alignments of CYP87A from various species were conducted using ClustalX 2.0. Phylogenetic trees were constructed using Molecular Evolutionary Genetics Analysis (MEGA) 11.0 and the neighbor-joining method.
Molecular docking, and molecular dynamics simulation
2.6
The structure of DlCYP87A was obtained using AlphaFold3. Moreover, all DlCYP87A variant structures were rebuilt in SWISS-MODEL using the wild-type protein structure as the template. The molecular model of campesterol was generated using ChemSketch. Docking of the substrate with DlCYP87A was performed using AutoDock Vina. Campesterol was docked into the active site of DlCYP87A by defining the docking box within a range of 7 Å above the heme center. A total of 50,000 snapshots were uniformly extracted from the 100-ns MD simulations at 2-ps time intervals and grouped into 10 clusters by a bottom-up hierarchical agglomerative method. The optimized substrate was docked into the active site of representative snapshots from each cluster to simulate the ligand–protein complex. Molecular docking was performed using the Lamarckian genetic algorithm with a local search, keeping the receptor rigid while allowing the substrate to rotate freely around all its rotatable bonds. A total of 500 independent docking runs were conducted. The resulting 500 conformations were clustered at an RMSD of 2.0 Å and ranked using the energy scoring function. Possible catalytically active binding modes were selected as the initial conformations for protein–substrate complex MD simulations based on the scoring function and reasonable conformations. Generally, the conformation with the lowest binding free energy (kcal/mol) was selected as the optimal docking result.
The ligand coordinates were optimized in Gaussian 09 software at the B3LYP level of theory with the 6-31G∗ basis set. The MD parameters of the ligand were analyzed using the Antechamber tool under the General Amber Force Field, with charges fitted using the RESP method. The protein topology was described by the Amber 99SB-ILDN all-atom force field. After energy minimization, NVT, and NPT equilibration, each protein–ligand system was subjected to a 100-ns MD simulation in GROMACS 2021.5 under periodic boundary conditions. The system temperature was maintained at 298 K using the Nose–Hoover thermostat. Nonbonded van der Waals interactions were treated with a switching function from 1.2 to 1.35 nm. Long-range electrostatic interactions were calculated using the Particle Mesh Ewald method with a cutoff of 1.2 nm separating real and reciprocal space. Bond length constraints were applied using the Linear Constraint Solver algorithm. The protein–ligand complexes were solvated in an SPC water box containing sodium and chloride ions. The integration step was set to 2 fs, and data were saved every 4 ps. Simulation snapshots were visualized using VMD software. Atomic distances were calculated for the 0- to 100-ns trajectories by selecting atoms through index files and using the “gmx distance” command. Residue binding free energies of CYP87A between 80 and 100 ns were calculated using the gmx_mmPBSA toolkit. The averaged values were taken as the final results.
Gas chromatography and HPLC-MS analysis for fermentation products
2.7
Fermentation products, including campesterol, and Prn were quantified using gas chromatography (GC). GC-MS analysis was performed on a Shimadzu TQ8050 NX instrument (Shimadzu, Japan) equipped with a flame ionization detector (FID) and an AI1310 autosampler, using an HP-5 MS column (30 m length, 0.25 mm internal diameter, 0.25 μm film thickness). Helium was used as the carrier gas at a flow rate of 1 mL/min. The injector temperature was set at 300 °C, and the FID temperature was 280 °C. The oven temperature program was as follows: held at 220 °C for 1 min, increased at a rate of 20 °C/min to 300 °C, and then held for 5 min.
HPLC-MS analysis was carried out on an Agilent 1290II-6460 system using a C18 column (Agilent EP-C18, 2.1 × 50 mm^2^, 1.8 μm). Mobile phase A was acetonitrile, and mobile phase B was a 0.5% trifluoroacetic acid aqueous solution. The liquid chromatography conditions were as follows: flow rate, 1 mL/min; column temperature, 30 °C; and detection wavelengths, 241 and 254 nm. The gradient elution program was as follows: 0–15 min, 5% A/95% B → 80% A/20% B; 15–20 min, 80% A/20% B; 20–30 min, 80% A/20% B → 5% A/95% B. The total runtime for each sample was 30 min.
The MS acquisition parameters were as follows: ion source, ESI mode; MS2 scan polarity, positive; m/z range, 50–600; ion source temperature, 350 °C; nebulizer gas flow rate, 10 L/min; pressure, 45 psi; capillary voltage, 4000 V; fragmentor voltage, 100 V. Purified samples (20 mg/mL) were dissolved in methanol-d4 and analyzed by NMR spectroscopy using a Bruker Avance III 600 MHz spectrometer (Bruker BioSpin, Karlsruhe, Germany). 1H NMR spectra were recorded at 500 MHz, and 13C NMR spectra were recorded at 126 MHz, with methanol-d4 as the solvent. The experimental data were processed and analyzed using MestReNova software (version 14.0).
RNA-seq analysis
2.8
Total RNA was extracted from S. cerevisiae strain C800 (CEN.PK2-1D; MATα; ura3-52; his3Δ1; trp1-289; leu2-3112; MAL2-8C; SUC2; gal80KanMX) and its engineered derivatives using the TRIzol reagent (Invitrogen, USA) following the manufacturer's protocols. The RNA integrity and purity were assessed by agarose gel electrophoresis and with a NanoDrop 2000 spectrophotometer (Thermo Fisher Scientific, USA). RNA integrity numbers (RINs) were determined using an Agilent 2100 Bioanalyzer (Agilent Technologies, USA). Only RNA samples with RIN values greater than 7.0 were used for subsequent library preparation.
Sequencing libraries were generated using the Illumina TruSeq RNA sample preparation kit (Illumina, USA) following the standard protocol. Briefly, poly(A)-mRNA was enriched with oligo(dT) magnetic beads, fragmented into ∼200-bp fragments, and reverse-transcribed into cDNA. After end repair, A-tailing, and adaptor ligation, the cDNA fragments were amplified using PCR to complete the library construction.
Transcriptome sequencing of three representative samples (C800, M-4, and CA-26-10) was performed on the Illumina NovaSeq 6000 platform at Tsingke Biotechnology Co., Ltd. (Beijing, China). A total of 122,816,864 raw reads were generated, averaging 40,938,955 raw reads per sample. Following quality control with FastQC and Trimmomatic, an average of ∼40.25 million clean reads were retained for each sample group. The proportion of low-quality bases and adaptor contamination was negligible. Q30 values (base error rate <10^−3^) exceeded 98% across all datasets, confirming that the sequencing results were highly accurate and reliable. Clean reads were mapped to the reference genome of S. cerevisiae CEN.PK2-1D using HISAT2 (version 2.2.1). The gene expression levels were quantified in fragments per kilobase of transcript per million mapped reads (FPKM) using StringTie (version 2.1.4). The resulting high-quality transcriptome datasets provided a reliable foundation for downstream analyses, including differential expression analysis, functional annotation, and pathway enrichment.
Results
3
Screening P450scc for constructing the optimal synthesis pathway of Prn
3.1
We employed engineered strains D7-8 [21] and CA-26 [22] (constructed in Prof. Jingwen Zhou's laboratory) to systematically evaluate the substrate specificity of animal- and plant-derived P450scc systems toward cholesterol and campesterol pathways (Fig. 1a). These strains achieved de novo biosynthesis of 438.28 mg/L cholesterol and 425 mg/L campesterol, respectively, at the shake-flask level. In our previous study, we identified 26 animal-derived CYP11A1 variants and two plant-derived CYP87A enzymes. Among these, RgCYP87A exhibited the highest catalytic activity toward cholesterol, yielding a Prn titer of 97 mg/L. However, since plant-derived enzymes have been less explored, we performed a phylogenetic analysis of CYP87A homologs retrieved from the NCBI database using RgCYP87A as a query (Fig. 1b). Eight homologs from sesame, Digitalis lanata, Olea europaea var., Agrocybe aegerita, Andrographis paniculata, Flammulina velutipes, Nicotiana tomentosiformis, and Solanum torvum were selected for functional verification. These CYP87A homologs were co-expressed with Arabidopsis cytochrome P450 reductase 2 (ATR2) to reconstruct Prn biosynthesis from sterol substrates (Fig. 1c and d). The results demonstrated that DlCYP87A exhibited the highest activity toward campesterol, producing 127.6 mg/L Prn (Fig. S1, S2 and S3). This superior performance might be attributed to the shorter biosynthetic route of campesterol, which allows more efficient carbon flux delivery to Prn (Figs. S4 and S5).Fig. 1Introducing plant- and animal-derived P450scc for complete biosynthesis of Prn. (a) Two principal biosynthetic routes for de novo Prn production were established: one via cholesterol and the other via campesterol, herein referred to as the cholesterol and campesterol pathways, respectively. (b) Among the 100 genes identified, 8 representative genes were selected from various clades of the phylogenetic tree (Fig. S15). (c) Screening of both animal- and plant-derived P450scc enzymes in the campesterol pathway using strain CA-26 as the host. (d) Screening of animal- and plant-derived P450scc enzymes in the cholesterol pathway using strain D7-8 as the host. (e) N-terminal truncation of mitochondrial targeting peptides from representative P450scc candidates. Constructs labeled with “or” represent the full-length proteins containing their native targeting sequences, whereas constructs without “or” indicate the truncated versions lacking the N-terminal signal peptides. (f) Gas chromatography (GC) spectrum.Fig. 1
Given that S. cerevisiae lacks endogenous sterol transport into mitochondria, the N-terminal mitochondrial targeting sequences of both animal- and plant-derived P450scc enzymes were truncated. Blocking the ADX signal peptide moderately enhanced Prn production mediated by BmCYP11A1 (39.8 mg/L). In contrast, truncation of the targeting sequence had minimal effect on CYP87A activity, suggesting that its intrinsic sterol-binding or translocation capacity might compensate for the absence of the signal peptide. This observation also indicated that further improvements via signal peptide truncation were limited. Finally, comparative analysis of sterol utilization confirmed that strains harboring DlCYP87A displayed superior catalytic efficiency when supplied with campesterol versus cholesterol (Fig. 1e). Also, high-performance liquid chromatography (HPLC) analysis verified the accumulation of Prn along with expected sterol substrates and intermediates (Fig. 1f). Taken together, these results identified CYP87A from Digitalis lanata as the most promising plant-derived P450scc candidate for Prn biosynthesis. Therefore, it was selected for subsequent engineering optimization.
Rationally engineering the substrate entry channel of CYP87A
3.2
We first modeled its structure using AlphaFold3 and mapped the heme prosthetic group to further optimize the catalytic performance of DlCYP87A. Tunnel analysis with CAVER 3.0 revealed a pronounced bottleneck between 12 and 16 Å along the substrate channel (Fig. 2a). This constraint was alleviated by performing alanine scanning on residues within 5 Å of the bottleneck. Among these, I206A (M1) displayed a marked increase in catalytic efficiency, producing 153.7 mg/L Prn compared with the wild-type enzyme (Fig. 2b). Building on M1, we next reengineered residues adjacent to the tunnel. Several positions that initially reduced activity upon alanine substitution were replaced with functionally similar residues (Fig. 2d). However, none of these modifications enhanced catalytic output, indicating that this region of the substrate channel might already be structurally optimized and further diversification was deleterious. Molecular dynamics (MD) simulations were carried out on M1 to identify additional promising sites. Most residues lining the channel displayed stable RMSF values, but P213 and T458 exhibited elevated flexibility (0.29 and 0.33 nm, respectively), suggesting dynamic bottlenecks (Fig. 2c and e). Saturation mutagenesis of these sites generated new variants. P213S (M2) exhibited the highest Prn titer at 189.7 mg/L (Fig. S16), accompanied by reduced RMSF and RMSD values (Fig. 2f and S9), implying enhanced tunnel rigidity and structural integrity. Mutations at T458 did not improve Prn yield. Although mutations at the T458 site did not improve Prn yield (Fig. S14), it is noteworthy that the introduction of serine in the T458S (M3) variant resulted in superior catalytic efficiency compared to other amino acid substitutions at this position. Molecular dynamics simulations revealed that M3 exhibited higher overall RMSD values than M2 (Fig. 2f), suggesting that serine may modulate the catalytic process by introducing moderate conformational rigidity, though its stabilizing effect was weaker than the structural reinforcement conferred by the P213S (M2) mutation. Following the systematic engineering of the substrate channel, we shifted our focus to the design and optimization of the substrate-binding pocket.Fig. 2Engineering the substrate entry channel of CYP87A. (a) Substrate tunnel analysis using CAVER revealed a bottleneck at ∼13–14 Å, which guided tunnel engineering. (b) Alanine scanning within 5 Å of the tunnel identified I206A as a beneficial mutation. (c) Further tunnel engineering guided by RMSF analysis revealed P213 and T458 as mutation hotspots. (d) Saturation mutagenesis results of P213 in the M1 background. (e) Saturation mutagenesis results of T458 in the M1 background. (f) Comparative analysis of RMSF values between M1 and M2, and RMSD values between M2 and M3 from molecular dynamics simulations(Figs. S6, S7, and S8).Fig. 2
Engineering a functional switch in P450scc between dihydroxylation and side-chain cleavage
3.3
After reengineering the substrate access channel, we next focused on remodeling the catalytic pocket of DlCYP87A to further improve its catalytic efficiency. The goal was to optimize the spatial arrangement of the active site and facilitate proton transfer between the enzyme and the substrate. Structural inspection revealed two key microenvironments adjacent to the catalytic pocket: a hydrophobic region influencing substrate orientation and a hydrogen-bonding region stabilizing the transition state (Fig. 3a). Alanine scanning was performed on positions within 6 Å of the bound substrate to probe functional residues. Residues T123, D278, S287, and P354 emerged as essential contributors to catalytic performance (Fig. 3b). Guided by this analysis, we first performed hydrophobic remodeling. Introduction of the S111L mutation (M4) significantly improved Prn production to 204.9 mg/L, representing the highest titer observed thus far (Fig. 3c). MD simulations further demonstrated that M4 exhibited greater structural stability than M2, with reduced RMSD fluctuations and more stable substrate binding (Fig. 3d). We subsequently attempted to strengthen the hydrogen-bonding network at the catalytic site by introducing amino acid residues prone to forming strong hydrogen bonds (serine, threonine, tyrosine, and tryptophan). However, this strategy did not improve pregnenolone production (Fig. 3e). To investigate this outcome, we performed molecular dynamics simulations on the M5 (F281Y) mutant. The results revealed that the predicted number of hydrogen bonds increased relative to the wild-type (Fig. 3g), but the distance between the substrate and the Fe–O catalytic center concurrently increased (Fig. 3f). These findings demonstrate that hydrophobic optimization effectively enhanced catalytic efficiency by reshaping the catalytic pocket, whereas M5, despite strengthening hydrogen-bonding interactions between the enzyme and the substrate, adversely affected the distance between the substrate and the Fe–O center.Fig. 3Structure-guided engineering and functional analysis of enzyme variants. (a) Structural model of the catalytic pocket showing substrate binding (blue stick) and highlighting three functional regions: hydrogen-bonding region (blue), hydrophobic region (red), and catalytic pocket (green). Insects illustrate representative residues involved in hydrogen-bond optimifzation (right: F-281, F-285, L-284, and L-357) and hydrophobic modification (left: S-111, S-123, S-358, and C-460). (b) Production levels of Prn (mg/L, purple bars) and corresponding cell density (OD_600_, gray bars) of single-point mutants targeting key residues. (c) Further analysis of Prn production and cell growth for additional mutants in the catalytic pocket. (d) Molecular dynamics (MD) simulations revealing RMSD trajectories of M2 and M4 variants (Figs. S10 and S11), reflecting structural stability. (e) Analysis of Prn production and cell growth through hydrogen bond network reinforcement in the catalytic pocket. (f) Comparison of substrate-iron center distances between M4 and M5 variants. (g) Time-resolved analysis of hydrogen bond formation during molecular dynamics simulations, comparing M4 and M5 variants (Figs. S12 and S13) and revealing variant-specific differences in hydrogen bonding patterns.Fig. 3
Optimizing organelle function through transcriptome-based circuit design
3.4
S. cerevisiae is a long-established and widely used host in the biotechnology industry. Its versatile adaptive mechanisms render it an attractive chassis for diverse bioprocesses; however, this flexibility often comes at a cost. To address the inherent functional constraints of yeast organelles, we proposed a systematic organelle engineering approach to enhance their metabolic and regulatory capacities. We implemented targeted engineering of mitochondria, endoplasmic reticulum (ER), and lipid droplets to enhance the performance of yeast cell factories. We performed transcriptomic analysis on S. cerevisiae C800 (CEN.PK2-1D; MATα; ura3-52; his3Δ1; trp1-289; leu2-3112; MAL2-8C; SUC2; gal80: KanMX) and its engineered derivatives. RNA sequencing of three representative samples (C800, M-3, and D7-8) yielded 122,816,864 raw reads, averaging 40,938,955 per sample. After quality control, ∼40.25 million clean reads remained per group, with Q30 scores exceeding 98% across all datasets, confirming sufficient integrity and quality for downstream analysis (Fig. 4a). Guided by these results, we systematically optimized Prn biosynthesis by targeting mitochondria, ER, and lipid droplets (Fig. 4b and c). Genes associated with mitochondrial complex III assembly and energy metabolism (COR1, RIP1, QCR6, and QCR7) were significantly downregulated. Independent expression of COR1 elevated the Prn titer to 236.2 mg/L (Fig. 4d), and the resulting strain was designated P4. Similarly, genes related to ER protein translocation and folding (EC61, SEC62, SEC63, SEC23, SEC24, SEC13, SEC31, KAR2, ERO1, PDI1, and IRE1) were consistently repressed, a condition known to activate the unfolded protein response. The ectopic expression of these genes identified ERO1 as the most effective candidate, improving Prn production to 340.7 mg/L while restoring cell density. The resulting strain was designated P5 (OD_600_ = 6.2) (Fig. 4e). In contrast, the overexpression of lipid droplet-associated genes (SEI1, LDB16, DGA1, LRO1, ARE2, and PLN1) failed to enhance Prn accumulation and even reduced cell density (Fig. 4f), indicating a limited role for lipid droplet remodeling under the current conditions. Transmission electron microscopy revealed that organelle engineering led to a marked expansion of the endoplasmic reticulum and a significant increase in the overall size of cellular organelles (Fig. 4g).Fig. 4Multi-organelle engineering strategies for improved Prn production. (a) Transcriptome analysis of engineered strains displaying differential gene expression. Left: volcano plot of differentially expressed genes; middle: MA plot of expression distribution; right: Venn diagram highlighting overlaps between comparisons. (b) Schematic representation of subcellular engineering approaches, including mitochondrial engineering, endoplasmic reticulum (ER) engineering, and lipid droplet (LD) engineering, with localization of cytochrome P450. (c) Heatmap of selected differentially expressed genes across strains (C800, CA-26-10, and M3), normalized by row Z-score. (d) Functional validation of mitochondrial engineering targets, showing Prn titers (mg/L, purple bars) and corresponding cell density (OD_600_, gray bars) for gene variants including COX and RIP1. (e) Functional validation of ER engineering targets, with production levels and growth profiles across gene variants involved in ER-associated functions. (f) Functional validation of lipid droplet engineering targets, including DGA1, ARE1, and PLIN1, showing effects on Prn accumulation and growth. (g) Electron microscopy images showed a clear enlargement of the endoplasmic reticulum and other organelles after organelle engineering.Fig. 4
Modulating CYP87A function via pH in a 5-liter fermenter
3.5
Since the key P450 enzyme steps in pregnenolone synthesis are highly dependent on NADPH (Fig. 5a), any factor affecting the supply of reducing power directly limits its yield. We hypothesize that a lower pH may redirect more reducing power toward the pregnenolone synthesis pathway by suppressing NADPH-consuming competing metabolism (such as the oxidative stress response). However, an excessively low pH can induce acid stress, thereby broadly inhibiting metabolism. To test this hypothesis and identify the optimal pH window, we designed three pH conditions (5.0, 5.5, and 6.0) to systematically investigate how pH fine-tunes the metabolic flux of pregnenolone by coordinating the allocation of reducing power and cellular stress tolerance (Fig. 5b). At pH 5.5, P4 displayed the most robust physiological performance, characterized by rapid glucose utilization, efficient ethanol reassimilation, and strong biomass accumulation, ultimately yielding the highest Prn titer (1.46 g/L) at 144 h. In contrast, pH 5.0 moderately slowed carbon conversion and Prn formation (Fig. 5c), whereas pH 6.0 caused pronounced metabolic disturbances, including prolonged ethanol accumulation (Fig. 5d), impaired OD_600_ increase, and markedly reduced Prn production. Overall, a mildly acidic environment (pH 5.5) optimally supported metabolic homeostasis and steroid biosynthesis, while deviations toward either higher or lower pH disrupted carbon flux distribution and diminished catalytic efficiency.Fig. 5Effect of culturepH****on fermentation performance of engineered strains P4. (a) Biosynthetic pathway of pregnenolone and P450scc catalyze reactions. (b)Batch fermentation profiles of strains P4 at pH 6. (c) Batch fermentation profiles of strains P4 at pH 5.5. (d) Batch fermentation profiles of strains P4 at pH 5.Fig. 5
Discussion
4
Prn biosynthesis in microbial systems represents a long-standing challenge due to the complex biochemistry of sterol side-chain cleavage and the fragile compatibility between eukaryotic P450 enzymes and heterologous hosts. Given the multi-layered and dynamic nature of cellular metabolism, indiscriminate engineering of upstream pathway genes risks disrupting metabolic homeostasis and inducing cellular burden. In this study, we adopted a targeted approach—focusing on rate-limiting steps through selective overexpression of key genes and rational engineering of specific enzymes—to optimize flux toward product formation without compromising host fitness. In this study, we established an effective strategy by integrating plant P450scc enzyme engineering, mechanistic elucidation, and organelle optimization in S. cerevisiae. Our findings not only enhance the practical production of Prn but also provide conceptual advances in exploiting plant P450s for steroid biosynthesis.
Previous studies on microbial Prn biosynthesis primarily relied on mammalian CYP11A1 [23], which required strict mitochondrial electron transfer partners (ADX and ADR) and suffered from poor expression and stability in yeast [24]. In contrast, plant CYP87A exhibited superior compatibility with the yeast host, demonstrating higher activity and reduced dependence on specialized partner proteins [25]. This difference likely stems from the evolutionary adaptation of plant P450scc enzymes to plastidial environments, where redox versatility and membrane association are less constrained than in mammalian mitochondria [26]. These features make plant P450scc enzymes particularly attractive for synthetic biology, where cross-kingdom compatibility is often a bottleneck [27]. Mutations such as I206A and P213S relieved steric hindrance and improved enzyme activity, underscoring the use of tunnel engineering in P450 catalysis. These insights enhance our understanding of plant-derived P450 catalysis and offer opportunities for rational enzyme redesign [28]. Through combined pocket engineering, we aimed to optimize the side-chain cleavage function of P450scc by enhancing hydrophobic interactions and hydrogen bond networks to stabilize substrate binding [29]. The results demonstrated that although modifications to the hydrophobic pocket moderately improved catalytic efficiency [30], the enhancement was limited.
Beyond enzyme-centric improvements, we demonstrated that host organelle function was crucial for Prn biosynthesis [31]. Transcriptomic analysis revealed stress responses in mitochondria and the endoplasmic reticulum, including reduced respiratory activity, redox imbalance, and protein-folding stress [32]. Among these factors, as a core subunit of mitochondrial respiratory chain complex III, overexpression of COR1 enhances mitochondrial electron transfer efficiency [33], increases ATP production, and alleviates the accumulation of reducing equivalents, thereby indirectly optimizing the energetic and redox conditions required for P450scc [34]. Targeted restoration of mitochondrial electron transport and ER protein-processing functions markedly improved enzyme stability and metabolic flux, increasing titers to 340.6 mg/L. This highlighted the importance of considering organelle homeostasis when reconstructing complex eukaryotic pathways in yeast. Attempts to engineer lipid droplets were less effective, suggesting that precursor storage was not the limiting factor under our conditions [35]. Instead, redox management and protein quality control emerged as dominant bottlenecks [36]. This suggests that in the reconstruction of complex eukaryotic metabolic pathways, future metabolic engineering strategies should not only optimize the activity of key enzymes but also take into account the overall homeostasis of the host cells, including organelle function, redox balance, and protein quality control [37,38].
The effect of extracellular pH on enzyme activity and metabolic flux was investigated by adjusting the fermentation pH to 5.0, 5.5, or 6.0, after 72 h of cultivation. [39]. The results revealed that under moderately acidic conditions (pH 5.5), the key P450 enzyme systems appeared to maintain optimal activity, coupled with a more robust supply of NADPH reducing equivalents, which collectively contributed to a significant enhancement in steroid product synthesis [40]. However, when the pH dropped to 5.0, mild acid stress might have been induced, resulting in a decrease in the specific growth rate of the cells [41]. In contrast, under pH 6.0 conditions, the cells exhibited lower sugar consumption and a pronounced decrease in OD, confirming that both basal metabolism and growth were significantly inhibited [42]. This is likely associated with impaired enzyme activity and the activation of cellular stress responses. These observations are consistent with the inherent pH homeostasis mechanisms and regulatory patterns in yeast cells [43].
In summary, this study established plant-derived P450scc as a powerful catalyst for microbial Prn biosynthesis, which was supported by rational enzyme engineering, mechanistic validation, and organelle optimization. The insights gained in this study not only address long-standing challenges in steroid biosynthesis but also broaden the synthetic biology toolbox with new strategies for harnessing plant P450s. We bridged mechanistic enzymology with systems-level host engineering to achieve Prn titers that set a new benchmark for yeast-based production. This integrated framework has broad applicability for the biosynthesis of complex natural products and holds promise for advancing industrial-scale production of steroidal precursors and derivatives.
CRediT authorship contribution statement
Zikai Chao: Writing – review & editing, Writing – original draft, Visualization, Validation, Supervision, Software, Resources, Project administration, Methodology, Investigation, Funding acquisition, Formal analysis, Data curation, Conceptualization. Qihang Chen: Writing – review & editing, Writing – original draft, Visualization, Validation, Supervision, Resources, Project administration, Methodology, Investigation, Funding acquisition, Formal analysis, Conceptualization. Wenbao Zhao: Writing – review & editing, Formal analysis, Data curation. Jianhong He: Investigation, Data curation. Qi Li: Data curation. Tianqi Gao: Data curation. Wenqian Wei: Writing – review & editing, Supervision, Methodology. Song Liu: Writing – review & editing. Jingwen Zhou: Writing – review & editing, Writing – original draft, Supervision, Resources, Methodology, Formal analysis, Data curation. Weizhu Zeng: Writing – review & editing, Supervision, Resources, Investigation, Funding acquisition, Conceptualization. Sha Xu: Writing – review & editing, Writing – original draft, Supervision, Resources, Investigation, Funding acquisition, Formal analysis, Data curation, Conceptualization.
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Morohashi K.Baba T.Tanaka M.Steroid hormones and the development of reproductive organs Sex Dev 72012617910.1159/00034227222986257 · doi ↗ · pubmed ↗
- 2Al-Suhaimi E.A.Khan F.A.Homeida A.M.Regulation of Male and female reproductive functions Al-Suhaimi E.A.Emerging concepts in endocrine structure and functions 2022 Springer Nature Singapore Singapore 287347
- 3Auchus M.L.Auchus R.J.Human steroid biosynthesis for the oncologist J Invest Med 60201249550310.2310/JIM.0b 013e 3182408567 PMC 365318622222232 · doi ↗ · pubmed ↗
- 4Deli T.Orosz M.Jakab A.Hormone replacement therapy in cancer survivors – review of the literature Pathol Oncol Res 262020637810.1007/s 12253-018-00569-x 30617760 PMC 7109141 · doi ↗ · pubmed ↗
- 5Strauss J.F.III Martinez F.Kiriakidou M.Placental steroid hormone synthesis: unique features and unanswered Questions Biol Reprod 54199630331110.1095/biolreprod 54.2.3038788180 · doi ↗ · pubmed ↗
- 6Arukwe A.Steroidogenic acute regulatory (St AR) protein and cholesterol side-chain cleavage (P 450scc)-regulated steroidogenesis as an organ-specific molecular and cellular target for endocrine disrupting chemicals in fish Cell Biol Toxicol 24200852754010.1007/s 10565-008-9069-718398688 · doi ↗ · pubmed ↗
- 7Szentirmai A.Microbial physiology of sidechain degradation of sterols J Ind Microbiol 6199010111510.1007/BF 01576429 · doi ↗
- 8Chakraborty S.Pramanik J.Mahata B.Revisiting steroidogenesis and its role in immune regulation with the advanced tools and technologies Gene Immun 22202112514010.1038/s 41435-021-00139-3PMC 827757634127827 · doi ↗ · pubmed ↗
