Synthetic G-quadruplex components for predictable, precise two-level control of mammalian recombinant protein expression

Melinda Pohle; Edward Curry; Ryan Holden; Suzanne Gibson; Adam Brown

PMC · DOI:10.1093/nar/gkaf732·July 30, 2025

Synthetic G-quadruplex components for predictable, precise two-level control of mammalian recombinant protein expression

Melinda Pohle, Edward Curry, Ryan Holden, Suzanne Gibson, Adam Brown

PDF

Open Access

TL;DR

This paper introduces synthetic G-quadruplex components that allow precise and predictable control of protein expression in mammalian cells, useful for biopharmaceuticals and gene therapy.

Contribution

The novel contribution is the development of synthetic G-quadruplex elements for two-level, tunable control of recombinant protein expression.

Findings

01

Synthetic G-quadruplex elements were designed to regulate protein expression over two orders of magnitude.

02

These components function predictably in different cell hosts and in both DNA and RNA contexts.

03

A vector design platform was created to optimize multichain protein expression ratios efficiently.

Abstract

Control of mammalian recombinant protein expression underpins the in vitro manufacture and in vivo performance of all biopharmaceutical products. However, routine optimization of protein expression levels in these applications is hampered by a paucity of genetic elements that function predictably across varying molecular formats and host cell contexts. Herein, we describe synthetic genetic components that are specifically built to simplify bioindustrial expression cassette design processes. Synthetic G-quadruplex elements with varying sequence feature compositions were systematically designed to exhibit a wide range of regulatory activities and inserted into identified optimal positions within a standardized, bioindustry compatible core promoter-5′UTR control unit. The resulting library tuned protein production rates over two orders of magnitude, where DNA and RNA G-quadruplexes could…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Figures5

Click any figure to enlarge with its caption.

Endogenous G4 elements negatively regulate recombinant protein expression rates in a position-dependent manner. (A) Schematic presentation of general position-dependent mechanisms by which G4s in core promoter and 5′ UTR elements can regulate protein expression rates by disrupting the binding [1, 4], movement [3, 5], and activity [2, 6] of transcription and translation initiation complexes. (B) G4 components were inserted into various positions in a BPCU in eGFP-reporter plasmids harboring a CMV-IE1 proximal promoter. CHO cells were transfected with (C) reporter vectors containing c-MYC (DNA) and Nras (RNA) G4 elements at different sites within the BPCU and (D) eGFP-reporters containing wild-type or mutated versions (see Supplementary Table S2) of c-MYC and Nras G4s in positions −1 and +6, respectively. eGFP expression was quantified 48 h post-transfection. Data are expressed as a percentage of the production exhibited by the control CMV-IE1 proximal promoter-BPCU (ΔG4) construct (dotted line). Values represent the mean + SD of three independent experiments (n = 3, each performed in triplicate). TSS—transcriptional start site.

Synthetic DNA and RNA G4 components enable precision tuning of recombinant protein expression levels. (A) Core G4 structure, highlighting four key input design parameters. (B,C) Synthetic G4 components (listed in Supplementary Table S1) with varying sequence feature compositions were inserted into positions −1 (DNA) and +6 (RNA), respectively, in the BPCU (see Fig. 1) downstream of a CMV-IE1 proximal promoter in eGFP-reporter vectors. (D) The loop sequence within G4 motif 13 ((G4N3)3G4) was varied at both the DNA and RNA levels. In panels (B–D), CHO cells were transfected with eGFP-reporter plasmids prior to protein quantification after 48 h. Data are expressed as a percentage of the production exhibited by the control CMV-IE1 proximal promoter-BPCU (ΔG4) construct (dotted line). Values represent the mean + SD of three independent experiments (n = 3, each performed in triplicate). (E) The library of synthetic G4 components encodes a wide range of recombinant protein expression set points; the gray-shaded area represents the generic insertional effects observed when mutated G4s are inserted into the BPCU.

Synthetic G4s function predictably when deployed in combinations, mRNA molecules, and stable expression systems. CD spectra of (A) DNA oligonucleotides containing varying G4 motifs (all G4 motif sequences listed in Supplementary Table S1) and (B) DNA.10REU in varying salt conditions. CHO cells were transiently transfected with (C) synthetic eGFP-encoding mRNA molecules harboring different RNA G4 components,and (D) eGFP reporter plasmids containing varying combinations of DNA and RNA G4 elements. eGFP levels were quantified 48 h post-transfection and compared to those driven by (C) the same G4s in a DNA plasmid context and (D) predicted values, assuming constituent G4 pairs act synergistically (see Fig. 2). (E) CHO cells were stably transfected with G4-reporter plasmids coexpressing a puromycin selection marker gene. Stable cell pools were created and subsequently exposed to varying chemical ligands for 48 h prior to eGFP quantification. (F) Correlation between the regulatory activity of 24 synthetic G4 components in transient and stable expression systems (stable data represent cell pools measured in exponential phase growth). In panels (C) and (D), data are expressed as a percentage of the production exhibited by the control CMV-IE1 proximal promoter-BPCU (ΔG4) construct (dotted line). Values represent the mean + SD of three independent experiments (n = 3, each performed in triplicate). In panel (E), data are expressed as a percentage of the production achieved in control media (Δligand, containing DMSO or water), normalized to the effect of ligand addition on G4 mutant (white bar). Values represent the mean + SD of three independent experiments (n = 3, each performed in duplicate). In panel (F), data are expressed as a percentage of the production achieved using the control BPCU construct, values represent a single cell pool, and linear regression was conducted excluding the single outlier identified by analysis of standardized residuals (red outline).

Synthetic G4 components predictably titrate recombinant protein expression levels in different cell and product contexts. (A) eGFP-reporter plasmids containing DNA or RNA G4-components with varying activities were transfected into CHO, HEK, and HepG2 cells, prior to protein quantification 48 h later. (B) CHO cells were transfected with reporter plasmids encoding eGFP or SEAP production under the control of varying-strength DNA or RNA G4 elements. Data are expressed as a percentage of the production exhibited by the control CMV-IE1 proximal promoter-BPCU (ΔG4) construct (dotted line). Values represent the mean + SD of three independent experiments (n = 3, each performed in triplicate).

Synthetic G4 toolboxes facilitate simplified optimization of polypeptide expression ratios for multichain protein products. (A) Vector design platform to optimize polypeptide expression ratios for multichain biopharmaceutical products. (B) CHO cells were co-transfected with separate plasmids (at an equimolar ratio) encoding mAbT HC) and LC expression under the control of varying-strength G4 components (sequences listed in Supplementary Table S1). Cells were incubated for 5 days, including addition of feed 48 h post-transfection, prior to quantification of mAb titer. Data are shown as a percentage of the production achieved using the unengineered BPCU (ΔG4) construct to drive expression of both HC and LC (dotted line). Values represent the mean + SD of three independent experiments (n = 3, each performed in triplicate). (C) Vector solution space identifying mAbT product-specific design rules.

Equations1

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} \begin{eqnarray*} {\rm RE}{{{\rm U}}_{{\rm combined}}} = \ \frac{{({\rm RE}{{{\rm U}}_{{\rm DNA}}}\ \times \ {\rm RE}{{{\rm U}}_{{\rm RNA}}})\ }}{{100}} \end{eqnarray*}\end{document}

Funding3

—AstraZeneca10.13039/100004325
—Engineering and Physical Sciences Research Council10.13039/501100000266
—University of Sheffield10.13039/501100000858

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRNA Interference and Gene Delivery · RNA and protein synthesis mechanisms · Bacteriophages and microbial interactions

Full text

Introduction

Recombinant protein expression in mammalian cells is underpinned by genetic technologies that control flux through the key steps in product biosynthesis, namely gene transcription, messenger RNA (mRNA) translation, and polypeptide translocation [1]. In most cases, these processes are still controlled by assemblies of viral and endogenous elements that are generally known to encode reasonably high biosynthetic rates, such as the CMV-IE1 promoter, human alpha-globin 5′ UTR, and SV40 3′UTR [2–4]. However, increasingly complex protein expression objectives have created a requirement for more sophisticated gene expression control technologies that can precisely tailor transcription and translation rates in molecule- and context-specific manners. These include (i) advanced mammalian cell factory engineering strategies, where multiple genes must be expressed at correct stoichiometric ratios to achieve designed phenotypes [5]; (ii) gene therapy, where target proteins need to be expressed at levels that provide therapeutic efficacy without causing off-target effects [6]; and (iii) protein biopharmaceutical manufacturing, where the expression levels of multiple polypeptide chains must be optimally balanced to facilitate correct protein folding and assembly [7]. Indeed, in the latter example, many promising product molecules are currently considered difficult or impossible to manufacture due to improperly assembled proteins inducing cellular stress responses [8, 9].

Predictable and precise control of recombinant protein expression levels requires design and validation of synthetic genetic component libraries. While synthetic signal peptides can be used to tailor translocation levels, their activity is highly polypeptide-specific, and this lack of orthogonality significantly restricts their routine use in vector design spaces owing to the associated in vitro screening burden [10]. Conversely, synthetic promoter and UTR elements have been developed to customize transcription and translation levels, but their performance is typically cell-type and cell-context specific, dependent on relative abundances of cognate nucleic-acid binding proteins [11, 12]. This necessitates design of cell-host specific genetic component libraries, which are only currently available for a handful of mammalian cells [13–15]. Accordingly, for most mammalian protein expression applications, including the vast majority of cell-type-specific gene therapies, it is currently intractable to design genetic construct assemblies that encode optimal biosynthetic rates, owing to a paucity of appropriate validated regulatory components.

Secondary structure-based control elements represent a promising and relatively unexplored means to achieve sophisticated mammalian protein expression control. Nucleic acid strands can fold into a variety of stable structures, including loops, hairpins, helices, and pseudoknots. When present in appropriate positions within promoters and UTRs, their formation can sterically hinder the binding and progression of transcriptional and translational apparatus, thereby facilitating biosynthetic rate tuning by selecting elements with varying inhibitory activities (e.g. varying thermodynamic stabilities) [16]. An elegant example of designing secondary structure components to tailor mammalian protein expression for bioindustrial applications is the study by Eisenhut et al., where 25 distinct RNA hairpins were created by varying GC content, minimum free energy, and positioning relative to the 5′ cap [17]. This library was used to vary translation rates over two orders of magnitude in common mammalian bioproduction cell hosts, exemplifying the utility of secondary structure elements to solve complex protein expression challenges. However, the available regulatory component toolbox remains relatively sparsely populated and is particularly lacking technologies that can (i) predictably tune transcription levels and, accordingly, (ii) enact two-level protein expression control by facilitating simultaneous titration of both transcription and translation rates.

G-quadruplexes (G4s) are four-stranded secondary structures that form from guanine-rich nucleic acid sequences, comprising multiple G-tetrads (a square planar arrangement of four guanine bases interconnected through Hoogsteen hydrogen bonding) [18]. Prevalent in both the human genome and transcriptome, they form in DNA and RNA molecules to regulate transcription and translation rates [19–21]. While some G4s have been shown to upregulate protein expression, e.g. via transcription factor recruitment [22, 23], when appropriately positioned within promoters and 5′ UTRs, they can act to reduce protein biosynthesis by disrupting the function of general transcriptional or translational machinery [24–26]. Although hundreds of discrete G4s have been experimentally validated, mainly those that regulate expression of oncogenes such as c-MYC and NRAS [27–29], these endogenous components cannot be easily applied to bioindustrial applications as their functionality has only been profiled within the context of natural human promoters and UTRs (e.g. [30–33]), which typically display unpredictable or undesirable expression dynamics [16, 34, 35]. Moreover, they are usually studied individually, where information on their relative activities is not generally available, preventing their use as amalgamated libraries where elements can be rationally selected and combined to achieve user-defined protein expression rates. Accordingly, harnessing the potential of G4s for cell-agnostic two-level precision protein expression control requires bottom-up creation of built-for-purpose synthetic libraries, comprising well-characterized elements that are designed and validated to function within standardized industrially relevant promoter-UTR contexts.

In this study, we develop the first synthetic G4 element library designed specifically for use in high-value bioindustrial applications. Comprising complementary DNA and RNA G4 components operating at optimized positions within a standardized genetic architecture, this technology enables precise two-level control of mammalian protein expression over two orders of magnitude. Validated to perform predictably in varying cell types and molecular formats (i.e. plasmid DNA, synthetic mRNA, multi-chain products), these synthetic G4 controllers could be utilized to predictably fine-tune recombinant protein expression levels in in vitro biomanufacturing and in vivo biomedical applications.

Materials and methods

Expression cassette and vector construction

The bioindustrial protein expression control unit (BPCU) was constructed by fusing a minimal 41 bp hCMV-IE1 core promoter (GAN M60321.1, nucleotides 1110–1150) to a 57 bp proprietary AstraZeneca 5′ UTR [36]. BPCU was inserted upstream of the gene of interest [containing an SV40 polyA 3′ UTR (GAN LT727517.1, nucleotides 1449–1676)] and downstream of the hCMV-IE1 proximal promoter (GAN M60321.1, nucleotides 517–1109) in reporter vectors encoding secreted alkaline phosphatase (SEAP), eGFP, mAbT HC, and mAbT LC expression [reporter vector de novo synthesized by GeneArt (Regensburg, Germany)]. G4 elements (listed in Supplementary Table S1) were inserted into BPCU by site-directed mutagenesis using the Q5® Site-Directed Mutagenesis Kit (New England Biolabs, USA). Plasmid templates for in vitro transcription (IVT) reactions were generated by cloning appropriate RNA G4-eGFP sequences into pRNA128A [37] using NheI and NotI (New England Biolabs, USA) restriction sites. Vectors for stable transfections were created by inserting appropriate DNA G4–eGFP sequences into pMCS–Gaussia Luc (Thermo Fisher Scientific, USA) using KpnI and NotI restriction sites. All constructed plasmids used in this study were validated via DNA sequencing and purified to transfection-grade quality using the QIAGEN Plasmid Plus Midi Kit (QIAGEN, USA).

Cell culture

CHO-K1 derived host cells (AstraZeneca, UK), hereby referred to as Chinese hamster ovary (CHO) cells, and CHO-S (Thermo Fisher Scientific, USA) cells were routinely cultivated in CD-CHO medium (Thermo Fisher Scientific, USA) supplemented with 6 mM L-glutamine and 8 mM L-glutamine, respectively. Expi293F cells (Thermo Fisher Scientific, USA), hereby referred to as HEK cells, were routinely cultivated in Freestyle™ 293 expression medium (Thermo Fisher Scientific, USA). Cultures were maintained at 37°C, 5% CO_2_ with 140 rpm orbital shaking. CHO cells were passaged every 3–4 days at a seeding density of 0.2 × 10^6^ cells/ml, CHO-S cells were passaged every 2–3 days at a seeding density of 0.2 × 10^6^ cells/ml, and HEK cells were passaged every 3–4 days at a seeding density of 0.3–0.4 × 10^6^ cells/ml. HepG2 cells (ATCC HB-8065) were cultured in RPMI 1640 medium (Thermo Fisher Scientific, USA) supplemented with 10% fetal bovine serum and maintained at 37°C under 5% CO_2_. Cell concentration and viability were measured with the VI-CELL BLU automated cell counter and cell viability analyzer (Beckman Coulter, USA).

Transient transfection

CHO cells were transiently transfected by electroporation using the Amaxa Nucleofector system in combination with the SG Cell Line 96-well Nucleofector® Kit (Lonza, Switzerland). 1.86 × 10^6^ cells per well were transfected with 750 ng plasmid DNA or 400 ng mRNA following manufacturer’s instructions. Transfected cells were transferred to a 24 shallow-well plate and incubated in a total volume of 750 μl for 48 h at 37°C, 5% CO_2_ at 240 rpm orbital shaking. IgG1 (mAbT) LC- and HC- encoding plasmids were transfected at ratios maintaining equal HC:LC gene copy numbers, and incubated for 5 days with a 10% (v/v) feed added 48 h post-transfection containing 1:1 (v/v) Efficient Feed A (Thermo Fisher Scientific, USA) and Efficient Feed B (Thermo Fisher Scientific, USA). HEK cells were transiently transfected with PEIpro® (Polyplus, France), whereby 1.5 × 10^6^ cells were transfected with 600 ng plasmid DNA mixed with 1.8 μl PEIpro® (1:3 ratio of DNA to PEI) following manufacturer’s instructions. Cells were incubated in a 24-shallow-well plate in a total volume of 750 μl for 48 h at 37°C, 5% CO_2_ at 240 rpm orbital shaking. HepG2 cells were transfected with Lipofectamine™ 3000 Transfection Reagent (Thermo Fisher Scientific, USA) according to manufacturer’s instructions in a 24-well plate scale and incubated for 48 h.

Stable transfection

CHO-S cells were stably transfected by electroporation using the Amaxa Nucleofector system in combination with the Cell Line Nucleofector® Kit V (Lonza, Switzerland). 5.0 × 10^6^ cells per cuvette were transfected with 2 μg plasmid and incubated in T75 flasks at 37°C under 5% CO_2_ in 13 ml CD-CHO medium supplemented with 8 mM L-glutamine. Forty-eight hours post-transfection, the medium was changed to CD-CHO selection medium including 8 mM L-glutamine, 10 μl/ml Puromycin, and Anti-Clumping agent (Thermo Fisher Scientific, USA) at a 1:250 (v/v) dilution. Once cultures reached >80%, they were transferred from static to shaking conditions and passaged every 3 days until they reached a viability of >95%, at which point they were cryopreserved.

Recombinant protein quantification

eGFP protein expression was quantified 48 h post-transfection using the SpectraMax iD5 Microplate Reader (Molecular Devices, USA) with 485 nm excitation and 535 nm emission. Transfection efficiency was determined using the Countess II FL Automated Cell Counter (Thermo Fisher Scientific, USA). SEAP expression was quantified 48 h post-transfection using the Secreted Alkaline Phosphatase Reporter Gene Assay Kit (Cayman Chemical, USA) according to manufacturer’s instructions. For IgG1 mAb protein titer quantification, the ELISA quantification kit for human immunoglobulins G (RD-Biotech, France) was used in accordance with the manufacturer’s protocol 5 days post-transfection. All assays were performed using the SpectraMax iD5 Microplate Reader. The synergistic effect of combining two G4 elements within a single construct was predicted using the following equation, where REU is the relative protein expression units (REUs) they encode.

\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{upgreek} \usepackage{mathrsfs} \setlength{\oddsidemargin}{-69pt} \begin{document} \begin{eqnarray*} {\rm RE}{{{\rm U}}_{{\rm combined}}} = \ \frac{{({\rm RE}{{{\rm U}}_{{\rm DNA}}}\ \times \ {\rm RE}{{{\rm U}}_{{\rm RNA}}})\ }}{{100}} \end{eqnarray*}\end{document}

Circular dichroism

DNA oligos (Integrated DNA Technologies, USA) were prepared at 5 μM in 50 mM Tris–HCl (pH 7.5) with 100 mM KCl. The samples were denatured at 95°C for 5 min and then cooleddown 0.2°C/min to 20°C. Circular dichroism (CD) experiments were performed at 25°C using a JASCO J-810 spectropolarimeter (JASCO, USA) in 1-mm quartz cuvettes. CD scans were taken from 320 to 200 nm at 100 nm/min. Melting curves were obtained by heating the samples from 20 to 100°C at a controlled rate of 2°C/min. Changes in ellipticity were monitored at the wavelength of the highest peak for each oligo every 0.5°C.

In vitro transcription

Plasmid templates for IVT were linearized with BsaXI and purified by ethanol precipitation. IVT reactions used the HiScribe® T7 High Yield RNA Synthesis Kit (New England Biolabs, USA) following the manufacturer’s instructions and 3′-O-Me-m^7^G(5′)ppp(5′)G RNA cap (New England Biolabs, USA) was used as the cap structure analog. Samples were digested with DNase I (Thermo Fisher Scientific, USA) for 15 min at 37°C and immediately purified using the Monarch® RNA Cleanup Kit (New England Biolabs, USA). RNA samples were diluted to the desired working concentration and stored at −80°C prior to use.

Chemical ligands

CHO-S cell pools were seeded at 1 × 10^6^ cells/ml in 24 shallow-well plates prior to ligand addition. Phen-DC3 (Selleck Chemicals, USA), Pyridostatin (Merck, Germany), TMPyP4 (Selleck Chemicals, USA), and 360A (ApexBio Technology, USA) were added to a final concentration of 25, 5, 25, and 20 μM, respectively (concentrations optimized to maintain cell viabilities >90%), and cells were maintained for 48 h at 37°C, 5% CO_2_ with 140 rpm orbital shaking. Phen-DC3 and 360A were dissolved in DMSO and Pyridostatin and TMPyP4 in water.

Statistical analysis

A one-way ANOVA was performed to evaluate statistical differences with statistical significance being defined as P ≤ .05 (* = P ≤ .05, ** = P ≤ .01, *** = P ≤ .001, **** = P ≤ .0001). Tukey’s post-hoc test was used to determine statistical differences to other groups and the Dunnett’s post-hoc test was used to determine statistical differences to a specific control. Statistical tests were performed using GraphPad Prism 10 software.

Results and discussion

Optimization of DNA and RNA G4 element positioning within a standardized genetic chassis

The human CMV-IE1 core promoter is the most widely utilized regulatory element for controlling mammalian recombinant gene transcription in high-value applications, including gene therapy and biopharmaceutical production, and has been shown to function in conjunction with varying proximal promoter partners [38–40]. While a range of human and mouse sequences are available for use as translational controllers, to facilitate deployment in biomedical settings, we combined the CMV-IE1 core with a minimized 5′ UTR that has been specifically developed and validated for use in bioproduction processes (AZ 5′ UTR). This 98 bp BPCU is an ideal genetic chassis to house regulatory G4s that are designed to control transcription and translation rates by interfering with the activity of general machinery components. Given that G4 function is position-dependent [29, 41], as illustrated in Fig. 1A, we first sought to identify the optimal locations to place DNA and RNA G4s within this sequence context. As shown in Fig. 1B, model DNA (c-MYC) and RNA (NRAS) G4s were each individually inserted into 7 and 6 distinct sites, respectively, within the BPCU in eGFP-reporter plasmids harboring a CMV-IE1 proximal promoter, selected to drive a high protein expression set-point that can be tuned down by G4 activity [40, 42, 43]. Positions were chosen whereby G4 formation should theoretically interfere with binding and movement of biosynthetic machinery, including near key regulatory binding sites such as the TATA box, INR, and Kozak sequence [44, 45]. DNA and RNA G4-encoding sequences were inserted into the template and coding strand, respectively, to facilitate formation of secondary structures at the DNA and RNA level (i.e. RNA G4s form in the 5′ UTR of eGFP-mRNA molecules).

Endogenous G4 elements negatively regulate recombinant protein expression rates in a position-dependent manner. (A) Schematic presentation of general position-dependent mechanisms by which G4s in core promoter and 5′ UTR elements can regulate protein expression rates by disrupting the binding [1, 4], movement [3, 5], and activity [2, 6] of transcription and translation initiation complexes. (B) G4 components were inserted into various positions in a BPCU in eGFP-reporter plasmids harboring a CMV-IE1 proximal promoter. CHO cells were transfected with (C) reporter vectors containing c-MYC (DNA) and Nras (RNA) G4 elements at different sites within the BPCU and (D) eGFP-reporters containing wild-type or mutated versions (see Supplementary Table S2) of c-MYC and Nras G4s in positions −1 and +6, respectively. eGFP expression was quantified 48 h post-transfection. Data are expressed as a percentage of the production exhibited by the control CMV-IE1 proximal promoter-BPCU (ΔG4) construct (dotted line). Values represent the mean + SD of three independent experiments (n = 3, each performed in triplicate). TSS—transcriptional start site.

G4-containing reporter plasmids were independently transfected into CHO cells, selected as a model mammalian cell line due to it being the dominant cell factory for biopharmaceutical production, and eGFP expression levels were quantified 48 h later. Transfected cells were analyzed using the Countess II FL Automated Cell Counter to confirm consistently high transfection efficiency (>90%) and normal distribution of fluorescence intensity across the population, enabling quantification of relative G4 activities by measuring average protein expression levels. The data show that G4s are functional within the context of the BPCU, where protein expression was substantially downregulated by G4s acting at both the DNA and RNA level, as compared to the unengineered control construct (i.e. BPCU without G4 insertion, Fig. 1C). Indeed, G4 insertion had a negative regulatory effect on eGFP expression in 12 of 13 tested sequence positions, only being non-functional in the DNA position that was furthest from the transcriptional start site. G4s exerted greatest inhibitory effect when located directly upstream of the transcriptional and translational start sites, in both cases reducing eGFP expression by ∼45% compared to the standard BPCU. These findings are consistent with computational analyses showing that endogenous G4s are preferentially located near transcription start sites [46]. Previous studies have found that positional effects on RNA G4 function are context-specific, depending on the 5′ UTR sequence [29, 41]. Ultimately, position-optimization studies should be conducted for any new genetic chassis given that G4 inhibitory activity may be impacted by both the relative location of other regulatory elements, and the nucleotide-composition of flanking sequences either side of the insertion site that can affect G4 formation kinetics [47, 48]. Within the context of the BPCU, we concluded that positions −1 and +6 were optimal locations for DNA and RNA elements, respectively, where inhibitory effects of ∼45% facilitate subsequent design of varying strength G4 components that can titrate protein expression levels both above and below this set-point.

To validate that decreases in eGFP expression were caused by secondary structure-based biosynthesis inhibition, we inserted mutated versions of the c-MYC and NRAS G4 elements into positions −1 and +6 in the BPCU. Mutants were designed to maintain high sequence similarity and equal length to originator elements while substantially decreasing their propensity to form G4 structures (predicted using QGSR mapper [49]—see Supplementary Table S2). As shown in Fig. 1D, mutated motifs reduced eGFP expression by 5%–20% compared to the standard BPCU. A generic insertional effect is unsurprising, as previous studies have shown that (i) mutated G4 motifs have a similar impact in other genetic contexts [50], and (ii) small changes to mammalian core promoters and 5′ UTRs, including the CMV-IE1 core, typically reduce protein expression levels [51]. However, given that wild-type G4 sequences had a substantially greater inhibitory effect than disabled versions containing a minimal number of G-to-A mutations [9.5× (P = < .0001), 2.4× (P = .0005), 2.3× (P = .001), and 2.0× (P = .003) compared to c-MYC mut1, c-MYC mut2, NRAS mut1, and NRAS mut2, respectively), we inferred that these elements were functioning via secondary structure-based interference of transcription and translation processes. Accordingly, we concluded that this genetic architecture (i.e. the BPCU with optimized DNA and RNA G4 insertion points) could be employed to create a synthetic G4 component library for two-level precision control of recombinant protein expression.

Synthetic G4 components facilitate predictable, tunable control of recombinant protein expression

G4s have a simple core structure, consisting of alternating runs of guanine tracts (G-tracts) and separating loops, with four possible key input design parameters, namely loop length, G-tract length, loop nucleotide composition, and G-tract number (Fig. 2A). Using a small number of exemplar elements, it has previously been shown that functional synthetic RNA G4s with different regulatory properties can be created by varying one or more of these design features [41]. Accordingly, we rationalized that a component library capable of precisely tailoring protein expression levels from 0% to 100% relative to the standard BPCU could be derived by designing DNA and RNA G4s with varying input parameter combinations. Such G4 variants should exhibit variable propensities to both form and maintain secondary structures within the standardized BPCU genetic chassis, facilitating development of regulatory elements with different inhibitory activities (i.e. varying abilities for steric hindrance-mediated downregulation of transcription and translation rates). Based on previous studies showing loop nucleotide composition to have a relatively minimal impact on G4 thermal stability [52], we adopted two distinct design strategies, namely (i) keeping G-tract number constant (N = 4) and varying G-tract length (N = 2–6) and loop length (N = 1–5), and (ii) keeping loop length constant (N = 2) and varying G-tract length (N = 2–6) and number (N = 4–7). In both cases we applied a full-factorial design and tested each element at the DNA and RNA level, resulting in a library of 80 distinct G4 components containing loop sequences that maintained equal A:T:C nucleotide ratios within each construct (all synthetic G4 sequences are provided in Supplementary Table S1).

Synthetic DNA and RNA G4 components enable precision tuning of recombinant protein expression levels. (A) Core G4 structure, highlighting four key input design parameters. (B,C) Synthetic G4 components (listed in Supplementary Table S1) with varying sequence feature compositions were inserted into positions −1 (DNA) and +6 (RNA), respectively, in the BPCU (see Fig. 1) downstream of a CMV-IE1 proximal promoter in eGFP-reporter vectors. (D) The loop sequence within G4 motif 13 ((G4N3)3G4) was varied at both the DNA and RNA levels. In panels (B–D), CHO cells were transfected with eGFP-reporter plasmids prior to protein quantification after 48 h. Data are expressed as a percentage of the production exhibited by the control CMV-IE1 proximal promoter-BPCU (ΔG4) construct (dotted line). Values represent the mean + SD of three independent experiments (n = 3, each performed in triplicate). (E) The library of synthetic G4 components encodes a wide range of recombinant protein expression set points; the gray-shaded area represents the generic insertional effects observed when mutated G4s are inserted into the BPCU.

Synthetic G4s were individually inserted in the identified optimal positions in the BPCU (DNA elements in position −1, RNA elements in position +6, Fig. 1) downstream of a CMV-IE1 proximal promoter in eGFP-reporter vectors and transfected into CHO cells. Quantification of eGFP expression levels 48 h post-transfection showed that engineered G4s displayed a wide range of inhibitory activities, driving expression levels between 11%–107% and 4%–90% at the DNA and RNA levels, respectively, compared to the standard BPCU control (Fig. 2B and C). We note that four new mutant variants of varying sizes exhibited the same generic insertional effect as seen previously, reducing eGFP expression by 5%–20% (data not shown). Accordingly, we concluded that only those components that reduced expression by >20%, 57 synthetic G4s in total, were exerting secondary structure-based control of protein biosynthesis. The inhibitory activity of G4 sequences was highly correlated at the DNA and RNA levels (Pearson’s correlation coefficient, r = 0.85, P = < .0001, Supplementary Fig. S1), identifying that discrete designs behave similarly when utilized as transcriptional or translational controllers. As depicted in Fig. 2B and C, G4 performance could be predictably tailored by varying any of the selected input parameters, where inhibitory activities were particularly correlated with increasing G-tract length and number. Indeed, multiple linear regression modeling confirmed that G4 inhibitory activity could be accurately explained as a function of these three features (DNA elements model: R^2^ = 0.85 P = < .0001; RNA elements model: R^2^ = 0.88 P = < .0001). These findings correlate with previous oligonucleotide-based studies that have shown G-tract length, G-tract number, and loop length to be critical determinants of G4 thermal stabilities [41, 52–54]. To our knowledge, this represents the first model linking synthetic G4 sequence feature composition to protein expression regulatory function. We reasoned that the ability to accurately explain the performance of these synthetic components as a function of their sequence parameters will enhance both the confidence with which they can be applied to new applications (i.e. as unknown features are unlikely to cause unpredictable changes to activities) and the ability to (re)engineer future elements with defined characteristics. While the model can accurately predict the regulatory activity of our synthetic G4s as a function of three input design parameters (model summary data shown in Supplementary Fig. S2), full understanding of their performance in cellular environments would require molecular dynamic simulations. In particular, to deploy them predictably in vivo, it would be useful to determine how their ability to form machinery-impeding secondary structures (i.e. the balance of factors including stacking interactions, hydrogen bonding, etc.) is affected by changes in intracellular environment parameters such as ion concentrations [55].

We predicted that loop nucleotide content would have a negligible impact on G4 performance [52] and accordingly that these “non-functional sections” may provide opportunities to achieve other design objectives such as enhanced sequence efficiency (e.g. by incorporating other regulatory motifs), tailored immunostimulatory effects [56, 57], and optimized GC-content [58]. To evaluate this, we varied loop sequence within a standard G4 architecture (motif 13 - (G_4_N_3_)3_G_4), where all 27 possible 3-nucleotide permutations (using A, T, and C nucleotides only) were individually tested (e.g. ATC: GGGATCGGGATCGGGATCGGG). Quantitative evaluation in CHO cells confirmed that loop composition generally has minimal impact on component inhibitory activity, where 17 of 27 DNA and 23 of 27 RNA elements drove eGFP expression levels within one standard deviation of the mean (Fig. 2D). Notably, the three motifs that had the greatest impact on eGFP levels, RNA loops of AAU, UAU, and CAU, all introduced alternative start codons, highlighting the requirement for multi-parameter/objective optimization approaches when designing new synthetic nucleotide parts [59]. Discounting these three sequences (and any components that reduced eGFP expression by <20% relative to the standard BPCU) based on the assumption that they encode undesirable variation in translational start sites, our final library comprises 101 distinct regulatory elements (by far the largest library of synthetic G4 controllers described to date) that facilitate predictable two-level precision control of recombinant protein expression over two orders of magnitude within a standardized, industrially relevant genetic chassis (Fig. 2E).

Mechanistic dissection of synthetic G4 component functionality and performance

To facilitate their adoption in bioindustrial applications, we next sought to obtain detailed mechanistic information on the function and performance of our synthetic G4 components. First, we validated their ability to form secondary structures by CD spectroscopy, a standard technique for G4 structure analysis [60]. DNA G4 components with varying inhibitory activities (see Fig. 2) were analyzed to reveal their structural conformations, namely DNA13, DNA15, DNA30, and DNA40. Henceforth, all synthetic G4 components will be named according to the nucleic acid level at which they operate and the REUs they encode, e.g. DNA.70REU (DNA13), DNA.50REU (DNA15), DNA.30REU (DNA30), and DNA.10REU (DNA40; see Supplementary Table S1 for all G4 nucleic acid sequences and names). As shown in Fig. 3A, all four components produced classical distinctive G4 CD spectra [61], where the mutated control sequence did not show characteristic G4 peaks and troughs. We note that G4s with the highest inhibitory activities (DNA.10REU and DNA.30REU) formed parallel structures (positive peak at 260 nm, negative peak at 240 nm), while medium (DNA.50REU) and low (DNA.70REU) strength components adopted hybrid and anti-parallel (positive peaks at 295 and 240 nm, negative peak at 260 nm) conformations, respectively [62]; the mutated sequence displayed the spectrum typical of unstructured single-stranded DNA [63]. To further assess the stability of the G4 with the strongest inhibitory activity (DNA.10REU), we profiled its conformation in an extended range of salt conditions (Fig. 3B) and performed a melting temperature analysis (Supplementary Fig. S3), which together confirmed this element forms a highly stable, consistent secondary structure. Based on these data and our previous finding that regulatory activities were correlated with sequence feature compositions, we concluded that synthetic components’ relative performances were underpinned by their varying propensities to form and maintain G4 structures. Although we note that some in vivo applications, such as gene therapy, may require a more detailed analysis of intracellular secondary structure formation dynamics using G4-specific fluorescent probes [64] or antibodies [65].

Synthetic G4s function predictably when deployed in combinations, mRNA molecules, and stable expression systems. CD spectra of (A) DNA oligonucleotides containing varying G4 motifs (all G4 motif sequences listed in Supplementary Table S1) and (B) DNA.10REU in varying salt conditions. CHO cells were transiently transfected with (C) synthetic eGFP-encoding mRNA molecules harboring different RNA G4 components,and (D) eGFP reporter plasmids containing varying combinations of DNA and RNA G4 elements. eGFP levels were quantified 48 h post-transfection and compared to those driven by (C) the same G4s in a DNA plasmid context and (D) predicted values, assuming constituent G4 pairs act synergistically (see Fig. 2). (E) CHO cells were stably transfected with G4-reporter plasmids coexpressing a puromycin selection marker gene. Stable cell pools were created and subsequently exposed to varying chemical ligands for 48 h prior to eGFP quantification. (F) Correlation between the regulatory activity of 24 synthetic G4 components in transient and stable expression systems (stable data represent cell pools measured in exponential phase growth). In panels (C) and (D), data are expressed as a percentage of the production exhibited by the control CMV-IE1 proximal promoter-BPCU (ΔG4) construct (dotted line). Values represent the mean + SD of three independent experiments (n = 3, each performed in triplicate). In panel (E), data are expressed as a percentage of the production achieved in control media (Δligand, containing DMSO or water), normalized to the effect of ligand addition on G4 mutant (white bar). Values represent the mean + SD of three independent experiments (n = 3, each performed in duplicate). In panel (F), data are expressed as a percentage of the production achieved using the control BPCU construct, values represent a single cell pool, and linear regression was conducted excluding the single outlier identified by analysis of standardized residuals (red outline).

Given the dramatic rise in applications utilizing synthetic mRNA, including protein replacement therapy, cancer immunotherapy, and vaccines against infectious diseases [4, 66], we next sought to validate that our RNA components display predictable performance when employed directly in mRNA molecular contexts. eGFP-mRNA molecules harboring RNA elements of varying inhibitory strengths were produced by IVT reactions, where purified transcripts were identical to those created intracellularly from DNA reporter plasmids (i.e. comprising the AZ 5′ UTR containing a single G4 element in position +6, upstream of the eGFP CDS; see Fig. 1). Synthetic mRNAs were transfected into CHO cells prior to quantification of eGFP protein levels after 48 h. These data show that synthetic element performance in DNA vector and mRNA transcript contexts is highly correlated (Pearson’s correlation coefficient, r = 0.99, P = .008; Supplementary Fig. S4), where RNA.90REU, RNA.70REU, RNA.50REU, and RNA35.REU drove eGFP expression levels of 75%, 55%, 35%, and 25%, respectively, relative to the unengineered control (Fig. 3C). Accordingly, synthetic RNA G4 components from the library can be utilized predictably to precisely control recombinant protein production in both plasmid DNA (e.g. biopharmaceutical production) and synthetic mRNA (e.g. cancer vaccines) molecular formats. We note that we were unable to synthesize mRNAs containing element RNA.5REU, presumably due to secondary structure-based effects, and therefore use of very strong G4s (i.e. REUs < 20) may require optimization of mRNA manufacturing processes.

Synthetic mammalian transcriptional and translational control elements are typically developed in isolation, where a lack of standardization, such as the use of varying genetic architectures and test systems, complicates efforts to use them in combination [67, 68]. G4-based regulatory controllers operating within a standardized BPCU should permit predictable, simultaneous titration of transcription and translation rates via selection of appropriate DNA and RNA parts. To test this, we created constructs containing all possible combinations of exemplar high, medium, and low strength DNA (DNA.20REU, DNA.45REU, and DNA.75REU) and RNA (RNA.25REU, RNA.50REU, and RNA.80REU) components. As shown in Fig. 3D, dual-controllers drove eGFP expression levels in CHO cells that were highly correlated with predicted values (r = 0.96, P = < .0001; Supplementary Fig. S5), assuming synergistic effects of constituent elements (see “Materials and methods” section). Indeed, all observed activities were within ±12 REU of expected outputs, indicating that composite parts functioned as expected to reduce the rate of both recombinant gene transcription and recombinant mRNA translation. We therefore concluded that DNA and RNA elements from the library can be predictably combined to achieve user-defined, stringent two-level control of recombinant protein expression, which may be particularly advantageous for applications that necessitate product biosynthesis rates to be maintained within defined limits, such as manufacture of toxic molecules.

We next evaluated the potential to convert our G4-based controllers into inducible expression systems. Many endogenous G4 sequences interact with chemical ligands that can destabilize or stabilize their structure to increase (ON systems) or decrease (OFF systems) protein expression levels [69, 70]. Such genetic switches facilitate sophisticated control strategies, including the ability to enact data- and process-responsive titration of system outputs [71, 72]. We tested four well-known G4 ligands (Phen-DC3, Pyridostatin, TMPyP4, and 360A [70, 73]) in combination with seven distinct DNA elements that were selected to (i) have varying feature compositions, based on findings that ligands may have specific binding preferences [74, 75], and (ii) exhibit medium to low strength (30–70 REU), to permit observation of up- and down-regulatory effects. These 28 expression systems were evaluated in an industrially relevant context, namely stable protein production in CHO cells, where there is a design objective to switch expression “ON” once maximum cell densities are reached in order to decouple growth and production phases during biomanufacturing processes [76]. As shown in Fig. 3E, eGFP expression levels driven by synthetic G4 components (DNA.30REU, DNA.50REU, DNA.50REU.2, DNA.70REU, DNA.70REU.2 DNA.70REU.3, and DNA.70REU.4) were not significantly affected by any of the tested ligands. Further tests demonstrated that RNA G4 component activity was similarly unchanged by addition of these chemicals (data not shown). Failure to create functional ON and OFF switches may be due to ligand’s inabilities to interact with unnatural synthetic G4 sequences. Alternatively, maximum useable chemical concentrations (not reducing cell viability by >10%) may have been insufficient to significantly regulate the artificially large number of G4 motifs present per cell, where stably transfected CHO cells typically contain a high number of integrated gene copies [77, 78]. These findings, coupled with broad-acting ligands potentially causing off-target effects on host cell function by binding to multiple endogenous structures, indicate that development of inducible systems for bioindustrial contexts will require design of bespoke, highly specific synthetic G4 modulators [75, 79]. Such genetic switches would also have potential use in gene therapy by facilitating dynamic titration of therapeutic protein levels in response to patient data profiles.

Finally, to evaluate the ability to predictably deploy synthetic G4 components in the industrially-relevant context of stable producer cell lines, we compared their relative activities in transient and stable expression systems. Twenty-four elements, including 12 DNA and 12 RNA components of varying strength, were used to create stable eGFP-producing pools by selecting cells with genomically integrated expression cassettes using puromycin. As shown in Fig. 3F, observed activities in a stable expression context were highly correlated with those seen in transient systems (Pearson’s correlation coefficient, r = 0.72; r = 0.86 with single outlier removed). We concluded that G4 elements behave similarly when positioned in episomal and genomic DNA settings, as previously observed [80], enabling their use in stable biopharmaceutical production systems. However, further work will be required to confirm compatibility with individual industrial platforms, given the wide variety of genome integration and clonal selection procedures employed [81].

Validating synthetic G4 component performance for bioindustrial applications

As G4 controllers function via interference with the general cellular biosynthetic machinery, rather than site-specific regulators such as transcription factors, they should exhibit modular performances where activities are not affected by the chosen mammalian cell host. Indeed, the ability to predictably deploy these components in new bioindustrial cell contexts is a potential key advantage of this technology, particularly for in vivo therapeutic applications, which may target any one of hundreds of distinct human cell types [82, 83]. To test this, we profiled the performance of our G4 elements in two additional bioindustrially-relevant cellular environments, namely (i) human embryonic kidney (HEK) cells, a commonly utilized biomanufacturing host for recombinant protein and viral vector production [84], and (ii) HepG2 cells, a model cell line representing a common target tissue for gene therapy [85, 86].

Reporter plasmids harboring DNA and RNA G4 components of varying activities were individually transfected into each cell type and eGFP expression was quantified 48 h later. As illustrated in Fig. 4A, DNA and RNA components performed as expected in both new cell contexts, each driving five distinct predictable levels of protein production ranging from very low to high. Indeed, previously observed activities in CHO cells were highly correlated with those seen in HEK (Pearson’s correlation coefficient, r = 0.93, P = < .0001; Supplementary Fig. S6A) and HepG2 (r = 0.97, P = < .0001; Supplementary Fig. S6B). We note that although discrete encoded protein production levels were maintained, construct inhibitory activities were generally slightly weaker in these cell types, as compared to CHO cells, resulting in higher mean expression outputs in both HEK (+7 REU) and HepG2 (+17 REU). For example, RNA components RNA.65REU, RNA.40REU, RNA.15REU, and RNA.5REU drove expression levels of 84%, 57%, 42%, and 16% in HEK and 91%, 57%, 31%, and 15% in HepG2, relative to the unengineered control. This aligns with previous studies showing relative G4 regulatory activities are maintained between cell lines, but overall suppressive effects are somewhat different [41], potentially due to varying ion compositions [87, 88]. While G4s were able to predictably tailor protein expression in both cell types, the maximum inhibitory activity observed was lower in HepG2 (15% relative to control) and HEK (11%) than CHO (5%), potentially due to varying transcriptional/translational machinery compositions [89], indicating that stronger G4s may need to be employed in certain cellular contexts. Nevertheless, our findings confirm that these synthetic G4s can be predictably utilized as “off-the-shelf” components to optimally titrate recombinant protein expression levels in new cell types of interest.

Synthetic G4 components predictably titrate recombinant protein expression levels in different cell and product contexts. (A) eGFP-reporter plasmids containing DNA or RNA G4-components with varying activities were transfected into CHO, HEK, and HepG2 cells, prior to protein quantification 48 h later. (B) CHO cells were transfected with reporter plasmids encoding eGFP or SEAP production under the control of varying-strength DNA or RNA G4 elements. Data are expressed as a percentage of the production exhibited by the control CMV-IE1 proximal promoter-BPCU (ΔG4) construct (dotted line). Values represent the mean + SD of three independent experiments (n = 3, each performed in triplicate).

We next evaluated the ability to utilize synthetic G4 constructs to routinely tailor expression of new target recombinant proteins, using SEAP as an exemplar secreted, glycosylated molecule. Reporter vectors encoding SEAP expression driven by G4 controllers of varying strength were transfected into CHO cells prior to product quantification after 48 h. SEAP outputs from G4 elements were highly correlated with expected values (r = 0.91, P = .0002; Supplementary Fig. S7), although product expression levels relative to the standard BPCU were higher than those observed for eGFP (Fig. 4B). The latter observation is likely due to product-specific biosynthetic processing kinetics (i.e. impact of post-translational processing steps), where relative mature protein levels do not directly correlate with relative mRNA transcript and nascent polypeptide quantities [90–92]. Accordingly, where very low protein levels are required (i.e. <10% of that driven by the standard expression cassette lacking a G4 element), we recommend creating vector-optimization spaces that include dual-controllers encoding minimal expression outputs (e.g. DNA.10REU and RNA.5REU utilized in combination, see Fig. 3D). However, given the observed overall increase in SEAP expression levels relative to those achieved with eGFP, data using a wider range of sequence-diverse proteins will likely be required to finalize panels of G4-containing vectors that can predictably optimize product transcription/translation rates. Nevertheless, the selected G4 component mini library performed as designed by facilitating six discrete expression set points for a new protein of interest (i.e. 20, 30, 50, 70, 85, and 100 REUs), validating their product-agnostic functionality.

Finally, we incorporated our G4 controller technology into a current high-value bioindustrial process, namely cell line development for protein biopharmaceutical production, which involves derivation of optimized CHO cell host DNA expression vector combinations. The dominant class of biopharmaceuticals, monoclonal antibodies (mAb), consists of separate heavy chain (HC) and light chain (LC) polypeptides, where it is well known that the optimal HC:LC expression ratio is highly product specific, ranging from 1:1 to 1:10 depending on antibody folding and assembly dynamics [93–95]. We utilized four DNA G4 components of varying strength, DNA.REU70, DNA.REU50, DNA.REU30, and DNA.REU10, alongside the unengineered BPCU construct (i.e. 100 REU) to independently control LC and HC expression for an exemplar AZ mAb (mAbT; Fig. 5A). All possible component combinations were tested in transient fed-batch production processes, where the 25 unique encoded HC:LC expression ratios facilitated a wide range of mAbT titers (Fig. 5B). As illustrated in Fig. 5C, rapidly testing a standardized vector solution space revealed that (i) an encoded HC:LC expression ratio of ∼1:2 enabled highest product titer, (ii) mAbT yields were positively correlated with increasing LC quantities, and (iii) HC levels needed to be precisely controlled to achieve an optimal set-point, presumably due to excess intracellular HC causing adverse effects on cell factory performance [93, 96]. While polypeptide-specific processing dynamics will likely result in variable absolute expression set points for each G4 component pair when combined with new test mAbs, our previous data (see Fig. 4) indicate that this toolbox approach will routinely encode a wide range of predictable, relevant HC:LC ratios to facilitate expression cassette optimization. We therefore concluded that a G4 element-based screening platform could derive improved expression vector designs for new proteins and could also be utilized as a diagnostic tool to interrogate the cell’s relative ability to synthesize a given polypeptide, guiding product-specific cell engineering solutions [97–99]. We anticipate that this technology will be particularly useful for the development of difficult-to-express proteins, where suboptimal polypeptide biosynthesis kinetics dramatically restricts product yield and quality, and next-generation molecules that require expression of ≥3 composite chains to be precisely stoichiometrically balanced [100–102]. We note that titer optimization is not the only objective for biopharmaceutical expression vector design, and future work will focus on evaluating the ability of this platform to enhance product quality characteristics, such as aggregation and glycosylation profiles, in industrially relevant scaled-up manufacturing processes [103].

Synthetic G4 toolboxes facilitate simplified optimization of polypeptide expression ratios for multichain protein products. (A) Vector design platform to optimize polypeptide expression ratios for multichain biopharmaceutical products. (B) CHO cells were co-transfected with separate plasmids (at an equimolar ratio) encoding mAbT HC) and LC expression under the control of varying-strength G4 components (sequences listed in Supplementary Table S1). Cells were incubated for 5 days, including addition of feed 48 h post-transfection, prior to quantification of mAb titer. Data are shown as a percentage of the production achieved using the unengineered BPCU (ΔG4) construct to drive expression of both HC and LC (dotted line). Values represent the mean + SD of three independent experiments (n = 3, each performed in triplicate). (C) Vector solution space identifying mAbT product-specific design rules.

Conclusion

The synthetic G4 components presented in this study can be used predictably to precisely tune mammalian recombinant protein expression levels. Specifically created to function within a bioindustry-compatible genetic chassis, these elements can provide simple off-the-shelf solutions to current protein production challenges. Shown to reliably encode a wide range of discrete protein biosynthesis set points in both CHO and HEK cells, they can be harnessed to (i) optimize polypeptide expression ratios for difficult-to-express multi-chain molecules, such as trispecific antibodies, and adeno-associated viral vectors, and (ii) enact complex multigene cell factory engineering strategies [5, 102, 104]. We have validated a toolbox approach to achieve this, employing G4s with varying regulatory activities; however, given polypeptide-specific processing dynamics, the element composition of vector design platforms may need to be refined dependent on individual applications or product portfolios. Having demonstrated the technology’s performance predominantly for transient production contexts [105], we note that implementation in stable expression systems will require more comprehensive validation in relevant industrial manufacturing platforms (e.g. using standard genome integration methodologies, clonal selection procedures, etc [81].

Demonstrated to titrate protein expression rates predictably in varying molecular formats (i.e. mRNA and DNA) and mammalian cell types, synthetic G4s could be deployed in gene therapy applications to coordinate therapeutic efficacy with product safety by driving optimized expression levels that achieve intended medical outcomes without causing off-target effects on cell behavior [106]. Moreover, for the growing number of products that target multigenic disorders, they could be used to precisely balance the intracellular levels of multiple therapeutic proteins in order to achieve required, physiologically relevant expression ratios [107]. Given that in in vivo applications, genetic regulatory elements become “part of the product” introduced into patients, their successful incorporation into such therapeutics will require extensive testing in animal models. It will also be beneficial to characterize the long-term interaction between G4 elements and target host cells to assess potential off-target effects. While the components were designed to function via steric hindrance-based mechanisms, ChIP-seq studies would be useful to quantitatively profile their ability to bind to and potentially sequester host cell proteins [108]. Transcriptomic and proteomic analyses would confirm if impeding movement of key transcriptional and translational machinery leads to unintended changes in global gene expression patterns, and comet assays could be employed to assess whether secondary structures induce local DNA damage, e.g. by causing breakage of DNA strands [109, 110]. Depending on the intended application, G4-host cell relationships may need to be studied in extended animal studies or scaled-up bioreactor processes. Finally, to deploy this technology predictably across the wide range of different potential host cells, it would be beneficial to fully characterize how synthetic G4 structures are affected by varying cellular parameters, e.g. by conducting CD analysis in a wide range of salt conditions [111] and performing molecular dynamic simulations [112].

As they enable simultaneous precision control of both transcription and translation rates, G4 components can also be utilized in combination with other gene expression modulation technologies, including cell-type-specific promoters [11] and exogenous additives that act as ON or OFF switches [113]. Indeed, the ability to enact two-level protein expression control with a single motif (i.e. using the same G4 sequence in the core promoter and 5′ UTR) provides an ideal basis for constructing non-leaky inducible expression systems that can coordinate product biosynthesis kinetics with bioreactor processes, cell line development workflows, and patient requirements [76, 114, 115]. However, this will require the design of cognate regulatory-compliant chemical ligands that exhibit highly G4-specific stabilization or destabilization activities without inducing off-target effects that disrupt other required host cell functionalities [116]. Indeed, the absence of such ligands is a primary obstacle currently limiting the full potential of this technology for in vivo applications. Identifying optimal chemicals is challenging given the complex design criteria required, including the need to (i) develop/identify manufacturing processes that can produce the ligand at requisite quality, and (ii) comprehensively determine their activity and safety profiles in animal models. However, the initial obstacle of designing bespoke highly-specific ligands will be aided by recent progress in this area, including CRISPR-guided techniques [117] and methods that conjugate chemicals with oligonucleotides recognizing sequences adjacent to target G4s [79]. In conclusion, the synthetic genetic elements introduced in this study provide a new toolkit to precisely tailor mammalian recombinant protein expression in high-value bioindustrial applications, where future work will focus on the development of next-generation small molecule-inducible ON and OFF switches.

Supplementary Material

gkaf732_Supplemental_File

Bibliography117

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Brown AJ, Gibson SJ, Hatton D et al. Whole synthetic pathway engineering of recombinant protein production. Nat Cell Biol. 2019; 116:1212–24.10.1002/bit.26855.30345502 · doi ↗ · pubmed ↗
2Wang TY, Guo X Expression vector cassette engineering for recombinant therapeutic production in mammalian cell systems. Appl Microbiol Biotechnol. 2020; 104:5673–88.10.1007/s 00253-020-10640-w.32372203 · doi ↗ · pubmed ↗
3Poletti V, Mavilio F Designing lentiviral vectors for gene therapy of genetic diseases. Viruses. 2021; 13:152610.3390/v 13081526.34452394 PMC 8402868 · doi ↗ · pubmed ↗
4Qin S, Tang X, Chen Y et al. m RNA-based therapeutics: powerful and versatile tools to combat diseases. Signal Transduct Target Ther. 2022; 7:166.35597779 10.1038/s 41392-022-01007-w PMC 9123296 · doi ↗ · pubmed ↗
5Eisenhut P, Marx N, Borsi G et al. Manipulating gene expression levels in mammalian cell factories: an outline of synthetic molecular toolboxes to achieve multiplexed control. N Biotechnol. 2024; 79:1–19.10.1016/j.nbt.2023.11.003.38040288 · doi ↗ · pubmed ↗
6Cring MR, Sheffield VC Gene therapy and gene correction: targets, progress, and challenges for treating human diseases. Gene Ther. 2022; 29:3–12.10.1038/s 41434-020-00197-8.33037407 · doi ↗ · pubmed ↗
7Sou SN, Harris CL, Williams R et al. CHO synthetic promoters improve expression and product quality of biotherapeutic proteins. Biotechnol Prog. 2023; 39:e 334810.1002/btpr.3348.37114854 · doi ↗ · pubmed ↗
8Li ZM, Fan ZL, Wang XY et al. Factors affecting the expression of recombinant protein and improvement strategies in Chinese hamster ovary cells. Front Bioeng Biotechnol. 2022; 10:880155.35860329 10.3389/fbioe.2022.880155 PMC 9289362 · doi ↗ · pubmed ↗