The Coli Toolkit (CTK): An Extension of the Modular Yeast Toolkit for Use in E. coli
Jacob Mejlsted, Erik Kubaczka, Sebastian Wirth, Heinz Koeppl

TL;DR
The Coli Toolkit (CTK) is a new cloning system for E. coli that improves the speed and efficiency of building genetic circuits.
Contribution
CTK adapts the Yeast Toolkit for E. coli with modular parts and calibrated models for rapid circuit assembly.
Findings
CTK introduces subdivided promoter parts for better control of transcription and translation.
Characterization data and calibrated models are provided for 20 NOT gates from the Cello library.
Preassembled parts enable faster construction of complex genetic circuits in E. coli.
Abstract
Genetic circuits are a cornerstone of synthetic biology, enabling programmable control of cellular behavior for applications in health, sustainability, and biotechnology. While Genetic Design Automation (GDA) tools have optimized and streamlined the design of such circuits, rapid and efficient assembly of DNA remains a bottleneck in the Design-Build-Test-Learn (DBTL) cycle. Here, we present the Coli Toolkit (CTK), a modular Golden Gate-based cloning system, adapted from the Yeast Toolkit (YTK) for use in Escherichia coli. The CTK expands on the original YTK architecture by introducing a more flexible control of transcription and translation through subdividing the former promoter part into subparts: promoter, insulating ribozyme, and ribosome binding site (RBS). We provide a range of basic parts that enable the assembly of a wide range of constructs as well as characterization data for…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
1
2
3
4
5
6- —European Research Council10.13039/501100000781
- —Deutscher Akademischer Austauschdienst10.13039/501100001655
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRNA and protein synthesis mechanisms · Bacterial Genetics and Biotechnology · Gene Regulatory Network Analysis
Introduction
Synthetic biology aims to apply engineering principles and frameworks to living materials such as organisms and cells. One aspect of this is genetic circuits, which have great potential to address many of the challenges that humankind faces, ranging from human health through diagnostics and treatments, to sustainability through adaptability and responsiveness. ?,? Within the design-build-test-learn (DBTL) cycle, Genetic Design Automation (GDA) tools like Cello ?,? and ARCTIC ?−? ? have revolutionized the design stage through the determination of optimal circuit configurations based on experimental characterization data. However, the limiting step in the DBTL cycle has moved to the build stage, where the DNA encoding these circuits needs to be assembled. Standardized toolkits can enable fast, standardized, and reproducible assembly of complex functions from basic parts.? These beneficial aspects have already been shown for other subfields within synthetic biology, like metabolic engineering, but are still lacking for genetic circuits.?
The Gram-negative bacterium Escherichia coli is one of the most used organisms within synthetic biology, both for basic research and for various applications. It is therefore natural that multiple cloning standards already exist for it, including MoClo,? CIDAR MoClo,? GoldenBraid, ?−? ? and EcoFlex,? among others. However, to achieve predictable circuit function, research has shown that insulating ribozymes between the promoter and RBS is essential.? The aforementioned cloning standards for E. coli do not have the ability to include an insulating ribozyme between promoter and RBS without drastically reshaping the architecture and thereby removing backward-compatibility. An alternative is the highly efficient and well-characterized MoClo-Yeast Toolkit (YTK).? As the name states, this was originally developed for the yeast Saccharomyces cerevisiae, but has since been expanded by various research groups to include functions such as multiplex integration (MYT),? GPCR sensors,? optogenetics (yOTK),? polycistronic expression,? and secretion and display.? Additionally, the YTK has also been the basis for toolkits for other organisms such as for fission yeast, Schizosaccharomyces pombe (POMBOX),? Kluyveromyces marxianus,? Pichia pastoris (now Kromatogella phaffi),? Candida glabrata,? mammalian cells (MTK),? and in the bee gut microbiome (BTK).?
The YTK is a hierarchical cloning system in the style of MoClo. It starts with basic plasmids (level 0) that each carry a part with an abstract biological function, such as “promoter” or “CDS”. Each basic plasmid has a part type, numbered 1–8, with optional subtypes that determine their order. What determines the part type is the 4-base pair overhang that gets produced when the part is cut with a BsaI restriction enzyme. For example, if a part has AACG as the overhang on the 5′ end, and TATG on the 3′ end, this will be a type 2 part. The basic parts can be assembled into cassette plasmids (level 1) that typically contain one transcriptional unit. All eight types have to be present in order to assemble a functional cassette plasmid. However, to reduce the amount of separate DNA fragments cloned together in one reaction, combination parts can be used. For example, most cloning backbones are Type 678, which means that they have the type 6 overhang at the 5′ end and type 8 overhang at the 3′ end. The level 1 cassette plasmids can subsequently be combined to make multigene plasmids (level 2). The order of the different transcriptional units in the multigene plasmid is determined by the assembly connectors (type 1 and type 5) from the previous level. Up to 10 transcriptional units can be assembled together using the existing assembly connector parts. ?,?
The main advantages of the YTK over many other MoClo assembly standards for bacteria are 3-fold: First, the cloning overhangs in the YTK are very orthogonal to each other ?,? (Supplementary Figure S1). This enables the assembly of more fragments in one reaction, which, in turn, can increase the customizability without compromising the cloning efficiency. Second, the YTK contains not only well-characterized parts but also robust support for cloning infrastructure. Connectors, origins of replication, and resistance markers exist both as individual parts, but also as combination parts to ease assemblies. ?,? Third, the ability to have both split parts (like types 3a and 3b), but also combination parts greatly increase the flexibility of the system and allows for design outside the scope of the original toolkit while maintaining backward-compatibility and the general integrity of the system.
Here, we introduce the Coli Toolkit (CTK), a toolkit for work in prokaryotes that extends the existing infrastructure of the YTK and allows synthetic biologists to more easily work with genetic circuits in E. coli. To extend the capabilities of the toolkit into E. coli, we split the type 2 promoter part into four subparts, which allows for independent choice of promoter(s), insulating ribozyme, and RBS. The insulating ribozymes are important for proper functioning of the genetic circuits by standardizing the 5′ UTR, thus generating a predictable outcome.? To improve the cost efficiency of DNA synthesis, we provide software to cluster smaller DNA fragments into synthesis optimized sequences. The CTK comprises 156 parts, including 45 promoters, 12 ribozyme insulators, 21 RBSs, 20 CDSs, and 15 transcriptional terminators. Furthermore, we also provide 16 level 0 backbones for cloning and characterization with various combinations of origins, resistances, and counter-screenable markers. Additionally, CTK also includes 25 combination parts that enable fast and efficient cloning, of which 20 are characterized NOT gates ready to use in larger genetic logic circuits. The CTK is made available on Addgene.
Results
Design of the
Toolkit
The Coli ToolKit (CTK) is based on the YTK, with certain additions that make it possible to use it in the context of a prokaryotic host such as E. coli. As mentioned above, the YTK works by having eight different types of parts, numbered 1–8, which each have an abstracted function. For example, type 1 parts are assembly connectors, type 2 are promoters, and type 3 are coding sequences. To adapt this system to making genetic circuits in E. coli, the type 2 was recontextualized from only being a promoter, to transcriptional and translational control, and subsequently split into multiple subparts (Figure). Now, types 2a and 2b are promoter parts, type 2c is a ribozyme insulator part, and type 2d is an RBS part. For the promoters, it is possible to use tandem promoters with one promoter in type 2a and one in type 2b (FigureA), or using a single promoter in a type 2ab part (FigureB).
CTK type 2 part subdivision. (A) The configuration for the use of tandem promoters with type 2a and type 2b parts can be seen. (B) The configuration for a single promoter as a type 2ab part can be seen. The type 2c and 2d parts stay consistent independently of whether single or tandem promoters are used.
When splitting one part into multiple parts, suitable overhangs need to be selected to maintain high cloning efficiency. The NEB Ligase Fidelity tool ?,?,? was used to select additional overhangs that do not conflict with the existing 4-base pair overhangs employed in the YTK. The new overhangs can be seen in Figure and in Supplementary Figure S1.
Parts
Overview
Access to a wide range of parts is essential to a toolkit’s utility. A total of 156 basic and combination parts is therefore included in this toolkit, and an overview of them can be seen in Figure. These basic parts include promoters, ribozyme insulators, RBSs, fluorescent proteins, transcriptional repressors, and terminators. Many parts are sourced from the Cello collection? and the Anderson promoter library? and adapted to the context of the CTK. For the promoters, there are options for both tandem promoters and single promoters. Three fluorescent proteins and two transcriptional terminators are also supplied in type 3a and type 4b, respectively. Type 3a parts enable transcriptional fusions, here tagging with fluorescent proteins in the N-terminal of the CDS in the type 3b part. When a C-terminal tag is used instead, the tag can be placed as a type 4a part, therefore needing terminators in type 4b.? In the toolkit, there are no type 3b or 4a parts, as these are project-specific coding sequences and therefore not included in the toolkit.
Overview of the level 0 parts in the Coli Toolkit. Each part type has unique upstream and downstream overhangs, so that a full plasmid can only be assembled when all types from 1 to 8 are present. The figure shows both basic parts and combination parts that facilitate easier cloning. The newly provided backbones are not shown in this figure. Assembly connectors (type 1 and 5) and protein tags (types 3b and 4a) are supplied in the YTK (highlighted in gray). A table of all plasmids, including their type, description, and resistance gene can be seen in Supplementary Table S1.
In addition to the basic parts, CTK also contains combination parts, which are level 0 plasmids that contain multiple functional elements. These can increase the efficiency of cloning by reducing the number of parts needed for a cloning. For example, pCTK130 has the J23101 promoter (pCTK029), RiboJ ribozyme (pCTK049), and B0064 RBS (pCTK058) all in one part. Additionally, all Cello repressor units, excluding the promoter, are also included in the collection as type 2cd34 parts (pCTK134-pCTK153). With this, it is easier to build new genetic circuits as only the promoter needs to be chosen. An overview of these combination parts can also be seen in Figure.
Several backbone plasmids of type 678 have also been added to the toolbox. These 16 new backbones contain either an ampicillin, spectinomycin, or kanamycin resistance gene together with various origins of replication in low, medium, or high copy number. To improve on the original YTK, each cloning backbone exists in two variants: one with a GFP dropout and one with an RFP dropout, indicated by [GFP] and [RFP], respectively (Figure). This allows for easy screening of all transcriptional units and also those that express GFP themselves. An RFP dropout (pCTK156) has also been included to enable the cloning of level 1 backbone plasmids for multigene plasmid assembly with an RFP dropout. Furthermore, some of the new backbones have repressors that regulate the input promoters (LacI regulating PTac, AraC regulating PBAD, and TetR regulating PTet). By having these regulators as part of the backbone, cloning and characterization can be streamlined, as the needed regulators are already accounted for.
The full list of all plasmids can be seen in Supplementary Table S1. All part plasmids, including basic parts, combination parts, and backbone plasmids, will be available through Addgene (plasmid nos. 248364-248519).
Clustering of De
Novo Synthesized Parts
Adding new parts to a toolkit, such as CTK, is most commonly achieved by PCR or through de novo DNA synthesis. In the case of DNA synthesis, price is often a limiting factor, which means increases in efficiency are welcome.
When ordering DNA fragments from various synthesis providers, different requirements are set. During this work, the de novo DNA synthesis provider we ordered with, required a size of the gene fragments of at least 300 bases. Smaller parts like E. coli promoters, RBSs, ribozymes, and terminators are often shorter than 300 bases, even with the added overhangs for cloning into the entry plasmid, and therefore cannot be synthesized as-is. A naïve approach would be to add random DNA at the end to pad the sequence to 300 bases before ordering, but that wastes DNA synthesis potential (FigureA). Instead, if multiple smaller sequences are concatenated, more parts can be ordered for the same price. To clone the parts into the entry vector without ambiguity, we exchange the BsmBI cut site with other type IIs restriction enzymes, in our case BbsI and BspMI (while maintaining the overhangs for cloning into the entry vector). From a concatenated fragment, we can then choose which of the individual parts to clone by using one of the three restriction enzymes. We can thereby package multiple fragments into one synthesis order, thus saving money and synthesis power (FigureB).
Clustering of de novo synthesized parts to increase efficiency. (A) Three individual small DNA parts can be ordered through a synthesis provider by adding filler nucleotides to obtain the minimum length required by the provider. However, this includes wasted filler nucleotides that are simply thrown out. The arrows indicate the cutting direction of the restriction enzyme. (B) Three smaller parts can be concatenated together and have their unloading restriction enzyme altered to minimize the number of nucleotides that need to be synthesized. To avoid errors and repeats, the software clusters the small fragments to avoid similarities, thus increasing the chance of a successful de novo DNA synthesis. The arrows indicate the cutting direction of the restriction enzymes.
However, simply concatenating parts together can cause problems during synthesis due to repeats. To avoid this, one can purposefully group the parts together to avoid similarities, such as concatenating a promoter together with an RBS and a terminator instead of two other promoters.
To streamline and optimize ordering de novo synthesized parts, we have developed a software package. It can take in DNA sequences (with the needed entry adaptors) and group them together in a way that avoids similarities, which decreases costs and increases the chances of successful DNA synthesis reactions. Through our benchmarking, we were able to decrease the costs associated with synthesizing all CTK parts to 46% of the naïve approach (Supplementary Table S5). The software can be accessed on GitHub (See Code and Data Availability), and a benchmark comparing the software to the naïve approach and to random clustering can be seen in the Supplementary Text.
Characterization
of Toolkit Promoters
Constitutive promoters, like the Anderson collection,? offer a constant level of expression that can be useful for applications. In the CTK, the Anderson promoters are included as type 2ab parts for easy use in constructs. All have been characterized in the expression of GFP through the standardized GFP expression combination part (pCTK154). The constitutive promoters have expression strengths stretching across more than 2 orders of magnitude (FigureA). When compared to previous characterized performance, our results generally align well, taking into account different growth conditions and E. coli strains? (Supplementary Table S3). The expression from all constitutive promoters was also tested for the addition of chemical inducers used in this study, and we found that all showed unaffected behavior (FigureB).
Characterization of constitutive promoters. (A) The strength of all Anderson promoters is normalized to relative promoter units (RPUs). Numbers in parentheses indicate the median expression strength of the constitutive promoter in RPU. (B) Promoter strength of Anderson promoters across different chemical inducers normalized to RPU. Error bars indicate the standard error of the mean (SEM) for the triplicate measurements.
Inducible promoters allow for controlled expressions based on the addition or removal of external stimuli. The most common stimulus is the addition of chemical inducers,? but there are many other options like light,? temperature,? and magnetic fields.? To construct inputs for the genetic circuits, we are employing three promoters, which respond to chemicals through their cognate transcription factors: PTac (responds to IPTG), PTet (responds to anhydrotetracycline (aTc)), and PBAD (responds to arabinose). These inducible promotes were sourced from the Cello collection.?
The inducible input promoters were tested with not only their respective chemical inducers (FigureA), but also the other inducers to measure their orthogonality (FigureB). From this, we can see that all input promoters act orthogonally depending on the input of chemical inducers.
Characterization of inducible promoters. (A) Dose–response curve of the three inducible promoters in relative promoter units (RPUs). PTac is induced by IPTG (left), PBAD is induced by l-arabinose (middle), and PTet is induced by aTc (right). Blue points are medians of the three replicates in black. Gray dots are outliers that were not considered further (too high deviation between replicates, see Materials and Methods). Population distributions for all input measurements can be seen in Supplementary Figure S2. Parameters for dose–response curves can be seen in Supplementary Table S4. (B) Cross-reactivity of inducible promoters. Numbers indicate promoter strength in RPU. The inducible promoters activate the expression only when exposed to their cognate chemical inducer. Otherwise, the expression is low.
Characterization of Cello NOT Gates
The basic components of many genetic circuits are NOT and NOR gates based on protein repressors. In the Cello library, there are a total of 20 NOT gates that can be combined in various ways to create the desired genetic circuits. For the GDA software to work, detailed characterization is required for all individual NOT gates.
All 20 NOT gates were assembled from the basic parts in the CTK and subsequently characterized by flow cytometry. Each gate consists of the PTac promoter, driving the expression of the protein repressor in the first transcriptional unit. In the second transcriptional unit, the corresponding promoter is driving the expression of sfGFP (FigureA). For each gate, 12 different concentrations of IPTG were used for characterization (FigureB), from which a Hill curve was fit (FigureC). The fit takes into account the spread and density of the measured populations, which provide a more accurate response function. The parameters of the response functions of all NOT gates can be seen in Supplementary Table S5, and the individual response functions for each NOT gate can be seen in Supplementary Figure S4. The five NOT gates with the highest dynamic range are shown in FigureD to highlight their interoperability in future genetic circuits.
Characterization of NOT gates in the CTK. (A) The structure of the NOT gates have lacI expressed from the plasmid backbone with the NOT gate itself being composed of two transcriptional units. The first is the Repressor protein being expressed from the PTac promoter. In the second transcriptional unit, the corresponding promoter drives the expression of GFP. (B) Individual populations of the S2-SrpR across 12 different concentrations of IPTG with increasing concentrations going to the right. The individual replicates can be seen in yellow, orange, and pink. The larger peak in light blue is the merged population of the replicates. Output normalized to RPU. (C) Fitting of a Hill curve to the median of the merged three biological replicates from separate days. Black points are medians for the individual replicates; blue points are the medians of the merged populations of all replicates. (D) Fitted Hill curves of the five NOT gates with the highest dynamic range, normalized to input in RPU and output in RPU. Cell populations as measured by flow cytometry for all 20 NOT gates can be found in Supplementary Figure S3, and fitted curves for all 20 NOT gates can be found in Supplementary Figure S4.
Conclusions
In this paper, we have presented an expansion of the Yeast Toolkit to prokaryote E. coli. To adapt the Yeast Toolkit to E. coli, we have expanded the scope of the type 2 parts from promoters to the control of transcription and translation. By splitting type 2 parts into four subtypes, promoters, ribozymes, and RBSs can be chosen independently, thus expanding the scope of possible constructs. A toolkit also needs useful parts, so we have characterized a collection of constitutive promoters, ranging 2 orders of magnitude, and a set of chemically inducible input promoters that all show low leakage, a high dynamic range, and no cross-talk between them. Additionally, we characterized all 20 NOT gates from the Cello collection in the context of the CTK. The NOT gates are also provided as combination parts with ribozyme, RBS, coding sequence, and terminator in one part, to facilitate faster cloning of larger genetic circuits. Lastly, we also provide a software package to increase the efficiency of de novo DNA synthesis orders. By using the clustering package, smaller DNA fragments can be grouped together to not waste synthesis potential. This software tool can help users of CTK, and any other Golden Gate toolkit, to greatly decrease their synthesis costs of small DNA parts.
In this work, we have applied CTK to genetic circuits, but it is not limited to only one topic within synthetic biology. Both applied and basic research is being conducted within our research group in E. coli using the CTK. We therefore believe all of these contributions will be useful to the wider synthetic biology community, both for making and using genetic circuits, but also for projects where the modular structure of the CTK can be used to easily and efficiently clone the desired constructs. The CTK is available on Addgene.
Materials
and Methods
Strains and Growth Media
Characterization experiments were performed in DH10B cells (NEB). DH10B and TOP10 cells (ThermoScientific) were used for routine transformations and cloning of plasmids. Routine bacterial growth was performed in LB media (Carl Roth), and characterization experiments were performed in Hi-Def Azure Media (Teknova), supplemented with 1% glucose. All cells were made chemically competent using the Mix & Go! kit (Zymo Research). For antibiotic selection, the following concentrations were used: ampicillin (Amp, 100 μg/mL), kanamycin (Kan, 50 μg/mL), chloramphenicol (Cm, 25 μg/mL), and spectinomycin (Spec, 50 μg/mL).
Cloning of CTK Parts
Creation of level 0 CTK parts was done in three ways: Most parts were synthesized de novo by Twist Biosciences with the appropriate overhangs. For the smaller of the parts, multiple were concatenated within one linear fragment, and the BsmBI cut sites were exchanged with BbsI and BspMI to allow for targeted entry into predigested pYTK001, as described above. Additionally, promoter PlacI was created through oligo annealing, where overhangs were already present, and the fragment could be cloned directly into pYTK001.? Construction of the Type 678 cloning backbones was performed by PCR amplification from pAN1717 and pAN3938,? using a Q5 polymerase (NEB), and was combined by Hi-Fi assembly (NEB) with the BsmBI overhangs and GFP expression unit from pYTK001, using the manufacturers’ instructions. pAN1717 and pAN3938 were gifts from Christopher Voigt (Addgene plasmid nos. 74696 and 74697, respectively).
Cloning of level 1 and level 2 plasmids was performed as described in the original YTK paper.? For assemblies with 4 or more parts, and for assemblies that exhibited low efficiencies, inspiration was taken from the CIDAR MoClo protocol,? where reactions were incubated in a thermocycler for 30–60 cycles of 37 or 42 °C (5 min), depending on whether the assembly uses BsaI or BsmBI, respectively, and 16 °C (5 min), followed by final digestion (37 or 55 °C, 20 min) and enzyme inactivation (80 °C, 10 min). Constructs were checked by colony PCR using DreamTaq PCR (ThermoScientific), and sequences were verified by Sanger sequencing (Microsynth) and Nanopore sequencing (Microsynth).
Clustering
Algorithm for De Novo Synthesis of DNA Parts
The clustering software uses the Levenshtein distance to compute the differences among the various fragments that the user wants to synthesize. Using affinity propagation,? the software defines clusters with high sequence similarity. From this, groups are made of up to three sequences from distinct clusters to obtain a low degree of sequence similarity in the final DNA sequence sent for synthesis. If the aggressive clustering option is selected, groups containing only one sequence are concatenated together to minimize the amount of DNA needed to be synthesized. Following the grouping, the DNA sequences are concatenated, and the restriction sites for BsmBI are exchanged to BbsI and BspMI for the second and third occurrences, respectively, using a search and replace function.
To run the software, download the Python script from Github: https://github.com/Self-Organizing-Systems-TU-Darmstadt/CTK-ColiToolKit, and run it using your preferred Python handler. The input file should be a .csv file with at least one column with the first row being “Name” and one being “Sequence”, containing the name of the fragment and the sequence of the fragment, respectively. If the aggressive clustering is desired, check the box marked “Aggressive clustering (combine singletons)”. After pressing “Run”, the final sequence is then outputted as a .csv file to the same folder from which the input file was chosen from.
Benchmarking was performed by running the sets of basic parts through the clustering software, on both aggressive clustering and normal clustering. The random and naïve groupings were generated with a modified version of the software that bypasses the clustering. All outputs were uploaded to the website of Twist Biosciences, and the ability to synthesize and price was recorded.
Flow Cytometry
Measurements of GFP Expression
To measure the fluorescence of the individual cells, colonies were added to 200 μL of Hi-Def Azure Media (Teknova), supplemented with 1% glucose and appropriate antibiotics in a 96 DeepWell plate (ThermoScientific) and grown overnight at 37 °C and 1000 rpm shaking. The culture was diluted twice by mixing 15 μL of culture to 185 μL of media, to a total dilution of 1:152. To this, inducers were added, and the cultures were incubated at 37 °C for 5 h at 1000 rpm.
The inducers used were IPTG (ThermoScientific), l-arabinose (Carl Roth), and anhydrotetracycline (aTc) (ThermoScientific). For IPTG, the following final concentrations were used: 0, 5, 10, 20, 30, 40, 50, 70, 100, 150, 200, and 1000 μM. For l-arabinose, the following final concentrations were used: 0, 10, 20, 50, 75, 100, 200, 500, 750, 1000, 2500, and 5000 μM. For aTc, the following final concentrations were used: 0, 1, 2, 2.5, 5, 7.5, 10, 15, 20, 50, 100, and 2000 pg/μL. Flow cytometry was performed on a CytoFLEX S cytometer (Beckman Coulter). For the measurements, the culture was diluted 1:20 by adding 10 μL of culture to 190 μL of PBS. The flow cytometer was set to record 3 · 10^5^ cells for all samples.
Flow Cytometry Data Preprocessing and Cleaning
Following acquisition of flow cytometry data, gating and data analysis was performed using a Python script (See Code and Data Availability). In particular, cytometry data was gated with the FlowCal library? by density gating defined on forward scatter area and height to preserve 95% of the cell events.
For data cleaning (outlier detection), we only consider experimental conditions with small deviations between replicates. In particular, we consider the pairwise ratios of the replicates’ median values and pool the data only in case all ratios are smaller than or equal to eight. Otherwise, we discard all three replicates for the respective experimental conditions (a particular inducer concentration). In addition, we consider only replicates with at least 1000 cell events. For later model calibration, we preferred to not introduce any bias and instead discard the experimental condition as a whole as the model curvature can be inferred from neighboring experimental conditions. The ratio of eight proved robust for discarding high deviation replicates and being tolerant to natural deviations. Both procedures were chosen to ensure robust data cleaning.
Further preprocessing includes conversion to relative promoter units (RPUs) to quantify the relative promoter activity in comparison to the reference promoter J23101. ?,? The conversion factor γ to convert raw fluorescence intensity values into RPU was determined by using the following formula.?
Here, [GFP] is the median fluorescence of the sample, [GFP]0 is the median fluorescence of the autofluorescence of the control, and [GFP]* RPU
- is the median fluorescence of the cells containing the reference plasmid with the J23101 promoter. ?,? All raw values then were rescaled by multiplication with γ to yield RPU values.
Model Calibration
To represent the dose–response curves of input sensors and gates analytically, we calibrated models to the median RPU dose–response. In particular, this means that model calibration uses the data set D = {x _ i _, y _ i _}. Here, the either represent the inducer concentrations in case of the input sensors or the corresponding input sensor’s median output in RPU, representing the gate’s input. In both cases, the are the corresponding median outputs in RPU.
To model the dose-response curves, we use the activatory Hill equation defined as
for the input sensors, and the inhibitory Hill equation
for the gates. Parameters y min, y max, and k are in RPU and define the dynamic range of the output (y min and y max) as well as the location of the transition region (k). n is the Hill coefficient and defines the steepness and, in turn, the length of the transition region. θ = (y max, y min, n, and k) represents the model’s parametrization.
The algorithm employed for model calibration is parallel tempering, ?,? a Markov chain Monte Carlo algorithm. Parallel tempering uses Markov chains at different temperatures to draw samples from the posterior distribution. This allows one to explore multimodal distributions through sample exchange between chains, and was successfully applied to model calibration of chemical reaction networks previously.? As the experimental values as well as the parameters θ span multiple orders of magnitude, we will consider, in both cases, the logarithmic domain for the calculation of differences. For the experimental values, this ensures that the model’s deviation to the data is treated in dependence to the order of magnitude, while in case of the parameters, the algorithmic behavior is improved.
The posterior distribution over the parameters θ in dependence to the median dose–response data D is defined in terms of the prior and the likelihood as
The prior encodes our initial assumptions on the parameters. We define the prior to be
and assume that for y max and y min values in the range [ŷ max,2 ŷ max] and [0.5 ŷ min, ŷ min] respectively should be most likely, where and . We encode this as
while we set , encoding no further assumptions. Please note that defining the distributions in terms of proportionalities is sufficient only for maximum a posteriori estimation and sampling from the posterior by parallel tempering.
The likelihood characterizes how well the data matches the model with parameters θ. As both models (f θ ^(a)^ and f θ ^(i)^) have four parameters each, they can be treated identically wherefore we introduce f θ as a placeholder representing either of f θ ^(a)^ and f θ ^(i)^. We assume independence of data points and therefore define the factorized likelihood.
The calibrated parameter configuration θ̂ is then defined as the maximum a posteriori (MAP) estimate
derived by sampling from the posterior distribution with parallel tempering. In particular, we define the initial parameter configuration to be
with n 0 = 2 and k 0 = 1 in the case of input sensor calibration or k 0 = 0.01 in the case of gate calibration. Then, we executed parallel tempering with 10 walkers, each featuring 10 chains at different temperatures, for 10,000 steps. From the 10^7^ posterior evaluations, we select θ̂ as the one maximizing the posterior. This process is performed independently for each input sensor and each gate.
The full model calibration pipeline is part of the Python script for data processing (Code and Data Availability).
Supplementary Material
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Brophy J. A. N.Voigt C. A.Principles of genetic circuit design Nat. Methods 201411550852010.1038/nmeth.292624781324 PMC 4230274 · doi ↗ · pubmed ↗
- 2Lee S.Khalil A. S.Wong W. W.Recent Progress of Gene Circuit Designs in Immune Cell Therapies Cell Syst.2022131186487310.1016/j.cels.2022.09.00636395726 PMC 9681026 · doi ↗ · pubmed ↗
- 3Nielsen A. A. K.Genetic circuit design automation Science 20163526281 aac 734110.1126/science.aac 734127034378 · doi ↗ · pubmed ↗
- 4Chen Y.Zhang S.Young E. M.Jones T. S.Densmore D.Voigt C. A.Genetic circuit design automation for yeast Nat. Microbiol.20205111349136010.1038/s 41564-020-0757-232747797 · doi ↗ · pubmed ↗
- 5Kubaczka E.Energy Aware Technology Mapping of Genetic Logic Circuits ACS Synth. Biol.202413103295331110.1021/acssynbio.4c 0039539378113 PMC 11494706 · doi ↗ · pubmed ↗
- 6Engelmann N.Schwarz T.Kubaczka E.Hochberger C.Koeppl H.Context-Aware Technology Mapping in Genetic Design Automation ACS Synth. Biol.202312244645910.1021/acssynbio.2c 0036136693176 PMC 9942193 · doi ↗ · pubmed ↗
- 7Schladt T.Engelmann N.Kubaczka E.Hochberger C.Koeppl H.Automated Design of Robust Genetic Circuits: Structural Variants and Parameter Uncertainty ACS Synth. Biol.202110123316332910.1021/acssynbio.1c 0019334807573 PMC 8689692 · doi ↗ · pubmed ↗
- 8Weber E.Engler C.Gruetzner R.Werner S.Marillonnet S.A Modular Cloning System for Standardized Assembly of Multigene Constructs P Lo S One 201162 e 1676510.1371/journal.pone.001676521364738 PMC 3041749 · doi ↗ · pubmed ↗
