Complex HBB gene editing outcomes revealed by a fluorescent reporter cell model
Cecile L. Karsenty, Daniel Betancourth, Mingming Cao, Quoc-Khanh Pham, So Hyun Park, Gang Bao

TL;DR
A fluorescent reporter cell model helps study gene editing outcomes at the β-globin gene, revealing complex and unintended changes that affect safety and efficacy.
Contribution
A novel fluorescent reporter cell model enables allele-specific monitoring of HBB gene editing outcomes with high resolution.
Findings
The model discriminates in-frame alleles, frameshift indels, large deletions, and loss of allele events.
HDR-enhancing agents increase LOA outcomes, which are undetected by traditional bulk assays.
The model links editing genotypes to erythroid phenotypes, aiding safety evaluation of gene therapies.
Abstract
CRISPR-Cas9 gene editing offers the potential to transform the treatment of sickle cell disease by correcting the sickle mutation in β-globin gene (HBB). However, in addition to alleles with homology-directed repair (HDR), Cas9 editing at HBB generates a diverse spectrum of outcomes, including small insertions and deletions (indels), large deletions (LDs), and loss of allele (LOA) events, that can compromise genomic integrity and raise significant safety concerns. While new pharmacological modulators have been developed to increase the HDR rates, they may also elevate the risk of large gene modifications. To better understand the complex HBB gene editing outcomes, we engineered a live-cell, dual-fluorescent reporter cell model enabling allele-specific monitoring of HBB expression via GFP and blue fluorescent protein (BFP) tagging. Based on fluorescence intensities, this model can…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCRISPR and Genetic Engineering · Virus-based gene therapy research · RNA regulation and disease
Introduction
Sickle cell disease (SCD) is one of the most prevalent and severe inherited monogenic disorders, caused by a point mutation in the β-globin gene (HBB) that results in hemoglobin S (HbS). SCD affects over 7 million individuals globally, causing significant morbidity, premature mortality, and healthcare burden.1 While matched-donor allogeneic hematopoietic stem cell transplant (HSCT) offers a curative treatment, this option is available only to less than 15% of individuals with SCD.2^,^3 Thus, there has been significant momentum in recent years toward the development of autologous gene therapies. In late 2023, the Food and Drug Administration (FDA) approved Casgevy, the first CRISPR/Cas9-based autologous gene editing therapy for SCD.4 This therapy aims to reverse the SCD phenotype by inducing fetal hemoglobin (HbF) through disruption of the BCL11A erythroid enhancer.4^,^5^,^6 Alternatively, CRISPR/Cas9-based gene editing can correct the sickle mutation in patient-derived hematopoietic stem and progenitor cells (HSPCs).7 In this approach, ribonucleoproteins (RNPs) formed by Cas9 nuclease and CRISPR guide RNA (gRNA) targeting the HBB gene are delivered into SCD HSPCs together with a corrective donor template, either packaged in adeno-associated virus vector 6 (AAV6)8 or as a single-stranded oligonucleotide (ssODN).9^,^10^,^11 Clinical trials using the gene correction approach targeting a locus near the sickle mutation with gRNA R-0212 are currently underway (NCT04774536/NCT04819841).
Cas9-induced double-strand breaks (DSBs) generate a wide spectrum of non-HDR repair outcomes, including small insertions and deletions (indels), large deletions (LDs), and complete loss of allele (LOA). However, analyses of genome editing outcomes in SCD HSPCs have largely centered on quantifying small NHEJ-mediated indels and HDR events, whereas larger structural modifications often arising through microhomology-mediated end joining (MMEJ)13^,^14 have not been carefully analyzed, and their phenotypic consequences remain poorly understood. Our prior long-read single-molecule real-time sequencing with dual unique molecular identifiers (SMRT-seq with dual UMI) revealed a high frequency and broad spectrum of LDs induced by R-02 and R-66S gRNA targeting the sickle mutation. These LD events decreased but remained appreciable when an ssODN donor was added.15 Although enrichment of HDR-repaired alleles with concomitant depletion of indel-bearing cells has been reported during erythroid maturation in a xenograft mouse,9 depletion of indels in peripheral blood does not equate to safety. If a substantial fraction of edited bone marrow progenitors fail to mature into red blood cells (RBCs) and instead undergo ineffective erythropoiesis, this could impose marrow stress and hyperplasia,16 while leaving behind long-lived HSCs harboring diverse indels and large structural modifications, the risks of which are not fully understood. A recent clinical trial using R-02 RNP and AAV6 donor for sickle mutation correction further illustrates this complexity. The treated patient exhibited a striking post-infusion shift in editing outcomes: indel frequency increased from 21.4% to 92.6%, while HDR-corrected alleles fell from 33.1% to 1.3% in peripheral blood CD15^+^ cells 1 year after infusion.17 Despite this, the patient demonstrated a favorable hemoglobin profile (<5% HbS and >78% HbF), suggesting that therapeutic benefit arose not from direct correction of the sickle mutation, but from HbF induction associated with indel alleles. The mechanism underlying this unexpected HbF upregulation and the apparent selective expansion of indel-bearing cells remains unresolved. Collectively, these observations highlight the complexity of HBB repair and the need to define how specific non-HDR genotypes influence HBB expression, HbF induction, and cell fitness throughout erythroid differentiation, as well as their impact on long-term persistence and the potential risk of clonal expansion.
To better understand the complex HBB editing outcomes, we developed a dual-fluorescent HBB reporter in sickle human-umbilical-cord-derived progenitor erythroid 2 (HUDEP-2) cells18^,^19^,^20 that enables high-resolution genotyping and phenotyping of Cas9 edits. By converting fluorescence intensity states into ∼20 genotype classes, this model distinguishes allele-specific in-frame edits, frameshift indels that introduce early versus late nonsense (NS) mutations, and LD versus LOA, defined as larger aberrations that exceed several kilobases and fail to amplify by 6-kb long-range PCR. These genotypes are simultaneously linked to HBB expression, HbF induction, cell fitness, and erythroid differentiation. Because cells remain viable, the platform supports prospective isolation, time course analysis, and testing of HDR-enhancing interventions, providing an integrated readout of both benefit (HDR gain) and risk (genome-altering byproducts). We show that our reporter cell model (1) maps fine-scale HBB genotypes, including rare classes such as LD and LOA that are difficult to resolve by next-generation sequencing (NGS), long-read sequencing, or droplet digital PCR (ddPCR) alone; (2) quantifies HbF induction and apoptosis associated with distinct editing outcomes; (3) provides a practical, scalable testbed to evaluate emerging HDR-enhancement strategies for both efficacy and safety; and (4) enables mechanistic dissection of repair pathways leading to LOA, informing strategies to minimize these events. Thus, this model offers a valuable tool for dissecting DNA repair outcomes at HBB, quantitatively linking editing genotypes to erythroid phenotypes and evaluating the safety of CRISPR/Cas9-based therapies.
Results
On-target gene editing at HBB results in ineffective erythropoiesis in SCD HSPCs
Because HDR is inefficient in HSCs,21^,^22 Cas9 editing at the sickle locus produces a mixture of repair outcomes, including indels and larger disruptive events such as LDs23 and LOA.24 Prior studies have shown that these non-HDR alleles are common and can differentially persist during erythropoiesis.10 Together with recent clinical observations suggesting unexpected expansion of certain indel-bearing clones,17 these findings underscore the need to define how individual HBB genotypes influence β-globin expression, HbF induction, and erythroid maturation. To investigate these effects, CD34^+^ HSPCs from three SCD donors (donors 1–3) were electroporated with HiFi SpCas9 RNPs targeting HBB exon 1 (R-66S or R-02),10^,^12 with or without an ssODN donor correcting the sickle mutation10^,^25^,^26 (Figure 1A). We quantified large gene modification events using a ddPCR allelic drop-off assay, which measures the fraction of HBB alleles that are amplifiable by short-range ddPCR. Alleles that fail to amplify are inferred to harbor CRISPR-induced large gene modifications, including LDs, large insertions, chromosomal rearrangements, chromosomal truncations, or chromosomal loss. Because LD is the predominant contributor to allelic drop-off, these events are referred to as “LD” in ddPCR figures throughout the paper. NGS detected HDR and small indels in the remaining intact alleles, and NGS data were normalized to ddPCR-derived total allele counts to report true allelic fractions across all HBB copies.15 Both gRNAs achieved high overall editing efficiencies. Without ssODN, R-66S generated a higher proportion of frameshift indels than in-frame indels, whereas R-02 predominantly produced a 9-bp in-frame deletion via MMEJ (Figure S1), resulting in more in-frame than frameshift indels (Figure 1B). R-66S also induced more LDs, 23.8% (5.2%), than R-02, 17.4% (2.2%). Adding ssODN shifted repair toward HDR, reaching 46% (0.97%) for R-66S and 30.8% (0.05%) for R-02,10^,^21^,^25 and correspondingly reduced both in-frame indels and LDs, with a smaller decrease in frameshift indels (Figure 1B).15^,^27^,^28^,^29^,^30Figure 1CRISPR editing of HBB delays erythroid maturation in HSPCs from SCD patientsSickle-patient-derived CD34^+^ HSPCs from two donors were edited at HBB using HiFi SpCas9 RNPs with sgRNAs R-66S or R-02, delivered by electroporation ± ssODN donor. (A) HBB exon 1 sequence showing the sickle mutation (red). Protospacers for R-66S and R-02 are shown with PAMs underlined (blue). (B) Allele frequency quantified by NGS and ddPCR. ddPCR measured allelic-drop off as a surrogate for large deletion (LD) resulting from loss of the primer/probe binding site, while NGS quantified HDR and small indels on intact alleles. NGS values were normalized to ddPCR to true allelic fractions across all HBB alleles (intact and drop-off). Efficient editing was achieved with R-66S and R-02 in n = 2 biological replicates (independent donors), each with two technical replicates (independent electroporation). (C) Erythroid maturation assays showed a significant delay after R-66S or R-02 RNP versus mock (p < 0.0001 and p = 0.0001, respecitvely using a 2-sided Welch t test, evidenced by increased CD36^+^/CD235a^+^ immature cells in donor 1 (15-day differentiation) and donor 2 (13-day differentiation). Adding the ssODN donor restored maturation to near-mock levels. (D) In a third donor, edited cells were differentiated for 10 days, sorted into immature and mature fractions, and allele frequencies were measured by LongAmp-seq. Across both gRNAs, frameshift indels were enriched in the immature fraction, whereas in-frame indels and unmodified alleles were enriched in the mature fraction, implicating frameshift indels in the observed maturation delay (n = 1 experiment).
Erythroid differentiation assays revealed a pronounced maturation delay following RNP-only editing with either gRNA (Figure 1C). This delay was evident as an increased proportion of CD36^+^/CD235a^+^ immature cells on day 15 (donor 1) or day 13 (donor 2) of differentiation31 (Figures S2 and S3), with a similar pattern observed using CD71^+^/CD235a^+^ markers (Figure S4). Addition of ssODN restored differentiation to near-mock levels. In a third donor, long amplicon sequencing (LongAmp-seq)16 was performed on sorted immature and mature fractions after 10 days of differentiation, showing frameshift indels were enriched in the immature fraction, whereas in-frame indels and unmodified alleles were enriched in the mature fraction, implicating frameshift indels in the observed maturation delay (Figure 1D; Figure S5). HPLC analysis, which reports the relative fractions of each hemoglobin species, showed that R-02 RNP editing reduced HbS from 96.5% in mock to 34.5% and produced 26% HbA via HBD conversion,10 with 40% HbF. With ssODN, HbA increased to an average of 61% (R-66S) and 53% (R-02), and HbF, although lower than in RNP-only samples, remained above mock (Figures S6 and S7). While HPLC does not quantify absolute hemoglobin per cell, the changes in relative hemoglobin composition align with the maturation phenotypes. RNP-only editing yields HbS knockout with limited HbA restoration and only partial HbF compensation, and these samples show impaired erythroid maturation compared with mock. In contrast, donor-mediated HDR produces a more favorable hemoglobin profile and restores maturation toward mock levels (Figure 1C).
Despite maturation patterns resembling mock (Figure 1C), bulk RNP+ssODN-edited cultures still contain a substantial fraction of cells with disruptive genotypes that are not sensitively captured by the population-level maturation assay. In our prior clonal genotyping, ∼20% of R-66S RNP+ssODN-edited colonies were complete HBB knockouts resulting from frameshift indels and/or LDs on both alleles.15 Mechanistic dissection of such heterogeneous genotypes is difficult in primary SCD HSPCs, which have a finite ex vivo lifespan after CD34^+^ isolation and rapidly lose self-renewal capacity, limiting longitudinal tracking, multiple perturbations, and genotype-phenotype correlation within the same culture. To address these constraints, we developed a fluorescent reporter model that enables controlled, allele-resolved analysis of editing outcomes and their functional consequences in a stable, scalable cell system.
Establishment of S-HUDEP2GFP/BFP dual-fluorescent model for allele-specific monitoring of HBB editing
We selected HUDEP-2 cells to introduce the HBB reporter because they are a well-characterized human erythroid model that recapitulates early and intermediate erythropoiesis and adult β-globin expression, while remaining highly amenable to precise genome editing.18^,^19^,^20 HUDEP-2 cells are broadly used as a first-line platform to benchmark CRISPR strategies before validation in primary HSPCs,32^,^33 dissect mechanisms of HbF induction,34 and identify new HbF regulators.35^,^36^,^37 Their robust expansion, uniform differentiation, and reproducible β-globin output provide an ideal setting for a HBB-allele-specific fluorescent reporter, enabling mechanistic genotype-phenotype studies that are impractical in primary HSPCs.
We engineered a dual-fluorescent sickle HUDEP2 cell model (SHD^GFP/BFP^) from the parental sickle HUDEP2 (SHD) line,18^,^19^,^20 which carries a homozygous sickle mutation, by introducing a P2A-GFP-pA cassette at the HBB C-terminus via HDR, generating a biallelic SHD^GFP/GFP^ clone. One GFP allele was subsequently converted to BFP through a Y66H substitution (TAC→CAT) using Cas9 RNP and ssODN,38 and dual-positive cells were sorted to establish the stable SHD^GFP/BFP^ line. Long-read sequencing confirmed allele-specific tagging of HBB with GFP and BFP, enabling direct visualization of expression from each allele (Figure 2A). HPLC, following 7 days of erythroid differentiation, detected a single HbS peak in the parental SHD line, whereas SHD^GFP/BFP^ showed a single earlier-eluting peak corresponding to HbS^2A^ (Figure 2B). Flow cytometry demonstrated stable co-expression of GFP and BFP during expansion and a strong increase in fluorescence intensity during erythroid differentiation, consistent with activation of the endogenous HBB promoter (Figure 2C). SHD^GFP/BFP^ cells displayed maturation kinetics, as measured by CD36 and CD235a expression (Figures 2D and 2E) and maintained viability comparable to the parental SHD line (Figure S8), confirming that reporter tagging did not impair erythroid differentiation capacity.Figure 2. Generation and validation of sickle HUDEP2 dual fluorescent model for allele-specific monitoring of HBB expression(A) The SHD^GFP/BFP^ model was generated from S-HUDEP2 cells by delivering Cas9 RNPs together with a double-stranded DNA donor targeting the HBB C-terminus. The donor contained left and right homology arms, a P2A-GFP-poly(A) tail, labeling each HBB allele via HDR. A biallelic SHD^GFP/GFP^ clone was then converted to a SHD^GFP/BFP^ hetero-allelic line by introducing a Y66H substitution via RNP + ssODN to convert one GFP allele to BFP. Dual-positive cells were sorted to establish the clonal SHD^GFP/BFP^ line. Long-read sequencing confirmed allele-specific labeling of HBB with GFP and BFP. (B) Hemoglobin HPLC after 7 days of differentiation showed a single HbS peak in parental SHD, whereas SHD^GFP/BFP^ showed a single peak at a shorter retention time, consistent with addition of the 21-amino acid 2A tail to HbS after P2A cleavage (HbS^2A^), which increases the protein’s negative charge. (C) The left panel shows parental SHD used for negative gating. The middle panel shows SHD^GFP/BFP^ cells with stable co-expression during expansion, and the right panel shows increased GFP and BFP mean fluorescent intensities (MFIs) by day 4 of erythroid differentiation, consistent with regulation by the endogenous HBB promoter, whose activity rises during erythropoiesis. (D and E) SHD ^GFP/BFP^ maintained erythroid maturation patterns comparable to the SHD, as measured by CD36 (D) and CD235a (E) expression over time. Flow cytometry values are the median MFI within CD36+ or CD235a+ gates (mean ± SD; n = 2 replicates). (F) Schematic of HBB exons 1–3 showing the positions of four gRNAs tested in the SHD^GFP/^^BFP^: R-66S and R-02 gRNAs targeting early exon 1, R-62 targeting distal exon 1, and Met-55 targeting the alternative start codon in early exon 2. (G) SHD^GFP/BFP^ was edited with four gRNAs; for R-66S and R-02, edits were also performed with ssODN. Efficient editing was confirmed in bulk culture for all four gRNAs. Flow cytometry at day 4 of erythroid differentiation showed distinct fluorescent clusters for each gRNA. R-66S and R-02 generated the greatest cluster diversity. With ssODN, the GFP^high^BFP^high^ cluster encompassing in-frame HDR alleles increased, while non-HDR clusters decreased. R-62 produced fewer intermediate clusters and Met-55 eliminated intermediate populations.
SHDGFP/BFP reports gRNA-specific editing outcomes for gRNAs targeting HBB exons 1 and 2
We evaluated four gRNAs spanning HBB exons 1–2 to probe how target position and indel spectra shape SHD^GFP/BFP^ fluorescence. Two guides (R-66S and R-02) are the clinically relevant early exon-1-targeting gRNAs used for sickle-mutation correction; we also tested R-62 targeting distal exon 1 and Met-55 targeting the alternative start codon in exon 2 (Figure 2F).39 All gRNAs edited efficiently and produced distinct indel profiles (Figures S9 and S10). Prior studies show that a positional boundary bisects HBB exon 1: nonsense mutations (NSs) in the 5′ portion largely escape nonsense-mediated decay (NMD), whereas more distal NS trigger efficient NMD, with mRNA levels declining gradually up to NS23 and dropping sharply by NS26.40^,^41 We therefore reasoned that gRNAs targeting early exon 1 and creating an NS across this boundary would generate a range of HBB mRNA abundances and, thus, a spectrum of GFP/BFP mean fluorescent intensities (MFIs) in SHD^GFP/BFP^. Consistent with this, R-66S and R-02 produced the broadest range of GFP/BFP clusters, including intermediate MFIs indicative of partial NMD escape, whereas R-62 yielded fewer intermediate clusters, and Met-55 largely eliminated them. For R-66S and R-02, addition of an ssODN increased the GFP^high^BFP^high^ fraction, consistent with HDR enrichment (Figure 2G).
Optimization of SHDGFP/BFP for cluster-resolved HBB editing analysis
We validated and refined the SHD^GFP/BFP^ reporter cell model by longitudinally profiling R-66S-RNP-edited cells during expansion and on differentiation days 4, 8, and 11. Cluster separation was stable on days 4 and 8 but blurred by day 11, establishing days 4–8 as the optimal window for sorting and genotype assignment (Figure S11). Using single-color SHD controls (SHD^GFP^ and SHD^BFP^) for compensation, we defined a 4 × 5 grid of GFP/BFP clusters based on MFI (Figure S12). Although we anticipated symmetric behavior of the two alleles, we resolved five GFP-defined clusters but only four for BFP. This asymmetry likely reflects GFP’s higher brightness and spectral spillover,42 which collapses BFP^dim^ and BFP^neg^. Future iterations replacing BFP with a brighter, spectrally distinct fluorophore (e.g., mCherry) should further improve cluster separation and genotype calling. Post-sorting, clusters retained their characteristic GFP/BFP MFIs after additional differentiation, confirming fluorescence stability and sorting fidelity (Figure S13).
Throughout the manuscript, we summarized % cells, allele frequencies, and cluster-level phenotypes as 4 × 5 heatmaps that preserve the spatial layout of the GFP-BFP flow cytometry plots. This format enables rapid visual mapping of cluster identity (GFP/BFP MFI), genotype class (in-frame, frameshift, LD, and LOA), and phenotype (%HbF, %Annexin V^+^), with proportions encoded by color. Main-text heatmaps display means from n = 2 replicates; companion supplemental figures provide per-replicate values with SD error bars.
SHDGFP/BFP reports allele-specific HBB genotypes and expression states in live cells
Using R-66S and R-02 gRNAs, we isolated each fluorescence-defined cluster to identify the genotypes underlying its characteristic GFP/BFP MFI profile. In SHD^GFP/BFP^ cells edited with R-66S or R-02 ± ssODN, NGS and ddPCR confirmed efficient editing for both guides (Figure 3A), following the same trends observed in SCD HSPCs (Figure 1A) and prior reports.10^,^11 After 4 days of erythroid differentiation, GFP/BFP flow cytometry resolved 20 clusters per gRNA, with the percentage of cells in each cluster shown as heatmaps (Figures 3B and 3C). With RNP alone, R-66S, whose edits are more heterogeneous and frameshift skewed, yielded fewer GFP^high^BFP^high^ cells (4.2%) (Figure 3B) than R-02 (19.5%) (Figure 3C), reflecting R-02’s predominant 9-bp MMEJ deletion. Adding ssODN decreased intermediate- and negative-clusters, and increased GFP^high^BFP^high^, consistent with enrichment for HDR/in-frame alleles. Fold changes in cluster percentages between RNP-only and RNP+ssODN are summarized in Figure S14. Next, we sorted all 20 fluorescence-defined clusters on days 4–5 of differentiation from each R-66S- or R-02-RNP-edited cells with replicates and profiled genotype composition by NGS and ddPCR (Figures 3D and 3E; Figure S15). The GFP^high^BFP^high^ was composed largely of in-frame alleles that preserve HBB expression. Clusters with high GFP or BFP on one allele (top row/right column) contained ∼50% in-frame alleles. Medium and low-MFI clusters were enriched for frameshift indels. GFP^low, mid^BFP^low,mid^ clusters were dominated by biallelic frameshift genotypes, and clusters with one allele at medium or low MFI contained ∼50% frameshift alleles. Dim/negative clusters were dominated by LDs, with GFP^neg^BFP^neg^ cluster showing 93% LD for R-66S and 83% LD for R-02 and clusters with one dim or negative allele (GFP^dim,neg^ or BFP^neg^) containing ∼50% LD alleles.Figure 3SHD^GFP/BFP^ model reports on allele-specific HBB genotypes and expression in live cells(A) SHD^GFP/BFP^ were edited with R-66S and R-02, delivered as RNP or RNP + ssODN. Efficient editing was achieved for both guides. NGS quantified HDR, in-frame, and frameshift indels; ddPCR quantified LDs. R-66S yielded higher frameshift and LD rates than R-02. (B and C) After 4 days of erythroid differentiation, identical GFP/BFP-MFI gating resolved 20 clusters for R-66S (B) and R-02 (C). The percentage of cells in each cluster is displayed as a 4 × 5 heatmap, with the spatial arrangement matching that of the flow cytometry plots. Cluster distributions reflected gRNA-specific outcomes. Addition of ssODN reduced the proportion of intermediate/negative clusters in most cases and increased the GFP^high^BFP^high^ population, corresponding to HDR or in-frame alleles. R-66S (D) or R-02 (E) RNP-treated cells were bulk sorted into the 20 clusters defined by the gating shown in the left flow cytometry panel. The three heatmaps display the percentage of in-frame alleles (green), frameshift alleles (blue), and LD (red) for each cluster. Each cluster showed a distinct indel profile, confirming that changes in GFP/BFP intensity reflect underlying genotype. Unmodified and in-frame alleles predominated in high MFI clusters; frameshift indels enriched in medium/low MFI clusters; LD was most frequent in dim/negative clusters. A similar cluster-specific editing pattern was observed for both gRNAs. The values shown within each heatmap cluster represent the average of two replicates. (F) To pinpoint which frameshift indels drive medium vs. low MFI clusters, we analyzed frameshift indels and the resulting nonsense (NS) codon position in the HBB exon. The x axis represents the frameshift indel sizes, and the “NS” labels shown above each peak indicate the codon position of the premature nonsense mutation generated by each specific indel size. β-globin codon numbering is used (sickle = codon 6). For R-66S, medium-MFI clusters (GFP^med^BFP^high^, GFP^high^BFP^med^, and GFP^med^BFP^med^) were enriched for NS12, NS15, and NS18, indicating that NS < 19 permits NMD bypass and maintains HBB^GFP/^^BFP^ expression. (G) Low-MFI clusters (GFP^low^BFP^high^, GFP^high^BFP^low^, and GFP^low^BFP^low^) mapped to NS19–21, showing that NS ≥ 19 triggers robust NMD, reducing HBB^GFP/BFP^ expression. R-02 showed a similar trend: (H) earlier NS codon is associated with medium MFI, whereas (I) later NS is associated with low MFI. For all heatmaps, individual data points and standard deviations are shown in Figure S15.
Together, these data indicate that GFP/BFP fluorescence is a reliable proxy for the underlying HBB genotypes generated by editing, even though sorted clusters are not completely genotype-pure populations. Residual impurity likely stems from both the close spacing of adjacent GFP/BFP populations during sorting and the resolution limits of our genotyping bins. For example, although GFP^high^BFP^high^ is expected to be enriched for in-frame alleles, the R-66S GFP^high^BFP^high^ population contained 77% in-frame and 11% LDs. Long-read SMRT-seq showed that these LDs were predominantly <500 bp and 76% were in-frame, explaining the high MFI despite their classification as LDs. Similarly, the GFP^low^BFP^high^ cluster in R-02-edited cells contains two visible subclusters (higher vs. lower GFP) that we analyzed together; resolving them separately would likely reveal distinct frameshift genotypes. We therefore suspect that the true underlying genotype composition of each cluster is cleaner than our current sorting and genotyping can fully resolve. Future refinements, such as tighter, cluster-centered gating and replacing BFP with a brighter, more spectrally distinct fluorophore, should further improve separation and purity.
Medium versus low MFI reflects frameshift-induced nonsense mutation position in HBB exon 1
Frameshift indels were enriched in both medium- and low-MFI clusters. Given the established relationship between NS position in HBB exon 1 and NMD efficiency,40^,^41 we mapped each unique frameshift indel to its resulting NS codon position to determine whether differences in NS placement within HBB exon 1 explain the medium versus low MFI phenotypes. In Figures 3F–3I, the x axis represents the frameshift indel sizes. The “NS” labels shown above each peak indicate the codon position of the premature nonsense mutation generated by each specific indel size. For each indel, we reconstructed the HBB coding sequence and identified the premature stop codon introduced by the frameshift, thereby assigning an NS codon position to each indel. For R-66S gRNA, frameshift indels introducing NS before codon 19 (NS < 19) showed partial expression with medium MFI (Figure 3F), consistent with NMD bypass, whereas indels introducing NS at or after codon 19 (NS ≥ 19) triggered efficient NMD and resulted in low MFI (Figure 3G). R-02 showed the same overall pattern, but NS19 alleles were found in both the medium (Figure 3H) and low MFI (Figure 3I) clusters, arising from distinct indel sizes (a 2-bp insertion in the medium cluster and an 8-bp deletion in the low cluster). This likely reflects context-dependent effects: different indels generate distinct amino acid sequences despite sharing the same stop codon position, leading to differences in MFI.
Allele-resolved long-read SMRT-seq reveals extensive LOA in GFP-negative clusters
To resolve structural events flagged as LD by the ddPCR drop-off assay, we performed long-read SMRT-seq analysis on 19 sorted clusters (all except GFP^high^BFP^neg^) from R-66S-RNP-edited SHD^GFP/BFP^(Figure S13). A ∼6 kb region spanning the R-66S cut site and the Y66H marker was PCR amplified and analyzed on PacBio SMRT-seq.43 This approach distinguished GFP vs. BFP alleles and quantified allele-specific LD or LOA (Figure 4A). Deletions extending past Y66H yet remained amplifiable were counted as “Loss of Y66H reads,” whereas non-amplifying alleles were classified as “LOA.” Untreated cells (UT) showed a 1:1 GFP:BFP read ratio. GFP^dim^ clusters with detectable BFP fluorescence showed few Loss-of-Y66H reads and retained the ∼1:1 ratio, indicating mainly sub-Y66H deletions on GFP (Figure 4B). By contrast, GFP^neg^ clusters with detectible BFP showed depleted GFP reads together with increased BFP and Loss-of-Y66H reads, consistent with frequent GFP-allele LOA that prevents PCR amplification (Figure 4B). Because detectable BFP implies an intact BFP allele (no BFP-side LOA), BFP read counts serve as an internal reference; thus, deviations from 1:1 reflect LOA on GFP. For example, in GFP^neg^BFP^high^, SMRT-seq reported 15% GFP, 68% BFP, and 17% Loss-of-Y66H reads (Figure 4B). Assigning all Loss-of-Y66H to the GFP allele, the shifted ratio solves (50−x)/(100−x) = 0.32, yielding x = 52.9% GFP LOA events invisible by SMRT-seq (Figure 4C). This approach quantifies GFP LOA in GFP-negative clusters and explains sequencing gaps observed for negative-MFI populations. Figures 4B and 4C highlight representative clusters, while Figure S16 summarizes read counts and LD frequencies across the 19 clusters analyzed by SMRT-seq.Figure 4. Allele-specific large deletions and loss of allele revealed by long-read sequencing(A) Schematic of the 6 kb long-range PCR amplicon spanning the R-66S cut-site and the Y66H mutation site, located 1,662 bp downstream. The Y66H mutation converts GFP (TAC, Tyr) to BFP (CAT, His) and was used to distinguish alleles during SMRT long-read sequencing to quantify allele-specific LDs induced by Cas9-mediated DSBs. LDs extending beyond the Y66H site but still amplifiable were defined as “Loss of Y66H reads.” Events preventing PCR amplification, consistent with gross chromosomal modifications, were defined as “LOA.” (B) Proportions of GFP reads, BFP reads, and Loss of Y66H reads were analyzed. Untreated sample (UT) showed a 1:1 GFP to BFP reads ratio. In GFP^dim^ clusters, most GFP alleles retained the Y666H site, resulting in few Loss of Y66H reads and an approximately 1:1 GFP to BFP read ratio. In contrast, GFP^neg^ clusters with detectable BFP expression showed reduced GFP reads, with an increase in BFP reads and Loss of Y66H reads. A higher proportion of BFP reads than of combined GFP and Loss of Y66H reads in GFP^neg^ clusters indicates frequent GFP LOA undetectable by long-read sequencing. (C) The percentage of GFP LOA in GFP-negative clusters was calculated from the altered GFP:BFP read ratio, confirming that LOA contributes to loss of GFP fluorescence. (D) The proportions of LDs among GFP, BFP, and Loss of Y66H reads demonstrate that loss of GFP or BFP expression correlates with allele-specific LDs. Both GFP^dim^ and GFP^neg^ clusters predominantly contain GFP alleles with LDs, but the higher frequency of Loss of Y66H reads in GFP^neg^ clusters suggests larger LDs in this group. (E–H) Size and distribution of LDs vary across clusters based on MFI, confirming that complete GFP loss corresponds to larger LDs. (E) In the bulk-edited population, LD frequency decreases with size, with most LDs <2,000 bp. (F) In GFP^high^BFP^high^ cells, LDs are rare and mostly <500 bp, with ∼76% of indels in-frame. Comparison of GFP^neg^BFP^med^ (G) versus GFP^dim^BFP^med^ (H) shows a significant enrichment of LDs >2,000 bp in the GFP^neg^ cluster. (I) Summary of GFP- and BFP-allele-specific genotype based on NGS and long-read sequencing across 20 clusters in R-66S edited samples. Alleles with no frame changes (IN), including unmodified, HDR, and small in-frame indels, show high MFI. Alleles with NS mutation before codon 19 (NS < 19) show medium MFI, while those with NS ≥ 19 show low MFI. LDs result in dim MFI, and LOA due to extensive deletions or chromosomal alterations leads to complete loss of expression.
Allele-resolved SMRT-seq reveals enrichment of larger LDs in GFPneg compared to GFPdim clusters
The frequency of large deletions (LDs >200 bp) across GFP reads, BFP reads, and Loss-of-Y66H reads demonstrated a tight relationship between fluorescence loss and allele-specific LDs. Both GFP^dim^ and GFP^neg^ clusters were dominated by GFP alleles carrying LDs, but GFP^neg^ clusters showed a higher %GFP LD and a significantly higher %Loss Y66H LD, indicating frequent deletions extending beyond the Y66H site (Figure 4D). LD-size-distribution analysis (Figures 4E–4H; Figure S17) showed that LD frequency declined with increasing size, with most LDs in bulk-edited cells <2 kb (Figure 4E). GFP^high^BFP^high^ cells contained very few LDs, mostly <500 bp, ∼76% of which were in-frame (Figure 4F). In contrast, comparison of GFP^dim^BFP^med^ (Figure 4G) and GFP^neg^BFP^med^ (Figure 4H) clusters demonstrated a marked enrichment of LDs >2 kb in GFP^neg^ cells. Overall, GFP^neg^ clusters have higher GFP LOA (Figure 4C), more Loss-of-Y66H LDs (Figure 4D), and larger GFP LD than GFP^dim^ clusters (Figures 4G and 4H). Integrating NGS, ddPCR, and SMRT-seq across sorted clusters establishes a quantitative relationship between MFI and genotype: in-frame alleles maintain high expression; NS < 19 frameshift indels yield medium expression; NS ≥ 19 produce low expression; LDs generate dim signals; and LOA from extensive deletions or chromosomal rearrangements leads to complete loss. The allele-specific genotype composition for all 20 clusters is summarized in Figure 4I.
Reporter fluorescence mirrors HBB transcript abundance and β-globin production
To confirm that the dual-fluorescent reporter faithfully reflects endogenous HBB output, we tested R-66S-RNP-edited SHD^GFP/BFP^ cells on three points: (1) GFP- and BFP-tagged alleles are transcribed evenly in unedited cells, assessed by cDNA NGS (GFP:BFP mRNA ratios); (2) total HBB mRNA matches combined GFP+BFP mRNA, and each allele’s mRNA level correlates with its corresponding GFP or BFP protein MFI by flow cytometry, assessed by qPCR (total HBB and GFP+BFP mRNA); and (3) total reporter fluorescence tracks β-globin production, assessed by western blotting for β-globin and GFP+BFP. Because DNA, RNA, and protein assays require large cell numbers, we first sorted 12 fluorescence-defined clusters during expansion, expanded each population, and then initiated differentiation. These 12 clusters remained separable after differentiation, with some spreading among low/negative MFI groups (Figure S18). Cluster IDs (1–12) shown in Figure S18 are used consistently in Figures S19 and S20 to denote the same fluorescence-defined populations. Genotyping confirmed the expected relationship between MFI and edit class, albeit with slightly lower resolution than the 20-cluster map generated during differentiation (Figure S19A). In UT and bulk RNP-edited cells, GFP:BFP mRNA remained ∼1:1, showing similar transcription and stability from both alleles. The relative proportion of GFP and BFP mRNA mirrors their flow cytometry protein MFI (Figure S19B). (GFP+BFP) mRNA matched total HBB levels in all sorted clusters, showing that reporter signals track HBB transcription (Figure S19C). Western blots showed efficient P2A cleavage, and the summed GFP+BFP signal closely paralleled β-globin abundance across clusters, indicating stoichiometric co-translation from the shared HBB-P2A-GFP/BFP transcript (Figure S20).
Diverse HBB genotypes differentially modulate HbF induction, while LOA events fail to induce HbF
CRISPR/Cas9 editing at HBB generates a broad spectrum of non-HDR outcomes, yet their phenotypic consequences remain poorly defined. This question has gained clinical urgency with the report of expanded frameshift clones accompanied by robust HbF induction via an unclear mechanism.17 These observations raised the possibility that certain non-HDR alleles may confer a selective or functional advantage, but the specific HBB mutations that drive this effect and how remain unknown. To dissect how individual genotypes influence HbF induction, we used SHD^GFP/BFP^, which is well suited for this purpose because it recapitulates adult β-globin dominance with silenced γ-globin,34^,^44 enabling sensitive quantification of HbF, with >30%–40% F-cells indicative of a robust response.32^,^45^,^46 Looking ahead, once high-HbF-associated genotypes are defined, the corresponding GFP/BFP clusters could be isolated as live cells, eliminating the fixation required for intracellular HbF staining and preserving DNA and chromatin for downstream mechanistic studies, including analysis of chromatin architecture,44 transcription factor occupancy, and transcription activity at the globin locus.47
First, we quantified HbF induction in bulk edited SHD^GFP/BFP^ cells. At day 8 of differentiation, R-66S RNP increased HbF from 17.5% (0.3%) in mock to 53.7% (0.4%) (p = 0.0001) and R-02 to 40.5% (3.9%) (p = 0.074). Co-delivery of ssODN reduced but did not abolish HbF induction (35.0% [3.5%] for R-66S; 33.8% [1.7%] for R-02), with levels remaining above mock (Figure 5A). HPLC at day 11 corroborated these trends: RNP-only editing produced near-complete loss of HbS with robust HbF induction, whereas ssODN delivery restored HbA and yielded more moderate HbF increases (Figure 5B).Figure 5SHD^GFP/BFP^ identifies genotypes that drive robust HbF induction and those that impair erythropoiesis after HBB editingSHD^GFP/BFP^ were edited with R-66S or R-02 RNP ± ssODN (allele and genotype frequencies quantified in Figure 3), differentiated, and assayed for HbF induction and viability. (A) Day-8 flow cytometry quantifies HbF^+^ cells. RNP treatment with either gRNA increases HbF relative to mock; RNP + ssODN also increases HbF, but to a lesser extent than RNP alone. HbF increase for R-66S RNP and R-02+ssODN relative to mock is statistically significant with p = 0.0001 and p = 0.042, respectively, using a 2-sided Welch t test (B). HPLC on day 11 shows near complete loss of HbS after RNP cutting at HBB with either gRNA, accompanied by significant HbF induction compared to mock. With ssODN, HDR restores HbA production and yields a more moderate HbF increase than RNP alone, mirroring the flow cytometry trend. (C and D) Genotypes underlying HbF induction. Heatmaps showing %HbF^+^ cells across 20 fluorescent-defined clusters from the same bulk-edited samples analyzed in (A): (C) R-66S RNP (left) and R-02 RNP (right) and (D) R-66S + ssODN (left) and R-02 + ssODN (right). Trends are consistent across gRNAs in both RNP and RNP + ssODN conditions. ssODN-treated samples show clearer separation, as reduced cluster crowding improved distinction and resolution of cell populations by flow cytometry. GFP^high^BFP^high^ (biallelic in-frame) is comparable to mock, indicating in-frame alleles do not induce HbF. Overall, biallelic HBB-disruptive genotypes produce the strongest HbF induction, whereas genotypes with one in-frame allele show less. The GFP^med^BFP^med^ cluster (biallelic NS < 19) shows attenuated induction relative to biallelic genotypes containing NS ≥ 19 allele, suggesting NS ≥ 19 contributes more strongly to HbF than NS < 19. However, when paired with LD on the other allele, GFP^dim^BFP^med^ (LD/NS < 19) shows the highest %HbF^+^, exceeding GFP^dim^BFP^low^ (LD/NS ≥ 19). LOA clusters showed low HbF despite biallelic HBB disruption, consistent with HbF increases arising from the frameshifted allele rather than the LOA allele. GFP^neg^/BFP^neg^ has the lowest HbF^+^ fraction, even below mock, suggesting that on-target HBB cleavage may cause collateral HBG disruption via large genomic alterations. (E) Trypan blue counts over the differentiation time course (to day 11) show a clear viability drop in RNP-treated groups by day 11, not observed in mock controls. The decline emerges post-maturation, consistent with impaired hemoglobin production and erythropoiesis. (F) Day 11 Annexin V^+^ (apoptotic) percentages were higher in RNP-treated cells than in mock controls, with a trend toward increased apoptosis for R-66S (p = 0.068) and a significant increase for R-02 (p = 0.021), based on a two-sided Welch t test. (G and H) Heatmaps showing %Annexin V^+^ cells across 20 fluorescence-defined clusters from the same bulk-edited samples as in (F). (G) R-66S-RNP- and (H) R-02-RNP-edited samples. Compared with mock and in-frame genotypes, which show similar low levels of apoptosis, biallelic HBB disruptive genotypes show increased apoptosis. Annexin V^+^ frequency inversely correlates with GFP/BFP MFI across genotypes: it is lowest in high MFI (in-frame) clusters and increases through med MFI (NS < 19), low MFI (NS ≥ 19), and dim MFI (LD). The GFP^neg^/BFP^neg^ cluster shows the highest Annexin V^+^, consistent with predominant LD/LOA and suggesting that loss of HbS without compensatory HbF induction creates a severe β-thalassemia-like state. For all heatmaps, individual data points and standard deviations are shown in Figure S21.
Using matched genotype (NGS and ddPCR; Figures 3D and 3E) and phenotype (%HbF) (Figures 5C and 5D; Figure S21) datasets across all 20 fluorescence-defined clusters, we observed distinct HbF responses associated with specific genotypes. Trends are consistent across gRNAs in both RNP (Figure 5C) and RNP + ssODN (Figure 5D) conditions. ssODN-treated samples show clearer separation, as reduced cluster crowding improved distinction and resolution of cell populations by flow cytometry. In-frame genotypes (GFP^high^BFP^high^) showed HbF levels comparable to mock (Figures 5C and 5D), confirming that preserved β-globin does not activate γ-globin. In contrast, biallelic frameshift genotypes induced strong HbF, with NS ≥ 19 (GFP^low^BFP^low^) producing higher levels (57% HbF^+^ for R-66S; 58% for R-02) and NS < 19 (GFP^med^BFP^med^) producing a more modest response (36% for R-66S; 28% for R-02), consistent with more complete β-globin loss driving stronger HbF induction (Figure 5D). These findings align with prior observations that R-02-induced frameshift indels upregulate γ-globin in erythroid colonies9 and in a treated patient.17
LD-containing alleles paired with frameshifts also induced high HbF (GFP^dim^BFP^low^: 47% for R-66S, 55% for R-02; GFP^dim^BFP^med^: 66% for R-66S, 59% for R-02) (Figure 5D). Because these LDs delete variable segments around the Cas9 cut site, they can remove HBB coding sequence and/or promoter. Prior work shows HBB promoter deletion can relieve promoter competition and reactivate γ-globin more strongly than loss of coding sequence alone,48 suggesting a potential mechanism for the robust HbF induction we observe. Future studies sorting HbF^−^, HbF^low^, and HbF^high^ subpopulations within LD-containing clusters and assessing HBB promoter copy number could determine whether promoter-deleting LDs specifically drive this effect.
In striking contrast, LOA genotypes did not induce HbF, despite being biallelic disruptive events (Figures 5C and 5D). With the exception of a GFP^neg^BFP^med^, LOA-enriched clusters showed substantially lower HbF than clusters with the same second allele but enriched for LD or frameshift. The GFP^neg^BFP^neg^ cluster—most enriched for LOA—showed the lowest HbF (4% for R-66S; 5% for R-02), even below mock (Figure 5D). These findings indicate that LOA lesions are qualitatively distinct from frameshift or LD with respect to globin regulation. One plausible explanation is that LOA events extend beyond HBB and disrupt elements required for γ-globin expression—such as the HBG1/2 gene located ∼21 kb upstream of the cut site or the local chromatin architecture (Figure S22)—thereby abolishing HbF induction even when β-globin is absent. Alternatively, severe β-globin loss may drive ineffective erythropoiesis, preventing cells from reaching maturation stages where HbF is expressed, even if HBG remains intact. Together, these observations highlight LOA as a distinct and potentially more deleterious outcome than other disruptive alleles. HBG copy number and maturation analyses will further define how LOA suppresses HbF.
Multiple clusters showed non-additive genotype-phenotype behavior. For example, NS < 19 alleles paired with NS ≥ 19 (GFP^med^BFP^low^ and GFP^low^BFP^med^) produced HbF levels comparable to biallelic NS ≥ 19 (GFP^low^BFP^low^). Similarly, NS < 19 paired with LD or LOA (GFP^dim^BFP^med^ or GFP^neg^BFP^med^) drove higher HbF than the corresponding NS ≥ 19 paired with LD or LOA (GFP^dim^BFP^low^ or GFP^neg^BFP^low^), even though NS < 19 is a weaker HbF driver than NS ≥ 19 in the biallelic setting. We speculate that these patterns reflect interactions between qualitatively different HBB lesions. NS < 19 alleles may bypass NMD and produce truncated, unstable β-globin that adds proteotoxic stress to the profound β-globin deficiency caused by NS ≥ 19, LD, or LOA alleles. The combined burden could push erythroid cells past a threshold for γ-globin activation, consistent with known effects of unstable globin peptides,49 threshold-like HbF induction seen in deletional HPFH and δβ-thalassemia,50 and stress-mediated HbF upregulation.51 A second, non-exclusive explanation is allelic non-independence; because Cas9 cleaves both alleles within a shared repair window, genotype combinations may reflect cell-intrinsic repair biases rather than random pairing. Under this model, NS < 19 alleles could preferentially co-occur with a subset of LD/LOA that yield unexpectedly high HbF in NS < 19/LOA clusters, even though NS < 19 or LOA alone is a weak HbF driver in other settings. The SHD^GFP/BFP^ enables direct testing of this hypothesis by allowing cells to be sorted from NS < 19/LOA and NS ≥ 19/LOA clusters and comparing their LOA breakpoint spectra to determine whether specific lesions are enriched in HbF^high^ cells.
Collectively, these findings define a genotype-phenotype hierarchy: (1) preserved β-globin (in-frame) does not activate γ-globin; (2) biallelic frameshift/NMD and LD genotypes induce progressively stronger HbF in proportion to functional β-globin depletion; (3) LOA events uniquely fail to induce HbF due to likely disruption of γ-globin expression or maturation failure; and (4) specific allele combinations exhibit emergent, non-additive HbF behavior reflecting interactions between β-globin dosage, proteostasis burden, and locus structure. This allele-resolved framework provides mechanistic insight into how distinct non-HDR repair outcomes shape fetal globin compensation and identifies LOA as a quantitatively and qualitatively different class of lesions with important safety implications.
Genotype-dependent apoptosis highlights risks of LOA
To assess how distinct HBB genotypes influence erythropoiesis and cellular fitness, we examined viability and apoptosis across edited populations. Trypan blue staining showed similar percentages of live cells during early differentiation, but viability in both R-66S- and R-02-RNP-treated cultures began to decline relative to mock by day 9, consistent with globin-chain-imbalance-driven toxicity (Figure 5E). This loss of viability was accompanied by increased Annexin V^+^ cells (Figure 5F). On day 11 of differentiation, mapping Annexin V positivity onto the 20 fluorescence-defined clusters (Figures 5G and 5H; Figure S21) showed that cells with intact or in-frame alleles (GFP^high^BFP^high^) exhibited low apoptosis, similar to mock, whereas GFP^neg^BFP^neg^ cells, enriched for biallelic LD/LOA, displayed the highest Annexin V positivity, followed by GFP^dim^BFP^neg^ and GFP^neg^BFP^low^, showing clusters unable to induce HbF despite complete HBB disruption show the strongest apoptosis. Early erythroid precursor apoptosis due to α/β chain imbalance is a hallmark of β-thalassemia,52 and similar mechanisms likely underlie the severe apoptosis observed in LOA-enriched clusters. Clinically, if biallelic LOA/LD-edited cells were to engraft, they could contribute little to mature red blood cell (RBC) production while provoking compensatory responses reminiscent of β-thalassemia syndromes, including marrow expansion, chronic anemia, excessive iron absorption, and skeletal abnormalities.53
In SHD^GFP/BFP^, reduced GFP/BFP MFI is expected to reflect disruptive HBB genotypes, and we observe an inverse relationship between MFI and apoptosis, with lower MFI clusters highly enriched for Annexin V^+^ cells (Figures 5G and 5H). This pattern indicates that β-globin deficiency is a major contributor to apoptosis, even across clusters with differing levels of HbF induction (Figures 5C and 5D). Because mock-treated cells maintain high viability (Figure 5F), the excess apoptosis and loss of viability are attributable to gene editing. However, apoptotic cells can also downregulate or lose reporter signal, so some unhealthy cells without LOA may be mis-assigned to GFP^neg^BFP^neg^ as their fluorescence declines. As a result, our current measurements may overestimate apoptosis within the GFP^neg^BFP^neg^ population. Future work using this cell model will leverage GFP/BFP-based presorting of live Annexin V^−^ clusters, followed by longitudinal Annexin V profiling, to determine whether specific genotype-defined populations have an intrinsically higher apoptotic propensity. In parallel, sorting Annexin V^−^ versus Annexin V^+^ cells within the GFP^neg^BFP^neg^ cluster may help pinpoint which LOA subtypes most strongly impair maturation and drive cell death, illustrating how this SHD^GFP/BFP^ can be used to mechanistically dissect high-risk editing outcomes beyond the scope of the present study.
SHDGFP/BFP recapitulates HSPC responses to DNA-repair modulators that enhance HDR but increase LOA
A growing class of HDR-boosting strategies, including pharmacologic DNA-repair modulators, is rapidly entering practice. While these agents can improve HDR, they may also shift repair outcomes in ways that impact safety (e.g., increased LDs, translocations, loss of heterozygosity (LOH)).28^,^54^,^55^,^56 We therefore asked whether SHD^GFP/BFP^ can predict how DNA-repair modulators reshape editing outcomes, benchmarking against primary SCD HSPCs. SHD^GFP/BFP^ and SCD HSPCs were edited with R-66S or R-02 RNP ± ssODN in the presence of the DNA-dependent protein kinase (DNA-PKcs) NHEJ inhibitor M3814, the HDR enhancer protein (HEP), or the DNA polymerase theta (Polθ) MMEJ inhibitor ART558. Prior studies have shown that pharmacologic NHEJ inhibition can increase HDR as measured by NGS,57^,^58 yet simultaneously inflate kilobase-scale deletions, chromosome-arm loss, and translocations, many of which escape detection due to allelic drop-out,54^,^55 underscoring the need for assays beyond standard NGS. HEP is a recombinant engineered ubiquitin variant that selectively inhibits 53BP1, relieving its block on end resection and promoting HDR.59 HEP has been reported to boost HDR with minimal acute cytotoxicity and reduced off-target effects,60 but its impact on LD and LOA remains unknown. In parallel, DNA Pol θ-mediated end joining (TMEJ) has been implicated in the repair of Cas9-induced LDs, motivating Polθ inhibition (e.g., ART558/M4344) as a strategy to reduce these events,61 whereas a more recent preprint reports high LOH levels after Polθ inhibition, raising safety concerns because LOH is difficult to detect or deplete.55 Together, these data motivated a systematic evaluation of all three modulators using our SHD^GFP/BFP^, which can quantify LD and LOA at single-allele resolution and in SCD HSPCs.
DNA-PK inhibition by M3814
Following R-66S RNP + ssODN electroporation, cells were exposed to 0, 0.5, 1, and 2 μM M3814 for 24 h, then cultured without drug. M3814 caused transient toxicity relative to untreated controls, but cell viability recovered by day 4 post-electroporation (Figure S23). M3814 induced a dose-dependent increase in HDR with a corresponding reduction in NHEJ (Figure 6A). Flow cytometry (Figure S24) showed that HDR enhancement coincided with expansion of the GFP^high^BFP^high^ population (Figure 6B), and NGS confirmed selective depletion of frameshift (NHEJ-repaired) indels while maintaining in-frame edits (Figure 6C). Consistent with these findings, intermediate-MFI clusters corresponding to biallelic frameshifts were reduced (Figure 6D), whereas LOA-associated populations (GFP^neg^BFP^neg^ and GFP^neg^BFP^high^; Figures 6E and 6F) increased in a dose-dependent manner. These data indicate that although M3814 boosts HDR, it does not suppress it and may even increase LOA formation. Validation in SCD HSPCs showed similar responses for both gRNAs: 0–4 μM M3814 caused transient cytotoxicity, but viability recovered to near-mock levels (Figure S25). HDR increased with dose up to 2 μM and then plateaued, while frameshift indels decreased and in-frame edits were preserved (Figure S26). These parallels demonstrate that SHD^GFP/BFP^ recapitulates NHEJ-inhibitor responses in SCD HSPCs, supporting its use as a surrogate platform for screening DNA repair modulators. Because LOA increased with M3814 in the RNP+ssODN condition, we next tested M3814 with RNP only to isolate its effects in the absence of HDR and to determine whether LD and LOA formation are NHEJ-dependent (Figure 6F; Figures S27 and S28). Fold change analysis for R-66S RNP (0 μM vs. 2 μM M3814) showed selective enrichment of clusters associated with LD/LOA (GFP^neg^BFP^neg^ and GFP^dim^BFP^neg^ clusters increased 2.7- to 2.8-fold, and GFP^dim^BFP^high^ and GFP^neg^BFP^high^ clusters increased 1.4- to 1.5-fold) (Figure S28A). Conversely, intermediate-MFI clusters composed primarily of frameshift indels were depleted.Figure 6SHD^GFP/BFP^ model reports both HDR enhancement and LOA risk from DNA repair modulators, mirroring responses in SCD HSPCsSHD^GFP/BFP^ and SCD HSPCs were edited with R-66S or R-02 gRNAs ± the HDR enhancers M3814 and Alt-R HDR enhancer protein (HEP). Editing outcomes were quantified by flow cytometry and NGS/ddPCR in SHD^GFP/BFP^ and by NGS/ddPCR in SCD HSPCs. In both systems, HDR enhancers increased HDR but also elevated LOA. (A–E) SHD^GFP/BFP^ edited with R-66S RNP + ssODN and treated with M3814 at 0, 0.5, 1, or 2 μM. (A) NGS shows a dose-dependent HDR increase with a concomitant NHEJ decrease. (B) After 4 days of erythroid differentiation, flow cytometry shows a dose-dependent rise in GFP^high^BFP^high^ cells, matching NGS trends and directly reflecting HDR enhancement. (C) M3814 reduces frameshift indels while maintaining in-frame indels by NGS, consistent with selective NHEJ inhibition. (D) In line with the NGS data, flow cytometry shows a dose-dependent reduction in intermediate MFI clusters (GFP^med^BFP^med^, GFP^low^BFP^low^, GFP^med^BFP^low^, and GFP^low^BFP^med^), which represent biallelic frameshift indels. (E) M3814 caused a dose-dependent increase in LOA populations, shown as GFP^neg^BFP^neg^ cells (left panel) and GFP^neg^BFP^high^ cells (right panel), although the increase was not statistically significant. (F) Heatmap of cluster percentages in SHD^GFP/BFP^ edited with R-66S RNP and treated with M3814 at 0 μM (left panel) vs. 2 μM (right panel) shows a significant increase in GFP^neg^BFP^neg^ and GFP^dim^BFP^neg^, indicating NHEJ inhibition does not suppress LOA or LD formation. (G–L) Effect of HEP in SHD^GFP/BFP^ and SCD HSPCs edited with R-66S or R-02 RNP ± ssODN. (G) Allele frequencies in SHD^GFP/BFP^ electroporated with R-66S (left panel) and R-02 (right panel) RNP ± ssODN, ±25 μM HEP, quantified by NGS/ddPCR. With RNP alone, HEP increased LD and decreased frameshift indels. With ssODN, HEP increased HDR while decreasing frameshift indels. (H) SCD HSPCs treated and analyzed as in (G) showed similar trends for both R-66S (left panel) and R-02 (right panel). With ssODN, HEP produced smaller HDR gains than in SHD^GFP/BFP^ and increased LD under both RNP-only and ssODN conditions, indicating that enhanced end resection in HSPCs favors LD formation over HDR. (I–L) Heatmap shows the percentages of cells in each of 20 fluorescent-defined clusters in SHD^GFP/BFP^ edited without (left panel) and with 25 μM HEP (right panel), measured by flow cytometry 4 days differentiation post editing: (I) R-66S RNP + ssODN, (J) R-02 RNP + ssODN, (K) R-66S RNP, and (L) R-02 RNP. With ssODN, HEP significantly increases HDR (more GFP^high^BFP^high^ cells) and reduces frameshift indels (fewer intermediate MFI cells). As with M3814, HEP also increased GFP^neg^BFP^neg^ and GFP^neg^BFP^high^ populations, indicating enrichment of large genomic events. With RNP only, promoting end resection enriched LOA populations, as evidenced by increases in GFP^neg^BFP^neg, low, med, high^ and GFP^dim^BFP^neg^ clusters. The same trend was observed for both gRNAs. (A–L) Data represent the mean of n = 2 independent editing reactions. For all heatmaps, individual data points and standard deviations are shown in Figures S27 and S29.
HDR enhancer protein
To directly compare outcomes, we electroporated 25 μM HEP with R-66S or R-02 RNP ± ssODN in SHD^GFP/BFP^ and SCD HSPCs. With RNP only, HEP increased LD and decreased frameshift indel for both gRNAs. With ssODN, HEP increased HDR and reduced small indels in SHD^GFP/BFP^ for both gRNAs (Figure 6G; Figure S29). In SCD HSPCs, HEP produced comparable but smaller HDR gains and a marked increase in LDs under both RNP-only and ssODN conditions (Figure 6H), suggesting that enhanced end resection in primary cells favors LD formation over HDR. In SHD^GFP/BFP^, HEP had concordant effects across both gRNAs. With ssODN, HEP increased the GFP^high^BFP^high^ (HDR) population and reduced intermediate-MFI clusters but also expanded GFP^neg^BFP^neg^ (1.7-fold for R-66S, 1.3-fold for R-02) and GFP^neg^BFP^high^ (1.9-fold for R-66S, 2.0-fold for R-02) populations (Figures 6I and 6J; Figure S29A), mirroring the pattern observed with M3814+ssODN (Figure S23C). Without donor, HEP produced even stronger effects on LOA-enriched populations, with GFP^neg^BFP^neg^ increasing 3.6-fold (R-66S) and 4.5-fold (R-02) (Figures 6K and 6L; Figure S30B), exceeding the 2.7-fold rise seen with 2 μM M3814 for R-66S RNP (Figure S28).
Polθ inhibition by ART558
In SHD^GFP/BFP^ cells treated with R-66S RNP and ART558 (0, 2, 5 μM), we observed a dose-dependent decrease in GFP^dim^ clusters enriched for LD but a 1.2- to 1.5-fold increase in GFP^neg^ clusters enriched for LOA (Figure S31). Thus, Polθ/MMEJ inhibition reduces detectable LDs yet paradoxically increases LOA.
Implications for LOA mechanisms and detection
Across all three modulators, LOA consistently increased, raising important questions about the repair pathways that generate LOA. Because LOA alleles are not amplifiable by 6-kb SMRT-seq, their structures (e.g., multi-kb/MB deletions, chromosomal truncations, chromosome loss, or LOH) cannot yet be resolved. Additional assays, such as fluorescence in situ hybridization (FISH), long-read whole genome sequencing (WGS), or chromosomal aberrations analysis by single targeted linker-mediated PCR sequencing (CAST-seq),62 will be required to resolve their architecture. As shown in Figure 4, 6-kb long-read sequencing misses LOA, and bulk ddPCR underestimates modest LOA increases, underscoring the risk of under-calling LOA with conventional methods and demonstrating the value of SHD^GFP/BFP^ for detecting rare, large, and potentially deleterious outcomes.
Discussion
CRISPR/Cas9 editing has opened a path toward curative therapies for SCD, but DSBs at HBB can generate a wide spectrum of unintended outcomes, including frameshift indels, LDs, and LOA, whose biological consequences remain incompletely understood. Reflecting this concern, the FDA now emphasizes the need for systematic assessment of on-target genotoxicity in gene-editing products.63 However, HSPCs are short-lived ex vivo and poorly suited for high-resolution genotype-phenotype analysis. To address this challenge, we developed an allele-resolved fluorescent reporter cell model (SHD^GFP/BFP^) that converts HBB editing outcomes into ∼20 discrete GFP/BFP clusters in live cells, enabling systematic mapping of editing genotypes, transcriptional consequences, HbF induction, and erythroid cell fitness. Our results reveal a highly structured and predictable genotype-phenotype landscape at HBB, inducing genotypes that promote HbF induction and others that produce deleterious β-thalassemia-like phenotypes.
SHD^GFP/BFP^ recapitulates key features of HBB DSB repair. In-frame alleles yield high GFP/BFP; NS < 19 alleles partially evade NMD and generate medium fluorescence; NS ≥ 19 alleles undergo stronger NMD and low fluorescence; LDs reduce fluorescence to dim levels; and LOA eliminates the signal entirely. Long-read-allele-specific sequencing and ddPCR validated these assignments, revealing that GFP^neg^ clusters are highly enriched for LDs extending hundreds to thousands of base pairs and for extensive LOA not detectable by 6-kb long-read sequencing. Thus, SHD^GFP/BFP^ provides a sensitive live-cell platform for detecting and quantifying large on-target genomic alterations that are routinely under-called in standard NGS, SMRT-seq, and ddPCR.
CRISPR/Cas9 editing at HBB yields a wide range of non-HDR outcomes whose functional consequences are difficult to resolve in bulk. The SHD^GFP/BFP^-allele-resolved reporter uncovers subtle, clinically relevant patterns hidden in bulk measurements and reveals a genotype-phenotype hierarchy that explains both therapeutic and deleterious behaviors. Going forward, this model can be leveraged to presort live, genotype-defined populations and longitudinally track outcomes such as apoptosis, HbF induction, and maturation, enabling mechanistic dissection of why some HBB lesions promote fetal-globin compensation, whereas others drive erythroid failure. These insights will be essential for the safe design and translation of HBB-targeted editing strategies.
Using SHD^GFP/BFP^, we evaluated three classes of DNA-repair modulators: NHEJ inhibition (M3814), 53BP1 inhibition (HEP), and MMEJ inhibition (ART558) and found that all three increased LOA. These findings mirror recent reports that NHEJ or MMEJ inhibition can artificially inflate HDR in short-read assays while simultaneously generating kilobase- to megabase-scale deletions, terminal chromosomal truncations, or LOH events that escape detection.27^,^64 SHD^GFP/BFP^ directly visualizes these changes as expansion of GFP^neg^ clusters, providing a readout of genotoxicity that sequencing alone cannot reliably capture. These results underscore the need for caution when modulating DNA-repair pathways in HSC.
SHD^GFP/BFP^ offers several advantages beyond sequencing-based assays. It is live-cell based, allowing prospective sorting of rare genotypes, including LOA-enriched clusters, that would otherwise be lost due to fitness disadvantages in bulk culture. The allele-specific GFP/BFP readout is stable over time, supporting longitudinal analysis of cell fitness, chromatin architecture, and transcriptional regulation. Once HbF-inducing genotypes are defined, their corresponding clusters can be isolated without fixation for HbF staining, preserving native DNA and chromatin for mechanistic studies. These capabilities are not feasible with single-cell sequencing, which captures terminal states but cannot prospectively recover live subclones for functional interrogation.
Several limitations should be acknowledged. GFP and BFP differ in brightness, constraining cluster symmetry; next-generation versions using GFP/mCherry could achieve balanced 5 × 5 resolution. HUDEP-2 cells do not enucleate, although complementary use of BEL-A cells could extend analysis to terminal maturation.65 Our 6-kb SMRT-seq cannot resolve the full architecture of LOA alleles; whole-genome long-read sequencing or FISH will be required to define their structural complexity. Finally, while the current configuration cannot always distinguish LD from LOA on a single-cell basis, the genomic position of the reporter provides unique advantages: because HBB lies ∼21 kb telomeric to HBG, SHD^GFP/BFP^ can function as a live sentinel for genotoxic outcomes arising from HBG-targeted editing using SpCas9,66^,^67 AsCas12a,68 and adenine base editor (ABE),69^,^70 noting that even base editors can induce LDs, albeit at reduced frequencies.61 Any >21-kb telomeric LDs or chromosomal truncation originating from the HBG locus would eliminate the reporter signal, enabling prospective enrichment and detailed analysis of affected cells.
The SHD^GFP/BFP^ cell model provides a powerful, scalable, allele-resolved platform to quantify the full landscape of on-target editing outcomes, define their functional consequences, and evaluate DNA-repair modulators with a level of resolution not achievable in primary HSPCs. As such, it fills a critical gap for the safe translation of HBB-targeted editing strategies. More broadly, the principles we define for DSB repair architecture, the effects of DNA-repair pathway modulation, and genotype-phenotype relationships are generalizable to other therapeutic loci and will inform the wider gene-editing field.
Materials and methods
Cell culture
S-HUDEP2 and CD34+ cells were cultured at 37°C and 5% CO_2_. In the expansion phase, S-HUDEP2 cells were cultured in SFEM (Stemcell, Cat No. 09650) supplemented with 50 ng/mL of hSCF (Peprotech), 20 ng/mL of EPO (Peprotech) 1 μg/mL of DOX (Sigma), 1 μM of DEX (Sigma), and 100 units/mL of pen/strep (Gibco). In the differentiation phase, cells were cultured in IMDM supplemented with 100 ng/mL of hSCF, 20 ng/mL of EPO, 331.25 μg/mL of Holo-HTF, 4 μg/mL of Heparin, 10 μg/mL of Insulin, 5% hPlasma, 200 units/mL Pen/strep, and 1 μg/mL of DOX. Fresh medium was added every 1–2 days, and cells were cultured at a density of under 10^6^ live cells/mL for up to 14 days and analyzed throughout differentiation.20 DOX was removed on day 7 of differentiation, and cells were cultured for additional days to continue differentiation, as described for each experiment.
Peripheral blood CD34+ cells were obtained from patients with SCD undergoing therapeutic red cell exchange at the Texas Children’s Hospital Cancer & Hematology Centers (Houston, TX), under the approved IRB protocol H-33997. CD34+ cells were extracted from the mononuclear fraction by immunomagnetic separation using CD34 Microbeads Kit (Miltenyi Biotech, CD34 MicroBead Kit UltraPure, human) according to the manufacturer’s instructions. CD34+ purity was assessed at 48 and 72 h after extraction by flow cytometry, and cells with purity above 90% were used for experiments. These patients’ derived CD34+ HSPCs were cultured and differentiated to erythroblasts using a two-phase primary erythroid culture system. In the expansion phase, cells were cultured in SFEMII (Stemcell Cat. No. 09655) supplemented with 300 ng/mL hSCF (Peprotech), 100 ng/mL TPO (Peprotech), 300 ng/mL Flt3 ligand (Peprotech), and 60 ng/mL IL3 (Peprotech). Cells were then differentiated using two phase erythroid differentiation culture. In phase 1, HSPCs were differentiated in IMDM supplemented with 10 ng/mL SCF, 20 ng/mL EPO, 1 ng/mL IL-3 (PeproTech), 200 μg/mL holo-transferrin human, 10 μg/mL insulin human, 3% human serum (Sigma-Aldrich), 6 μg/mL heparin (Stem Cell Technologies), 2% human plasma (Innovative Research), and 1% penicillin-streptomycin-glutamine (Gibco) for 7 days. Cells were then moved to phase 2 for an additional 7 days, where interleukin-3 (IL-3) was removed. Cells were incubated at 37°C and 5% CO_2_. CD34+ cells were cultured for 3 days in the expansion phase before electroporation. Seventy-two h after electroporation, cells were transferred to differentiation media. Fresh medium was added every 1–2 days, and cells were cultured at a density of under 10^6^ live cells/mL for 5 days before analysis. Cell count and viability were measured using a 0.4% Trypan Blue solution (Bio-Rad) and a T20 Automated Cell Counter (Bio-Rad).
RNP and ssODN delivery using electroporation
According to the manufacturer’s instructions, 1–2 × 10^5^ S-HUDEP2 and CD34+ cells (program CA-137, solution P3) were electroporated on a Lonza Nucleofector 4D. In S-HUDEP2 and CD34+ cells, 5 μg of HiFi SpCas9 protein (Integrated DNA Technologies) complexed with 2.5 μg of chemically synthesized gRNAs (Integrated DNA Technologies) as RNP with or without 100 pmol of ssODN were electroporated. For mock-treated S-HUDEP2 and CD34+ cells, the same number of cells was electroporated without RNP or ssODN.
S-HUDEP2 and CD34+ cell drug treatment
HEP: S-HUDEP2 and patient-derived CD34+ were electroporated with RNP + ssODN corrective donor with either 0 μM or 25 μM of HEP (Integrated DNA Technologies Cat # 10029790), based on the manufacturer’s recommendations. After electroporation, cells were maintained in expansion culture for 72 h until editing was complete, before transitioning to erythroid differentiation media.
M3814 and ART 558: both M3814 (Selleckchem #S8586) and ART 558 (TargetMol #T9275) were resuspended in DMSO according to the manufacturers’ recommendations and solubility guidelines. S-HUDEP2 and patient-derived CD34+ cells were then electroporated with RNP or RNP+ssODN donor, then plated into media containing either M3814 or ART558 at the indicated concentrations. Vehicle control cultures received an equivalent final concentration of DMSO to match the drug-treated conditions. Drug-containing media were removed after 24 h, and cells were maintained in expansion culture for an additional 48 h to allow editing to complete before DNA harvest for downstream analyses.
Flow cytometry analysis
SONY MA900 and BD FACSMelody instruments were used for flow cytometry analysis and cell sorting. For all analyses performed on the SONY MA900, compensation was set using unstained controls, single-color controls, and fluorescence-minus-one (FMO) controls, when applicable. This approach optimized separation of GFP and BFP signals in SHDGFP/BFP cells and enabled sorting of 20 distinct fluorescence-defined clusters. Twenty-way SHDGFP/BFP sorting based on GFP and BFP was performed on the SONY MA900 on days 4 and 5 of erythroid differentiation. Following sorting, cells were maintained in differentiation culture for an additional 24–48 h to allow expansion and ensure adequate DNA yield for downstream analyses. For flow cytometry analysis of erythroid markers in CD34+ cells, the following antibodies were used for analysis: FITC Mouse Anti-Human CD36 (BD 555454), APC-H7 Mouse Anti-Human CD71 (BD Biosciences 563671), and APC Mouse Anti-Human CD235a (BD Biosciences 561775). Erythroid differentiation was assessed between days 10 and 15, with exact time points varying by donor as described in the main text. This window was chosen based on prior time course staining, which showed that differentiation differences based on cell-surface markers were not apparent before day 10 and that cellular health declined substantially after day 15, limiting reliable analysis. The following antibodies were used to assess cellular health, fetal hemoglobin, and erythroid surface markers in S-HUDEP2: APC Mouse Anti-Human Cd36 (BD Biosciences 550956), BV786 Mouse Anti-Human CD235a (BD Biosciences 740984), APC-Fetal Hemoglobin Monoclonal antibody (HBF-1) (Thermo Fisher Scientific MHFH05), and Alexa Fluor 647 Annexin V Apoptosis Detection (Thermo Fisher Scientific A23204). Fetal hemoglobin analysis in S-HUDEP2 was performed on day 8 of erythroid differentiation, based on prior time course data showing that differences in HbF induction across treatments are most pronounced at this time point, while GFP/BFP-defined clusters remain well separated and cell viability is high. Annexin V analysis was conducted on day 10 of erythroid differentiation, as this was the earliest time point at which treatment-dependent differences in Annexin V staining became evident.
Library preparation for targeted amplicon next-generation sequencing
The first PCR (PCR1) amplified a ∼300 bp region surrounding the Cas9 cut site. A second PCR (PCR2) was performed to append dual-index barcodes and Illumina P5/P7 adapters. Equimolar amounts of PCR2 products from each sample were pooled and sequenced on the Illumina NextSeq 2000 using a NextSeq1000/2000 P1 Reagents (600cycles). Demultiplexed FASTQ files were analyzed using CRISPResso2.71
LongAmp-seq
LongAmp-seq was performed as previously described.15 In short, 100 ng of L-R PCR products were used for LongAmp-seq library preparation, which consists of on-bead tagmentation, posttagmentation clean up, 5-cycle PCR to add index adaptors, double-sided bead purification, library pooling, and quantification according to the Nextera DNA Flex Library Prep Reference Guide (Nextera DNA Flex Library Prep Kit [Illumina, 20018704] and Nextera DNA CD Indexes [Illumina, 20018707]). Equimolar amounts of each sample were pooled and sequenced on the Illumina NextSeq 2000 using a NextSeq1000/2000 P1 Reagents (600 cycles) and analyzed using LV_caller15 and CRISPResso2.71
SMRT-seq
The first PCR reaction (PCR1) was used to target a 6-kb region around the Cas9 cut site and tag the template molecule with an adapter using a tailed primer pair. The PCR1 reaction contained 100 ng of gDNA, 200 nM of each tailed primer in 50 μL of reaction (LongAmp Hot Start Taq 2× Master Mix, NEB). The PCR1 program consisted of initial denaturation (2 min at 94°C) and 25 cycles of denaturation (30 s at 94°C), annealing (30 s at 60°C), and extension (6 min at 65°C). After completion of PCR1, the PCR1 product was purified using SPRIselect (Beckman Coulter, B23317) and eluted in 30 μL of water. In second PCR reaction (PCR2), barcodes are incorporated by using universal sequences tailed with 16-bp PacBio barcode sequences (Sequal_RSII_96_barcodes_v1). The PCR2 program consisted of initial denaturation (2 min at 94°C), 5 to 10 cycles of denaturation (15 s at 94°C), annealing (30 s at 60°C), and extension (6 min at 65°C) followed by the final extension (5 min at 65°C). The minimum cycle number (10 cycles) was used to obtain sufficient PCR product (>100 ng) for library preparation. The PCR2 product was purified using SPRIselect and eluted in 30 μL of water. One hundred nanograms of barcoded amplicon from PCR2 was pooled and used for PacBio library preparation, which consists of DNA damage repair, end repair/A-tail, SMRTbell adaptor ligation (SMRTbell Express Template Prep Kit 2.0), nuclease treatment (SMRTbell Enzyme Clean Up Kit), and AMPure bead purification following the standard protocol. The SMRTbell library was sequenced on a PacBio Sequel II 8M flow cell in CCS mode following the standard protocol with 1 h of preextension and 30 h of collection time (PacBio). The PacBio subreads were converted to HiFi reads, and Q20 CCS reads were used for analysis. GFP and BFP allele specific editing rate was quantified by a custom python script (available upon request).
Droplet digital PCR
Probe-based ddPCR assays quantifying allelic drop-off at HBB (ROX) relative to the diploid reference gene RPP30 (VIC) were used. Reactions contained 15 ng genomic DNA, 1× ddPCR Supermix for Probes (Bio-Rad), 900 nM target primers, 250 nM target probes (Eurofins Genomics), and 10 U HindIII-HF restriction enzyme in a 20 μL reaction. Thermal cycling was performed using the standard ddPCR protocol recommended by the manufacturer.
qRT-PCR
RNA extraction was done using Qiagen RNeasy Mini Kit, followed by cDNA synthesis using Bio-Rad iScript cDNA Synthesis Kit. GFP- and BFP-specific primers were designed and used to quantify relative levels of GFP and BFP mRNA according to the manufacturer’s protocol (Bio-Rad, iTaq Universal SYBR Green Supermix).
Western blot
Protein analysis of individual globin chains (alpha sub-unit, beta sub-unit, and gamma sub-unit), GFP/BFP, and P2a peptide was done using specific antibodies. The 7-day differentiated cells were lysed in cold RIPA buffer supplemented with 1× protease and phosphatase inhibitor cocktail. Samples were vortexed and incubated on ice for 5 min before centrifugation of cell lysates at 10,000 r.p.m. at 4°C for 20 min. Supernatants were then collected. Samples were mixed with cold running buffer and loaded onto 10% SDS-PAGE, and proteins were separated at 60 V for 30 min followed by 110 V for 1.5–2 h. Gel was then transferred to a PVDF membrane (25 V, 1.3 A, 4 min), followed by 30 min incubation at RT in 0.4% PFA/PBS. Blocked with 5% nonfat dry milk powder (NFDM) in 0.1% TBST for 1 h at RT then probed with desired antibody overnight at 4C. Primary antibodies used were GFP Polyclonal Antibody, Rabbit (GenScript, A01388-40), P Histone H2A, Rabbit (Cell Signaling Technologies, 9718S), Hemoglobin Alpha Mouse monoclonal IgG1 (SantaCruz Biotechnology, sc-514378), Hemoglobin Gamma Mouse monoclonal IgG1 (SantaCruz Biotechnology, sc-21756), and Hemoglobin Beta mouse monoclonal IgG1 (SantaCruz Biotechnology, sc-21757). Membranes were washed in 0.1% TBST 3× at room temperature and secondary antibody Goat pAB to Rabbit IgG (HRP) (Abcam, ab6721) or Rabbit pAB to mouse IgG (HRP) (Abcam, ab6728) in 1:10,000 dilution in 5% NFDM. Membranes were washed 3× in 0.1% TBST, then treated with West Pico PLUS Chemiluminescent substrate (Thermo Fisher Scientific, cat. 34577) for 5 min prior to imaging on a Chemidoc MP imaging system (Bio-Rad). Relative expressions were normalized to values of hemoglobin alpha.
HPLC
Cell pellets were collected after in vitro erythroid differentiation of S-HUDEP2 and SCD HSPCs. For S-HUDEP2, pellets for HPLC were harvested on day 11 of erythroid differentiation. For SCD HSPCs, HPLC was performed on day 14, at the end of phase 2 of the erythroid differentiation protocol. Cells were lysed in water containing 1× phosphatase inhibitor by three freeze-thaw cycles. Lysates were cleared by centrifugation at 10,000 r.p.m. for 30 min at 4°C to remove cell membranes. Native hemoglobins were analyzed on a SmartLifeLC portable high-performance liquid chromatography system (POLYLC Inc.) using a cation-exchange PolyCAT A column. Analyte retention times were compared with AFSC hemo control (Helena Laboratories). Areas under the peaks were used to quantify hemoglobin fractions, and the summed areas of HbA, HbF, and HbS were used for ratio comparisons.
Data availability
PacBio and Illumina sequencing data have been deposited in the NCBI Sequence Read Archive (SRA) under BioProject accession number: PRJNA1260711. All other original data are available from the corresponding author upon request.
Acknowledgments
This work was supported by the 10.13039/100000002National Institutes of Health (R01HL169761 to G.B. and R225HL106365 to Pace B.S. with C.L.K. as a Trainee). We are grateful to the patients with sickle cell disease for permitting the use of discarded red cell exchange samples for HSPC isolation. Patients were enrolled on a protocol reviewed and approved by the Institutional Review Board at Baylor College of Medicine. NextSeq2000 sequencing was performed by the Genetic Design and Engineering Center (GDEC) at 10.13039/100007863Rice University, funded by the 10.13039/100004917Cancer Prevention & Research Institute of Texas (RP210116 to G.B.).
Author contributions
G.B. and S.P. conceived and directed the project; C.L.K. and S.P. planned and performed the experiments; D.B. helped with PCR, and Q.K.P. helped with flow cytometry; M.C. processed SMRT-seq data; C.L.K., S.P., and G.B. wrote the manuscript.
Declaration of interests
The authors declare that they have no competing interests.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Thomson A.M.Mc Hugh T.A.Oron A.P.Teply C.Lonberg N.Vilchis Tella V.Wilner L.B.Fuller K.Hagins H.Aboagye R.G.Global, regional, and national prevalence and mortality burden of sickle cell disease, 2000–2021: a systematic analysis from the Global Burden of Disease Study 2021 Lancet Haematol.102023 e 585e 59910.1016/s 2352-3026(23)00118-737331373 PMC 10390339 · doi ↗ · pubmed ↗
- 2Furstenau D.K.Tisdale J.F.Allogenic hematopoietic stem cell transplantation in sickle cell disease Transfus. Apher. Sci.60202110305710.1016/j.transci.2021.10305733485798 · doi ↗ · pubmed ↗
- 3Walters M.C.Patience M.Leisenring W.Eckman J.R.Scott J.P.Mentzer W.C.Davies S.C.Ohene-Frempong K.Bernaudin F.Matthews D.C.Bone Marrow Transplantation for Sickle Cell Disease N. Engl. J. Med.335199636937610.1056/nejm 1996080833506018663884 · doi ↗ · pubmed ↗
- 4Leonard A.Tisdale J.F.A new frontier: FDA approvals for gene therapy in sickle cell disease Mol. Ther.32202426426710.1016/j.ymthe.2024.01.01538246166 PMC 10862012 · doi ↗ · pubmed ↗
- 5Frangoul H.Altshuler D.Cappellini M.D.Chen Y.S.Domm J.Eustace B.K.Foell J.de la Fuente J.Grupp S.Handgretinger R.CRISPR-Cas 9 Gene Editing for Sickle Cell Disease and β-Thalassemia N. Engl. J. Med.384202125226010.1056/NEJ Moa 203105433283989 · doi ↗ · pubmed ↗
- 6Frangoul H.Bobruff Y.Cappellini M.D.Corbacioglu S.Fernandez C.M.de la Fuente J.Grupp S.A.Handgretinger R.Ho T.W.Imren S.Safety and Efficacy of CTX 001 in Patients with Transfusion-Dependent β-Thalassemia and Sickle Cell Disease: Early Results from the Climb THAL-111 and Climb SCD-121 Studies of Autologous CRISPR-CAS 9-Modified CD 34+ Hematopoietic Stem and Progenitor Cells Blood 13620203410.1182/blood-2020-139575 · doi ↗
- 7Park S.H.Bao G.CRISPR/Cas 9 gene editing for curing sickle cell disease Transfus. Apher. Sci.60202110306010.1016/j.transci.2021.103060 PMC 804944733455878 · doi ↗ · pubmed ↗
- 8Kanter J.Di Persio J.F.Leavey P.Shyr D.C.Thompson A.A.Porteus M.H.Intondi A.Lahiri P.Dever D.P.Petrusich A.Lehrer-Graiwer J.Cedar Trial in Progress: A First in Human, Phase 1/2 Study of the Correction of a Single Nucleotide Mutation in Autologous HS Cs (GPH 101) to Convert Hb S to Hb A for Treating Severe SCD Blood 1382021186410.1182/blood-2021-152892 · doi ↗
