Multiple strategies to resolve the conflict between replication and transcription in mammalian cells
Yajie Huang, Xiao He, Yuhan Ying, Junzheng Peng, Lizhong Du, Hongyu Luo, Jiangping Wu

TL;DR
This study explores how DNA replication and transcription avoid conflicts in mammalian cells during the S phase.
Contribution
The paper reveals multiple mechanisms that help resolve the transcription-replication conflict in mouse embryonic fibroblasts.
Findings
Pol II density decreases in S-phase mouse embryonic fibroblasts.
Replication forks tend to pass transcription start sites with some difficulty.
Forks travel faster in regions with more Pol II, possibly due to negative supercoiling.
Abstract
RNA polymerase II (Pol II) and the replisome use the same DNA template during the S phase. How these two machineries pass each other is a quintessential biological question. In this study, we demonstrated that the de novo transcriptomes of G1- and S-phase mouse embryonic fibroblasts (MEFs) showed extensive overlap. Pol II density decreased somewhat in the S-phase-enriched MEFs according to ChIP-seq. Based on long-read sequencing, a minor fraction of replication forks showed codirectional bias. A quarter of the forks were aborted near the Pol II-dense transcription start site (TSS), but the majority of forks could still pass it. The speed of the passing forks was slower if their progressing tips were closer to the TSS, suggesting that the passing was not uneventful. Paradoxically, the forks tended to travel faster in regions with more Pol IIs, likely due to greater negative supercoiling.…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6- —Canadian Institutes of Health Research10.13039/501100000024
- —Arthritis Society of Canada
- —Natural Sciences and Engineering Research Council of Canada10.13039/501100000038
- —Canadian Rare Disease Models and Mechanisms Network
- —Canadian Institutes of Health Research10.13039/501100000024
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDNA Repair Mechanisms · Bacterial Genetics and Biotechnology · Genomics and Chromatin Dynamics
Introduction
RNA transcription and DNA replication use the same DNA template. Gene transcription in the S phase is the norm in mammalian cells [1–11], raising the theoretical possibility of a head-on or codirectional collision between RNA polymerase II (Pol II) and the replisome. The fact that mammalian cells can grow and function well under physiological conditions indicates that the transcription-replication conflict (TRC) can be mitigated and resolved. The mechanisms of TRC resolution have been an active area of research for the past 70 years since the discovery of the DNA double helix. However, we are far from understanding how the TRC is smoothly resolved or avoided in mammalian cells.
There are several strategies that cells can use to avoid TRC in mammalian cells. Temporal separation of transcription from replication is one. For example, some highly transcribed genes are replicated late during the S phase [6]. However, some genes encoding proteins required for replication, including histones, are hypertranscribed during S phase [1, 3, 4]. Therefore, a general and complete temporal separation between the two processes is unlikely.
In mammalian cells, the replisome speed varies stochastically but falls within ~1–4 kb/min [12], which is close to that of RNA Pol II (1–3 kb/min) [13]. In theory, the codirectional movements of the replisome and Pol II can reduce the chance of collision. Indeed, recent TRIP-seq results suggest that some replication origins are located at or near the transcription start sites (TSSs) of highly transcribed genes [8, 14, 15]. Half of the replication forks initiated at these regions are codirectional with the transcription complex. The exact degree of such bias genome-wide is not known. Genome-wide mapping showed that replication origins are fired stochastically [16], suggesting that a bias in favor of codirectional replication is limited. Moreover, Pol IIs in a codirectional orientation with the replisomes remain detrimental due to the codirectional collision [17]. Therefore, a large number of TRC, whether in head-on or codirectional orientation, need to be resolved by other means.
In theory, there can only be four fundamental strategies to resolve the TRC, as depicted in Supplementary Fig. S1, and discussed in detail in the Supplementary Text S1. (i) The replisome and Pol II both partially open their grips on the template, exposing the latter to allow the other to glide through. (ii) The replisome yields to Pol II. (iii) Pol II yields to the replisome. (iv) The worst scenario is that after a collision between Pol II and the replisome, both of them disengage from the template, and both the transcription and replication at this location are terminated. There are various reports supporting these models [18–22]. These strategies are not mutually exclusive, and they might all contribute to resolving the TRC. If so, an evaluation of the relative contributions of these strategies to TRC resolution in mammalian cells is needed.
Recently, we discovered a novel multiple-subunit ring finger ubiquitin ligase (E3) that uses ARMC5 as its substrate-recognition subunit, CUL3 as a scaffold, and RBX1 for its ligase activity [23]. We call this novel E3 “ARMC5-E3” for simplicity. All 12 subunits of Pol II are substrates under physiological conditions (i.e. in the absence of exogenously induced DNA damage) [23, 24]. ARMC5 gene knock-out (KO) leads to a significant increase in the protein levels of all the Pol II subunits. These KO cells provide a model of elevated Pol II levels in live cells.
In this study, taking advantage of several new tools available to us, i.e. ARMC5 KO cells, EdU/BrdU pulse-labeling followed by Nanopore sequencing (DNAscent) to detect replication fork speed and direction in a single molecule in vivo [25], a psoralen-binding-based sequencing (bTMP-seq) [26] to assess genome-wide DNA negative supercoiling, a.k.a., unwound, as well as some other well-established methods such as nascent RNA-sequencing (nsRNA-seq) and RPB1-chromatin immunoprecipitation sequencing (ChIP-Seq), we investigated ways and means by which the mammalian cells used to resolve the TRC. It is worth noting that although DNAscent can determine fork speed and direction within a single DNA molecule, our findings and conclusions in this project are primarily cell-population-based, given the use of multiple next-generation sequencing technologies. These cell-population-based findings need to be tested by future single-molecule experiments. That said, these findings still offer some insights into TRC resolution.
Materials and methods
Ethics statement
All mice were housed and handled in accordance with a protocol approved by the Institutional Animal Protection Committees (Comité institutionnel d’intégration de la protection des animaux) of the CRCHUM (Protocol IP24007JWs). The mice were housed in a specific pathogen-free facility with 12-h-on and 12-h-off light cycles and an ambient temperature of 22°C in an enriched environment. Mice were euthanized if one of the following symptoms was observed: weight loss exceeding 20%, diarrhea, dehydration, prostration, neurological symptoms such as ataxia or paralysis, or reaching 20 months of age. Mouse sacrifices were conducted by cervical dislocation under general anesthesia with ketamine (120 mg/kg) plus xylazine (10 mg/kg).
ARMC5 KO mice
ARMC5 KO mice were generated previously in our laboratory, and their phenotypes are described in detail in our publications [23, 24, 27]. The KO mice and their wild-type (WT) littermates used to generate mouse embryonic fibroblasts (MEFs) were in the C57Bl/6 x CD1 background.
Generation of MEFs
Mouse embryos at embryonic day 13.5 (E13.5) were harvested and dissected under sterile conditions in a biosafety cabinet. Heads and internal organs were removed, leaving only the carcasses, which were finely minced using razor blades. Tissue digestion was performed with prewarmed 0.025% Trypsin–ethylenediaminetetraacetic acid (EDTA). The cell suspension was passed through a 100-μm cell strainer to remove undissociated tissue fragments, and a fresh culture medium was added to quench the trypsin activity. Following centrifugation, cells were resuspended in fresh culture medium [Dulbecco’s modified Eagle medium (DMEM) supplemented with 10% fetal bovine serum] and immediately plated onto tissue culture dishes, with media changes every other day. MEFs at passage 3 were used for all downstream experiments.
MEF proliferation assay
An equal number of MEFs from WT and KO mice were seeded into 96-well plates containing 100 µl medium per well. Samples were in triplicate. After 0, 24, 48, and 72 h of culture, 20 µl of CellTiter 96^®^ Aqueous One Solution Reagent (Promega, G3582) was added to each well. The optical absorbance of the wells at 490 nm was measured after a 2-h incubation period using a microplate reader (BIO-RAD Model 3550). The background absorbance was determined from wells containing only blank medium and CellTiter 96^®^ Aqueous One Solution, and subtracted from the signals of test samples.
MEF synchronization
MEFs were plated in 24-well plates to achieve ~60% confluence for the following day. Aphidicolin (Sigma–Aldrich, 178273) was added to reach a final concentration of 12 µM, as described by Thwaites et al. [28]. These cells were cultured for 18 h to achieve a G1/S-phase block. They were then released into the S phase by washing three times with phosphate-buffered saline (PBS) and culturing in fresh medium. Cell cycle distribution was analyzed by flow cytometry at different time points after the release. Cells were fixed and stained with propidium iodide for DNA content. Flow cytometric analysis revealed that the S-phase population peaked at 4 h post-release; therefore, this time point was selected for S-phase cell sorting in all subsequent experiments.
4-thiouridine labeling and nascent RNA-sequencing
WT and KO MEF samples were in triplicate. The MEFs (20 × 10^6^ cells/sample) were synchronized at the G1/S boundary with aphidicolin and then released into the S phase for 1.5 h. The cells were then incubated with 500 µM 4-thiouridine (4-sU; final concentration; Sigma, T4509). After an additional 1.5-h culture, the cells were stained with 10 µM Vybrant™ DyeCycle™ Violet (Invitrogen, V35003) in the culture medium for 75 min at 37°C. Approximately 1 × 10^6^ G1 cells (containing 2N DNA) and 1 × 10^6^ S-phase cells (with DNA content between 2N and 4N) were sorted by fluorescence-activated cell sorting (FACS; BD FACSAria™ III). The total RNA from the sorted cells was extracted using TRIzol in phase-lock tubes. The 4-sU-labeled RNA was enriched according to a protocol by Biasini et al., with some modifications [29]. Briefly, the total RNA was precipitated with isopropanol. The pellets were washed in 75% ethanol, resuspended in DEPC-treated H_2_O, and treated with DNase I (Invitrogen™ TURBO DNA-free™ Kit) to eliminate genomic DNA. 4-sU-labeled RNA was biotinylated with EZ-link HPDP-Biotin (Thermo Fisher, 21341) and subsequently purified using phenol: chloroform: isoamyl alcohol (25:24:1, v/v). RNA was precipitated with isopropanol and NaCl, and the pellets were washed twice with 75% ethanol. Biotinylated RNA was captured on streptavidin magnetic beads (New England Biolabs, S1420S), which were washed sequentially with binding (0.5 M NaCl, 20 mM Tris–HCl, pH 7.5, 1 mM EDTA) and ice-cold low-salt washing buffer (0.15 M NaCl, 20 mM Tris–HCl, pH 7.5, 1 mM EDTA). Beads were resuspended in 100 µl water containing freshly prepared 100 mM DTT (Dithiothreitol) and incubated at room temperature for 1 min to elute the biotinylated RNA. The eluted RNA was then purified with Qiagen RNeasy Mini Kit according to the manufacturer’s instructions and finally eluted in nuclease-free water. This 4-sU-labeled RNA was ready for RNA sequencing.
The KAPA RNA HyperPrep Kit with RiboErase (HMR) module (Roche Diagnostics) was used for ribodepletion and library preparation for 123 ng 4-sU-labeled RNA. Library distribution was assessed using a 2100 Bioanalyzer (Agilent Technologies), and the libraries were quantified by quantitative polymerase chain reaction (qPCR). Libraries were sequenced as paired-end (PE100) reads on a Novaseq 6000 (Illumina) using an S4 flow cell, with a depth of 50 million reads per library.
Raw RNA sequencing data were processed using a standardized bioinformatics workflow. Paired-end raw fastq files were initially quality-assessed and preprocessed using fastp [30], which implemented stringent quality filtering to remove low-quality reads. Adapter sequences were subsequently removed using Cutadapt [31]. Cleaned reads were aligned to the mouse reference genome (GRCm39) using the STAR aligner [32] with a two-pass mapping strategy to improve mapping precision. Alignment files were generated in BAM format and processed through Picard MarkDuplicates (http://broadinstitute.github.io/picard/) to identify and remove duplicate reads, thereby ensuring data quality and reducing potential PCR amplification bias.
Gene-level quantification was performed using FeatureCounts [33], which precisely counted unique reads mapped to genomic features. Differential expression analysis was conducted using edgeR [34]. Unless specified otherwise, genes with counts per million (CPM) of ≥1 in at least three samples were retained for downstream analysis. In additional analyses, we also used a CPM ≥ 5 filter in at least three samples.
To account for potential variations arising from Pol II levels or other experimental manipulations, we employed Rn7sk-based normalization in lieu of an RNA spike-in (Supplementary Text S2). Normalization factors for each sample were calculated as the ratios of sample-specific Rn7sk counts to the geometric mean of Rn7sk counts across all samples. Since Rn7sk is transcribed by Pol III, it serves as an independent reference unaffected by alterations in Pol II transcription dynamics. Similar Rn7sk expression levels were observed across all experimental conditions (Supplementary Fig. 2A and C–E), including G1- versus S-phase MEFs, validating its suitability as an internal control. These sample-specific normalization factors were applied within the edgeR framework, replacing the default trimmed mean of M-values (TMM) normalization.
Functional annotation was performed using GOseq [35] to conduct Gene Ontology (GO) enrichment analysis. Visualization of results, including heatmaps, volcano plots, and pie charts, was accomplished using an R programming environment (https://www.R-project.org/; R version 4.3.1, Packages: pheatmap https://github.com/raivokolde/pheatmap and ggplot2 v3.4.4 https://ggplot2.tidyverse.org).
RPB1 chromatin immunoprecipitation followed by sequencing of asynchronous and S-phase-enriched MEFs
WT MEF samples were in triplicate. Approximately 18 × 10^6^ WT MEFs synchronized at the G1/S boundary by aphidicolin were allowed to progress into the S phase for 4 h, then harvested and washed with PBS. Asynchronous WT MEFs were used for comparison. The asynchronous and S-phase-enriched cells were crosslinked with 1% formaldehyde for 10 min at room temperature, then quenched with 125 mM glycine for 10 min while rotating. The cells were pelleted, washed with cold PBS, and resuspended in cell swelling buffer (25 mM HEPES, pH 7.9, 1.5 mM MgCl_2_, 10 mM KCl, 0.1% NP-40, supplemented with protease and phosphatase inhibitors) on ice for 10 min to release nuclei. Nuclei were pelleted, resuspended in ChIP sonication buffer [50 mM HEPES pH 7.9, 140 mM NaCl, 1 mM EDTA, 1% Triton X-100, 0.1% Na-deoxycholate, 0.1% sodium dodecyl sulfate (SDS)] supplemented with protease and phosphatase inhibitors, and sonicated using a probe-based sonicator (FB120 with a CL-18 probe; Thermo Fisher) at 25% amplitude with 30-s pulses at 30-s intervals for a total of 3 min. Sonicated chromatin was stored until immunoprecipitation.
Chromatin fragmentation was assessed by retrieving 5% of sonicated nuclei and treating them with 10 µg RNase A for 45 min at 37°C, followed by 20 µg proteinase K for 1 h at 65°C. Purified DNA (QIAquick PCR Purification Kit) was quantified using a Nanodrop 1000 Spectrometer (Thermo Fisher), and fragment length was determined by electrophoresis.
For immunoprecipitation, 50 µl SureBeads™ Protein G Magnetic Beads (Bio-Rad) per sample were incubated with 1 µg anti-RPB1 N-terminal domain mAb (clone D8L4Y, Cell Signaling Technology; 1:100) per sample at 4°C for 4 h. The beads were washed three times with PBS containing 0.1% tween 20, then incubated overnight at 4°C with sonicated MEF chromatin, supplemented with 5% sonicated Drosophila Schneider 2 (S2) chromatin as a spike-in control. Beads were washed sequentially with ChIP sonication buffer, LiCl wash buffer (20 mM Tris, pH 8.0, 1 mM EDTA, 250 mM LiCl, 0.5% NP-40, 0.5% Na-deoxycholate), and TE buffer (10 mM Tris, pH 8, 0.1 mM EDTA). Chromatin was eluted with the ChIP elution buffer (50 mM Tris, pH 8.0, 10 mM EDTA, 1% SDS) with agitation at 1100 rpm for 15 min.
Immunoprecipitated chromatin was de-crosslinked at 65°C overnight, treated with RNase A (20 µg per sample) at 37°C for 1 h and proteinase K (200 µg per sample) for 2 h at 65°C. DNA was purified with the QIAquick PCR Purification Kit (QIAGEN). Size distribution and concentration of immunoprecipitated and input samples were evaluated on a 2100 bioanalyzer (Agilent Technologies) using a High-Sensitivity DNA Kit. ChIP-seq libraries were prepared using the KAPA Hyperprep Library Kit (Roche Diagnostics). Normalization of the sample quantities was performed before final amplification based on qPCR quantification of the ligation products. Library size distribution was assessed on a 2100 bioanalyzer (Agilent Technologies) using a High-Sensitivity DNA Kit, and libraries were quantified by qPCR. Equimolar libraries were sequenced in paired-end reads (PE100) on a Novaseq system (Illumina) at 50 million reads per library.
Raw sequencing reads were processed using the McGill University and Genome Quebec Innovation Centre (MUGQIC) ChIP-seq pipeline [36]. Adapter sequences were trimmed using Trimmomatic [37] (v0.39), and reads were aligned to the mouse reference genome (GRCm38) or Drosophila BDGP6 genome using BWA MEM [38] (v0.7.17) with default parameters. Sequence alignment files were processed using Sambamba [39] (v0.8.1) to sort, index, and mark duplicate reads. ChIP-seq peak calling was performed using MACS2 (https://pypi.org/project/MACS2/; v2.2.7.1) with standard parameters. Peak annotation and motif discovery were conducted using Homer (http://homer.ucsd.edu/homer/; v4.11). Normalized coverage tracks (bigwig files) were generated using DeepTools bamCoverage (bin size = 10) with sample-specific scaling factors calculated based on S2 DNA spike-in reads. Specifically, the lowest total uniquely mapped S2 DNA reads in a sample across all samples in a given experiment were arbitrarily designated as the internal control. The ratio of the control to the uniquely mapped S2 DNA reads in a given sample was calculated as the scaling factor for that sample. These factors were used to scale the mouse genome-mapped reads by random downsampling [40]. Scores for metaplots were calculated using DeepTools computeMatrix with the --missingDataAsZero option. Profile plots and heatmaps were generated using plotProfile and plotHeatmap utilities, respectively.
To compare the average peak density between groups and genomic regions, the log_2_-transformed FPKM + 1 value was first calculated as the overall mean across all genes for each biological replicate (n = 3) within a specified region (TSS, gene body, or intergenic). This overall sample mean served as the statistical unit for all subsequent analyses. Prior to parametric testing, the normality of the replicate (n = 3) means for each group was assessed using the Shapiro–Wilk test; however, due to the small sample size (n = 3), which limits the power of the test, this test may not be reliable. To compare the log_2_(FPKM + 1) means across genomic regions (TSS versus genebody, TSS versus intergenic region, and genebody versus intergenic region) within the same group (asynchronized or S-phase-enriched), a two-way paired t-test was used. To compare the log_2_(FPKM + 1) means between the asynchronized and S-phase-enriched groups within the same genomic region, a one-tailed Welch’s t-test (μS-phase-enriched > μAsynchronized) was used.
Gene length distributions between significantly downregulated versus nonsignificant genes in the S-phase-enriched cells [log_2_FC < -1, false discovery rate (FDR) <0.05] were compared using the Wilcoxon rank-sum test. Genes were also classified into three groups based on their length: short (<30 kb), medium (from 30 to <100 kb), and long (≥100 kb), and compared similarly but with Bonferroni correction for the P-values.
All statistical analyses were performed in R. A P-value (multiple-testing penalty-adjusted when necessary) < .05 was considered statistically significant. Data visualization was performed using ggplot2.
EdU/BrdU pulse-labeling followed by Nanopore sequencing (DNAscent)
WT and KO MEF samples were in duplicates. The MEFs synchronized at the G1/S boundary were released to the S phase for 3.5 h and then pulse-labeled with EdU (5-ethynyl-2′-deoxyuridine; Sigma–Aldrich, 900584) followed by BrdU (5-bromo-2′-deoxyuridine; Sigma–Aldrich, B5002) as described by Totanes et al. [25]. Briefly, cells were treated with 50 µM EdU for 5 min, and then washed twice with PBS. They were pulsed with 50 µM BrdU for 10 min. After two PBS washes, the cells were chased with 100 µM thymidine (Sigma–Aldrich, T9250) for 20 min to terminate the labeling. The cells were harvested from the plates using trypsin–EDTA, fixed in 70% ice-cold ethanol on ice for 1 h, thoroughly washed with ice-cold PBS, and stained with 4′,6-diamidino-2-phenylindole (DAPI) at a concentration of 3 µg/ml for 30 min. Approximately 1.5 × 10^6^ WT and KO S-phase cells were sorted by FACS from the stained cells. They were immediately snap-frozen in liquid nitrogen and stored at −80°C until further use.
High-molecular-weight DNA was extracted from WT and KO S-phase MEFs using the Circulomics Nanobind Big DNA kit (cat: NB-900-701-01) according to the DNA extraction protocol for mammalian cells. DNA yield was assessed using the Qubit High Sensitivity dsDNA Assay (Invitrogen). The DNA was also profiled using the Femto Pulse System (Agilent). Nanopore library construction was performed using the nanopore ligation kit (Oxford Nanopore SQK-LSK109), followed by library quality control using Qubit. Libraries were sequenced on a PromethION P24 system (Oxford Nanopore) using R9 pore flow cells. The flow cells were reloaded with the library every 24 h for a total of 72 h of run.
Replication fork calling and annotation of fork speed and genome feature
Raw fast5 files from Oxford Nanopore sequencing were base-called using Guppy (version 5.0.16 with configuration file dna_r9.4.1_450bps_hac_prom.cfg), followed by alignment to the murine reference genome GRCm39 (release GCF_000001635.27) with Minimap2 [41] (version 2.24). Next, the Minimap2 output files (sam format) were converted to bam format, sorted, and indexed using samtools [42] (version 1.14). The DNAscent pipeline was then run as previously described [25]. In the DNAscent “Detect” subprogramme, we set minimum quality criteria of >20 kb read length and >20 mapping quality (using flags -l 20000 -q 20). All the forks, whether containing both EdU and BrdU segments, EdU-only segments, BrdU-only segments, or truncated forks that were analyzed in this work, met the above-mentioned quality criteria. The software associated with DNAscent [25] was used to detect the locations, lengths, and orientations of EdU- and BrdU-labeled forks within single DNA molecules.
The forkSense output from all datasets (2 for WT replicates and 2 for KO replicates) was annotated with fork velocities for all replication forks, requiring a minimum distance of 2 kb between the fork boundaries and the end of the mapped read, using a custom Python script, which has been deposited on GitHub (https://github.com/Pfuderer/mouse_fibroblast_replication_dev/blob/main/scripts/stats_v01.py). The distance to the closest TSS was annotated for each fork using the gffutils package (https://github.com/daler/gffutils) to annotate gene locations based on chromosome coordinates. To differentiate between forks moving towards a TSS or away from a TSS, we annotate orientation towards the closest TSS using a custom Python script deposited in GitHub (https://github.com/Pfuderer/mouse_fibroblast_replication_dev/blob/main/scripts/stats_v01.py). Visualizations and downstream statistical analyses were performed using R programs (version 4.3.1).
For the analysis of the speed of all the forks in each category (codirectional and head-on) about the fork tip position (the final position, i.e. the 3′end, of the BrdU-labeled segment), R’s lm() function was performed (model: fork speed ~ distance to TSS), yielding slope angles (θ_2_, θ_4_) and R^2^ values (R2^2^, R4^2^). For linear regression of dots at the lower limit of the V-shaped pattern, all the forks were divided into 50 bins across the distance range, and the fork with the lowest speed in each bin was selected. These values were subjected to linear regression, generating slopes (θ^1^, θ^3^) and R^2^ values (R1^2^, R3^2^) for these border forks.
The length of the TSS region for each gene was always 500 bp, from −400 to +100 bp relative to the TSS. The length of the genebody of each gene was obtained from the NCBI database. The TES region was always 2.1 kb, from −100 bp to +2 kb.
Genomic negative supercoiling mapping using biotinylated trimethylpsoralen followed by single-strand DNA sequencing (bTMP-seq)
The mapping of genomic DNA negative supercoiling was conducted according to the methods described by Naughton et al. [43]. WT and KO MEF samples were in triplicate. Each MEF sample was cultured in three 10-cm dishes and synchronized in G1/S as described above. The cells were released into the S phase for 4 h. They were rinsed twice with PBS and incubated with 500 µg/ml biotinylated trimethylpsoralen (bTMP) in 1 ml PBS in the dark on ice for 20 min. The cells were carefully covered with parafilm during the incubation to ensure uniform reagent distribution. The plates with parafilm-covered cells were irradiated at a nominal 12 joules/cm^2^, as measured by the instrument, in a UV crosslinker (UVP, Model CL-1000 Ultraviolet Crosslinker, Upland, CA), whose lamps were replaced with five 8-W T5 365 nm UVA tubes (Coospider-Repta, Jinyun, China). This UV energy level was pre-tested in a dot blot assay (described below) to determine an optimal level of DNA bTMP labeling for MEFs. After the irradiation, the parafilm was removed. The cells were rinsed twice with ice-cold PBS, scraped into 1 ml PBS supplemented with protease inhibitors, and pelleted by centrifugation. The cell pellets were treated with 300 µl of lysis buffer (50 mM Tris–HCl, pH 8.1; 10 mM EDTA; 1% SDS) containing protease inhibitors on ice for 10 min. The cell lysates were sonicated using a probe-based sonicator (FB120 with CL-18 probe; Thermo Fisher) at 25% amplitude, with 15-s pulses separated by 15-s intervals for a total of 3.5 min. Proteinase K (60 µg) was added to the sonicated sample, which was then incubated at 55°C for 1 h. Total bTMP-labelled DNA was extracted from the lysate using an equal volume of phenol:chloroform: isoamyl alcohol, precipitated with 0.3 M sodium acetate and two volumes of absolute ethanol. The pellet was washed twice with 70% ethanol and resuspended in 50 µl TE buffer (10 mM Tris, 1 mM EDTA, pH 8.0). In parallel, genomic DNA was extracted from Drosophila S2 cells using the same procedure as described above. Twenty-three micrograms S2 cell genomic DNA in 50 μl TE was placed in a well of a 96-well plate and reacted with 50 μl bTMP (1 μg/μl) on ice in the dark for 20 min. The reaction mix in the wells was irradiated with 365 nm UVA at 2.25 joules/cm^2^ in the UV crosslinker, as described above. The crosslinked DNA was diluted with 400 µl lysis buffer (50 mM Tris–HCl, pH 8.1, 10 mM EDTA; 1% SDS) and sonicated using the probe-based sonicator at 20% amplitude with 30-s pulses and 30-s intervals for a total duration of 2 min. The DNA was then purified, resuspended in TE buffer, and stored at −20°C until use.
Dot blotting was used to optimize irradiation energy for MEFs and naked S2 cell DNA. Briefly, a nitrocellulose membrane (0.45 μm pore size, Hybond ECL, RPN303D, Amersham) was soaked in 20X SSC (3 M NaCl, 0.3 M sodium citrate, pH 7.0) for 20 min at room temperature, then air-dried. For each sample, 270 ng bTMP-labeled MEF DNA or bTMP-labeled naked S2 cell DNA was blotted onto the membrane. Samples were in duplicate. After blotting, the membrane was exposed to 254 nm UVC radiation using a UV crosslinker (Model CL-1000, UVP) equipped with five 8-watt UVC bulbs at 0.15 joules/cm^2^. The membrane was briefly rinsed in Buffer 1 (0.1 M Tris–HCl, pH 7.5; 0.15 M NaCl) and shaken in blocking buffer (3% bovine serum albumin in Buffer 1) for 60 min at room temperature. The membrane was reacted with streptavidin-conjugated horseradish peroxidase (1:200 dilution in Buffer 1; R&D Systems, DY998) for 1 h at room temperature, followed by four sequential washes: twice with TBS containing 0.1% tween 20 and twice with Buffer 1. Dot signals in the membrane were detected by chemiluminescence using SuperSignal ELISA Pico substrate (Thermo Scientific, 37070). The dot intensity was visually inspected to assess the optimal irradiation energy for MEFs and to adjust the irradiation energy for naked S2 cell genomic DNA to achieve a level of bTMP-labeling similar to that in MEF DNA.
Approximately 45 µg bTMP-labeled DNA extracted from MEFs in 600 μl of TE buffer was spiked with 5% (i.e. 2.25 µg) of bTMP-labeled naked DNA from the S2 cells. Fifty microliters of prewashed magnetic streptavidin beads (Dynabeads MyOne Streptavidin, Invitrogen, 65001) were washed twice with TE buffer and then added to the DNA mix in a 1.5 ml Eppendorf tube, which was rotated at 4°C overnight. The beads were then sequentially washed with the following buffers: TSE I buffer (20 mM Tris–HCl, pH8.1, 2 mM EDTA, 150 mM NaCl, 1% triton, 0.1% SDS) followed by a TE buffer rinse, TSE II (20 mM Tris–HCl, pH 8.1, 2 mM EDTA, 500 mM NaCl, 1% triton, 0.1% SDS) followed by a TE buffer rinse, and TSE III (10 mM Tris–HCl, pH 8.1, 1 mM EDTA, 0.25 M LiCl, 1% NP40, 1% sodium deoxycholate) followed by a final TE buffer rinse. Each washing step was 5 min, with the Eppendorf tube rotating at room temperature. DNA was then eluted from the beads by 50 µl of 10 mM EDTA/95% formamide at 90°C for 10 min. The eluate was collected and supplemented with 350 µl nuclease-free water. DNA was precipitated by the addition of 40 µl 3 M sodium acetate, 1 ml 100% ethanol, and 20 µg glycogen in 1 μl. The sample was incubated at −70°C overnight, and then centrifuged to pellet the enriched single-strand bTMP-labelled DNA. The pellet was subsequently washed with 70% ethanol and resuspended in 15 µl low TE buffer (10 mM Tris, 0.1 mM EDTA, pH 8.0).
Input single-stranded DNA (ssDNA) quantity was evaluated using a High-Sensitivity DNA Kit (150–1500 pg total). Libraries were prepared using xGenTM ssDNA and low-input Library Prep Kit (IDT) with 11–14 PCR cycles for final library amplification. Library size distribution was assessed on a 2100 bioanalyzer (Agilent Technologies) using a High-Sensitivity DNA Kit, and libraries were quantified by qPCR. Equimolar libraries were sequenced in paired-end reads (PE100) on a Novaseq system (Illumina) at 35 million reads per library.
Raw sequencing data were processed using the MUGQIC ChIP-seq pipeline v4.6.0 [36] developed by McGill University and Genome Quebec Innovation Centre. Adapter trimming was performed with Trimmomatic [37] (v0.39), and reads were aligned to either the mouse reference genome (GRCm38) or the Drosophila melanogaster BDGP6 genome using BWA MEM [38] (v0.7.17) with default parameters. Aligned reads were sorted, indexed, and duplicate-marked using Sambamba [39] (v0.8.1). Peak quantification and annotation were carried out with HOMER (http://homer.ucsd.edu/homer/; v4.11). Normalized coverage tracks (bigWig files) were generated using bamCoverage (DeepTools) with a bin size of 10 bp and sample-specific scaling factors derived from S2 DNA spike-in reads.
Signal quantification for metaplots was conducted using DeepTools computeMatrix in scale-regions mode (gene bodies: 6 kb; upstream/downstream flanks: −2 and +6 kb, respectively) with a bin size of 10 bp, using the --missingDataAsZero flag. The output matrix was exported in tab-delimited format via the --outFileNameData option to retrieve normalized read densities per bin. We defined three regions based on bin indices: TSS (bins 161–210); gene body (bins 211–800); and intergenic region (bins 801–1400), with bin 1 corresponding to −2 kb upstream of the TSS. Heatmaps and profile plots were generated using the DeepTools plotHeatmap and plotProfile utilities, respectively.
Results
Slower growth and progress to the S phase of KO MEFs
ARMC5 KO MEFs proliferated more slowly than their WT counterparts (Supplementary Text S3 and Supplementary Fig. S3). This seems to be a general characteristic of the KO cells, as we previously reported that KO T lymphocytes [27] and KO neural progenitor cells [24] also have slower growth rates.
Overwhelming overlap of genes transcribed in G1 and S-phase cells based on nsRNA-seq
To assess the transcriptome in MEFs without proliferation or with active proliferation, we conducted nsRNA-seq of sorted G1 and S-phase MEFs. The 1.5 h 4-sU labeling time was optimized based on a better 4-sU labeled RNA yield in our pilot study. Extended 4-sU labeling and pulse-chase approaches (0.5–2 h) have been widely used to study transcription kinetics and RNA stability [44, 45]. With this labeling period, we saw some exonic reads in the dataset due to RNA splicing. Accordingly, our nsRNA results should be interpreted as newly synthesized transcriptomes rather than strictly unspliced nsRNA. This served well for the purpose of our study, which was to assess the overlap of transcriptome between the G1- and S-phase cells.
In both the WT and KO cells, there was >98% overlap between the two transcriptomes (Fig. 1A). Example tracks of nsRNA-seq can be found in Supplementary Fig. 2B.
The majority of genes were actively transcribed in both the G1 and S phases according to nsRNA-seq. (A) Only a fraction of genes is differentially transcribed in the S versus G1 phase. FACS-sorted S- and G1-phase MEFs (left panel, WT; right panel, KO) were subject to nsRNA-seq. The number and percentage (in parentheses) of differentially expressed genes (S versus G1 phase; absolute FC > 1.5 and FDR < 0.05) are depicted in the pie charts. (B) Bar graphs of down- and upregulated genes in the S versus G1 phase. Fold changes of genes with FDR < 0.05 and absolute FC > 1.5 in the S versus G1 comparison in WT (left panel) and KO (right panel) MEFs. The percentage of genes downregulated in the S phase was indicated. (C) Volcano plots illustrating genome-wide transcriptional changes between S and G1 phases in WT (left panel) and KO (right panel) MEFs. Noteworthy genes are annotated, and dashed horizontal and vertical lines represent FDR = 0.05 and FC = ±1.5, respectively. (D) GO analysis (in terms of biological processes) of S-phase upregulated genes (FDR < 0.05, FC > 0); WT (n = 153), left panel; KO (n = 11), right panel. The percentages of differentially expressed genes in each category, their absolute number, the associated biological process terms, and the adjusted P-value of the association are indicated.
Genes with differential expression between the G1 and S phases in the WT and KO MEFs according to nsRNA-seq were listed in Supplementary Tables S1 and S2. Heatmaps reflecting such differential expression in G1- and S-phase WT and KO cells were presented in Supplementary Text S4 and Supplementary Fig. S4. Bar graphs in Fig. 1B depict the up- and downregulation of the differentially expressed genes in the WT (left panel) and KO MEFs (right panel) for the S versus G1 phase comparison. Volcano plots comparing all S versus G1 phase genes in WT (left panel) and KO (right panel) MEFs are shown in Fig. 1C. The majority of differentially expressed genes were the downregulated ones in the S phase, and this was true in both WT and KO cells. However, compared to the number of all the expressed genes (e.g. ~16 091 in WT MEFs), only a small fraction (307 genes out of the 16 091 expressed genes) downregulated their expression in the S phase. Moreover, such downregulation did not represent a complete absence of transcription. Therefore, on a cell population level, there is only a low degree of temporal separation between replication and transcription.
In both WT and KO cells, some genes were upregulated during S phase. We therefore used looser criteria, i.e. FDR < 0.05 and fold change (FC) > 0, to select the S-phase upregulated genes and subjected them to GO analysis for biological processes (Fig. 1D; left panel, WT cells; right panel, KO cells). Interestingly, the S-phase upregulated genes in both the WT and KO cells were mostly related to various replication processes, such as mitosis, nuclear division, chromosome segregation, sister-chromatid segregation, etc., suggesting that molecules for these processes need to be produced de novo, at least to some extent, in each cell cycle.
In the above analysis, we used a CPM > 1 filter. If a more stringent filter of CPM > 5 was used, ~35% fewer expressed genes were obtained. However, the results are similar to those with CPM > 1, and the interpretation and conclusions remain the same (Supplementary Text S5 and Supplementary Fig. S5).
The major focus of this work was on the TRC. However, since nsRNA-seq was performed on both WT and KO MEFs, we compared their nsRNA transcriptomes. The results and data interpretation are presented in Supplementary Text S6, Supplementary Fig. S6, and Supplementary Tables S3 and S4.
Segregation between transcription and replication in G1-phase- and S-phase-enriched MEFs based on RPB1 ChIP-seq
The possible temporal segregation of transcription and replication on a cell population basis can be assessed by RPB1 ChIP-seq in S-phase-enriched versus asynchronous cells. Although close to a hundred RPB1 ChIP-seqs have been carried out using different antibodies and different cells [46], there is a paucity of detailed comparisons of RPB1 ChIP-seq read density between G1-phase and S-phase mammalian cells. We thus conducted RPB1 ChIP-seq on asynchronous (mostly G1) and S-phase-enriched WT MEFs.
The metagene profiles of the Pol II density (i.e. RPB1 peak density) showed that in both types of cells, there was a major Pol II peak near the TSS region and then a minor peak near the TES region (Fig. 2A), not unlike those using different types of asynchronous cells [23, 24, 47]. The Pol II density in S-phase-enriched cells was lower than in asynchronous cells across all regions plotted, including the TSS, genebody, and intergenic region (the region after the TES; only 5 kb was shown). The heatmap of differential Pol II density of the S-phase-enriched versus the asynchronous cells lists 40 genes that had FDR < 0.05 and are ordered from the highest to the lowest absolute FC values in the TSS region (Supplementary Fig. S7A) and genebody (Supplementary Fig. S7B). The full datasets of these genes with differential read density in the TSS, genebody, and intergenic regions are presented in Supplementary Tables S5–S7. In the volcano plots in Fig. 2B, we can clearly see that, in both TSS and genebody regions, almost all genes with FDR < 0.05 and FC > 2 show decreased Pol II density in S-phase-enriched cells. A numeric summary is illustrated in Fig. 2C. A Venn diagram shows moderate overlap between genes with significantly reduced read density in the TSS region and those with decreased read density in the genebody (right panel) in S-phase-enriched cells. It is known that most Pol IIs in the TSS region do not move to the genebody [48], which can explain the lack of a high degree of overlap between genes with RPB1 read density downregulation in both the TSS region and the genebody. Thus, among the 33 421 genes (i.e. 20%) with valid ChIP signals, 6679 genes showed decreased Pol II density in the S-phase-enriched cells. Such a decrease, particularly the decrease in the genebody of 4562 genes, likely resulted in reduced transcription.
*RPB1 ChIP-seq analysis of asynchronous (mostly at G1 phase) and S-phase-enriched WT MEFs. (A) Metagene profile of RPB1 (Pol II) density in the S-phase-enriched and asynchronous MEFs for the region between 2 kb upstream of TSS and 5 kb downstream of TES. The means of three biological replicates were used to construct the metagene profiles. (B) Volcano plots of genes with differential Pol II density between asynchronous and S-phase-enriched MEFs. Vertical dashed lines indicate fold changes at −2 and 2, while the horizontal dashed line indicates an FDR threshold of 0.05. Thirty representative genes with fold change >2 and the lowest FDR values are annotated. (C) A diagram showing the numbers of S-phase downregulated or upregulated genes at the TSS or genebody regions (left panel) and a Venn diagram (right panel) illustrating the number of genes with overlapping downregulation in the TSS region and genebody in the S-phase. (D) Representative Pol II tracks of four genes spanning the TSS region, genebody, and intergenic regions. (E, F) Violin plots showing log2 (FPKM + 1; +1 is used to avoid the log20 logical error during the software operation) of Pol II per FPKM for the TSS region, genebody, and intergenic regions of asynchronous and S-phase-enriched MEFs. The dots represent the median values. Panel (E) focused on comparisons among the three genomic regions, without distinguishing between asynchronous and S-phase-enriched cells. Panel (F) focused on the comparison between asynchronous and S-phase-enriched cells, without comparing different gene regions. Values were calculated from three independent biological replications. (G, H) Comparison of gene lengths between those with their Pol II density reduced versus unchanged (based on FPKM) in S-phase-enriched cells. The horizontal bars represent the median values. All genes with and without Pol II density reduction were compared in panel (H), while genes were stratified by length (short: <30 kb; medium: 30–100 kb; long: ≥100 kb) and then compared. *P < .05, **P < .01; **P < .001. ns: not significant.
Figure 2D provides examples of four genes with their Pol II signal tracks in asynchronous cells and S-phase-enriched cells. The violin plots in Fig. 2E show quantitative and statistical analyses of Pol II density in the TSS, genebody, and intergenic region. The density was significantly higher in the TSS region than in the genebody and intergenic regions in both asynchronous and S-phase-enriched cells. For clarity, the difference between the asynchronous and S-phase-enriched cells was compared in Fig. 2F. The Pol II density in the TSS region and the genebody was lower in S-phase-enriched cells than in their asynchronous counterparts, confirming the visual impression of the metagene in Fig. 2A and indicating some segregation between transcription and replication. However, there is no complete segregation.
We wondered whether there was a difference in gene lengths between genes with downregulated Pol II density in S-phase-enriched cells and those without. As shown in Fig. 2G, when genes of all sizes were compared, the genes with Pol II density downregulation in the S-phase-enriched cells were shorter. We stratified these genes into three size categories: short (<30 kb), medium-sized (30–100 kb), and long (>100 kb) (Fig. 2H). The short genes were the majority among all genes, and their proportions in the downregulated group (71.1%) and the unchanged group (70.2%) did not differ significantly. The short genes contribute to the observed smaller overall sizes of downregulated genes in S-phase-enriched cells. Maybe, to achieve some degree of transcription-replication segregation, evolution decides that acting on shorter genes is more numerically advantageous in terms of energy efficiency. In the long gene category, which accounts for ~10% of all genes, the downregulated genes are actually larger. The meaning of this finding is unclear.
Based on these ChIP-seq results, we conclude that, on a cell population basis, Pol II occupancy in chromatin is reduced during S phase. However, such a reduction can result from the loss of all Pol II in a subpopulation of cells, a global partial reduction in Pol II traffic across all cells, or both. In any of these three scenarios, it is safe to say that there is no complete segregation between transcription and replication in any cell.
Faster fork speed in genome regions dotted with more Pol II, according to DNAscent
Using DNAscent, we examined replication fork speed across different genomic regions, which vary in Pol II density on a population basis. The workflow of this experiment is depicted in Fig. 3A. The quality parameters and fork characteristics of the sequencing data are presented in Supplementary Text S7. Four representative types of forks are illustrated in Fig. 3B.
*Replication fork speed in different regions of the genome according to DNAscent All the forks from two biological replicates were pooled unless specified otherwise. (A) Experimental workflow for measuring replication fork speed in MEFs using DNAscent. MEFs were synchronized at the G1/S phase and then released. Three and a half hours after the release, the MEFs were sequentially pulse-labeled with EdU (5 min) and BrdU (10 min), followed by a 20-min thymidine chase. The cells were stained with DNA-staining dye DAPI. S-phase cells were sorted by FACS. High molecular weight DNA was extracted from the sorted cells and subjected to Nanopore sequencing. The sequencing data were analyzed by DNAscent to detect EdU- and BrdU-labeled segments. Duplicate biological samples were employed for the WT and KO MEFs. (B) Tracks illustrating representative replication forks with EdU (pink) and BrdU (blue) incorporation patterns. Top panel: Two forks meeting with each other (collision) in the genebody of Kcnh1 shortly after BrdU labeling. Second panel: A leftward fork in the intergenic region downstream of Ptprt. Third panel: A leftward replication fork originating from the Lamp5 gene region and extending into the intergenic region. Bottom panel: A head-on rightward fork started in the middle of the Sirpb1c genebody. The black line before the red arrow represents an unlabeled DNA sequence synthesized during the 20-min thymidine chase following BrdU pulsing. (C) Faster fork speed in KO MEFs than WT MEFs. Box-and-whisker plots depicting the replication fork speed. (D) KO MEFs have fewer forks per mb. The bar graph (left panel) shows the fork number/mb (mean ± SE) based on duplicate biological samples of WT and KO MEFs. The right panel depicts all types of forks (i.e. initiating forks with their EdU starting from the same location, colliding forks with their BrdU meeting each other, EdU-only forks, BrdU-only forks, completely labeled forks with both EdU and BrdU segments, and truncated forks with the unlabeled and labeled sequence terminated at the same place). All these forks are included in the fork number calculation. (E) A small but significant bias in favor of the codirectional forks over the head-on forks with regard to transcription direction. Fork counts were normalized per mb of sequenced length. (F) Faster fork speed in genic regions compared to intergenic regions in both WT and KO MEFs. Fork speed in the genic and intergenic regions of the KO cells was also compared to their counterparts in WT MEFs. The colors of the box-and-whiskers correspond to the colors of forks (arrows) on the top of the plots, where the location of the forks relative to the gene regions is indicated. (G) The TSS-overlapping forks had a faster speed than the nonoverlapping forks in both WT and KO MEFs. Forks in the genic region were divided into TSS-overlapping and nonoverlapping forks, and their speed was compared. Fork speed in the TSS-overlapping and nonoverlapping regions of the KO MEFs was also compared with their counterparts in WT MEFs. (H, I) Higher incidence of fork truncation in regions with more Pol IIs. The genes harboring the truncated forks were first identified. The TSS region is always 500 bp, from −400 to +100, surrounding the TSS. The TES region was always 2.1 kb, from −100 to +2 kb. The number of truncated forks per kb in each of these regions was then calculated. The mean ± SEM of two biological replicates was presented. The number of truncated complete forks is shown in panel (H), and the number of truncated EdU-only and BrdU-only forks (the sum of the two types is used) is shown in panel (I). For panles (C) and (F–I), only complete forks containing both EdU and BrdU signals were used for analysis. For panels (D) and (E), all types of forks, as illustrated on the right panel, were included for analysis. Wilcoxon rank-sum tests with continuity correction were conducted for panels (C) and (F–H). For panels (D) and (E), the Welch two-sample t-test was performed. For panels (H) and (I), paired t-tests were conducted to assess differences between KO and WT conditions within each genomic region. To evaluate differences among genomic regions within each genotype, we performed a one-way ANOVA followed by Tukey’s HSD post hoc tests for multiple comparisons. For panels (C), (F), and (G), the P-values were adjusted for multiple testing using the Benjamini–Hochberg procedure. For panels (D) and (E), since this is a single a priori hypothesis test, it is not subject to multiple testing penalties. In panels (F–I), all the possible comparisons between two different columns were conducted. If they are marked with ns (not significant) or without a horizontal bar between the two columns, the difference is not significant. *P < .05; **P < .01; **P < .001. ns: not significant.
Our original hypothesis was that, since more numerous Pol IIs were dotted across the KO cell genome, this would slow the progression of the replication fork in these cells. To our surprise and contrary to our hypothesis, we found that the fork speed in KO cells was faster than that in WT cells (Fig. 3C). This surprising finding will be discussed later.
In the KO cells during the 15-min labeling period, the total number of all types of forks per mb (including initiation forks, collision forks, EdU-only forks, BrdU-only forks, complete forks, and truncated forks, as illustrated on the right) was smaller than that in the WT cells (Fig. 3D). There is no significant difference between KO and WT samples for numbers of reads across any defined length category, from 20–30 kb to over 100 kb (Supplementary Fig. S8), excluding the possibility that the difference of the smaller fork number in the KO cells was due to a lack of longer reads, which might favor the fork calls by software.
Using DNAscent, we were able to directly determine the fork direction in relation to the transcription direction. Although both types of orientation (codirectional and head-on) existed, in the WT MEFs, the average number of codirectional forks per mb was higher than that of the head-on ones, representing a bias in favor of the codirectional ones (Fig. 3E). The tendency was the same in the KO MEFs. However, it did not reach statistical significance.
The fork speed was faster in the genic region than in the intergenic region (Fig. 3F). For forks in the genic region, those that overlapped with TSS moved faster than the ones that did not (Fig. 3G). These two findings are both contrary to our initial hypothesis that forks should move more slowly in regions with higher Pol II levels on a cell-population basis. Consistent with the finding that the general fork speed in the KO cells was faster than that in the WT cells (Fig. 3C), the fork speed in selected regions of the KO cells was always faster than that of their WT counterparts (Fig. 3F and G). The possible explanations for this observation will be elaborated in the Discussion.
For some forks, the EdU/BrdU-labeled tracks terminate at the same place where sequencing also ends. These forks are called “truncated forks.” Such truncated forks comprise forks that have both EdU and BrdU segments, as shown in Fig. 3D. However, they can occur in EdU-only or BrdU-only forks, and they are called truncated EdU-only and BrdU-only forks (not depicted). Such truncation is not due to interruption of the fork progression by cell harvest, because after BrdU labeling (EdU has been washed away at the BrdU stage), we always added a 20-min thymidine chase before harvesting the cells. All the normal forks end with a stretch of unlabeled sequence due to this 20-min chase. Thus, the truncation occurred when the cells were receiving EdU or BrdU, long before the fork progression was interrupted by cell harvest.
When normalized by the length on a per kb basis, there was a drastically and significantly larger number of truncated complete tracks in the TSS region, followed by the TES region, while the number of such forks in the genebody was the smallest (Fig. 3H). This was true for both WT and KO MEFs. A similar observation could also be seen for the truncated EdU-only and BrdU-only forks (Fig. 3I) the number of forks in the panel represents the sum of truncated EdU-only and BrdU-only forks.
Our results show that, on a cell-population basis, truncation occurred more frequently at the location with the higher Pol II density (i.e. the TSS region followed by the TES). The truncation rate in the TSS region of the KO cells appears to be higher than that of their WT counterparts, although the difference is not statistically significant. Probably, the Pol II density difference between KO and WT cells was not large enough to affect the truncation rates.
Based on the number of truncated forks in the TSS region and the number of forks that passed this region, we calculated the frequency of truncation in the TSS region (the calculation is detailed in Supplementary Text S8). In WT cells, 25.55% of complete forks were truncated upon entering the Pol II-dense TSS region. The percentage of truncated forks in KO cells was similar but slightly higher at 31.01%, likely due to the generally higher Pol II density in KO cells [23, 24]. Again, this is a cell population-based rather than a single-molecule-based frequency.
Forks that passed the TSS but with tips closer to the TSS moved more slowly
The forks overlapping with TSS traveled faster than those that did not (Fig. 3G). Next, we analyzed the speed of this overlapping group based on the distance between their fork tips (the position of the last BrdU-labeled nucleotide, indicated as the tip of the red arrowhead in Fig. 3B) and TSS. Interestingly, the forks with their tips within 500 bp of TSS had a slower speed than those with their tips farther away from TSS (Fig. 4A). This was the case for both the KO and WT cells. The threshold could be set at 1 kb (Fig. 4B) or 2 kb (Fig. 4C) with similar results. This suggests that a continuum might exist, i.e. the number of slower forks increased as their tips approached the TSS.
*Slower speed for forks with their tip closer to the TSS. (A–C) The speed was slower for forks with their tips closer to the TSS. TSS-overlapping forks from KO and WT cells were stratified according to the distance of their tips to the nearest TSS. Demarcation positions were set at 500 bp (A), 1000 bp (B), or 2000 bp (C) from the TSS. The speed of the forks on one side of the demarcation position was compared to that on the other side. The KO forks were also compared with their WT counterparts in each category. Wilcoxon rank-sum tests were conducted with P-values adjusted for multiple comparisons using the Benjamini–Hochberg FDR method. **P < .001. Scatter plots show an increased number of low-speed forks as a function of the closer distance of their tips to the TSS in KO (D) and WT (E) MEFs. Distances were shown as positive or negative, depending on the fork orientation with regard to transcription direction (positive values for codirectional forks and negative values for head-on forks). Two linear regression analyses were conducted as described in the methods. Scatter plots of the relationship between speed and distance from fork tips to the TSS for non-TSS-overlapping forks in KO (F) and WT (G) MEFs.
To assess the existence of such a continuum and explore whether such a continuum was fork orientation-dependent, we stratified the forks based on their orientation with regard to transcription and plotted their speed versus their tip distance to TSS (Fig. 4D, KO; 4E, WT). Each dot in the plot represented the tip position of a complete fork that passed TSS, and no truncated forks were included in this analysis. Based on linear regression (solid lines), there was only a weak positive correlation between the fork speed and the fork’s tip distance to TSS in the range of R^2^ at 0.2–0.3 in the KO and WT cells (Fig. 4D and 4E).
Interestingly, the distribution of the forks followed a V-shaped pattern with a defined bottom border. We conducted a linear regression using the speed of the slowest forks with their tips at different positions (i.e. those forks at the bottom border of the V-shaped scatter blot) and obtained highly correlated slopes with R^2 ^> 0.95 for all the four types of border slopes (R1^2^ and R3^2^ in the KO cells (Fig. 4D) and R1^2^ and R3^2^ in the WT cells (Fig. 4E). There was no significant difference in the slope angles (θ_1_ and θ_3_) between codirectional and head-on forks in either KO cells or WT cells.
This fork speed distribution pattern is not an artifact generated during sequencing or data processing. For forks that did not overlap with TSS, the same analysis did not reveal a V-shaped pattern, and their speeds scattered randomly, regardless of where their tips were located (Fig. 4F and 4G; KO and WT, respectively).
Interpretations of this interesting phenomenon will be presented later.
Chromatin-negative supercoiling correlated with faster fork speed according to bTMP-seq
We hypothesized that the varied fork speed across genomic regions might be related to chromatin openness, i.e. negative supercoiling, which would favor faster fork progression. We employed bTMP-seq to fine-map chromatin-negative supercoiling in the S-phase-enriched MEFs.
The bTMP-seq metagene profiles of all the genes in the WT and KO MEFs are presented in Fig. 5A, and the profiles with two outliers, Gm10800 and Gm10801, removed are shown in Fig. 5B. A discussion of the technical issue related to the effect of outliers in metagene construction can be found in Supplementary Text S9. In all the following analyses, these two outliers were excluded. The KO cells presented generally higher negative supercoiling, and this is more obviously presented when the WT signal is subtracted from the KO signal (Fig. 5C). Figure 5D shows heatmaps of genes with bTMP-seq read density in different genomic regions of the KO (left panel) and WT (right panel) MEFs. Based on the metagene profiles (Fig. 5B) and heatmaps (Fig. 5D), the genebody showed higher read density than the intergenic regions. Figure 5E shows a heatmap in which the WT signals were subtracted from the KO ones. This heatmap also confirms the impression that KO chromatin is overall more negatively supercoiled, particularly in the genebody. Representative bTMP-seq tracks spanning the TSS region, genebody, and intergenic regions are illustrated in Fig. 5F in support of the metagene and heatmap impression that the genebody has the highest negative supercoiling of the three regions, and the KO cell chromatin is more negatively supercoiled than its WT counterpart. The bTMP-seq read density in the TSS, genebody, and intergenic regions was statistically compared (Fig. 5G). The results showed that the read density was higher in the genebody, followed by the TSS. The intergenic region had the lowest density. For better data presentation and comprehension, the comparison of the read density between WT and KO MEFs in these three regions is shown in Fig. 5H. The signals in these regions in the KO MEFs are always higher than those of the WT MEFs. The datasets related to all these comparisons are presented in Supplementary Tables S8–S10.
*bTMP-seq analysis of genomic DNA negative supercoiling in S-phase-enriched WT and KO MEFs. (A) The metagene profile of all the genes. (B) The metagene profile with two outliers excluded. (C) The metagene profile of KO cells after subtraction of WT signals. Two outliers were excluded. (D) Heatmaps of genes with bTMP-seq read density in the KO (left panel) and WT (right panel) MEFs in the different genomic regions. The two outliers were excluded. (E) Heatmap of genes with bTMP-seq read density in the KO cells after the WT cell signals were subtracted. The two outliers were excluded. (F) Representative bTMP-seq tracks spanning the TSS region, genebody, and intergenic regions. (G, H) Box plots comparing the read density across three genomic regions: the TSS region, genebody, and intergenic region in the WT MEFs (left panel) and KO MEFs (right panel). While the data presented in panel (G) compared the three genomic regions, the data in panel (H) compared the WT and KO MEFs within these regions. For statistical comparisons of readcount density among different genomic regions (i.e. TSS versus gene body, TSS versus intergenic region, and genebody versus intergenic region), unpaired Wilcoxon rank-sum tests were employed. For comparison between the KO and WT samples, the Wilcoxon signed rank test was conducted on paired bin values within each region. All P-values were corrected for multiple testing using the Bonferroni method. **P < .001.
Thus, based on bTMP-seq, the highest negative supercoiling occurred in the genebody, followed by the TSS region, with the lowest read density in intergenic regions. KO MEFs had higher negative supercoiling in all three regions than their WT counterparts. These different degrees of negative supercoiling in these regions and cells are, in general, correlated with fork speed, except at the TSS, where fork speed is highest, but the degree of negative supercoiling is lower than in the genebody. Possible reasons for such a discrepancy will be discussed later.
Discussion
Understanding how the TRC is resolved is a titanic endeavor within the current technical limitations. In this study, we investigated what happened when the replisome met Pol II on a cell population basis. The major new findings, some of which are quite unexpected, are visually summarized in Fig. 6 and are discussed below.
A visual summary of the major findings in this study and a new model for TRC resolution.
Temporal segregation between transcription and replication exists, but it is not the only mechanism for TRC resolution
Some earlier studies suggest that the timing of transcription and replication differs across the cell cycle, thereby enabling temporal segregation of the two processes to avoid TRC [6]. The validity and generalization of such a claim are controversial. Many previous and recent studies have revealed that many genes are actively transcribed during S phase [1, 3–8]. It has also been shown that Pol II is present on or near S-phase cell DNA [7–11]. Our nsRNA-seq profiled the transcriptome of de novo synthesized mRNA in the G1 and S phases and provided complementary evidence that transcription is not completely repressed in the S phase. In fact, it is only mildly repressed in the S phase.
Our RPB1 ChIP-seq of S-phase cells showed that 6981 genes out of ~30 000 genes had reduced Pol II density in S-phase. Since this is a cell population-based result, we cannot determine whether the reduction is due to a complete loss of Pol II in a subpopulation of S-phase cells in these genes, or to a modest decrease in Pol II density across all S-phase cells. If it is the former, many cells and many genes retain Pol II during S phase. If it is the latter, then in all S-phase cells, Pol II levels remain lower in the chromatin. In either case, there is no complete disappearance of Pol II in all cells and across all genes during S-phase.
All in all, the combined nsRNA-seq and RPB1-ChIP-seq results indicate the existence of some degree of temporal segregation. Still, they are not sufficient to fully resolve the TRC, because unless all genes in all S-phase cells exhibit such temporal segregation, this strategy will not fundamentally resolve the TRC. Other efficient mechanisms must exist to resolve the TRC fully.
We noticed a discrepancy between the reduction in Pol II density and the reduction in transcription. Despite 6679 genes exhibiting significantly reduced RPB1 density in the S-phase-enriched cells, only 243 genes were significantly downregulated in their transcription. Pol II has typical promoter-proximal pausing after initiation. It often pauses near the promoter, 20–60 bp downstream of the TSS, waiting to be released for elongation [49]. Pol II remains bound to chromatin in the TES region for some time after transcription ends [50, 51]. It has also been shown to slow down its elongation after the Poly A site [52]. Such stagnation/slowdown will obviously increase the relative Pol II density in the TES region.
In the S phase, on a cell population basis, the density of Pol IIs at the TSS and TES regions in many genes were reduced. Since these Pol IIs are not actively or slowly elongating, their presence will result in no transcripts or shorter transcripts, which might not be detectable by RNA-seq. This can explain that there are more genes with lower Pol II peak density but fewer genes with reduced transcription in the S-phase. For 4562 genes with significantly reduced Pol II density in their genebody in the S phase, their transcription must be reduced accordingly, but we did not see such a large number of genes with reduced transcripts in nsRNA-seq. There can be a few technical reasons to explain such a discrepancy. (i) ChIP-seq and nsRNA-seq are two different methods with different degrees of inter-group variation and different sensitivity to multiple-test penalty, which can affect the gene number reaching statistical significance. (ii) The percentage of S cells in the two experiments was different: one was enriched by synchronization, and the other by sorting. (iii) The normalization methods were different, one by spiking in S2 DNA, while the other by the internal Rn7SK. Regardless of the discrepancy, the conclusion remains the same, i.e. the temporal segregation of transcription and replication does exist, but it is not complete.
A small but significant bias in favor of codirectional forks over head-on forks
A few studies previously reported that replication origins tended to occur before the TSS, suggesting that one of the two forks emanating from the origin is in the same transcriptional direction [53, 54]. Using novel DNAscent, we were able to directly assess, genome-wide, whether codirectional fork progression is a significant means of avoiding the TRC. We found a small but statistically significant bias in favor of codirectional forks over head-on forks (Fig. 3E), indicating that cells use this approach to mitigate the TRC. However, this is not a major mechanism for TRC resolution because there is only a 17% bias toward codirectional replication. Furthermore, Pol II remains an obstacle even for codirectional fork progression [17].
About a quarter of replication forks were truncated when they were near the TSS region
During the long-read sequencing of the EdU and BrdU-labeled genomic DNA, if the replisome is off the template and never returns to resume the replication from the same location by itself or by a replacement replisome, the fork will appear as “truncated” as depicted in Fig. 3D. These forks have their EdU-BrdU-labeled tracks terminated at the same position as the unlabeled sequence. Such truncation is not due to the interruption of the fork labeling by cell harvest, because BrdU was already thoroughly washed out 20 min before the cell harvest, the forks were extending using regular thymidine during the 20-min thymidine chase. We compared the number of truncated forks in the TSS, TES, and genebody regions. According to our RPB1 ChIP-seq of S-phase-enriched cells, the TSS had the highest RPB1 read density, followed by the TES and then the genebody (Fig. 2A). This pattern of Pol II density distribution has also been documented in all previous Pol II ChIP-seq literature [23, 24, 47]. When normalized to these regions’ lengths, the highest incidence of fork truncation occurred at the TSS region, followed by the TES region, and then the genebody, which is the same order as RPB1 density in these regions. This finding is consistent with a previous report that the presence of Pol II in the TSS hinders replication in the S phase [7]. In that same study, the authors also report that the TSS region replication is biased in the last S/G2 phase. In our study, we sorted the cells 4 h after synchronization release (Supplementary Fig. S3B). The DNA content of these cells was evenly distributed between 2N and 4N, and the cells were not enriched for late S-phase cells with higher DNA content near 4N. Moreover, the normal, untruncated forks showed only a minor (~20%) increase in frequency near the TSS region (Fig. 3G), whereas the truncated ones showed a more than 35-fold increase (Fig. 3H). This is a clear indication that the high frequency of truncated forks near the TSS is not by chance.
Such fork abortion strongly suggests that when the replisomes encounter Pol II, some of them yield to Pol II, as depicted in the upper panel of Supplementary Fig. S1B. This is one way to resolve the TRC. Alternatively, the collision evicted both the replisome and Pol II from the DNA template, as depicted in Supplementary Fig. S1D. Such an eviction or stall serves to resolve the TRC. This result is consistent with reports by others that Pol IIs at the TSS reduce the ability of replisomes to cross this region [7, 8]. With that said, such fork abortion occurs to a limited degree, with the majority of forks unscathed, suggesting the possible existence of modus operandi as depicted in Supplementary Fig. S1A (Pol II and the replisome glide through each other) or S1C (Pol II is evicted while the replisome moves on).
Biphasic fork speeds with regard to Pol II density
Some interesting and quite unexpected findings were revealed by DNAscent in this study. Based on literature [55, 56], our prior RPB1 ChIP-seq data [23, 24], and the RPB1 ChIP-seq in this study (Fig. 2A), there is a huge accumulation of Pol II at the TSS region, while the Pol II density in the gene body and intergenic region is both much lower (Fig. 2F). As ARMC5-E3 is required for the degradation of all 12 Pol II subunits under physiological conditions, we have shown that more DNA-bound Pol II in the KO cells than in their WT counterparts [23, 24]. This is largely confirmed by Cacioppo et al. [57] in an ARMC5 KO cell line with respect to the TSS and genebody. The augmented Pol II density in the KO cells in the TSS region is also reported by Blears et al. [58], although the increase in the other part of the genome is not obvious in their study, probably due to a different method used to estimate the Pol II density (RPB1 ChIP-seq in our and Cacioppo’s study versus PRO-seq in Blears’ study).
More Pol IIs on the DNA template should, in theory, hinder the progression of the replication forks and, hence, slow the fork speed. However, to our surprise, the forks that spanned TSS were faster than those that did not (Fig. 3G). The forks of any type, i.e. overlapped with TSS or not (Fig. 3G), and those residing in the genic or intergenic region (Fig. 3F) in the KO cells were always faster than those of their WT counterparts. Therefore, it seems that forks move faster and more easily in regions with higher Pol II levels. This phenomenon has been documented by others: replication is faster in actively transcribed genes [10, 11, 59]. How do we explain this finding? Pol II is known to leave negatively supercoiled DNA in its wake [60]. The replisomes may exploit such negative supercoiling, allowing the forks to move faster.
The bTMP-seq results, which detect negative supercoiling, partially support this possibility. Consistent with the higher Pol II density and faster fork speed in all regions in the genome in KO cells than their WT counterparts (Figs 2A and 3C, F, and G), the bTMP-seq showed that DNA was more negatively supercoiled in all the regions of the genome in the KO cells (Fig. 5H). It is noteworthy that in KO cells, the number of forks was reduced. Therefore, we cannot exclude the possibility that fewer forks might allow them to move faster in KO cells through unknown mechanisms.
However, some observations did not fit well with the hypothesis. While the highest Pol II density was located in the TSS region, forks spanning the TSS region moved the fastest, and the TSS region was less negatively supercoiled than the genebody according to bTMP-seq (Fig. 5G). A possible explanation for this discrepancy is that the heavy Pol II loading at the TSS region obstructs psoralen binding to DNA. The actual degree of negative supercoiling in the TSS region might be higher. Another discrepancy concerns the Pol II density in the genebody. Fork speed was faster in the genebody than in the intergenic region (Fig. 3F), consistent with a higher degree of negative supercoiling in the former than in the latter (Fig. 5G). However, there was no significant difference in Pol II density between these two regions (Fig. 2F). It is to be noted that Pol II is not evenly distributed outside the genes, and therefore, averaging the peak density in the intergenic region is not very meaningful, and so is the comparison between the average negative supercoiling between the genebody and intergenic region. A caveat is that our findings are cell population-based, and whether this occurs at the single-molecule level remains to be determined.
The Pol II-dense TSS region is an obstacle for replication forks
To identify the location with consistently high Pol II density, we used the TSS region, which was laden with a large number of Pol II, both according to landmarks reported in literature [23, 24, 47, 61, 62] and our current RPB1 ChIP-seq (Fig. 2A and F). The profiles in both cases were identical. We used genomic landmarks from the literature in our calculations.
On a cell population basis, forks that overlapped with TSS moved faster than those that did not (Fig. 3G). However, among the forks overlapping with TSS, the closer their fork tip (the last BrdU-labeling nucleotide) to TSS, the slower their average speed was (Fig. 4A–C). This change of fork speed indicates that the forks do not pass Pol IIs smoothly, and Pol IIs is an obstacle to their passage. This change in fork speed is consistent with previous findings that increased transcription-replication interaction sites are enriched in the TSS region and that such sites cause replication stress [7, 8].
We dissected this overlapping fork population into individual orientations and tip positions, and a strange V-shaped pattern emerged (Fig. 4D and E). This V-shaped speed distribution indicates several things. (i) This pattern is independent of the fork orientation. (ii) Since all these overlapping forks are not truncated ones, it means they can pass the Pol II dense region unscathed. (iii) However, such a passage is not eventless but creates more slow forks when the fork tips are closer to the TSS. (iv) These TSS-overlapping forks behaved like trains leaving or arriving at a station, accelerating or decelerating, respectively, on a population basis based on their tip distance to the TSS. (v) All overlapping forks observe a strict minimal speed limit. (vi) At any position, there are always some high-speed forks with a maximal speed of ~4 kb/min, which is probably the maximal speed of replisomes.
It will be fascinating to investigate why and how these TSS-overlapping forks observe the strict minimal speed limit, and why and how their speed is higher when their tips are farther from the TSS.
Limitations of this study and future directions
Most of our conclusions in this study are cell-population-based and cannot fully support or refute any of the four modus operandi depicted in Supplementary Fig. S1.
A new technique that can detect the speed and direction of both Pol II and the replisome on a single DNA molecule in vivo will need to be developed to assess these models with certainty.
Nature is rarely simplistic but often employs multiple ways to overcome a problem. Based on the findings of this study and existing literature [18–22, 63], at least on a cell-population basis, temporal segregation, codirectional preference, and fork abortion all contribute to resolving the TRC, but only to a limited extent. The majority of the forks could pass Pol IIs unscathed, albeit with some difficulty, indicating the existence of a so-far unidentified, highly efficient conflict-resolution mechanism.
If we are allowed to speculate, the model in which the Pol II and replisome glide passing each other is most energy-efficient (model A in Supplementary Fig. S1). In support of this model, an earlier study showed that the replisome can pass a stalled Pol II without affecting the latter’s ability to continue transcription [22]. This sounds like a mission impossible if we view the Pol II and replisome as two trains on the same track.
However, if we imagine the two trains traveling in a zero-gravity 3D world, one running on the top surface of the track while the other running upside down on the lower surface of the track (Supplementary Fig. S9), this model becomes less far-fetched. Of course, both Pol II and the replisome need to unclench the template somewhat to share it with the other. The new technology mentioned above can be used to assess the validity of this model.
Supplementary Material
gkag235_Supplemental_Files
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1van der Meijden CM, Lapointe DS, Luong MX et al. Gene profiling of cell cycle progression through S-phase reveals sequential expression of genes required for DNA replication and nucleosome assembly. Cancer Res. 2002;62:3233–43.12036939 · pubmed ↗
- 2Whitfield ML, Sherlock G, Saldanha AJ et al. Identification of genes periodically expressed in the human cell cycle and their expression in tumors. M Bo C. 2002;13:1977–2000. 10.1091/mbc.02-02-0030.12058064 PMC 117619 · doi ↗ · pubmed ↗
- 3Cho RJ, Huang M, Campbell MJ et al. Transcriptional regulation and function during the human cell cycle. Nat Genet. 2001;27:48–54. 10.1038/83751.11137997 · doi ↗ · pubmed ↗
- 4Stein GS, Park WD, Thrall CL et al. Cell cycle stage-specific transcription of histone genes. Biochem Biophys Res Commun. 1975;63:945–9. 10.1016/0006-291X(75)90660-9.48381 · doi ↗ · pubmed ↗
- 5Bertoli C, Skotheim JM, de Bruin RA. Control of cell cycle transcription during G 1 and S phases. Nat Rev Mol Cell Biol. 2013;14:518–28. 10.1038/nrm 3629.23877564 PMC 4569015 · doi ↗ · pubmed ↗
- 6Meryet-Figuiere M, Alaei-Mahabadi B, Ali MM et al. Temporal separation of replication and transcription during S-phase progression. Cell Cycle. 2014;13:3241–8. 10.4161/15384101.2014.953876.25485504 PMC 4615114 · doi ↗ · pubmed ↗
- 7Wang J, Rojas P, Mao J et al. Persistence of RNA transcription during DNA replication delays duplication of transcription start sites until G 2/M. Cell Rep. 2021;34:108759. 10.1016/j.celrep.2021.108759.33596418 PMC 7900609 · doi ↗ · pubmed ↗
- 8St Germain CP, Zhao H, Sinha V et al. Genomic patterns of transcription-replication interactions in mouse primary B cells. Nucleic Acids Res. 2022;50:2051–73. 10.1093/nar/gkac 035.35100392 PMC 8887484 · doi ↗ · pubmed ↗
