Divergent Fates of Kidney-Resident Polyomaviruses: Stable Shedding Versus Near-Silent Persistence

Anik Mojumder; Kimin W. Nguyen; Christopher S. Sullivan

PMC · DOI:10.3390/v18030359·March 15, 2026

Divergent Fates of Kidney-Resident Polyomaviruses: Stable Shedding Versus Near-Silent Persistence

Anik Mojumder, Kimin W. Nguyen, Christopher S. Sullivan

PDF

Open Access

TL;DR

This study shows that kidney-resident polyomaviruses have two distinct fates: most remain dormant, while a small subset consistently sheds in urine over time.

Contribution

The study reveals structured viral dynamics in persistent kidney infections using barcoded viruses and longitudinal tracking.

Findings

01

Kidney-resident polyomaviruses resolve into two stable populations: near-silent persistence and shedding.

02

Only a small subset of viral barcodes contributes to late-stage urinary shedding, showing stable longitudinal behavior.

03

Shedding fate is established early in infection and is not explained by input abundance or other tested factors.

Abstract

Polyomaviruses establish long-term infection in the kidney and are intermittently shed in urine. However, the relationship between kidney-resident viral genomes and urinary shedding during persistent infection remains poorly defined. Using a genetically barcoded murine polyomavirus library, we tracked thousands of viral lineages in vivo by pairing longitudinal urine sampling with endpoint barcode sequencing of kidney tissue in four mice. Across all animals, kidney infection consistently resolved into two stable viral populations, with near-silent persistence as the dominant fate. Most kidney-resident barcodes were never detected in late urine at late stages of infection, even though many reached substantial abundance within the kidney, demonstrating that kidney viral genome levels alone do not predict urinary shedding. In contrast, only a small minority of kidney barcodes contributed…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species2

Polyomavirus sp.(species)Mus musculus(house mouse · species)

Diseases2

kidney infection infection

Figures10

Click any figure to enlarge with its caption.

Funding1

—National Institutes of Health (NIH)

Keywords

polyomavirus persistenceurinary sheddingviral barcodingkidney viral reservoirslong-term viral persistencehost–virus interactions

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPolyomavirus and related diseases · Animal Virus Infections Studies · Virus-based gene therapy research

Full text

1. Introduction

Polyomaviruses (PyVs) are among the best studied tumor viruses and have served for decades as foundational model systems for understanding viral oncogenesis, DNA replication, and host–virus interactions [1,2,3,4,5,6,7]. Consistent with their evolutionary success, 50–90% of adults are seropositive for one or more PyVs, reflecting widespread lifelong exposure [8]. Despite this, a fundamental aspect of PyV biology remains poorly understood: how do these simple viruses, with DNA genomes of only ~5 kb, persist within their hosts for decades? Clinically, this question carries major consequences. PyV persistence and reactivation underlie severe disease in immunocompromised settings, including BK polyomavirus (BKPyV)-associated nephropathy and graft failure following kidney transplantation, as well as JC polyomavirus (JCPyV)-driven progressive multifocal leukoencephalopathy [9,10,11,12,13]. Although Merkel cell polyomavirus (MCPyV) is the only human PyV definitively established as a direct cause of cancer, through its role in Merkel cell carcinoma of the skin, emerging evidence suggests that persistent BKPyV infection contributes to bladder carcinogenesis [14,15,16,17,18,19]. Therefore, from both clinical and basic biology perspectives, there is a critical need to understand how PyVs establish and maintain long-term infection in vivo.

Urinary shedding is a defining feature of kidney-tropic PyVs and a clinically relevant non-invasive readout of viral replication activity during persistent infection [20,21,22]. In humans, BKPyV and JCPyV preferentially persist in the kidney and are shed into the urine, where longitudinal studies reveal typically episodic high-level shedding interspersed with periods of minimal or undetectable viral output [23,24,25,26]. In this context, because it recapitulates kidney tropism, long-term persistence, and urinary shedding in a tractable experimental setting, murine polyomavirus (muPyV) provides a powerful in vivo system to interrogate the mechanisms underlying these shedding patterns [27,28,29,30].

With the advent of next-generation sequencing technologies, we developed a genetically barcoded murine polyomavirus (muPyV) library that preserves viral fitness and enables longitudinal non-invasive simultaneous tracking of thousands of distinct viral lineages in vivo [30,31]. Using this barcoded muPyV system in our previous work, we discovered at least two distinct modes of urinary shedding during persistent infection: a continuous, low-level shedding (“smoldering”) involving initially thousands and then hundreds of viral barcodes and episodic high-level shedding events dominated by one or a few barcodes at later times of persistent infection [30]. However, our original study did not answer several fundamental questions: During long-term infection, which viral genomes persisting in the kidney ultimately give rise to urinary shedding? Do most kidney-resident viral genomes contribute to urine virus loads, or do many persist without shedding? Among those that do shed, is long-term persistent shedding driven by a stable subset of kidney-resident viral lineages or by transient contributions from different viral genomes over time? Does kidney viral abundance predict long-term shedding fate? These open questions formed the basis of the present study.

Current understanding of PyV shedding patterns exhibits hallmarks consistent with potential latency and reactivation [23]. PyV persistence has long been connected to immune control [28,32,33,34,35]. Only recently has PyV persistence been discussed in the context of basic host cell-cycle-mediated regulation [23,36,37,38,39,40,41]. Although classical models proposed that large T antigen (LTAg) actively drives infected cells into S phase, recent work on BKPyV instead demonstrates that entry into S phase is required for robust LTAg expression [23,36,37,39,41,42,43]. Consistent with this model, BKPyV genomes remain largely transcriptionally quiescent in G_0_-arrested cells, while progression from G_1_ into S phase coincides with activation of early viral gene expression, possibly in part through release of E2F transcription factors from RB and their engagement with binding sites in the viral non-coding control region (NCCR) [23,36,37]. Once activated, BKPyV may further reinforce a replication-competent state by engaging DNA-damage response pathways and reinforcing derepressed “free” E2F that prolong S phase [38,44,45].

In parallel, host immune signaling further shapes persistent infection. BKPyV infection induces strong IRF3-driven interferon responses in endothelial cells but not in renal epithelial cells, creating a cellular environment permissive for long-term persistence in the kidney [39]. Single-cell transcriptomic studies reinforce this view, revealing a consistent host signature of productive infection characterized by S/G2M cell-cycle programs, DNA-damage and replication-stress pathways, MAPK activation, mitochondrial stress, and suppression of interferon-stimulated and antigen-presentation genes [40,46,47,48,49,50,51]. Importantly, only a minority of infected cells become high producers with robust late capsid transcription, while the majority remain in low or restricted transcriptional states despite similar viral exposure [48]. Together, these observations led us to hypothesize a subset of virus lineage reservoirs are primed for continuous/frequent shedding with other viral genomes persisting in the kidney in deeply restricted states that infrequently reactivate contributing to the urine virome.

Here, we tested this hypothesis using our longitudinal in vivo muPyV barcode sequencing data from four individual mice (FL, FR, ML, and MR) [30] applying new analyses to classify kidney-resident viral genomes according to their long-term shedding behavior. By partitioning kidney barcodes into shedding and non-shedding populations at latest times of infection within each mouse and tracking their dynamics over time from initial infection, we determined whether shedding fate is associated with kidney abundance, barcode sequence features, or ongoing reseeding from other tissues, and assessed when during infection these fates emerge within individual hosts. Across all four mice, we find that persistent kidney infection is consistently structured into two stable viral populations: a minority of genomes that drive sustained and episodic urinary shedding and a majority that persist without contributing to long-term bulk viral genome urine output. These findings demonstrate a large population of kidney-resident PyV lineages that rarely or never give rise to shed viruses and that shedding fate is established early during infection and maintained over time during long-term persistent infection in vivo.

2. Materials and Methods

2.1. Generation of a Barcoded Virus Library

We previously generated a complex library of genetically barcoded muPyV (PTA strain) in which each viral genome carries a unique 12-nucleotide random barcode inserted at a neutral position between the two polyadenylation signals [30,31]. A schematic overview of the experimental design is shown in Figure 1. This design allows individual viral barcodes to be tracked across tissues and over time without perturbing viral replication or fitness. Barcoded muPyV stocks were generated by excising viral genomes from plasmids, re-circularizing them, and propagating virus in NMuMG cells (ATCC, # CRL-1636; a gift from Prof. Dr. Aron Lukacher, Pennsylvania State University College of Medicine), as described previously [30,31].

2.2. In Vivo Infection and Longitudinal Sampling

All animal procedures were previously published [30]. Animal procedures were conducted in accordance with protocols approved by The University of Texas at Austin Institutional Animal Care and Use Committee (AUP-2022–00305) and were subject to veterinary approval. All animal experiments were conducted in the semi-barrier mouse facility of the Animal Resources Center at The University of Texas at Austin. Parental mice for breeding were originally obtained from Dr. Yuri Blednov, The University of Texas at Austin. To study longitudinal shedding dynamics and organ-resident viral populations, we infected two female (“FL” and “FR”) and two male (“ML” and “MR”) healthy FVB/NJ mice (9–10 weeks old) with a single intraperitoneal injection of 1 × 10^6^ infectious units of the barcoded muPyV library. We collected urine non-invasively at multiple timepoints (two to three times a week) over the course of infection and stored samples for downstream barcode sequencing and viral load quantification. For urine collection, mice were placed in individual microisolator cages fitted with wire-bottom inserts and lined with plastic wrap for 2–4 h. Collected urine samples were stored at −80 °C until analysis. At the end of the study (59 days post-infection for mouse “MR” and 99 days post-infection for the remaining three mice), we harvested a broad panel of organs to capture the full tissue distribution of viral barcodes. Mouse “MR” was euthanized earlier due to the development of an abscess that was determined to be unrelated to muPyV infection.

This study consisted of four independent experimental units (n = 4 mice; two female, two male), with each mouse serving as an internal control and comparator for the others in assessing longitudinal barcode competition and shedding dynamics. Therefore, no separate uninfected control group was included, as all mice were infected with the same barcoded virus stock and compared across lineage-resolved outcomes within and between animals. This study was originally intended as a pilot experiment but all mice demonstrated highly consistent shedding patterns [30]; therefore, four mice were considered the minimal number necessary to gain insight into the shedding dynamics. No predefined inclusion or exclusion criteria were applied, and all animals were included in the final analyses. All four infected mice were included in the longitudinal urine and organ analyses; we note, however, mouse “MR” was euthanized earlier due to an unrelated abscess but was included in all analyses up to the time of sacrifice. Due to high number of urine samples collected, not all samples collected were analyzed via barcode sequencing, with excluded timepoints being selected at random. All mice received the same inoculum from a single virus stock preparation, so no randomization in which virus stocks were used was applied. Experiments were not conducted in a blinded manner—all mice described in this work received the same treatment and were compared internally. The primary outcome measures were quantification of viral genome barcodes in urine over time and in organs at sacrifice, determined by qPCR and next-generation sequencing.

2.3. DNA Extraction, Real Time-qPCR Assay, and Illumina NextSeq Library Preparation

We purified DNA from urine, tissues, and blood and quantified total viral genome abundance in each sample by real-time qPCR. Barcode identities and relative abundances were determined by targeted amplification of the barcode-containing region followed by Illumina sequencing. Briefly, barcode amplicon libraries were generated using enrichment and indexing PCRs and sequenced on an Illumina NextSeq platform. All experimental procedures have been described in detail in our prior work [30].

For urine samples, DNA was purified from 50 μL of urine (or the maximum available volume when lower volumes were collected) using the QIAamp Viral RNA Mini Kit (Qiagen, Hilden, Germany) according to the manufacturer’s instructions and eluted in 60 μL of Tris-EDTA buffer. For tissue samples, DNA was extracted from approximately 25 mg of organ (or 10 mg for small spleen samples) using mechanical disruption with a Bead Mill 4 homogenizer (Thermo Fisher Scientific, Waltham, MA, USA) followed by purification with the QIAamp Fast DNA Tissue Kit (Qiagen, Hilden, Germany). DNA from 140 μL of whole blood was purified using the QIAamp DNA Mini Kit (Qiagen, Hilden, Germany), with an additional RNase A digestion step. Tissue and blood DNA were eluted in 200 μL of AE buffer. DNA concentrations were measured by a NanoDrop ND-1000 spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA), and integrity was assessed by agarose gel electrophoresis. All DNA samples were stored at −20 °C until use.

Total muPyV genome copies were quantified by SYBR Green-based real-time qPCR using the primer pair PTA/PTA-dl1013 (sense: 5′-GATGAGCTGGGGTACTTGT-3′; antisense: 5′-TGTATCCAGAAAGCGACCAAG-3′). Each 20 μL reaction contained 10 μL PerfeCTa SYBR Green FastMix, ROX (Quantabio, Beverly, MA, USA), 8 pmol of each primer, and 2 μL of template DNA. Thermal cycling consisted of an initial denaturation at 95 °C for 10 min followed by 40 cycles of 95 °C for 15 s and 60 °C for 30 s, with fluorescence acquisition at each extension step. Specificity was confirmed by melt curve analysis. Standard curves were generated using ten-fold serial dilutions of a pBluescript-sk+PTA barcoded plasmid. Viral genome copies were normalized per μL of sample for urine and blood or per μg of total DNA for tissue samples. Reactions were performed in duplicate, and the mean value was used for quantification. The limit of detection was 10 copies per reaction.

For barcode sequencing, enrichment PCR was performed using KAPA HiFi HotStart Ready Mix (Kapa Biosystems, Wilmington, MA, USA). For urine samples, typically 20 μL of eluted DNA was used per reaction. For blood, tissue, and virus stock samples, up to 4.77 × 10^5^ muPyV genome copies were used per reaction, with tissue DNA additionally limited to ≤3.1 μg total DNA per reaction to avoid PCR inhibition. Amplification was performed in 50 μL reactions containing 0.3 μM of each primer under the following conditions: 95 °C for 3 min; 22–35 cycles of 98 °C for 20 s, 56 °C for 15 s, and 72 °C for 15 s; followed by a final extension at 72 °C for 2 min. Products were verified on a 2% agarose gel.

Indexing PCR was performed as previously described [30], generating 75 bp amplicons encompassing the barcode region using staggered indexing primers. Duplicate indexing reactions per sample were pooled, gel purified, quantified, normalized, and combined to generate the final sequencing library. Libraries were quality-checked by Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA) at The University of Texas at Austin Genomic Sequencing and Analysis Facility and sequenced on an Illumina NextSeq SR75 platform (Illumina, San Diego, CA, USA). PhiX control DNA was included to increase library diversity.

2.4. Extracting Barcode Sequences

We extracted barcode sequences from FASTQ files as previously described [30]. Briefly, we used Cutadapt to identify barcodes, allowing the default error rate of 10% and requiring both flanking adapters to be present via linked-adapter matching [52]. We then aggregated raw barcode counts for each sample using custom R scripts built on tidyverse packages [53].

2.5. Clustering and Defining the Stock Barcode Reference

To define a reference set of input barcodes, we clustered barcodes from the stock virus using Starcode message-passing clustering, with a maximum Levenshtein distance of 3 and the default cluster ratio of 5, as in our prior work [30,31,54]. We then restricted downstream analyses to the subset of barcodes accounting for the top 99% of cumulative abundance in the stock, excluding low-complexity artifacts. This yielded an estimated 4012 unique barcodes in the initial barcoded virus stock, which served as the reference set for all barcode matching and quantification (Table S1). Unless otherwise noted, we performed all downstream processing and analyses in R (version 4.3.3) using tidyverse (v2.0.0) [53] and associated packages (including patchwork (v1.3.2) [55], ggridges (v0.5.6) [56], ComplexUpset (v1.3.3) [57], pheatmap (v1.0.13) [58], corrplot (v0.95) [59], kableExtra (v1.4.0) [60], stringdist (v0.9.15) [61], purrr (v1.0.4) [62], reshape2 (v1.4.4) [63], and gridExtra (v2.3) [64]).

2.6. Assigning Sample Barcodes to Stock Barcodes

To quantify barcode abundance in urine and tissue samples, we mapped every observed sample barcode to the closest stock barcode based on Levenshtein distance. We discarded sample barcodes that did not fall within a distance of 3 from any stock barcode. When a sample barcode matched multiple stock barcodes at the same minimum distance, we divided its count evenly among those stock barcodes. We then summed these fractional assignments within each sample to obtain a weighted count for each stock barcode, reflecting its estimated contribution to that biological sample. To reduce the influence of extremely low-abundance signals, we removed weighted barcode counts below 10. This cutoff served as our detection threshold for subsequent barcode-level analyses.

2.7. Normalizing Relative Barcode Abundance to Viral Genome Load

Because total viral genome abundance varied across biological samples, we normalized barcode counts to independent qPCR-based measurements of viral load (approximated by viral genome amounts detected). For each sample, we first calculated the fraction of total barcode signal (defined as weighted barcode counts after barcode-to-stock assignment) contributed by each stock barcode. We then scaled this fraction by the measured viral genome concentration in that sample (barcode fractions x genome level) to obtain an estimated barcode-specific viral load. We refer to this scaled quantity as the “barcode level”, which represents the estimated abundance of an individual viral barcode after accounting for both its relative representation within a sample and the absolute viral genome load of that sample. For urine samples, we multiplied each barcode’s fractional abundance by the qPCR-estimated number of viral genomes per microliter of urine, yielding barcode-specific genome concentrations in urine (Table S2). For tissue samples, we multiplied fractional barcode abundance by qPCR-estimated viral genomes per microgram of input DNA (Table S3). For blood samples, we scaled fractional barcode abundance by viral genome concentration per microliter of blood and converted values to a per-milliliter basis to maintain consistency across sample types (Table S3).

2.8. Definition of Late Urine Detection and Kidney Barcode Classification

For each animal, we defined the late window as the final two urine collection timepoints prior to sacrifice (FL: days 90 and 98; FR: days 94 and 98; ML: days 94 and 98; MR: days 55 and 59 post-infection). We classified a kidney barcode as “late-shed” if detectable in urine at either of these timepoints and as “late non-shed” if it was not detected during this window. We identified kidney-resident barcodes from tissue sequencing data by selecting barcodes detected in kidney samples. We assigned late-shed and late non-shed status independently for each mouse based on detection in urine.

2.9. Quantification of Kidney Barcode Counts, Total Load, and Per-Barcode Abundance

For each mouse, we quantified the number of distinct kidney barcodes classified as late-shed or late non-shed (see “Section 2.8”). To quantify the total kidney barcode load for each group, we summed the barcode-level (scaled quantity estimate of total barcode genome abundance) values across all barcodes belonging to the late-shed group and separately across all barcodes belonging to the late non-shed group within each mouse. Note that each barcode was associated with a barcode level, defined as the qPCR-scaled estimate of viral genome abundance for that barcode (see “Section 2.7”), expressed as viral genomes per microgram of input kidney DNA. To test whether kidney barcode abundance alone predicted late urinary shedding, we compared the distributions of kidney barcode levels per-barcode between late-shed and late non-shed groups within each mouse. Here, per-barcode kidney barcode level refers to the individual barcode-level values (qPCR-scaled estimate of viral genomes per microgram of kidney DNA) for each barcode, analyzed as distributions of kidney barcode levels per-barcode rather than summed totals. This analysis directly tests whether individual viral genomes that contribute to late urine shedding are distinguished simply by higher abundance in the kidney.

2.10. Urine Barcode Composition and Contribution to Total Urine Viral Load

We analyzed longitudinal urine barcode composition using two complementary approaches. First, for each urine sample, we calculated barcode fractional abundance, defined as the weighted barcode count for a given barcode divided by the sum of the weighted barcode counts across all barcodes detected in that urine sample (fractional abundance for barcode i = weighted barcode count for barcode i/total weighted barcode counts in that sample). Note, weighted barcode counts reflect post-clustering, tie-resolved barcode counts after removal of low-abundance signals (see “Section 2.6”). Then, the fractional abundances were normalized within each urine sample and were used to generate donut plots visualizing barcode composition at each timepoint. In these plots, we highlighted the top 50 kidney barcodes, defined as the 50 barcodes with the highest kidney barcode levels within the specified group (late-shed or late non-shed) for each individual mouse. These highlighted barcodes were assigned unique colors, while all remaining barcodes detected in urine were displayed in alternating grayscale to preserve rank ordering while minimizing overplotting. Second, to quantify how much of the urine signal was contributed by these kidney barcode sets, we performed a urine contribution analysis, in which we summed urine barcode levels across the selected barcodes at each timepoint. Urine barcode level refers to the qPCR-scaled estimate of barcode abundance in urine, expressed as viral genomes per microliter of urine (see “Section 2.7”). For each timepoint, we summed urine barcode levels for the top 50 kidney barcodes and overlaid this value on the total urine viral signal, defined as the sum of urine barcode levels across all barcodes detected in that urine sample.

2.11. Barcode-Resolved Longitudinal Urine Trajectory Analysis

To determine whether late urinary shedding reflected consistent, barcode-specific shedding behavior over time (i.e., repeated detection of the same viral lineages across multiple urine collections, rather than transient or sporadic appearances), we analyzed longitudinal urine trajectories at the level of individual viral barcodes tracked independently over time. We examined three barcode groups. These included (i) the top 50 most abundant late-shed kidney barcodes, defined as the 50 kidney-resident barcodes with the highest kidney barcode levels (qPCR-scaled estimate of viral genomes per microgram of kidney DNA) among barcodes classified as late-shed, (ii) the top 50 late non-shed kidney barcodes defined as the 50 kidney-resident barcodes with the highest kidney barcode levels among barcodes classified as late non-shed, and (iii) a random barcode control group of equal size per mouse, drawn from the set of all barcodes detected in urine after excluding barcodes belonging to the first two groups. For each barcode, we quantified its longitudinal urine shedding trajectory by tracking its urine barcode levels (qPCR-scaled estimate of viral genomes per microliter of urine) at each urine collection timepoint. For visualization only, we capped barcode level values at an upper limit of 25 to improve interpretation of the trajectories.

2.12. Input Virus Stock Relative Barcode Rank Analysis

To assess whether late shedding behavior reflected differences in input abundance, we ranked viral barcodes by their abundance in the input virus stock, defined as the weighted barcode count obtained from sequencing of the barcoded virus stock prior to infection. We derived stock ranks from the set of barcodes comprising the top 99% of cumulative abundance in the input library, ordering barcodes from most abundant (rank 1) to least abundant. For each mouse, we examined stock rank distributions for the top 50 late-shed and top 50 late non-shed kidney barcodes, where “top 50” refers to the 50 barcodes within each group with the highest kidney barcode levels (qPCR-scaled estimate of viral genomes per microgram of kidney DNA) in that mouse and summarized central tendencies by calculating median stock ranks for each group.

2.13. Peak Urine Barcode Abundance Analysis

To identify barcodes associated with high-amplitude urinary shedding, we calculated the peak urine abundance for each barcode, defined as the maximum urine barcode level observed for that barcode across all urine collection timepoints. Here, urine barcode level refers to the estimated abundance of a barcode in urine after normalizing its sequencing representation to the total viral genome concentration measured by qPCR, expressed as viral genomes per microliter of urine (see “Section 2.7”). We performed this analysis for the top 50 late-shed kidney barcodes, the top 50 late non-shed kidney barcodes, and a size-matched random barcode control group, with all groups defined independently for each mouse. Here, “top 50” refers to the 50 barcodes within each group with the highest kidney barcode levels (qPCR-scaled estimate of viral genomes per microgram of kidney DNA). We log_10_-transformed peak urine abundance values for visualization. We performed comparisons within individual mice, such that peak urine abundance distributions for different barcode groups were compared only among barcodes originating from the same animal, thereby accounting for mouse-to-mouse differences in overall viral shedding levels.

2.14. Definition and Enrichment Analysis of Smoldering Barcodes

To identify barcodes exhibiting sustained shedding (“smoldering”), we quantified barcode detection frequency across urine timepoints, defined as the proportion of urine collection timepoints at which a given barcode was detected above the barcode-level detection threshold. Detection was defined as a urine barcode level corresponding to a weighted barcode count ≥ 10 after barcode-to-stock assignment, consistent with thresholds used throughout the study (see “Section 2.6”). For each barcode within a given mouse, we calculated detection frequency as the number of urine timepoints with detection divided by the total number of urine timepoints sampled for that mouse. Within each mouse, we defined the top 50 barcodes with the highest detection frequency as “top smolderers”. We compared the fraction of top smoldering barcodes belonging to the late-shed or late non-shed kidney barcode groups against a size-matched random barcode control.

2.15. Early Versus Late Urine Detection Frequency Analysis

To test whether barcodes that dominate late urine already exhibit elevated activity earlier in infection, we partitioned urine time courses into early and late phases. For each mouse, we defined the late phase as the final quartile of urine sampling timepoints closest to the time of sacrifice (i.e., ≥day 74 for mice FL, FR, and ML; ≥day 45 for mouse MR), with all earlier timepoints assigned to the early phase. We then grouped barcodes into three sets. These included top late smolderers, defined as the 50 barcodes with the highest detection frequency during the late phase (detection frequency defined as the number of urine timepoints at which a barcode was detected above the barcode-level detection threshold divided by the total number of late-phase urine timepoints for that mouse); top late high-shedders, defined as the 50 barcodes with the greatest cumulative qPCR-scaled urine barcode levels (viral genomes per microliter of urine) across all late-phase urine timepoints and a size-matched random barcode control group, defined independently for each mouse. For each barcode, we calculated the fraction of early-phase detection frequency (defined as the number of early-phase urine timepoints at which a barcode was detected above the barcode-level detection threshold divided by the total number of early-phase urine timepoints for that mouse).

2.16. Correlation Analysis Between Kidney Barcode Groups and Other Tissues

To examine relationships between kidney barcode populations and other tissues, we computed Spearman correlation matrices using barcode relative abundance values, defined as the fractional abundance of each barcode within a given tissue sample (weighted barcode count for a barcode divided by the sum of weighted barcode counts across all barcodes in that tissue sample). For this analysis, we split kidney barcodes into late-shed and late non-shed groups using the same classification criteria described earlier, where late-shed kidney barcodes were those detected in urine during the final two urine timepoints for a given mouse, and late non-shed kidney barcodes were those not detected during this window (see “Section 2.8”). We treated these two groups as separate pseudo-organs. We computed Spearman correlations using pairwise complete observations and visualized correlation matrices using mixed circle-and-number correlograms. We also examined Spearman correlations across mice using “kidney-only” barcode data, defined as comparisons of relative barcode abundance profiles in kidney tissue between different mice, restricted to barcodes detected in kidney samples. This analysis tested whether the composition of late-shed or late non-shed kidney barcode populations was conserved across animals.

2.17. Barcode Sequence Feature Analyses

To test whether intrinsic barcode sequence features contributed to shedding fate, we analyzed GC content, barcode length, and predicted miRNA targeting. For each mouse, we computed GC content and barcode length distributions for the top 50 most abundant late-shed and top 50 late non-shed kidney barcodes, where “top 50” refers to the 50 barcodes within each group with the highest kidney barcode levels (qPCR-scaled viral genome abundance in kidney tissue) for that mouse. To assess potential miRNA-mediated effects, we tested whether kidney barcodes were enriched for predicted miRNA seed matches. We defined a miRNA-hit barcode as any barcode whose sequence contained a canonical miRNA seed match, defined as perfect Watson–Crick complementarity to nucleotides 2–8 from the 5′ end of a mature miRNA. This region corresponds to the miRNA seed sequence, which is widely recognized as a primary determinant of miRNA target recognition and binding specificity in animals [65,66,67]. For each miRNA, we searched for perfect matches to nucleotides 2–8 within both the forward barcode sequence and a corresponding reverse-orientation barcode sequence generated for each barcode using custom Python scripts (Python v3.12.2). Seed matches identified in the reverse orientation were collapsed back to their corresponding forward barcode identity. We performed this analysis separately for host murine miRNAs curated in miRbase [68] and for the muPyV-encoded miRNAs described previously [27], restricting the murine analysis to the top 100 kidney-abundant miRNAs extracted from the published small RNA sequencing atlas [69]. We compiled the set of miRNA-hit barcodes by retaining unique forward barcode sequences associated with at least one seed match and intersected this set with kidney-resident barcodes for each mouse. For each mouse, miRNA targeting was quantified using two complementary metrics: an unweighted metric corresponding to the fraction of kidney barcodes classified as late-shed or late non-shed that contained at least one miRNA seed match, and an abundance-weighted metric corresponding to the fraction of total kidney viral genome abundance (barcode level) attributable to miRNA-hit barcodes within each group. To determine whether observed differences in miRNA targeting between late-shed and late non-shed groups could be explained by intrinsic barcode sequence properties, we generated matched random expectations independently for each mouse using a length- and GC-matched resampling strategy. Within each mouse, kidney barcodes were stratified by exact barcode length and by GC content, with GC content divided into five quantile-based bins to ensure approximately equal numbers of barcodes per bin. For each shedding group, the observed barcode set was characterized by its joint length and GC-bin composition, and a matched random set of equal size was generated by sampling barcodes from the full kidney barcode pool of the same mouse while preserving the exact counts within each length × GC-bin stratum. Unweighted and abundance-weighted miRNA targeting metrics were recalculated for each resampled set, and this procedure was repeated 2000 times per mouse and shedding group to generate empirical null distributions representing the expected range of miRNA targeting under random assignment of shedding fate while preserving barcode sequence composition.

All custom scripts used in this study have been made publicly available at https://github.com/ChrisSullivanLab/Shedding-dynamics-of-a-DNA-virus on 12 February 2026. Independent analyses validating trends of plots presented in this work, including confirmatory plots and accompanying documentation generated by one of the authors (K.W.N.), are included in the same repository. Tables of analyses of raw sequencing making up the plots in this work are provided in the Supplementary Tables File.

3. Results

3.1. Late Urinary Shedding Arises from a Minority of Kidney Barcodes, Revealing a Large Non-Shedding Reservoir

To determine how kidney-resident viral barcodes contribute to late urinary output (Figure 1), we classified kidney barcodes based on their detection during the final phase of infection. Because kidneys were harvested at the experimental endpoint, we focused on the final two urine timepoints as the most proximal readout of kidney-derived viral shedding. Kidney barcodes detected in either of these timepoints were classified as “late-shed”, whereas kidney barcodes never detected during this window were classified as “late non-shed”.

We found that late non-shed barcodes greatly outnumbered late-shed barcodes across all four mice (Figure 2A; Table S4). In FL, we identified 1305 late non-shed distinct kidney barcodes compared with 188 distinct late-shed barcodes. We observed similar patterns in FR (913 late non-shed vs. 516 late-shed), ML (1333 vs. 241), and MR (1067 vs. 78) (Table S4). Thus, in every mouse, late urinary shedding arose from only a minority of kidney barcode diversity, indicating that most viral genomes persisting in the kidney do not contribute detectably to late urine.

We next asked whether late non-shed barcodes represent rare kidney residents or instead form a major quantitative reservoir of the virus genomes (bulk total viral DNA) within the kidney. For each mouse, we summed kidney barcode levels separately for late-shed and late non-shed barcode groups (Figure 2B; Table S5). Barcode levels are expressed as qPCR-scaled viral genome abundance per microgram of kidney DNA for each barcode (see “Section 2”). In three of four mice (FL, ML, and MR), late non-shed barcodes contributed the majority of total kidney viral genome abundance (Table S5). FR differed, with higher total kidney barcode levels among late-shed barcodes than late non-shed barcodes (Table S5). Importantly, even in FR, late-shed barcodes comprised fewer distinct kidney barcodes than late non-shed barcodes, indicating that absence from late urine is not simply a consequence of low abundance in the kidney.

We further tested whether kidney barcode abundance alone predicts late urinary shedding. For each barcode, we quantified its abundance by calculating its barcode level within the kidney of a given mouse, yielding an estimate of the total kidney-associated viral genome load contributed by that individual barcode. We then compared these per-barcode kidney abundance values between late-shed and late non-shed groups within each mouse (Figure 2C; Table S6). Across all animals, we observed wide dynamic ranges and substantial overlap between groups. While some high-abundance kidney barcodes were late-shed, many barcodes with comparable kidney levels remained late non-shed. Thus, kidney barcode abundance alone is insufficient to explain whether a viral genome contributes to late urinary shedding.

3.2. Late-Shed but Not Late Non-Shed Kidney Barcodes Contribute Substantially to Persistent Urinary Shedding

In our previous work, population-level analyses showed that although many viral genomes are detectable in urine, shedding during long-term persistent infection is dominated by a small subset of barcodes [30]. Here, we asked whether this principle persists when kidney-resident viral genomes are stratified by late shedding behavior. To address this, we examined longitudinal urine shedding patterns for kidney barcodes classified as late-shed or late non-shed, focusing on the most abundant kidney barcodes within each group (top 50 per mouse). For each urine sample, we calculated relative abundance of each barcode, defined as the fraction of total barcode-derived abundance within that urine sample (i.e., barcode-specific weighted counts normalized per timepoint). These relative compositions were visualized using donut plots (Figure 3A,B). In parallel, we quantified absolute contributions to urine shedding by summing barcode levels and visualized these values over time using sum-overlay plots (Figure 3C,D).

We observed that urine shedding during the persistent phase of infection (post-day 27) increasingly reflected contributions from a restricted number of late-shed kidney barcodes across all four mice (Figure 3A,B; Tables S7 and S8). These barcodes appeared as colored segments corresponding to the top 50 most abundant barcodes. In contrast, donut plots for late non-shed kidney barcodes showed little to no contribution from colored barcodes at late timepoints, reflecting their consistently low fractional representation in urine. Thus, while many barcodes remained detectable, only late-shed kidney barcodes rose to high fractional abundance within the urine barcode population throughout persistent infection.

Using sum-overlay analyses, we found that late-shed kidney barcodes contributed a substantially larger share of total urine barcode levels during the persistent phase (post-day 27) of infection compared to late non-shed kidney barcodes in all four mice (Figure 3C,D; Tables S9 and S10). Late non-shed kidney barcodes contributed minimally to cumulative urine shedding over time, even when they were abundant in kidney, indicating that kidney residence alone does not predict contribution to sustained urinary shedding.

3.3. Late-Shed Barcodes Are Longitudinally Consistent Throughout Persistent Infection

Population-level dominance of late-shed kidney barcodes raised the question of whether this pattern reflects consistent longitudinal behavior of individual viral lineages or transient effects at late times of persistence. To address this, we analyzed urine shedding trajectories at per-barcode resolution for kidney late-shed barcodes, kidney late non-shed barcodes, and a random barcode control group matched in number (top 50 per mouse; Figure 4; Tables S11 and S12). We visualized longitudinal urine detection for individual barcodes across all timepoints to assess persistence and recurrence of shedding behavior.

We observed that individual late-shed kidney barcodes were repeatedly detected above the barcode-level detection threshold (weighted barcode count ≥ 10) across multiple urine timepoints in all four mice (Figure 4). Many barcodes formed extended shedding trajectories spanning successive or nonconsecutive urine samples rather than isolated detection events. This sustained activity was consistent across animals, indicating that late shedding reflects stable, barcode-specific longitudinal behavior rather than dominance driven by a single transient peak.

In contrast, late non-shed kidney barcodes and randomly selected barcode controls were rarely detected in urine across time (Figure 4). When late non-shed barcodes were detected, urine barcode levels were typically low and restricted to isolated timepoints. Although occasional late non-shed barcodes reached higher urine barcode levels, these events were sporadic and non-recurrent, failing to persist across subsequent urine collections. The longitudinal patterns of late non-shed barcodes closely resembled those of random controls (except in mouse “ML,” where late non-shed barcodes were still far less recurrent and sustained than late-shed barcodes), indicating that persistent urine shedding is specifically enriched among late-shed kidney barcodes rather than a generic consequence of barcode sampling or kidney residence.

3.4. Divergence Between Late-Shed and Late Non-Shed Kidney Barcodes Is Not Due to Differences in Input Abundance

Stable, barcode-resolved differences in urine shedding raised the possibility that late-shed and late non-shed behaviors reflect differences already present in the viral inoculum. We tested this by examining the relative abundance of the top 50 kidney late-shed and top 50 late non-shed barcodes in the input virus stock. Because barcodes in the muPyV library are not evenly distributed in the input stock, as we have shown previously [30,31], we ranked barcodes by abundance in the inoculum and compared stock rank distributions for kidney late-shed and late non-shed barcodes across all four mice (Figure 5; Tables S13 and S14).

We found that kidney late-shed and late non-shed barcodes spanned broad and overlapping ranges of input stock ranks (Figure 5). Across all four mice, the median input ranks of both groups fell within the top 25% of the inoculum, indicating that neither group was preferentially overrepresented at the time of infection (Table S15 for kidney late-shed and Table S16 for kidney late non-shed barcodes).

The absence of systematic stock-rank differences indicates that late shedding behavior emerges during infection rather than being predetermined by input abundance. If late-shed barcodes were preferentially selected simply due to higher input abundance, they would be expected to rank consistently much higher in the virus stock. Instead, late-shed and late non-shed kidney barcodes entered infection with comparable input abundance profiles, supporting a model in which early host–virus interactions establish divergent long-term shedding trajectories.

3.5. Late-Shed Kidney Barcodes Reach Higher Peak Urine Levels than Late Non-Shed and Random Barcode Controls

Sustained longitudinal shedding could arise either from repeated low-level activity or from occasional high-amplitude shedding events. We distinguished between these possibilities by comparing the maximum single-timepoint urine abundance reached by each barcode. For each mouse, we measured the highest urine barcode level observed across all timepoints for the top 50 late-shed kidney barcodes, the top 50 late non-shed kidney barcodes, and 50 randomly selected urine barcode controls, and compared these values across groups (Figure 6; Table S17).

We found that late-shed kidney barcodes consistently reached higher peak urine levels than both late non-shed kidney barcodes and random urine barcode controls across all four mice (Figure 6). The distributions for late-shed kidney barcodes were shifted upward on a log scale, indicating a greater propensity to achieve high-amplitude urine detection at least once during infection.

In contrast, late non-shed kidney barcodes rarely reached high urine levels, even when considering their single largest detection event. Random urine barcode controls showed comparable distributions, with occasional low-level peaks but little evidence of high-amplitude shedding. These results indicate that high-amplitude urine shedding is a selective property of late-shed viruses, not a generic consequence of detection in urine.

3.6. Top Smoldering Barcodes Are Selectively Enriched Among Late-Shed Barcodes

Persistent, recurrent urine detection (“smoldering”) could reflect either stochastic barcode appearance or preferential activity of viral lineages associated with late shedding. We operationally defined smoldering based on detection frequency across urine timepoints. For each mouse and barcode, we calculated the fraction of urine timepoints in which that barcode was detected above the barcode-level detection threshold (weighted barcode count ≥10, where the weighted barcode count reflects the barcode’s read count after assignment to the closest matching stock barcode, with counts divided evenly among stock barcodes when a sample barcode matched multiple stock barcodes at the same minimum edit distance; see “Section 2.6”). Using this metric, we defined the top smoldering barcodes as the 50 barcodes per mouse with the highest detection frequency and tested whether these barcodes were enriched among late-shed or late non-shed kidney barcodes compared with a size-matched random urine barcode baseline (Figure 7; Table S18).

Across all four mice, we found that a substantially larger fraction of top smoldering barcodes belonged to the late-shed group than to the random urine control group (Figure 7). In contrast, we observed little to no enrichment of top smoldering barcodes within the late non-shed group relative to random controls. This demonstrates that smoldering behavior is not randomly distributed across viral barcodes but is instead selectively enriched among kidney barcodes that ultimately contribute to late urinary shedding.

3.7. Late-Shed Barcodes Already Exhibit Elevated Smoldering During Early Infection

While the analysis above tested whether barcodes with frequent detection across urine timepoints (“smoldering”) are enriched among late-shed kidney barcodes, it did not address whether this behavior is already apparent early after infection. To distinguish between early-established versus late-emerging shedding behavior, we next asked whether barcodes that ultimately dominate late urine already exhibit elevated detection during the early phase of infection. We partitioned each mouse’s urine time course by day, defining the late phase as the final quartile of urine sampling timepoints closest to the time of sacrifice (≥day 74 for mice FL, FR, and ML; ≥day 45 for mouse MR) and the early phase as all preceding.

For each mouse, we quantified early smoldering for three barcode groups: (i) the top 50 late smoldering barcodes, ranked by detection frequency above the barcode-level detection threshold (weighted barcode count ≥ 10; see “Section 2.6”) during the late phase; (ii) the top 50 late high-shedding barcodes, ranked by total urine barcode level during the late phase; and (iii) a size-matched random urine barcode control group. For each barcode, we calculated the fraction of early-phase days in which it was detected above the detection threshold (Figure 8; Table S19).

We found that both late smoldering and late high-shedding barcodes exhibited substantially higher early detection frequencies than random urine barcodes across all four mice (Figure 8). These results indicate that viral barcodes that ultimately dominate late urine, whether defined by persistent detection across urine timepoints or by high cumulative barcode level in urine, are already disproportionately detected in urine during early infection. Thus, by multiple criteria, we conclude that urinary shedding fate is established early during infection.

3.8. The Late Non-Shed Kidney Barcode Population Shows Weak Association with Blood or Other Tissues

In our previous work, barcode relative abundance showed poor correlation across organs within the same mouse, indicating that productive viral populations are largely tissue-localized rather than freely exchanged [30]. In this study, we asked whether this principle extends to functionally distinct kidney barcode populations. To address this, we treated late-shed and late non-shed kidney barcodes as two pseudo-organ compartments and examined correlations in barcode relative abundance between these kidney-defined groups and other tissues within each mouse (Figure 9).

Across all four mice, we observed weak correlations between barcode relative abundance in both late-shed and late non-shed kidney groups and abundance in non-kidney tissues (Figure 9). Although some barcodes detected at high levels in one tissue were also detectable elsewhere, the identities of the most abundant barcodes differed between compartments, indicating that kidney-resident viral genomes segregate into functionally distinct populations whose abundance patterns are largely decoupled from those of other tissues.

Among the three mice for which whole-blood samples were available (FL, FR, and ML), we found little to no correlation between blood barcode abundance and late non-shed kidney barcodes, arguing against continuous hematogenous seeding of the silent kidney reservoir. Barcode relative abundance within both late-shed and late non-shed kidney populations also showed low correlation between animals, indicating that kidney barcode fates are independently established within each host (Figure S1).

3.9. Shedding and Non-Shedding Fates Arise Independently of Barcode Sequence Composition

Differences in late shedding behavior could arise from intrinsic features of the barcode sequence itself. We tested this possibility by examining multiple sequence features, including GC content (Figure S2; Table S20 for late-shed and Table S21 for late non-shed barcodes), barcode length (Figure S3; Table S22), and predicted host and viral miRNA seed matches (Figure S4; Tables S23 and S24).

For GC content and barcode length, we analyzed the top 50 kidney late-shed and top 50 kidney late non-shed barcodes per mouse. Across all mice, the GC-content distributions of late-shed and late non-shed barcodes closely overlapped with the global barcode distribution, indicating no enrichment for unusually high or low GC content (Figure S2). Barcode length distributions were also nearly identical between late-shed and late non-shed barcodes across all four mice, with no enrichment for specific barcode lengths (Figure S3).

Since we designed our barcodes to be incorporated into both early and late mRNAs in addition to tagging individual episomes, we determined whether host or viral miRNAs could be responsible for the different late-shed and late non-shed kidney virus populations. To determine whether miRNA-mediated targeting could bias shedding fate, we examined predicted seed matches for both host murine miRNAs and the muPyV-encoded miRNA across kidney-resident barcodes classified as late-shed or late non-shed. Specifically, miRNA seed matches were defined as perfect Watson–Crick complementarity to nucleotides 2–8 from the 5′ end of the mature miRNA. For host top-100 kidney abundant miRNAs [69], we quantified the fraction of kidney barcodes containing at least one predicted miRNA seed match, as well as the fraction of total kidney viral abundance attributable to miRNA-targeted barcodes, and compared these values between late-shed and late non-shed groups on a mouse-by-mouse basis. Across all four mice, both unweighted and abundance-weighted measures of host miRNA targeting were highly similar between late-shed and late non-shed barcode populations and fell within null distributions generated by length- and GC-matched resampling controls (see “Section 2.17”), indicating no enrichment or depletion beyond that expected from barcode sequence composition alone (Figure S4). In contrast, predicted targeting by the muPyV-encoded miRNA was extremely sparse, with only a very small number of kidney barcodes (0 to 4) exhibiting a predicted seed match across mice and groups (Table S24).

These results indicate that barcode GC content, predicted miRNA targeting, and barcode length do not explain the divergence in shedding fate, supporting the conclusion that shedding and non-shedding behaviors arise from host- and infection-dependent processes rather than intrinsic properties of the barcode sequence.

4. Discussion

PyV persistence is often framed at the level of the host, yet our barcode-resolved analyses show that within the kidney, individual viral genomes follow distinct and stable long-term trajectories. Across four mice, we consistently observe two kidney-resident barcode populations: a minority that repeatedly contributes to urine over long-term infection and a majority that persists in kidney without detectable contribution to late urine. This organization is not a trivial consequence of kidney barcode abundance, since late-shed and late non-shed barcodes span overlapping abundance ranges and many high-abundance kidney barcodes remain non-shedding. Nor is it explained by initial inoculum representation, as late-shed and late non-shed barcodes exhibit broad, overlapping stock-rank distributions. These findings indicate that long-term shedding behavior emerges during infection in vivo, rather than reflecting input abundance or kidney copy number.

A central observation of this study is that long-term urinary shedding reflects stable, barcode-resolved behavior rather than transient aggregate effects. Late-shed kidney barcodes were not simply detected at the final timepoints used for classification; instead, they tended to recur across multiple longitudinal urine samples, often forming extended shedding trajectories that were clearly distinguishable from the sparse, sporadic, near-random appearance of late non-shed kidney barcodes. Consistent with this, late-shed barcodes reached substantially higher peak urine levels than late non-shed or random urine controls. These results refine the population-level “smoldering plus bursts” framework from our previous study by demonstrating that persistent detection is not evenly distributed across kidney-resident genomes [30]. Rather, smoldering is selectively enriched among the kidney barcodes that also drive long-term urine output. The simplest interpretation is that a subset of kidney genomes occupies a high-shedding-competent trajectory that is established early and then maintained, producing both recurrent low-level detection and the capacity for high-amplitude episodes.

Support for early establishment comes from our finding that viral lineages dominating the late phase already exhibit elevated activity early after infection. Barcodes defined by frequent late-phase shedding (late smolderers) and those defined by late-phase total genome output (late high shedders) both showed increased detection frequencies early relative to random urine barcodes. This pattern is inconsistent with models in which the majority of high-output late shedding arises from late, stochastic activation of previously low-activity genomes [70,71,72]. Instead, it supports a framework in which early host–virus interactions set a long-term propensity to shed that becomes evident as persistent infection unfolds (Figure 10). Importantly, this stability does not imply continuous productivity. Rather, it suggests that the probability of crossing a threshold into measurable urine output differs systematically among kidney-resident genomes.

What biological mechanisms could generate a large, stable low/non-shedding reservoir alongside a smaller, repeatedly shedding-competent minority? A compelling explanation invokes latency-like or deeply restricted infection states gated by host cell state. PyV persistence has long been discussed in terms of both smoldering and latent/lytic–like infection programs, even though definitive evidence for true latency remains limited [32,73,74,75,76]. Recent work on BKPyV has shifted the field away from a purely virus-driven view of cell-cycle control toward a host-gated model, in which entry into S phase is required for robust LTAg expression and early transcription is shaped by Non-Coding Control Region (NCCR) responsiveness to host transcription factors including RB-E2F axis [36,37]. In this framework, the large late non-shed reservoir we observed is compatible with genomes persisting predominantly in host environments that rarely enter replication-competent states, whereas the late-shed minority resides in niches that intermittently permit early activation and can further reinforce replication competence through engagement of DNA-damage response pathways that prolong S phase [38,45].

Host immune signaling likely acts in parallel with cell-cycle gating to stabilize virus shedding patterns, serving as another determinant of viral fate. In particular, cell-type–specific antiviral programs, including type I interferon responses, may bias which kidney-resident environments remain deeply restricted versus permissive for activation [77]. Type I interferon signaling is especially intriguing in this regard, as it is known to stabilize latent or persistent states in other DNA viruses and has been shown to modulate PyV infection outcomes in a cell-type-dependent manner, suggesting a role in reinforcing restricted, non-shedding states rather than directly triggering reactivation [48,77,78,79,80,81]. Our data are consistent with a latency-like gating mechanism but do not establish true latency in the strict sense of reversible switching within the same cell. What we can conclude with confidence is that kidney-resident genomes behave as if partitioned into long-lived states with sharply different probabilities of producing urine-detectable output.

A second model, which our data more directly constrain, is hematogenous reseeding. Hematogenous dissemination has been proposed for human PyVs, particularly for JCPyV, where transient viremia and leukocyte-associated spread are thought to contribute to dissemination between peripheral reservoirs and the central nervous system [82,83,84,85]. If the non-shedding kidney reservoir were continuously replenished from blood, or if long-term urine contributors were repeatedly reseeded from systemic compartments, stronger coupling between barcode abundance in blood and kidney-defined populations would be expected. Instead, barcode compositions are largely tissue-local, likely due to robust IgG response [86], and blood shows little to no correlation with the late non-shed kidney population in the mice where blood was sampled. This argues against ongoing hematogenous input as a dominant driver of either the silent reservoir or the long-term shedders. Rather than being continually repopulated by incoming genomes, kidney-resident viral populations appear to be locally established and stably maintained within each host.

A third possibility is spatial or anatomical sequestration within the kidney [84,87,88,89]. In this model, many viral genomes are genuinely present but reside in regions poorly coupled to the urinary outflow, such that virions or viral nucleic acids rarely reach urine even if low-level replication occurs. Our data cannot exclude this explanation, as kidney abundance alone does not predict urinary contribution and the late non-shed population constitutes a major quantitative reservoir in most animals. Differences in PyV receptor distribution across kidney tissue types may also contribute to divergent infection outcomes within the kidney [90,91,92]. This idea is consistent with recent single-cell analyses of human BKPyV, which demonstrate ranked renal cell-type tropism rather than uniform infection across the kidney, indicating that different infected compartments may differ substantially in their likelihood of contributing to urine [47]. Lineage-resolved studies similar to ours, using barcoded influenza A virus, have demonstrated strong within-host bottlenecks and compartmentalized “islands” of replication across anatomical sites, in which local founder effects dominate viral population structure [93]. More broadly, it aligns with the idea that PyV shedding may reflect local tissue and cellular context, and that not all infected niches within a kidney-tropic infection are equally positioned to contribute to urine.

A fourth model is abortive infection at the level of individual genomes. Under this scenario, the late non-shed reservoir could be enriched for genomes that are genetically or functionally compromised, allowing persistence as DNA but rarely supporting productive replication. Our analyses argue against barcode-intrinsic explanations such as GC content, length, or predicted miRNA targeting, but they do not rule out defects elsewhere in the viral genome. Prior studies of PyVs have documented persistent viral populations containing rearranged or non-functional genomes [94,95,96,97]. Abortive-genome models therefore remain plausible, particularly if early immune pressure or cellular restriction favors survival of defective genomes. Any such model, however, must still account for the striking contrast between a small subset of barcodes that repeatedly achieves high peak urine levels and persistent detection and a much larger population that does not.

Our findings support a framework in which persistent kidney infection is not a homogeneous pool of equivalent genomes, but a structured ecosystem composed of at least two long-lived behavioral classes. The large late non-shed reservoir provides a plausible substrate for cryptic persistence, while the high-shedding-competent minority supplies sustained and episodic urinary output. This organization bridges classical descriptions of intermittent viruria with emerging host-gated models of PyV activity: persistent infection can appear intermittent at the host level because only a small, stable subset of genomes repeatedly reaches detectable shedding, while most remain abortive, spatially sequestered or silenced.

5. Limitations of This Study

This study is limited by sample size and by the indirect nature of urine as a readout of kidney biology. Although the four-mouse dataset provides internal replication of the core qualitative conclusions, larger cohorts will be required to assess sex effects, inter-host variability, and the generality of the observed proportions of shedding versus non-shedding reservoirs. This is particularly relevant given extensive evidence that sex hormones and endocrine-immune interactions can shape viral persistence and tissue-specific infection dynamics, including in muPyV models [98,99,100]. In addition, our classifications rely on urine sampling density and on defining “late shedding” using final timepoints proximal to harvest; more frequent urine sampling and time-resolved kidney measurements would improve sensitivity to rare events and better distinguish persistent low-level output from sporadic bursts. Our study attempted to model PyV persistent infection in immunocompetent hosts, but since PyV infections are associated with serious diseases in immunosuppressed populations [6,82,87,101,102], it will be important to probe persistence and shedding dynamics in immunocompromised hosts and other models of muPyV disease models [103,104,105]. In this context, a future direction will be to explore how clinically relevant immunosuppression, such as that used in transplantation settings, alters these lineage-resolved dynamics, and to determine how this viral ecosystem behaves if an infected kidney is transplanted into a naïve immunosuppressed host or placed into ex vivo culture.

In clinical BKPyV-Associated Nephropathy (also known as “BKVAN”), particularly in immunosuppressed transplant recipients, infection is often focal and characterized by tubular epithelial cells with “ground-glass” enlarged nuclei due to extensive viral replication, and such infected cells (decoy cells) can be shed into the urine [106]. Consequently, a portion of the PyV DNA detected in urine may derive from whole infected cells rather than exclusively from free virions, although at least some of the DNA detected is associated with infectious viruses [30]. For our barcode urine analysis, we do not currently distinguish between cell-associated and virion-associated viral genomes in urine samples, and determining the fraction of barcode-resolved genomes present within shed cells versus free virions will be important for future mechanistic interpretation of urinary shedding dynamics. Finally, our barcode sequencing captures lineage identity and abundance but does not reveal full genome sequence, infected cell type, kidney microanatomical location, host cell state, or viral genome integrity. As a result, we cannot definitively discriminate among latency-like restriction, spatial sequestration, and abortive-genome models, which may coexist. Future studies linking barcode identity to cell state and spatial location, together with full-genome sequencing of shedding-competent and non-shedding reservoirs, will be essential for identifying the host and viral determinants that place a genome on a long-term shedding trajectory.

Bibliography106

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1White M.K. Khalili K. Polyomaviruses and Human Cancer: Molecular Mechanisms Underlying Patterns of Tumorigenesis Virology 200432411610.1016/j.virol.2004.03.02515183048 · doi ↗ · pubmed ↗
2Di Maio D. Small Size, Big Impact: How Studies of Small DNA Tumour Viruses Revolutionized Biology Philos. Trans. R. Soc. Lond. B Biol. Sci.20193742018030010.1098/rstb.2018.030030955494 PMC 6501907 · doi ↗ · pubmed ↗
3Pipas J.M. DNA Tumor Viruses and Their Contributions to Molecular Biology J. Virol.201993 e 01524-1810.1128/JVI.01524-1830814278 PMC 6475781 · doi ↗ · pubmed ↗
4Prado J.C.M. Monezi T.A. Amorim A.T. Lino V. Paladino A. Boccardo E. Human Polyomaviruses and Cancer: An Overview Clinics 201873 e 558s 10.6061/clinics/2018/e 558s 30328951 PMC 6157077 · doi ↗ · pubmed ↗
5Fanning E. Zhao X. Jiang X. Polyomavirus Life Cycle DNA Tumor Viruses Damania B. Pipas J.M. Springer New York, NY, USA 2009124
6Chang Y. Moore P.S. Merkel Cell Carcinoma: A Virus-Induced Human Cancer Annu. Rev. Pathol. Mech. Dis.2012712314410.1146/annurev-pathol-011110-13022721942528 PMC 3732449 · doi ↗ · pubmed ↗
7Arora R. Chang Y. Moore P.S. MCV and Merkel Cell Carcinoma: A Molecular Success Story Curr. Opin. Virol.2012248949810.1016/j.coviro.2012.05.00722710026 PMC 3422445 · doi ↗ · pubmed ↗
8Kamminga S. van der Meijden E. de Brouwer C. Feltkamp M. Zaaijer H. Prevalence of DNA of Fourteen Human Polyomaviruses Determined in Blood Donors Transfusion 2019593689369710.1111/trf.1555731633816 PMC 6916541 · doi ↗ · pubmed ↗