Drivers of Viral Diversity and Sharing in Marine Mammals
Matthew J. Arnold, Laura M. Bergner, Haris Malik, Mariel ten Doeschate, Nicholas J. Davison, Andrew Brownlow, Nardus Mollentze, Simon A. Babayan, Daniel G. Streicker

TL;DR
This study explores the diversity and sharing of viruses in marine mammals, revealing that age and ecological interactions influence viral transmission more than host taxonomy.
Contribution
The study introduces a large-scale metatranscriptomic analysis of viral communities in marine mammals, highlighting the role of age and ecology in viral sharing.
Findings
Viral sequences were detected in nearly all sampled pools, representing over 120 distinct viral taxonomic units.
Juvenile marine mammals had significantly higher viral diversity compared to adults and neonates.
Viral sharing between species mirrored ecological interactions, including cross-order sharing between seals and cetaceans.
Abstract
Knowledge of viral infection in marine mammals, a group severely threatened by human activity, is largely limited to the pathology and epidemiology of few endemic viruses. The recent emergence in marine mammals of high‐consequence viruses, such as H5N1 avian influenza and rabies, underscores the importance of understanding the ecology of viral transmission in these species. Metatranscriptomic approaches now enable relatively unbiased characterisation of full viral communities that can reveal ecological and evolutionary drivers of infection. We sequenced RNA from 15 marine mammal species (42 pools, 237 tissues, 128 animals) sampled in Scotland through the Scottish Marine Animal Strandings Scheme. Viral sequences were detected in 41 of 42 pools, representing more than 120 distinct viral taxonomic units (vOTUs). Virus host network analysis showed that viral communities were partly…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
FIGURE 1
FIGURE 2
FIGURE 3- —Wellcome Trust10.13039/100010269
- —Biotechnology and Biological Sciences Research Council10.13039/501100000268
- —Leverhulme Trust10.13039/501100000275
- —Natural Environment Research Council10.13039/501100000270
- —Bill and Melinda Gates Foundation10.13039/100000865
- —Medical Research Council10.13039/501100000265
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMarine animal studies overview · Rabies epidemiology and control · Ichthyology and Marine Biology
Introduction
1
Beyond their more obvious and immediate impacts, climate change and habitat destruction have altered the ecology of viral transmission in wild species. Evidence for this has emerged through the study of viral communities in wild plants and animals (Susi and Laine 2021; Campbell et al. 2020; Hermanns et al. 2023). In turn, these data can also highlight parameters useful for understanding how newly introduced viruses may spread within species or ecosystems. Metagenomic sequencing now offers a powerful tool to broaden the range of viruses detected in a specific ecosystem and understand the processes underlying viral diversity and community composition. As well as identifying viruses as potential sources of disease (e.g., Rubio‐Guerri et al. 2015), viral diversity characterised by metagenomics has revealed demographic drivers of infection including age (Bergner et al. 2020; Hill et al. 2023) and sex (Feng et al. 2022), environmental factors (e.g., habitat preference (Bergner et al. 2020; Geoghegan et al. 2021)) and seasonal variation (Raghwani et al. 2023). However, these studies are labour‐ and cost‐intensive, partly due to the difficulty of collecting large numbers of samples from free‐living animals.
Studying these ecological drivers may allow the identification of commonalities between systems: for example, juvenile animals appear to have higher viral diversity than adults in birds (Hill et al. 2023; Wille et al. 2021) and bats (Bergner et al. 2020). However, there is also growing evidence that drivers of viral diversity and community composition vary between systems. For example, the degree of cross‐species transmission measured by multi‐host metagenomic sequencing studies varied from extensive in some ecosystems (French et al. 2023; Costa et al. 2024) to almost non‐existent in another (Costa et al. 2023). Furthermore, habitat disturbance can affect pathogen diversity, although the direction of this influence varies depending on the system in question (Campbell et al. 2020; Hermanns et al. 2023). This underscores the role of host ecology in determining the emergence and prevalence of viruses within wild animal populations and highlights the importance of characterising these properties across a diverse range of systems to identify generalisable principles. Previous studies have focused primarily on taxa typically associated with viral spillover to humans and economically important taxa (Kwok et al. 2020). Ecosystem‐ and community‐level approaches have been employed in some systems (French et al. 2023; Costa et al. 2023), but are rare, and large gaps remain.
In general, large free‐living mammals have been overlooked in the field of viral community ecology, likely due in some part to the difficulty of acquiring suitable samples. Novel and innovative methods for surveying difficult‐to‐access species are emerging (Geoghegan et al. 2018; Massey et al. 2022; Kocher et al. 2017; Drinkwater et al. 2021; Mwakasungula et al. 2022) but have significant practical limitations. For example, some studies have collected biological samples using drones, but this is costly, has a low success rate and may cause stress in the target animals (O'Mahony et al. 2024). This means that the viral ecology of animals in this niche, with large bodies, long lives, long gestation and nursing periods and often feeding at high trophic levels, remains poorly understood. Marine mammals represent a particularly interesting case because, as well as fitting firmly into this niche, they also exhibit a number of unusual characteristics compared to other species studied for viral ecology. First, they display a gradient of habitat preferences from partially terrestrial to entirely oceanic, manifesting as a gradient of separation from phylogenetically related terrestrial species. Second, all feed at high trophic levels (Morissette et al. 2006; Blanchet et al. 2019; Rhodes‐Reese et al. 2021; Rupil et al. 2022), meaning that their populations are small but highly connected to prey species. Prey species vary by species and population from other marine mammals to fish and cephalopods, representing a range from high to low likelihood of trophic viral transmission. Third, many species are highly social, forming large groups, sometimes seasonally, likely accelerating the transmission of viruses. Furthermore, these often encompass individuals from multiple species (Syme et al. 2021), presenting an unusual opportunity for viral cross‐species transmission. Finally, they exhibit a range of movement ecologies, from year‐round stationary populations to annual transoceanic migrations. Combined with the lack of hard barriers in the ocean, this provides opportunities for movement and long‐distance viral transmission on a scale not seen among terrestrial mammals.
Furthermore, through occupying the highest trophic level in ocean food webs, marine biodiversity and ecosystem health depend on marine mammals. In turn, these ecosystems provide food and livelihoods for human populations globally (FAO 2022). Through this pivotal role in food webs, marine mammals also act as sentinels for environmental change, providing a barometer for the health of entire marine ecosystems (Bossart 2011). A growing body of evidence supports a pronounced threat to these species from human activity, from pressures on food supplies by fisheries (Jusufovski et al. 2019; Rupil et al. 2022), to increased contaminant burdens in seawater (Jepson et al. 2016), to rising sea temperatures (Simmonds and Isaac 2007). These processes impact the susceptibility of populations to infection, directly (in the case of chemical contaminants (Williams et al. 2025)) and indirectly, in the form of increased stress. Threatened populations may also have an elevated risk from infectious disease (Pedersen et al. 2007; Heard et al. 2013). Future conservation efforts for these species may therefore rely ever more heavily on an understanding of infectious disease dynamics in these populations (Gulland et al. 2022).
Detailed knowledge of viral ecology in marine mammals is restricted to epidemiology of specific viruses causing severe disease and thus leading to conservation concerns. For example, phocine distemper virus (PDV; genus: Morbillivirus, family: Paramyxoviridae) has been the subject of extensive PCR and serosurveillance leading to an ecological model of introduction and transmission in one ecosystem (VanWormer et al. 2019), enabled by the fact that affected species spend time on land, allowing sample collection. In contrast, cetacean morbillivirus (CeMV; genus: Morbillivirus, family: Paramyxoviridae) causes similar unusual mortality events in cetaceans (Vigil et al. 2024), but the epidemiology of this virus is extremely poorly characterised (Jo et al. 2018) due to the practical difficulties of surveillance in cetacean populations. This focus on high‐consequence pathogens stems from the fact that marine mammal species are legally protected and often endangered. However, focussing on single viruses, often limited in host range, may overlook broader trends that would provide context for novel pathogens emerging for the first time in these systems. The risks of viral spillover in marine mammals are exemplified in recent outbreaks of highly pathogenic avian influenza (HPAI; genus: Alphainfluenzavirus, family: Orthomyxoviridae, Leguia et al. 2023; Uhart et al. 2024) and rabies virus (RABV; genus: Lyssavirus, family: Rhabdoviridae; Department: Agriculture, Land Reform and Rural Development 2024).
Viral communities have been characterised for marine mammals, but these studies are limited to single animals (Rubio‐Guerri et al. 2015; Mifsud et al. 2024), single species (Li et al. 2011; Rosales and Vega Thurber 2015; Geoghegan et al. 2018; Martínez‐Puchol et al. 2022; Butkovic et al. 2023; Zhao et al. 2023; Karamendin et al. 2024; Prado et al. 2025; Holdsworth et al. 2025) or, in one case, a small number of individuals from 2 closely related species (Kluge et al. 2016), and have focused more on characterising the viruses present, often as a response to the presentation of clinical signs in monitored populations. Additionally, samples have often been collected from captive animals, limiting their usefulness for characterising population or community‐level processes. By broadening this approach and assessing viral communities across a range of species of marine mammals, it may be possible to identify high‐level host evolutionary and demographic drivers of community diversity, composition and sharing. This, in turn, could provide insights into the possible role of demographic groups or taxa in transmitting emerging viruses and how changes in population structure and species composition might affect the diversity and sharing of viruses in marine ecosystems.
To address this knowledge gap, we undertook metatranscriptomic sequencing to characterise viral communities of 237 tissue samples from 15 marine mammal species. Collaborators responsible for investigating stranded marine mammals collected these samples as part of their routine investigations, allowing access to a wide range of species over a 4 year time‐period. Working with these samples allowed unparalleled access to protected species while maximising the value of previously conducted fieldwork. Using the viral sequence data acquired, we investigated (a) how host species and demography shape the diversity and composition of viral communities and (b) evidence for cross‐species transmission of viruses between marine mammal species, families and orders.
Materials and Methods
2
Sample Selection
2.1
Samples were provided by the Scottish Marine Animal Strandings Scheme (SMASS), the body responsible for necropsy of stranded marine mammals in Scotland, which has been archiving suitable samples since 2016. Due to biosafety constraints, this study used samples collected before 2020, when the first recorded case of highly pathogenic avian influenza (HPAI; clade 2.3.4.4b) was diagnosed in seals in Scotland (Bird Flu (Avian Influenza) 2024). These factors combined resulted in a study period of 2016–2019, inclusive.
Archived samples were selected to maximise coverage of species and demographic classes (age and sex). All samples from carcasses in decomposition category 2a (freshly dead at time of necropsy), 2b (slight decomposition) and 3 (moderate decomposition; see Mazzariol and Centelleghe 2017) were considered. From among these, all code 2a and 2b cases were selected for all species. In species which remained under‐represented, code 3 cases were also included, resulting in a total of 128 individuals from 15 species. To maximise diversity of viruses sampled, we selected spleen (n = 116) and lung (n = 122) samples from all cases fulfilling the above criteria. Lung was chosen due to the key role of this organ in infection with clinically significant viruses in this system (influenza, morbillivirus, adenovirus and herpesvirus). Spleen has also been shown to contain detectable levels of a number of viruses (including morbillivirus) by PCR and metagenomics (Kane et al. 2024) and was the next most commonly collected tissue in this sampling period. All samples used were negative by PCR for CEMV, PDV and herpesvirus spp.
Additionally, we obtained a positive control sample from outside of the study period from the Moredun Research Institute. This consisted of homogenised brain tissue from a PCR‐positive CeMV case in stranded cetaceans from the United Kingdom.
Sample Preparation
2.2
Nucleic Acid Extraction and Pooling
2.2.1
Sub‐samples of each tissue were provided by SMASS in aliquots of DNA/RNA Shield (1×, Zymo Research) and extracted separately. Positive controls provided in RNAlater (ThermoFisher) were mixed with an equal volume of DNA/RNA Shield. The samples were homogenised by vortex agitation (Vortex Genie 2, Scientific Industries) at approximately 1200 RPM in 0.5 mm ceramic bead‐bashing tubes (Zymo ZR BashingBeadLysis tubes) for 15 min. Following homogenisation, proteinase K (Zymo Research) digestion was carried out overnight at 25C. Digested samples then had RNA and DNA extracted in tandem using the Zymo Research Quick DNA/RNA Miniprep Plus Kit, following the manufacturer's protocol for tissue samples. Before library preparation, quality assurance of extracted RNA was performed. Concentration was measured using a Qubit fluorometer (Invitrogen) and Qubit RNA High Sensitivity dye assay (ThermoFisher). For samples included in the pilot run, fragment size analysis was also carried out by electrophoresis on an Agilent TapeStation using the High Sensitivity RNA ScreenTape kit (Agilent). Extracts were pooled maintaining demographic and taxonomic separations of interest (see Supporting Information S1).
Library Preparation and Sequencing
2.2.2
Pilot run pools were sequenced by Novogene using an Illumina NovaSeq X‐Plus sequencer, generating paired‐end (PE) 150 nucleotide reads to a target depth of 6 gigabases (20 million reads) per pool. Based on detection rates of spike‐in CeMV controls (manifested as numerous short contigs with moderate coverage) in the positive control pools and the results of fragment size analysis, we decided to sequence to higher depth for the main experimental sequencing run. Thus, the main run was prepared and sequenced as described, except for adjusting sequencing depth to 24 gigabases (80 million reads per pool) to improve detection of low‐abundance and possibly degraded viral material in these pools.
Bioinformatic Analysis
2.3
Using a bioinformatics pipeline we developed for this project (MetagenOmic BIoinformatics—Diamond Informed Classification and Krona plotting; MOBI‐DICK; https://github.com/mattarnoldbio/MOBI‐DICK), the data were subjected to the following processing. First, preprocessing was carried out to remove adapters and low‐quality score base‐calls from the raw sequencing data using trim_galore v0.6.10 (Krueger et al. 2023; a wrapper calling cutadapt (Martin 2011) and fastQC (Andrews 2010)) with default settings, setting a relatively lenient quality score threshold of 25. Next, prinseq v0.20.4 (Schmieder and Edwards 2011) was used to mask low complexity regions and remove PCR duplicates (settings: low complexity masking using dust, threshold 7; complete deduplication options 1–5) to improve efficiency and fidelity during de novo assembly.
After data cleaning, host‐associated reads were removed by mapping all reads against an indexed reference genome using bowtie2 v2.4.4 (Langmead and Salzberg 2012), using local alignment with default parameters. As reference genomes were not available for all host species, representative reference genomes were assigned to each species: baleen whale libraries used the blue whale ( Balaenoptera musculus ) genome (National Centre for Biotechnology Information (NCBI) accession GCF_009873245.2); toothed whale, dolphin and porpoise libraries used the common bottlenose dolphin ( Tursiops truncatus ) genome (NCBI accession GCF_011762595.1) and seal libraries used the Hawaiian monk seal (Neomonachus schauinslandii) genome (NCBI accession GCF_002201575.2). These reference genomes were selected as the most complete representative of each taxonomic group. The reads which did not map to the host genome were then assembled de novo using megahit v1.2.9 (Li et al. 2015) with the ‘–meta‐sensitive’ flag.
The taxonomy of contigs was estimated by performing a search against the NCBI non‐redundant (nr) protein sequence database (accessed November 2023) using diamond BLASTX v2.0.8 (Buchfink et al. 2021) with default parameters to search using protein sequences for all six possible reading frames for each contig. No filtering by e‐value or bit score was imposed at this stage, to avoid incorrectly discarding viral contigs. Next, visualisations were created using kronaTools v2.8.1 (Ondov et al. 2011) for each library, and taxonomic assignments were filtered to separate viral and non‐viral hits. Following this, a further database query was conducted in an attempt to eliminate false positives arising from poorly annotated entries in the nr database. For this search, contigs which were identified as viral material by diamond BLASTX were queried against the NCBI nucleotide (nt) database (accessed November 2023) using blastn (Altschul et al. 1990) with default parameters to search for hits to the forward and reverse nucleotide sequences. All contigs which matched a non‐viral sequence at this stage were discarded from further analysis. During this filtering step, contigs identified as viral were also separated into candidate vertebrate‐infecting and non‐vertebrate‐infecting viruses using a curated list of viral families based on the VIRION database (Carlson, Gibb, et al. 2022), a curated and maintained database of virus‐host associations. Finally, the contigs were filtered at three different length cut‐offs (150 nucleotides, 250 nucleotides and 400 nucleotides), in order to assess the sensitivity of downstream analyses to any possible false detections introduced by contig length.
Assignment to Operational Taxonomic Units and Assessment of Viral Sharing
2.4
In order to identify viruses infecting individuals across multiple pools and species, viruses were assigned to finer‐grained viral operational taxonomic units (vOTUs; Figure S1). To do this, we separated contigs by assigned viral genus, treating all contigs from the same pool assigned to a viral family but not to a genus as the same artificial genus. These genus‐grouped contigs were then clustered based on nucleotide sequence identity using the cascaded clustering algorithm implemented in MMSeqs2 easy‐cluster (Steinegger and Söding 2017) with a range of percentage identity thresholds (80%, 82.5%, 85%, 87.5%, 90%, 92.5%, 95%). This range was selected based on literature suggesting that although no single threshold can partition sequences from the same viral species together, values for approximately‐species level clustering of all families tested fell within this range (Tian et al. 2024). After clustering contigs, clusters were merged for each viral genus such that if a single pool fell into separate clusters with other pools, all of those clusters were considered to represent the same vOTU (Figure S1). This strategy assumes that contigs from the same pool assigned to the same viral genus are derived from the same source, even if no overlapping sequence was present to assess percentage identity. While this may not always represent the truth (e.g., in the case that a circulating virus is found in multiple conspecifics in the same period, so multiple infected individuals contribute extracts to the same pool), there is no way to assign reads below the pool level, so this assumption represents the most cautious reading of the data, as it is unlikely to lead to an overestimate of viral diversity. This strategy also ensures that contigs from a segmented virus (e.g., genera Rotavirus, Betanodavirus) present in the same pool are assigned to the same vOTU. This was repeated for all three contig length cut‐offs (see Section 2.3). As clustering showed little variation between percentage identity thresholds on this dataset, subsequent analyses were performed using the threshold of 95%. To assess the similarity of viruses in pools which had been clustered together at this threshold, we then performed pairwise alignments of all member contig sequences separately for each vOTU using MMSeqs2 easy‐align at the nucleotide level and extracted the highest percentage identity match for each pairwise combination of pools within a single vOTU.
Unipartite viral sharing networks were visualised using the ‘network’ function from the sna R package (version 2.8). In these networks, nodes represent individual pools, with edges showing vOTUs shared between pools, and edge thickness showing the number of vOTUs shared.
Statistical Analysis of Viral Communities
2.5
All of the following analyses were performed in R (version 4.4.1).
Rarefaction and Accumulation Curves
2.5.1
Rarefaction and accumulation curves were plotted to assess the completeness of vertebrate viral communities characterised at the sample and study level. All analyses were performed at the viral family, viral genus and vOTU level. For the purposes of rarefaction, we considered a single viral read mapping to contigs from a viral taxon as a detection. As the downstream analyses all consider vertebrate infecting viral taxa only, rarefaction and accumulation analyses consider only these taxa. Furthermore, samples with only one vertebrate viral taxon detected were excluded as these are not valid targets for rarefaction. To perform this analysis, contingency tables mapping reads from all contigs in each sequencing pool to viral taxa (families or genera) were constructed. These tables were then used to plot rarefaction and accumulation curves using, respectively, the rarefy and specaccum functions of the R package vegan (v. 2.6‐6.1).
Community Composition
2.5.2
To assess the degree to which experimental, demographic and host evolutionary variables impacted which viruses comprised viral communities of sequencing pools, a permutational analysis of variance (PERMANOVA) test was used to investigate association. First, we extracted a presence/absence matrix with rows corresponding to sequencing pools and a column for each vOTU in the dataset, and cells therefore corresponding to whether each vOTU occurred in each pool. Community composition was assessed using this matrix. A distance matrix was calculated using vegdist (vegan version 2.6‐6.1) using the ‘raup’ method for binary presence/absence data. The sex variable was transformed to a proportion of samples in a pool coming from female individuals. The number of raw sequencing reads was scaled using the scale function of base R (version 4.3.1). Associations between experimental (number of reads from the pool, number of samples in the pool), demographic (proportion of female animals, life stage) and host evolutionary (summarised as host species, or host family) variables and the composition of viral communities were tested using PERMANOVA (adonis2 function of the vegan package) with 10,000 iterations. Clustering of community composition was visualised using principal coordinate analysis (PCoA; ape version 5.8), and visualised, and kernel density estimation was applied. These analyses were also performed separately for non‐vertebrate‐infecting vOTUs (i.e., vOTUs from viral families that do not infect vertebrates). The presence of non‐vertebrate‐infecting viruses (e.g., bacteriophages and viruses of fungi and plants) is not dependent on their ability to directly infect the marine mammal host. As such, these viruses provide a baseline for distinguishing factors specific to virus–marine mammal interactions from background patterns driven by environmental exposure and other stochastic processes in this and subsequent analyses.
The viral community distance matrix was also compared to estimated host phylogenetic distance to test whether more related hosts had more similar viral communities. The host phylogeny was approximated using Timetree (Kumar et al. 2017) and phylogenetic distances calculated using the cophenetic.phylo function of R package ape. These distances were then standardised by dividing by twice the standard deviation and transformed to a pairwise distance matrix. The correlation of viral community composition and host phylogenetic distance matrices was then calculated using a Mantel test (function mantel, R package vegan). Correlations were also calculated separately for host orders, that is, all cetacean (order: Artiodactyla) and all seal (order: Carnivora) pools.
Community Richness
2.5.3
Generalised linear models (GLMs) were used to test for association between experimental, demographic, host evolutionary and ecological covariates and viral richness. Viral richness was defined as the total number of unique vOTUs present in a pool, derived from the row sums of the presence/absence matrix (see Section 2.5.2).
Data were then filtered to exclude mixed age class pools for all analyses considering age class as a variable. We fitted Poisson distributed GLMs for total viral richness using the glm function from the stats package (version 4.4.1) in R. In these models we included all covariates combined, with effects for demographic covariates (life stage, proportion of female individuals in the pool), host taxonomy, ecological covariates (social group size and interspecies interaction) and to control for experimental covariates (number of samples in a pool, number of raw reads in a pool). We also built models including interaction effects for host taxonomy with life stage, and sex with life stage. To minimise the risk of overfitting, we summarised host taxonomy at the coarse level of host family. In addition to the above covariates, we also trialled two species‐wise summaries of behaviour, derived from discussions of behaviour with marine mammal‐specialist colleagues: whether the average group size is greater or less than 10; and whether the species was commonly sighted interacting with other marine mammal species. Testing for multicollinearity by calculating variance inflation factors using the performance package (version 0.12.3) showed a problematic degree of variance inflation between these additional ecological covariates and any summary of host taxonomy, so these were excluded from all subsequent models. Using Pearson Chi‐squared tests of residual dispersion performed using the DHARMa (version 0.4.6) package in R, the final models showed no problematic residual over‐ or under‐dispersion. We then assessed the significance of model covariates using analysis of variance (ANOVA) from the car package (version 3.1‐2) to perform likelihood ratio χ ^2^ tests. We also tested the association between the above covariates and richness of non‐vertebrate infecting vOTUs. Residual diagnostics using DHARMa suggested overdispersion in models using a Poisson error distribution, so a negative binomial distribution was selected for this model. Post hoc testing was conducted as described above.
To test for associations between demographic, taxonomic, ecological and experimental factors and the degree of viral sharing, GLMs were also constructed based on properties of virus sharing networks to determine if some pools were more connected than others. Node degree (i.e., the number of edges connected to a node, here representing the number of pools with which the pool in question shares viruses) was used as the response variable, with fixed effects for demographic and methodological covariates as described above.
In addition to performing these analyses using vOTUs (defined above), we also conducted the same analyses using viral genera. For these purposes, all contigs where genus was not assigned were excluded. In all analyses, the effect sizes and significance estimates varied little between vOTU and viral genus, so the results are reported for vOTUs as the more high‐resolution measure of taxonomy.
Results
3
Sequencing Dataset: Read Distribution and Viral Richness
3.1
42 metatranscriptomic libraries were sequenced, generating a total of 3,835,460,326 raw reads (52,713,560 to 130,963,975 reads per experimental pool). Two positive control (CeMV positive tissue spike‐in) pools and two negative control pools were also sequenced, generating an additional 416,325,344 raw reads. After quality control and removal of reads mapping to host reference genomes, a median of 4.7% of the raw reads (range 0.61%–63%) remained for further analysis (Figure S2). Positive controls were relatively low in this range, with 1.7% and 1.16% high quality unmapped reads remaining, respectively. Assembling unmapped reads de novo yielded 21,485,277 contigs, of which homology to the nr (non‐redundant amino acid sequence) database identified 13,625 as viral. Length filtering for contigs of > 250 nucleotides reduced this to 4252, after which 645 false positives were identified by homology at the nucleotide sequence level to the NCBI nt database, an exhaustive database of nucleic acid sequences. Finally, the contigs were separated into likely vertebrate infecting viruses and likely non‐vertebrate infecting viruses based on viral family level associations derived from host labels in the VIRION database. All sequencing pools contained material associated with both vertebrate and non‐vertebrate infecting viruses, except one, which contained only non‐vertebrate viral hits. Non‐vertebrate viral genetic material was more common, with a total of 888 probable vertebrate‐associated contigs (median length after filtering: 358 nt, range: 250–11,641 nt) versus 2652 non‐vertebrate associated contigs (median length after filtering: 331 nt, range: 250–73,030 nt). Each sequencing pool contained a median of 43 non‐vertebrate infecting viral contigs (range: 9–268 contigs); all bar one contained possible vertebrate viral material at a lower level (median: 16 contigs, range 1–85). We detected a total of 41 likely vertebrate‐infecting viral genera across all libraries, as well as a number of contigs for which analysis suggested viral origin but genus was ambiguous (excluded from all following quantitative analysis). Pools contained a mean of 2.3 viral genera (range: 0–6). These viruses partitioned into a mean of 122.1 (range: 121–124) viral observed taxonomic units (vOTUs), based on clustering on nucleotide sequence identity at 8 evenly spaced thresholds between 80% and 95% (see Section 2.4). Positive control pools both contained contigs assigned to CeMV, as expected. Negative controls contained reads attributable to background contamination, and although some short contigs (< 250 nucleotides in length) were assembled from these pools which could be assigned to viral taxa, these taxa did not appear in any experimental pools. Additionally, negative controls contained no reads mapping to vertebrate‐associated viral taxa.
Saturation of Viral Communities
3.2
Rarefaction curves for vertebrate viral families and genera and vOTUs plateaued for all samples where analysis was possible (Figure S4a–c). This indicates that the methods used here are likely to have captured the full richness of vertebrate viruses present in these sequencing pools. This strengthens conclusions based on comparative analysis of richness between pools, as it reduces the likelihood that model results are being influenced by systematic biases on the completeness of sampling. In contrast, accumulation curves did not plateau at any level of viral taxonomy (Figure S4d–f). This suggests that the level of sampling in this study was not sufficient to completely characterise the full diversity of viruses in this system. This is in line with expectations given the relatively small sample size and opportunistic and uneven sampling of different species and life stages.
Composition of Viral Communities and Shared Viruses
3.3
Among the viruses detected, we found a profile of viral taxonomy broadly consistent with other mammal virome studies (Figure 1), comprising mainly of viruses from common taxa. These include RNA viruses from the families Picornaviridae and Parvoviridae, as well as members of the likely non‐pathogenic genus Pegivirus (family: Flaviviridae) and DNA viruses from the families Anelloviridae and Circoviridae. We also found examples of genera of known pathogenic marine mammal viruses, including new members of the genera Varicellovirus (family: Alphaherpesviridae), established agents of disease in cetaceans and pinnipeds (Bento et al. 2019), Gammacoronavirus (family: Coronaviridae), which have previously been associated with a range of clinical signs in cetaceans (Mihindukulasuriya et al. 2008), and Vesivirus (family: Caliciviridae), which contains the well‐studied San Miguel Sea Lion virus (SMSV; also called vesicular exanthema of swine (VESV)), which causes epizootics in otariid seals (Smith et al. 1981). In the latter two cases, these viruses were detected in species not previously reported as hosts. We identified some vOTUs representing possible spillover events from terrestrial animals, including rotavirus A (genus: Rotavirus, family: Sedoreoviridae) in a common bottlenose dolphin ( Tursiops truncatus ) pool and two contigs mapping to avian influenza A virus (subtype H5N1; genus Alphainfluenzavirus, family: Orthomyxoviridae) in samples from 2019. Rotavirus A is common in multiple terrestrial ungulates, close relatives of cetaceans, but to our knowledge has not previously been detected in whales and dolphins. Influenza A infection has been commonly reported in marine mammals (especially pinnipeds) during the ongoing panzootic, although this is most common in otariids (fur seals and sea lions, which are not resident to Europe).
Vertebrate viral families present in sequencing pools. Heatmap columns represent individual pools, with separations in the grey and multi‐coloured x‐axes denoting pools from the same host species and family, respectively. Vertebrate‐infecting viral families are shown on the y‐axis. Cell colour represents the abundance of reads in the pool, shown as the logarithm of the number of reads per million non‐host reads.
Most vOTUs were only found in a single pool (mean: 103.14 vOTUs, range: 102–105). However, regardless of the sequence identity threshold used for clustering, we found 19 vOTUs in multiple sequencing pools, suggesting closely related viruses shared between individuals across these pools (Figure 2b). Alignment of sequences within vOTUs showed a high degree of sequence similarity, with more than three quarters of pairwise combinations of vOTUs having % nucleotide identity between the closest hits (Figure S3). Approximately three quarters of shared vOTUs (14/19 viruses across all sequence identity thresholds) were also found in multiple host species (Table S2, Figure 2b). Furthermore, pairwise identity values for combinations of representative sequences from different host species in the same vOTU had a high median pairwise identity (median pairwise identity = 99.75%) and showed no clear difference from pairs from the same host species (median pairwise identity = 99.40%; compared in Figure S3a). Expressing vOTU sharing relationships as a network graph, with each sequencing pool representing a node, revealed a clear separation between cetaceans and pinnipeds with only one vOTU shared between these two host orders (Artiodactyla and Carnivora; highlighted with star in Table S2; Figure 2b). Within host orders, there was less clear separation by host taxonomy. Although only a small proportion of vOTUs were shared between hosts, a mean of 28.4 out of 42 pools (67%; range 27–29) shared at least one vOTU with another pool, of which 26 shared a vOTU with a pool from another host species. Investigating the influence of demographic factors on sharing—expressed as network node degree—using GLMs showed confident negative effects for some species of pelagic delphinids (short‐beaked common dolphin ( Delphinus delphis ), striped dolphin (Stenella coeruloalba) and Risso's dolphin ( Grampus griseus ) versus the reference level (minke whale ( Balaenoptera acutorostrata ))), implying that pools from these species may be less likely to share viruses with other pools. Pools of neonate samples (vs reference level adult) were also less connected to other pools in some models. However, none of these effects were stable at different clustering thresholds. Although sharing of non‐vertebrate infecting vOTUs was observed (analysed separately, Figure S5), both clustering by PCoA and sharing networks showed little sign of structure by host taxonomy, with pools from the same host clade often distant from one another.
Composition of and sharing between viral communities. (a) Principal coordinate analysis (PCoA) shows clustering of viral community composition with some structure by host taxonomy. Each point represents a single pool, coloured by host family and plotted according to the first two coordinates of a principal coordinate analysis. Contours show kernel density estimates. (b) A unipartite network graph of viral sharing shows similar structure, with sharing largely constrained to within host families. Nodes represent sequencing pools, sized according to the total number of vOTUs detected in that pool. Edges in the network show vOTUs shared between pools, with edge thickness defined by the number of shared vOTUs (range: 1–2). Sequencing pools with no shared viruses were excluded from the network plot. In both panels, points are coloured by host family (Delphinidae (dolphins) = yellow; Ziphiidae (beaked whales) = pale blue; Phocidae (true seals) = dark blue; Phocoenidae (porpoises) = red; Balaenopteridae (baleen whales) = magenta and Physeteridae (sperm whales) = grey) and with node shape denoting life stage.
Shared vOTUs were from 7 families and 8 genera, with an additional 5 vOTUs from three families that could not be assigned to genera based on diamond BLASTX results (Table S2). These viruses covered a range of viral taxonomy with single‐ and double‐stranded DNA viruses (families: Anelloviridae and Parvoviridae and Herpeseviridae, respectively), positive sense single‐stranded RNA viruses (families: Flaviviridae, Nodaviridae, Picornaviridae) and negative sense single‐stranded RNA viruses (Rhabdoviridae). With the exception of Varicellovirus (family: Orthoherpeseviridae), these taxa have not usually been detected in association with severe clinical disease and are associated with a range of transmission methods (Erythroparvovirus, Varicellovirus: respiratory; Betanodavirus, Hepatovirus: faecal‐oral; Pegivirus: sexual) (Bodewes et al. 2014; Spezia et al. 2023; Rodrigues et al. 2019; Conceição‐Neto et al. 2015; Elbashir et al. 2018).
Drivers of Viral Community Composition
3.4
Across the 15 species represented in this study, we found that host taxon was a key factor in shaping viral community composition of all pools (n = 42). In PERMANOVA analyses of the presence/absence matrix of vOTUs from vertebrate infecting families in each pool, summarising host taxonomy at the family level explained 26% (i.e., R ^2^ = 0.26, p < 0.0001) of the variance (Table S3), but higher taxonomic resolution provided by host species more completely explained community composition (R ^2^ = 0.47, p < 0.0005; Table S4). Additionally, host life stage explained around 10% of variation in community composition in all PERMANOVA tests (Tables S3 and S4), although this effect was not statistically significant (p = 0.065 and p = 0.063, for host family and host species level tests, respectively). Other covariates, including experimental control covariates (sampling time period, number of samples per sequencing pool, number of sequencing reads per pool) had no effect on viral community composition (Tables S3 and S4). Repeating the same analyses for vOTUs representing non‐vertebrate infecting taxa showed weaker effects for host taxonomy, which were also less strongly supported (host family: R ^2^ = 0.16, p = 0.016; host species: R ^2^ = 0.39, p = 0.015; Tables S7 and S8, respectively). There was also no meaningful effect for life stage, or any of the other covariates tested in these models.
Visualising similarity of viral communities using principal coordinate analysis yielded a summary where the first two coordinates explained 34.5% of the total variation and plotting using these coordinates shows a separation of most seal pools and smeared density containing the cetacean pools (Figure 2a). Explicitly testing the association between vertebrate‐infecting viral community dissimilarity and host phylogenetic distance using Mantel tests showed a well‐supported but small effect (10,000 permutations; Mantel's r = 0.274, p < 0.005), when summarising viral community using vOTUs. The correlation decreased to r = 0.142 (p < 0.005) when using viral genera to summarise community composition. No significant correlation was found between host phylogenetic distance and community composition of vOTUs when considering host orders separately (cetacean pools only: r = 0.086, p = 0.149; seal pools only: r = 0.030, p = 0.381). Furthermore, composition of non‐vertebrate infecting vOTUs showed a small and poorly supported correlation with host phylogeny (r = 0.121, p = 0.02). Together, these findings show that community viral community composition is partially determined by host taxonomy, but similarity in viral communities is not linearly correlated with host phylogenetic distance. This means that closely related hosts have more similar viral communities, but this relationship is not sufficient to explain relationships between viral communities, especially within host orders. Differences between the results for vertebrate‐ and non‐vertebrate infecting vOTUs suggest a role for host‐specificity in establishing the limited differentiation that is present.
Factors Affecting Viral Richness
3.5
Models of viral diversity (quantified here as richness of vOTUs from known vertebrate‐infecting viral families) explained approximately half the variation in the data (null deviance explained = 49.8%). Analysis of variance showed a dominant role for host life stage (χ2=11, p = 0.0040; Table S6) when comparing between all pools containing individuals from one life‐stage only (n = 38). Linear models with effects for demographic and host evolutionary covariates and controlling for experimental variation showed higher viral diversity in juveniles compared to adult animals, with juveniles predicted to have 1.88 times the number of viruses in an equivalent adult pool (95% CI = 1.3–3.2, p = 0.0068; Figure 3, Table S5). Viral diversity was also lower in pools containing samples from neonate animals compared to juveniles (2.84 times fewer vOTUs in equivalent neonate pool; 95% CI = 2.29–10.4, p = 0.010; Figure 3, Table S5). Other than an increased diversity in beaked whales (Ziphiidae; estimated 2.98 times the number of viruses in the equivalent pool from Balaenopteridae, 95% CI = 1.01–8.60, p = 0.048), models suggested no effect for host family (Figure 3, Table S5). There was also no effect of sex on the pool's viral richness (χ2=0.98, p = 0.32; Table S6). Group size and interspecies interaction showed no effects in preliminary models and were excluded from final models due to variance inflation (see Section 2.5). We also found no significant effect of the number of sequencing reads in a pool (χ2=0.017, p = 0.89; Table S6), the number of samples included in the pool (χ2=0.40, p = 0.53; Table S6), or the sampling time period (χ2=1.31, p = 0.52; Table S6). Furthermore, interaction effects showed no explanatory power or statistical significance and were excluded from final models to avoid overspecification. Models were also tested at minimum contig length thresholds both higher (400 nucleotides) and lower (150 nucleotides) than the 250 nucleotide limit used for other analyses. Approximate sizes of coefficient estimates, directionality of effects and significance were stable across all thresholds, so all data presented hereafter consider a length threshold of 250 nucleotides. Models of non‐vertebrate infecting vOTU richness showed a much weaker effect of life stage (χ2=9.0, p = 0.011; Table S10) driven by lower numbers of non‐vertebrate infecting vOTUs in pools from neonates (0.28 times the number of non‐vertebrate vOTUs of the equivalent juvenile pool; 95% CI = 0.118–0.655, p = 0.0033; Table S9). In contrast, juvenile pools were not significantly richer in non‐vertebrate infecting vOTUs than adult pools. Of other tested covariates, only the number of samples in the pool was associated with a significant difference in the number of vOTUs from non‐vertebrate infecting viral families (χ2=10.5, p = 0.0013; Table S10).
Results of GLMs modelling viral richness. Agreement between predicted and observed numbers of vOTUs in pool shows model fit (a). Standardised effect sizes (b) show the multiplicative effect of a difference from the reference level (categorical variables) or of an increment in the variable (continuous variables) for covariates included in GLMs of vertebrate‐infecting viral richness, with bars representing a 95% CI. The dashed line shows an incidence rate ratio of 1 (i.e., multiplication by 1, meaning no change). Covariates whose 95% CI overlapped 1 are not considered significant predictors of viral richness. Reference levels are as follows: Host family—Balaenopteridae, life stage—juvenile, time period—1 (time period 1 corresponds to 2016–2017, time period 2 to 2018–2019 and time period 1&2 to pools with a mixture of samples from both time periods). (c) Shows viral richness of pools from different life stages, with a pronounced increase in the number of vOTUs seen in pools of juvenile animals. Host families are denoted by colour: Delphinidae (dolphins) = yellow; Ziphiidae (beaked whales) = pale blue; Phocidae (true seals) = dark blue; Phocoenidae (porpoises) = red; Balaenopteridae (baleen whales) = magenta; Physeteridae (sperm whales) = grey.
Discussion
4
Studies of metagenomic viral diversity can identify patterns in virus transmission and sharing in wildlife, but remain challenging for difficult‐to‐sample taxa such as marine mammals. Here, we present a novel approach making use of repurposed samples to answer questions about viral diversity in a hard‐to‐access wildlife system comprising protected keystone species of large carnivorous mammals. Using metatranscriptome sequencing to characterise viral communities, we show patterns of viral transmission structured by host family in marine mammals and demonstrate elevated viral richness in juvenile animals across all sampled taxa. Furthermore, we find no effect of sex on virus diversity or community composition.
Clustering into vOTUs on sequence identity, we show sharing of highly similar viruses between pools, both within and between species. Cross‐species transmission is known to be an important process in establishing viruses in new host taxa (Geoghegan et al. 2017). In the limited samples available to this study, we are able to capture multiple examples of viruses crossing species boundaries. Although the precise nature of these data does not allow us to distinguish between isolated cross‐species transmission events (i.e., spillovers) and recent host shifts establishing stable transmission of closely related viral lineages in different species, these results suggest that regular interactions between marine mammal species facilitate viral infection, especially given the high degree of nucleotide similarity between sequences from different host species. Clustering and network analysis show that within host orders (artiodactyl versus carnivore), vOTUs are frequently shared, including between pools from different cetacean families. Interestingly, however, we find only a single vOTU shared between cetaceans and seals. This pattern of separation at the level of host taxonomic order, but relatively free exchange within host orders is further supported by the observation that host phylogenetic distance explains nearly 30% of viral community composition in Mantel tests across all sequencing pools, whereas community composition within orders does not correlate with host phylogenetic distance. Together, these results suggest that transmission among marine mammals is structured by phylogeny, with phylogenetic distance between seals and cetaceans preventing some transmission due to virus host‐specificity, an observation strengthened by a limited host‐phylogenetic structure in non‐vertebrate viral communities. Existing high‐level models of global virus sharing across a broad range of host taxa support the observation that viruses are more likely to be shared between members of the same order (Albery et al. 2020). However, these models also suggest more frequent sharing between species with overlapping home ranges and that artiodactyls and carnivores are the orders with the most frequent out‐of‐order sharing. Ultimately, robustly establishing the patterns of viral sharing in a community of cohabiting organisms requires comprehensive sampling (French et al. 2023) which is difficult in free‐living organisms. Enriching the picture of sharing among marine mammals we present here would require new sampling techniques and significant investment.
Multispecies aggregations surrounding areas of abundant food are common among marine wildlife (birds, fish and mammals) and provide a likely theatre for interspecies transmission among marine mammals. Dietary analysis and behavioural surveys show overlapping foraging environments and food sources for many species in this study. For example, harbour porpoises ( Phocoena phocoena ) have been observed interacting off the coast of Scotland with minke whales (Dolman et al. 2014), short‐beaked common dolphins (Ryan et al. 2017) and grey seals ( Halichoerus grypus ) (Leopold et al. 2015), while stable isotope studies of feeding ecology link multiple dolphin species through common prey species and foraging areas (Plint et al. 2023) and the two seal species (harbour seal, Phoca vitulina and grey seal) are linked through shared colonies and by grey seal predation on harbour seals. All of these connections are mirrored in shared vOTUs. Disconcertingly, rising sea temperatures are inducing significant shifts in home ranges of cetaceans due to shortages and movement of prey and preferred temperatures (Plint et al. 2023; van Weelden et al. 2021; IJsseldijk et al. 2020, 2018; Williamson et al. 2021; O'Callaghan et al. 2024), leading to new overlaps between species, or an increased overlapping area. Based on these findings, it seems likely that new interactions between marine mammal species will lead to introductions of viruses into naive populations and changes in prey availability may drive increased cross‐species transmission as populations are forced closer together, presenting a second order threat to these populations from climate change. Modelling studies have begun to quantify this risk in terrestrial ecosystems (Carlson, Albery, et al. 2022), but this work suggests the need for similar appraisals of marine ecosystems.
A particularly striking finding from the analysis of viral richness presented here is the increased diversity of vertebrate‐infecting vOTUs in pools from juvenile animals. Models estimate that pools of juveniles have 1.8 times the number of vOTUs as adult pools, a finding consistent with both other metagenomic studies in bats (Bergner et al. 2020) and birds (Wille et al. 2021; Hill et al. 2023) and with the dynamics of transmission in many specific host‐pathogen systems (Van Bressem et al. 2009; van Dijk et al. 2014; Ashby and Bruns 2018). The explanation provided for this in viral community data (and more generally) is based on the premise that younger individuals have been exposed to fewer infections, and as such are more susceptible due to a smaller adaptive immune repertoire (Ashby and Bruns 2018; Bergner et al. 2020; Hill et al. 2023). That this effect is not also visible in the richness of vOTUs from non‐vertebrate infecting taxa strengthens the evidence that this difference is driven by a difference in susceptibility to viral infection between juveniles and adults. Models also identify a significant increase in viral diversity for juveniles compared to neonate pools (incidence rate ratio of 2.8×, p < 0.05). This suggests that exposure history is not the only determinant of viral diversity, as neonates should have an even more restricted exposure history (and therefore adaptive immune repertoire) than juveniles. This difference may be partially explained by maternal antibodies protecting neonates against viruses circulating in cohabiting conspecifics. Social behaviour in marine mammals develops over the course of the transition out of maternal dependency (Ham et al. 2022; Kovacs 1987; Hill et al. 2013; Gibson and Mann 2008; Delfour et al. 2021), meaning that exposure is likely to increase as individuals mature. Furthermore, based on increased cortisol levels in juvenile harbour porpoises, Kershaw et al. (2017) speculate that young animals may also experience greater levels of physiological stress due to difficulties balancing foraging with nutritional requirements. While similar observations are lacking for other species in this study, if this trend is consistent between species, it could also explain some of the variation in viral diversity driven by age. It should be noted that neonate pools also contained significantly fewer non‐vertebrate infecting viruses than juveniles. This may be explained by the fact that neonates are likely to have restricted diversity in their microbiome (as observed in human neonates; Sanidad and Zeng 2020), which will contribute many of the non‐vertebrate infecting viruses detected, although this hypothesis would require explicit testing by comparing microbiome and virome diversity. It is also possible that some phages were present due to post‐mortem colonisation with decomposers. Vertebrate infecting viral community composition may also be partly determined by host life stage (R ^2^ approximately 0.1, Tables S3 and S4), suggesting that as well as having more diverse viral communities, young animals were infected by different viruses to adults. However, confidence in this hypothesis therefore requires further testing on a larger dataset, as the p‐values associated with these estimates were marginally higher than the threshold for significance (p = 0.065 and p = 0.063, in tests summarising host taxonomy at two levels). Further investigation might reveal some viral taxa or viruses with particular infection strategies or life‐history traits that preferentially infect animals of different ages. Future studies might also benefit from investigating individual viral communities as opposed to pooled samples. Although every effort was made to control for pool size in this study, individual level data would provide more conclusive evidence of these effects, while explicitly accounting for the possible presence of outlying individuals. Altogether, these results suggest marine mammals of different ages experience viral infection differently. It follows that changes in age structure in marine mammal populations are likely to impact the diversity and abundance of viral infection at the population level. This is consistent with a model of viral infection whereby individuals accumulate lifelong immunity to some viruses after acute infections, but are sporadically subject to new infections due to novel viruses, antibody waning, or antigenic evolution. Where assessed, age structure in marine mammal populations appears highly labile, manifested in variable birth rates and juvenile survival rates (Holmes et al. 2007; Holmes and York 2003; Hostetler et al. 2021; Wickens and York 1997; Whitehead and Shin 2022). While the factors responsible for this are not always clear, excessive historic harvesting of some species has resulted in a complex mixture of co‐occurring forces acting on modern populations. This is exemplified by the sperm whale, examined in depth by Whitehead and Shin (2022). Although most sperm whale populations globally are recovering from historic whaling, populations in ecoregions most affected by anthropogenic disturbance are recovering more slowly, with these populations more stubbornly retaining the higher proportion of juveniles entrenched by preferential hunting of mature animals. Furthermore, commercial exploitation disproportionately impacted adult males. For these reasons recovery is leading to a change in population demographics. Ongoing monitoring of these populations, and others like them, will reveal whether these changes lead to differing forces of viral infection and changes in the burden of viral disease.
We found no effect of sex on viral diversity. To our knowledge, this is largely in line with other virome studies (although Feng et al. 2022 show an association between sex and community composition in mosquitoes). However, we might have expected to see increased viral infection in male marine mammals, due to sex differences in contaminant burden. Persistent pollutants (e.g., heavy metals and persistent organic pollutants (POPs) such as polychlorinated biphenyls (PCBs)) accumulate in marine mammals as top predators of food webs which have high contaminant burdens. The multifactorial immunosuppressive effect of these contaminants has now been extensively characterised (reviewed in Desforges et al. 2016) including in the same or overlapping geographical areas to this work. Studies of contaminant burden in marine mammals in UK waters have consistently reported high levels (Williams et al. 2023; Madgett et al. 2022; Kershaw and Hall 2019), above the critical thresholds for immune suppression. Crucially, levels of contaminants show pronounced variation by sex, as reproductively active females pass on contaminants to offspring, both transplacentally and in milk (Binnington and Wania 2014), decreasing their own burden in the process. It would therefore be reasonable to suggest that adult male marine mammals might have higher burdens of viral infection due to a lifetime of bioaccumulation of potential immunosuppressants, while adult females might be expected to have lower viral diversity. However, the well‐characterised contaminant bioaccumulation process does not appear to directly impact on viral diversity, as neither sex, nor the interaction sex with age has an estimated effect in the models presented here. Current data do not allow us to comment on the viral load or clinical severity of viral infections observed, which are likely to also be affected downstream of diversity. This relationship might also be complicated by a role for the modified immune state induced by pregnancy seen in other mammals (e.g., Behringer et al. 2024).
Other than in the case of one host family, host taxonomy also showed no significant effect on viral diversity. The family in question, Ziphiidae (beaked whales), was represented by only one sequencing pool in GLMs of viral richness as the other pool from this family contained a mixture of life stages and was excluded. This means this effect must be treated with extreme caution and further surveillance is required to investigate viral diversity in this taxon. Disregarding this, the taxonomy of hosts, even between mammalian orders, seems to have no hand in shaping the richness of marine mammal communities. This is at odds with findings from other systems (Chen et al. 2023; Costa et al. 2023; Pan et al. 2023), where taxonomy affects diversity, possibly as a result of differing life history traits. Species of marine mammal included in this work cover a remarkable range of life histories, including a roughly 300‐fold difference in body weight between the smallest and largest species, a 100‐fold difference in group size, a 10‐fold difference in life expectancy, partially terrestrial versus fully marine lifestyles, and dramatically different foraging environments and behaviour. This makes the lack of any effect of host taxonomy on diversity particularly surprising, although this lack of effect probably explains the inability of ecological variables included in the model to determine diversity. However, robust validation with a larger dataset would be required to confirm this. If this pattern holds, it may lend weight to the hypothesis that observed taxonomic effects on viral richness reflect uneven sampling effort across taxa with different life histories rather than a true biological effect (Mollentze and Streicker 2020). Further work encompassing taxa with differing life histories in the same study is necessary to reach any conclusion on this point, as some evidence does exist supporting this phenomenon in one system (Costa et al. 2023). In any case, this finding cautions in favour of a system‐specific approach to understanding determinants of viral diversity, despite the seeming universality of life stage effects.
Although rarefaction analysis suggests that viral communities in sequencing pools are completely characterised, it is likely that this does not represent a complete picture of viral infection in this ecosystem. One key limitation of metagenomics as a measure of viral diversity is that it only captures a snapshot of viruses actively infecting the small section of tissue sampled, in this case at time of death. Although viruses have developed many ways to protect their RNA from degradation (Dickson and Wilusz 2011), it is likely that the difficulty of acquiring samples immediately post mortem and transporting them with adequate cold chain will have led to reduced RNA integrity. Pooling also reduces the effective concentration of viral RNA from each individual sample in the final sequencing pool, although it increases the search space of the experiment. The crucial question for detection is whether, after all of these processes, sufficient viral material remains in the pool to be detected among the noise of host and bacterial RNA, given that the relative amount of viral RNA in the pool is very low to begin with; less than 1% of total reads in any pool mapped to vertebrate viruses. Both positive controls and experimental results show that the method used here is capable of detecting viral material, but it is possible that lower abundance RNA may have gone undetected. This observation is supported by the fact that not all viruses present have complete genome coverage. Beyond physical processes, sampling processes may also have an impact on viral detections. First, while this study is by far the most comprehensive attempt to quantify the viral communities of marine mammals, the sample size remains small (128 individuals in 42 pools). Indeed, unsaturated accumulation curves suggest that further viral diversity remains in this system that was not sampled by this study. Second, detection using sequence identity relies on moderately related viruses being present in the databases used for detection thanks to previous sampling. While viromes of terrestrial artiodactyls and carnivores have been extensively explored, marine mammal viruses are less well‐represented in databases. These results must therefore be regarded as a conservative estimate of viral diversity in marine mammals and might be expanded by more extensive sampling, possibly using more sensitive methods (e.g., PCR or viral bait capture sequencing). However, we have no reason to suspect that possible undersampling of diversity differs systematically between sequencing pools or affects conclusions from models, as we see saturation of rarefaction curves across pools. Previous viral community studies (e.g., Hill et al. 2023; Raghwani et al. 2023) have improved the resolution and power of their statistical inferences by considering not only presence or absence of viral taxa, but also the abundance of individual taxa, expressed as the number of reads mapping to each taxon. However, the degradation processes we outline above are likely to have acted heterogeneously on different viruses and different individual hosts, due to differences between the environmental resilience of viral genomes and the differing levels of decomposition and times between collection and proper storage experienced by samples from different individuals. Given that samples from different hosts experiencing a range of these conditions were pooled, and pools contained different numbers of individuals, we refrained from drawing conclusions about relative virus abundance within or between pools without a robust method to verify differential abundance.
A further consideration is that population‐level inferences based on data and samples taken from stranded individuals should be made with caution. A number possible biases are exerted through the process of stranding resulting from a combination of biotic and abiotic factors acting at the individual and population levels (Williams et al. 2011; IJsseldijk et al. 2020) and also on the reporting of strandings (ten Doeschate et al. 2018; Coombs et al. 2019). In this work, no animals where viral infection was established as a cause of death were considered. Nevertheless, other health factors leading to stranding may also indirectly impact on viral diversity, along with factors such as ocean currents meaning not all mortality is observed and seasonal variation in strandings reporting. It is therefore possible that estimates of diversity presented here may not fully reflect viral communities of living animals. However, exposure history, and thus, community assemblage and cross‐species transmission is unlikely to be affected by this process, with any effect most likely to manifest as an underestimate, as described above. Furthermore, assessing viral communities of living marine mammals, and cetaceans in particular, on this scale is beyond the current scope of field sampling techniques. Repurposing of these samples from routine surveillance represents a clear value added, by contributing to the body of knowledge about marine mammal population health. As new sampling techniques develop further (Geoghegan et al. 2018; O'Mahony et al. 2024; Apprill et al. 2017; Centelleghe et al. 2016), it may be possible to validate these results using samples from living animals.
In this work, we provide the first multispecies exploration of marine mammal viral community ecology, demonstrating that opportunistic repurposing of existing samples can be used to address questions about viral diversity. In doing so, we show that patterns of viral diversity seen in terrestrial mammals hold true in mammals inhabiting inaccessible marine ecosystems and present the first clear evidence of reduced viral diversity in neonatal wild animals. The patterns of virus sharing in this dataset reveal a complex and connected web of transmission events between species and, in one case, between host orders, reflecting previously observed ecological interactions. Combined with global introductions of high‐consequence viral pathogens in marine mammals and the increasing pressures on these protected species, the highly connected nature of these populations paints a picture of vulnerability to introductions of disease‐causing viruses.
Author Contributions
Matthew J. Arnold: funding acquisition, conceptualisation, data curation, formal analysis, investigation, methodology, software, visualisation, writing – original draft, writing – review and editing. Laura M. Bergner: methodology (advice on experimental design), supervision. Haris Malik: investigation (laboratory training and assistance). Mariel ten Doeschate: funding acquisition, supervision, resources, writing – review and editing. Nicholas J. Davison: funding acquisition, supervision, resources, writing – review and editing. Andrew Brownlow: funding acquisition, supervision, resources. Nardus Mollentze: methodology, supervision, writing – review and editing. Simon A. Babayan: methodology, supervision, writing – review and editing. Daniel G. Streicker: methodology, supervision, writing – review and editing.
Funding
This work was supported by the Wellcome Trust (217221/Z/19/Z, 218518/Z/19/Z), the Biotechnology and Biological Sciences Research Council (BB/V003798/1, DEB 2011069), the Leverhulme Trust (PLP‐2020‐362), the Natural Environment Research Council (NE/X01424X/1), the Bill and Melinda Gates Foundation (INV‐003079, INV‐030025), the Scottish Government and the Medical Research Council (MC_UU_00034/3).
Conflicts of Interest
The authors declare no conflicts of interest.
Supporting information
Data S1: mec70294‐sup‐0001‐Supinfo01.pdf.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Albery, G. F. , E. A. Eskew , N. Ross , and K. J. Olival . 2020. “Predicting the Global Mammalian Viral Sharing Network Using Phylogeography.” Nature Communications 11, no. 1: 2260.10.1038/s 41467-020-16153-4PMC 721098132385239 · doi ↗ · pubmed ↗
- 2Altschul, S. F. , W. Gish , W. Miller , E. W. Myers , and D. J. Lipman . 1990. “Basic Local Alignment Search Tool.” Journal of Molecular Biology 215, no. 3: 403–410.2231712 10.1016/S 0022-2836(05)80360-2 · doi ↗ · pubmed ↗
- 3Andrews, S. 2010. “Fast QC A Quality Control Tool for High Throughput Sequence Data.” https://www.bioinformatics.babraham.ac.uk/projects/fastqc/.
- 4Apprill, A. , C. A. Miller , M. J. Moore , J. W. Durban , H. Fearnbach , and L. G. Barrett‐Lennard . 2017. “Extensive Core Microbiome in Drone‐Captured Whale Blow Supports a Framework for Health Monitoring.” m Systems 2, no. 5. 10.1128/m Systems.00119-17.PMC 563479229034331 · doi ↗ · pubmed ↗
- 5Ashby, B. , and E. Bruns . 2018. “The Evolution of Juvenile Susceptibility to Infectious Disease.” Proceedings of the Royal Society B: Biological Sciences 285, no. 1881: 20180844.10.1098/rspb.2018.0844 PMC 603053929925619 · doi ↗ · pubmed ↗
- 6Behringer, V. , C. Deimel , J. Ostner , B. Fruth , and R. Sonnweber . 2024. “Modulation of Cell‐Mediated Immunity During Pregnancy in Wild Bonobos.” Biology Letters 20, no. 3: 20230548. 10.1098/rsbl.2023.0548.38471567 PMC 10932712 · doi ↗ · pubmed ↗
- 7Bento, M. C. , R. Canha , C. Eira , et al. 2019. “Herpesvirus Infection in Marine Mammals: A Retrospective Molecular Survey of Stranded Cetaceans in the Portuguese Coastline.” Infection, Genetics and Evolution 67: 222–233.10.1016/j.meegid.2018.11.01330445114 · doi ↗ · pubmed ↗
- 8Bergner, L. M. , R. J. Orton , J. A. Benavides , et al. 2020. “Demographic and Environmental Drivers of Metagenomic Viral Diversity in Vampire Bats.” Molecular Ecology 29, no. 1: 26–39.31561274 10.1111/mec.15250 PMC 7004108 · doi ↗ · pubmed ↗
