Single-cell analysis of a salmonid immune system (river brown trout Salmo trutta fario) reveals evolutionary divergence and hatchery-induced transcriptional reprogramming
James Ord, Helena Saura Martinez, Monica Hongroe Solbakken, Anastasiia Berezenko, Simone Oberhaensli, Stephanie Talker, Heike Schmidt-Posthaus, Irene Adrian-Kalchhauser

TL;DR
This study uses single-cell analysis to explore the immune system of river brown trout, revealing evolutionary changes and how hatchery rearing affects immune gene expression.
Contribution
The study provides the first single-cell transcriptome of immune cells in river brown trout and identifies hatchery-induced transcriptional changes.
Findings
The study identified 34 distinct immune cell populations and novel markers in neutrophils, macrophages, T-cells, and B-cells.
Transcriptional divergence was observed between WGD-derived ohnologue pairs, suggesting sub- and neofunctionalization.
Hatchery-reared fish showed altered immune gene expression, indicating rearing history shapes immune cell identity.
Abstract
Vertebrate immune systems exhibit striking evolutionary diversity, yet our understanding remains biased toward mammalian models. Here, we generate a single-cell transcriptome of immune cells from the ecologically and economically important salmonid Salmo trutta fario (river brown trout), a lineage characterized by an ancestral whole-genome duplication (WGD). Profiling over 83,000 kidney-derived immune cells, we resolved 34 transcriptionally distinct populations, identified core immune lineages, and uncovered novel markers in neutrophils, macrophages, T-cells, and B-cells. We detected pervasive transcriptional divergence between WGD-derived ohnologue pairs, indicating putative sub- and neofunctionalization in immune gene regulation. We further show that the transcriptional identity of immune cells is shaped by rearing history: fish raised in hatcheries—whether for one or multiple…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 10
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9- —Institut für Fisch- und Wildtiergesundheit, Universität Bern
- —https://doi.org/10.13039/501100001711Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung
- —University of Helsinki (including Helsinki University Central Hospital)
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSingle-cell and spatial transcriptomics · Aquaculture disease management and microbiota · Zebrafish Biomedical Research Applications
Background
Vertebrate immune systems comprise some of the most complex systems in nature and exhibit extraordinary diversity between species [1]. Differences range from the structural level (e.g., in teleost fish, the kidneys are the principal source of immune cells, not bone marrow as in mammals) through the level of cell types (distinct cell types found in some species but not others) to the genetic level (in terms of enzymes and gene repertoire [2]). Single-cell RNA sequencing of immune cells offers the opportunity to map this diversity in unprecedented detail beyond model organisms, yields insights into evolutionary processes, and provides useful data for veterinary and human medicine.
Among vertebrates, fish immune systems display particularly interesting evolutionary trajectories and have expanded our understanding of immunity in the past. Teleost immune gene repertoires display species-specific non-canonical features such as loss of MHCII (major histocompatibility complex II, highly conserved protein essential for exogenous antigen presentation in vertebrates), novel subfamilies within NLR (nucleotide-binding domain and leucine-rich repeat) and TLR (toll-like receptor) pattern recognition receptor families, and loss of antibodies [3–7]. Therefore, the characterization of immune system setups in fish can be challenging from genome sequence alone, where gene models may be incomplete or lack gene identification due to sequence divergence, or be misannotated due to highly variable gene families.
River brown trout (Salmo trutta fario) are an ecologically and economically relevant fish species from the salmonid family [8]. As well as their economic importance as a game fishing species, they also play a crucial ecological role in their native range as apex predators in lotic alpine streams by controlling populations of smaller fish and insects and influence the overall structure and health of freshwater ecosystems through cascading effects. Brown trout in their native range are especially susceptible to pollution [9], habitat degradation [10], and changes in water temperature [11]. As such, they serve as sentinel species for riverine health in their native range and populations may be especially susceptible to climate change impacts [12]. Their ability to mount appropriate immune responses depends on water temperature [13, 14] and is additionally impacted by water quality parameters [15, 16]. Large stocking programs are in place to support wild populations, but the stocked fish fail to thrive [17], may be less resistant to pathogens, and the conditions experienced during raising offspring in artificial environments before release affect disease incidence [18]. This renders the immune system of brown trout an important area of study.
Phylogenetically, brown trout is located at an intermediate position between whitefishes (Coregonidae) and trouts (Oncorhynchidae) [19]. Despite the suggestive name, they are closer relatives of salmon (common ancestor around 12 Mya) than of rainbow trout (common ancestor around 30 Mya; [19, 20]). Their history of hybridization and isolation, and even a potential multispecies taxonomy for brown trout populations are currently being discussed [21–23]. As salmonids, brown trout have experienced a whole-genome duplication event approx. 80 Mya, followed by rediploidization and loss or retention of the resulting gene copies [24–26]. The high-quality reference genome of brown trout [27] suggests that the brown trout immune system largely follows the standard vertebrate build in terms of genes present, but features extensive gene duplications in many pathways involved in immune cell differentiation and proliferation [28], with unclear functional consequences. Among genes retained as duplicates, fates may include unchanged retention of both copies, pseudogenization (because loss of function mutations are less detrimental), neofunctionalization (divergence of function), or subfunctionalization (division of roles in the same function) between the retained copies. In the related rainbow trout, three quarters of such WGD-derived gene duplicates (“ohnologues”) diverge in expression patterns and levels across bulk samples of various organs [29]. Understanding patterns of neofunctionalization specifically in brown trout immune genes thus has the potential to complement existing data and provide perspective with regard to species-specific versus universal effects of WGD.
This study provides baseline data for the healthy brown trout cellular immune system using single-cell RNA sequencing of kidney-derived immune cells from nine S. trutta individuals. We used brown trout from the same catchment, but three different developmental and parental backgrounds. Three each were wild-caught animals derived from wild parents (“wild”), farm-raised animals derived from wild parents (“mix”), and farm-raised animals derived from farm-raised parents (“farm”). This setup allowed us to (a) describe the immune cell types of brown trout, (b) identify novel markers for particular cell types, (c) investigate expression patterns of immune gene ohnologues to identify neofunctionalizations, and (d) describe the effects of rearing environment on cellular immunity. Specifically, we identify cell types based on previously established marker genes, explore the identity of cell types with non-canonical gene expression, and identify cell-specific marker genes for future use in, e.g., qPCR-based analyses of immune functions. We explore evolutionary peculiarities in brown trout by analyzing expression patterns of salmonid-specific ohnologues and propose functional diversification patterns for some of them. To investigate the impact of short-term environmental challenges on immune function, we compare three groups of brown trout with natural vs farmed rearing histories in terms of gene expression profiles at the cluster level.
Results
Cell lineages and subtypes
Single-cell RNA sequencing from 83,847 hematopoietic cells, enriched from kidneys of nine individual fish (Fig. 1A), was used to identify 34 putative immune cell types using data from all individuals pooled. An initial set of 29 clusters (Additional file 1: Fig. S1) obtained from the top 2000 variably expressed genes was further refined to 34 final clusters based on the expression of lineage markers. Clusters were visualized by Uniform Manifold Approximation and Projection (UMAP, Fig. 1B). Thirty-one of these clusters could be assigned to the major hematopoietic cell lineages based on expression modules of “prior markers” (Fig. 2A, Table 1; see also Additional file 2: Table S1 for full list of prior markers). The majority of cells were putative neutrophils and B-cells, respectively (Fig. 2B).Fig. 1. Overview of the brown trout cellular immune system. A Analysis setup. Kidney tissue was dissected from 9 juvenile Salmo trutta fario of 7 months of age from three different backgrounds: three each were wild-caught animals derived from wild parents (“wild”), farm-raised animals derived from wild parents (“mix”), and farm-raised animals derived from farm-raised parents (“farm”). Hematopoietic cells were extracted and used for single-cell RNAseq (scRNAseq). Cluster identity was assigned based on prior markers, newly identified cluster markers, GO terms, and pseudotime analysis. B 34 immune cell clusters. UMAP projection of 83,847 cells (data pooled from all nine individuals) grouped into 34 clusters according to shared gene expression variation. Of the 34 clusters, 31 were assigned putative identities, while 3 remain unassignedFig. 2Cluster characteristics. A Module expression scores for neutrophil markers (n = 9), monocyte/macrophages (n = 7), T-cells (n = 22), and B-cells (n = 12) in the 34 clusters. B Cluster identities and sizes, the latter in terms of total cell counts across all nine individualsTable 1Prior markers of four major immune cell lineages whose S. trutta homologues were variably expressed in the scRNAseq datasetCell lineageProtein name (alternate name)S. trutta homologue gene name (ENSEMBL)S. trutta homologue ENSEMBL gene IDNeutrophilsCD182 (CXCR2)ENSSTUG00000013662CPA5CPA1ENSSTUG00000021190HCE^a^ENSSTUG00000011854LCE^a^c6ast1ENSSTUG00000007607LYZ^a^LYZENSSTUG00000025205MMP9mmp9ENSSTUG00000015832MMP9mmp9ENSSTUG00000038105Macrophages/monocytesCD206 (MRC1)MRC1ENSSTUG00000000974CD74ENSSTUG00000006024CD74ENSSTUG00000008754COX2 (PTGS2A)ptgs2aENSSTUG00000026620IRF8irf8ENSSTUG00000021023MARCOMARCOENSSTUG00000016758MPEG1.1MPEG1ENSSTUG00000031848MPO (MPX)mpxENSSTUG00000006763MPO (MPX)mpxENSSTUG00000007163T-cellsBCL11BENSSTUG00000025217CD28ENSSTUG00000013913CD28ENSSTUG00000041050CD3DENSSTUG00000034782CD3EENSSTUG00000020148CD4CD4-2aENSSTUG00000038814CD4CD4-1ENSSTUG00000038833CD8AENSSTUG00000018765CD8BENSSTUG00000037440FOXP3ENSSTUG00000024213GATA3gata3ENSSTUG00000023944IL17A (a/f in fish)^a^il17a/f1ENSSTUG00000016782IL21il21ENSSTUG00000042553IL7RENSSTUG00000021260TBX21 (T-bet)tbx21ENSSTUG00000037670TCF7ENSSTUG00000002790TCF7tcf7ENSSTUG00000023812TRBC1 TCR beta constant 1ENSSTUG00000037152TRGC1 TCR gamma constant 1ENSSTUG00000007088TRGC1 TCR gamma constant 1ENSSTUG00000007103TRGC1 TCR gamma constant 1ENSSTUG00000007109ZAP70zap70ENSSTUG00000023942B-cellsAICDAaicdaENSSTUG00000016342BCR heavy IgDENSSTUG00000016267BCR heavy IgMENSSTUG00000010446BCR light IgL kappa 1 IGKC1IGKCENSSTUG00000027283BCR light IgL kappa 2 IGKC2IGKCENSSTUG00000028030BCR light IgL lambda IGLC1ENSSTUG00000001951BCR light IgL lambda IGLC1ENSSTUG00000015148CD37ENSSTUG00000006678CD79Acd79aENSSTUG00000031393CD79BENSSTUG00000027061CXCR5 (CD185)CXCR5ENSSTUG00000007343TNFRSF9tnfrsf9aENSSTUG00000031635^a^Fish-specific
At a finer scale, putative cluster identities were inferred from genes that were overexpressed in the respective cluster at least twofold (i.e., log2FC >/= 1 relative to cells of other clusters) and were expressed in 60% of the cells. These genes are hereafter referred to as “cluster markers” (see Additional file 2: Table S2). Of 623 genes meeting these criteria, 600 were ENSEMBL genes and 23 were previously unannotated gene loci identified from bulk RNAseq data (see Methods). A total of 295 cluster markers were characteristic for a single cluster, and 328 cluster markers characterized more than one cluster. Prior markers and cluster markers together were sufficient to assign a putative cell type to 31 of 34 clusters (80,166 out of 83,847 cells) (Fig. 1B), including two non-immune clusters (RN1 and RN2, renal cells). Three remaining unidentified clusters (NA1, NA2, and NA3) could not be identified either due to lack of prior marker expression, lack of cluster marker expression in a high enough proportion of cells, or ambiguous expression patterns (e.g., markers of multiple cell types). Cluster NA2 co-expressed both B-cell and neutrophil markers and thus could not be assigned to either lineage with certainty. We further employed overrepresentation tests to identify enriched Gene Ontology (GO) terms among cluster markers to further support fine-scale cluster characterization, although these results were largely uninformative: 23 out of 34 clusters yielded enriched GO terms, most of which comprised only 3–4 cluster markers (Additional file 2: Table S3).
Proliferation markers identify several progenitor populations
Actively proliferating cell populations were identified by their expression of two prior marker genes associated with proliferation: pcna (proliferating cell nuclear antigen; two homologues) and mki67 (one homologue). Proliferation markers were widely expressed in neutrophil cluster N3, T-cell cluster T6 and B-cell cluster B4, and within subpopulations of macrophage cluster M1 and putative myeloid progenitor cluster MP (Fig. 3). The identification of these (sub)clusters as proliferating progenitor cells was further supported by their expression of H2AZ (histone variant H2AZ) and pclaf (PCNA clamp associated factor; Fig. 3B).Fig. 3. Proliferating cells. A Module expression scores for proliferation markers (n = 3) across all clusters. Clusters N3, M1, T6, B4, and MP contain substantial proliferating (sub)populations. B UMAP projections highlighting cells expressing two prior markers of proliferation (pcna and mki67) and two further genes (H2AZ and pclaf) with similar expression patterns and cell cycle functions. Clusters showing high proportions of inferred proliferating cells are circled. Gene names are as given in the ENSEMBL annotation (without altering case). *Denotes gene is in a list of prior markers used to infer putative cell type or state
Neutrophils
We identify two major neutrophil subpopulations with specialized roles: one characterized by high expression of HCE1-like and another by abundant metallopeptidase gene expression. Overall, neutrophil clusters (N1 to N6, Fig. 4A) formed a large somewhat unstructured agglomeration, consistent with neutrophils being (a) the most abundant immune cell type and (b) a rather homogeneous population compared to other immune cell types. Neutrophil heterogeneity is however increasingly appreciated [30], and indeed, 78 high-abundance cluster markers (Additional file 2: Table S4) were heterogeneously expressed between the six subtypes (Fig. 4B).Fig. 4. Neutrophils. A Distinguishing features of neutrophil clusters N1–N6. B Unique vs shared high-abundance cluster markers (expressed in at least 60% of cells in the cluster). Markers shared with non-neutrophil clusters were excluded. C–E Examples of cluster marker expression patterns. Normalized expression values of markers C shared by the major neutrophil clusters N1 and N2, D biased toward N2, and E expressed in N1. Gene names are as given in the ENSEMBL annotation (without altering case) except for entries with no gene names, in which case they are named according to protein homology (e.g., FBP1-like). Clusters for which the gene is a high-abundance marker (expressed in at least 60% of cells) are listed. Where relevant, clusters for which the gene is a non-high-abundance marker (expressed in < 60% of cells) are also indicated. *Denotes genes that are also prior markers
Most neutrophils were collected in N1 and N2, and these two clusters also shared the largest number of markers, suggesting that markers of N1 and N2 can serve as a proxy for neutrophil fate. This set of “pan-neutrophil” markers includes homologues of human taldo1, tktb, and FBP1 (Fig. 4C), which play a role in neutrophil responses to LPS exposure [31, 32]. Markers of N1 and N2 were generally enriched for the GO term “carbohydrate metabolic processes,” indicating a high energetic demand consistent with pro-inflammatory cells.
A few markers were more restricted in their expression and able to differentiate between N1 and N2. A striking example of an N2 marker is HCE1-like (high choriolytic enzyme 1-like, Fig. 4D). HCE is responsible for breaking down the chorion during hatching of teleost fish [33] but has more recently emerged as a marker of teleost granulocytes [34, 35]. Other N2-biased markers include CFD1-like (complement-factor D-like), the human homologue of which inhibits leukocyte degranulation and thus prevents host tissue damage [36], and cpa1, a prior neutrophil marker and metallopeptidase.
N1 markers were consistently enzymes including the metalloproteases mmp9, adam28, and mmp13, and glutamate decarboxylase GADL1 (Fig. 4E). The GO term “metallopeptidase activity” was significantly enriched among N1 markers. Lysozyme (lyz), previously identified as a neutrophil-specific marker in zebrafish [37], was active in both neutrophils and macrophages (see Additional file 1: Fig. S3; in line with [38]). Pseudotime analysis of neutrophil clusters (Additional file 1: Fig. S4A) rooted at the putative myeloid progenitor cluster (MP, see Fig. 8) implied that N1 cells are terminally differentiated.
N3 was identified as a pool of actively proliferating cells (see Fig. 3) while N4 and N5 lacked unique markers and may comprise intermediate or immature cells. N6 may represent dead or dying cells given a pattern of high mitochondrial gene expression (Additional file 1: Fig. S2), fewer expressed genes, and low overall gene expression [39].
Macrophages
Five clusters expressed prior markers of macrophages and/or monocytes and were termed M1 (a large cluster) and M2–M5 (four small, fragmented clusters) (Fig. 5A). Prior markers of macrophages detected in our dataset included MPEG1 and irf8, and two copies of CD74, which has been identified as a monocyte lineage marker in zebrafish [40]. Another 25 high-abundance cluster markers were unique to subsets of macrophages (Additional file 2: Table S5), mostly to the largest cluster M1 (Fig. 5B). Antigen-presenting cell markers (Fig. 5C) consistent with canonical macrophage and dendritic cell function were broadly expressed.Fig. 5. Monocyte/macrophages. A Distinguishing features of monocyte/macrophage clusters N1–N6. In the case of M4, 'APC+' is a shorthand for 'antigen-presenting cell-containing'. B Unique vs shared high-abundance cluster markers (expressed in at least 60% of cells in the cluster). Markers shared with non-macrophage clusters were excluded. C Expression of antigen-presenting cell markers. Expression of a gene module comprised CD40, CD74 (2 homologues), MHCII beta-like, and MPEG1. D–E Examples of cluster marker expression patterns. Normalized expression values of markers C shared by multiple clusters and D expressed principally in one cluster. Gene names are as given in the ENSEMBL annotation (without altering case) except for entries with no gene names, in which case they are named according to protein homology (e.g., BPI-like). Clusters for which the gene is a high-abundance marker (expressed in at least 60% of cells) are listed. Where relevant, clusters for which the gene is a non-high-abundance marker (expressed in < 60% of cells) are also indicated. *Denotes genes that are also prior markers
M1 was the largest and most distinct cluster with the highest number of unique high-abundance markers including CD206-like (macrophage mannose receptor 1 homologue) (Fig. 5E) and FCGR1-like (ENSSTUG00000008409, not shown). In mammals, both have roles in phagocytosis [41, 42].
M2, although containing cells expressing macrophage/antigen-presenting cell markers (e.g., MPEG1 in 33% of cells and IRF8 in 28% of cells, see Additional file 2: Table S2), did not possess any high-abundance markers after filtering for markers of non- “M” clusters (and therefore does not feature in Fig. 5B). However, its expression of monocyte marker CD74 in 69% of cells (Additional file 2: Table S2) suggests a candidate cluster for undifferentiated monocytes.
M3 shared four high-abundance markers with M1: three paralogues of complement component 1q (c1qa, c1qb, and c1qc) and OLFM4-like (Fig. 5D). Similarly, GO term enrichment analysis showed markers of M1 and M3 to be enriched for “immune response” and “antigen processing and presentation” (Additional file 2: Table S3). Unique M3 markers include FAPB7 (fatty acid binding protein 7) and epdl1 (ependymin). Fatty acid metabolism is important in macrophages to produce inflammatory mediators [43], while ependymins are a class of purportedly teleost-specific glycoproteins found in cerebrospinal fluid and involved in the central nervous system [44]. Around half of M3 cells also expressed the classical macrophage marker MARCO.
M4 expressed two homologues of monocyte marker CD74 in > 90% of cells and macrophage markers in lower proportion of cells (Additional file 2: Table S2), implying undifferentiated or differentiating monocytes similar to M2. However, it also expressed the unique marker flt3, the murine homologue of which is a major dendritic cell growth factor [45], thus implying a dendritic fate for M4. Cells with dendritic morphology and antigen-presenting capacity have been described in several teleost species (e.g., [46, 47]), but the existence of a true dendritic cell lineage homologous to that of mammals remains under debate due to differences in markers, ontogeny, and evolutionary divergence [48]. Although dendritic cells are thought to derive from a distinct cell lineage than monocytes, they may be functionally similar and thus may resemble monocytes transcriptionally [49].
Uniquely, M5 contained cells expressing classical macrophage marker mrc1a (< 60% of cells), despite not showing elevated antigen-presenting cell (APC) marker expression (Fig. 5E). Generally, the relative scarcity of high-abundance markers in putative macrophage clusters (25 high-abundance markers compared to 75 non-high-abundance markers after excluding markers of non-macrophage clusters) suggests that the level of clustering in this study does not fully grasp macrophage heterogeneity.
T-cells
Seven clusters were identified as T-cells based on prior markers and were designated T1–T7 (Fig. 6A). Interestingly, cluster T7 was represented almost exclusively by fish from the wild-caught group (Additional file 2: Fig. S6). Given the potential of confounding of markers by genetic, environmental, or genotype-by-environment effects, we focused on clusters T1–T6 in our characterization of T-cell subpopulations.Fig. 6T-cells. A Distinguishing features of T-cell clusters T1–T7. B Unique vs shared high-abundance cluster markers (expressed in at least 60% of cells in the cluster). Markers shared with non-T-cell clusters were excluded. C-D Examples of cluster marker expression patterns. Normalized expression values of markers C shared by multiple clusters and D expressed principally in one cluster. Gene names are as given in the ENSEMBL annotation (without altering case) except for entries with no gene names, in which case they are named according to protein homology (e.g., SLAM5-like). Clusters for which the gene is a high-abundance marker (expressed in at least 60% of cells) are listed. Where relevant, clusters for which the gene is a non-high-abundance marker (expressed in < 60% of cells) are also indicated. *Denotes genes that are also prior markers
Of 53 genes identified as T-cell markers (Fig. 6B; Additional file 2: Table S6), only two can be considered “pan” T-cell markers (Fig. 6C): CD3E-like and TRBC1-like (TCR beta constant 1-like), both of which were also prior markers and represent well-characterized T-cell surface receptors. A further five markers were shared by five clusters: STK10-like, def6b, il7r (interleukin 7 receptor), sh2d1ab, and fynb. STK10 is a putative immunoregulator expressed in human T-cells (but also in dendritic cells and NK cells; [50]). Human DEF6 is important for T-cell homeostasis [51], while human SH2D1A is involved with SLAM-signaling (important for T-cell proliferation) and indeed has known interactions with FYN [52].
T1, T4, and T5 could be largely distinguished by expressed homologues of canonical T-cell markers, namely CD2, CD4, CD5, and two homologues of each of CD8 and CD28 (Fig. 6D; only one CD8 homologue shown). T1 represents CD8 + T-cells, as it expressed two homologues of CD8, cd8a and cd8b (a surface protein characteristic of memory and cytotoxic T-cells [53]), in high abundance, indicating their expression as a heterodimer. Of two CD28 homologues, only one was expressed in T1, suggesting some degree of functional divergence between CD28 orthologs. T1 markers were enriched for the GO term “integrin complex” in line with the role of integrins in T-cell activation by promoting stable contact with antigen-presenting cells [54]. Meanwhile, T4 represents CD4 + cells and could be distinguished by expression of both CD28 homologues along with the T-helper cell marker cd4-2b. T5 was the most distinct T-cell cluster with 10 high-abundance markers not shared by other clusters and was characterized by a homologue of canonical T-cell marker CD2 along with SLAM5-like (human homologue also known as CD84) and il2rb, which are markers of regulatory T-cells [55, 56].
T2 expressed only one unique marker expressed in high abundance (likely reflecting a cluster composed of more than one distinct subpopulation which were not completely resolved) but expressed plk3 in combination with ilr2b, the latter suggesting this cluster also contained regulatory T-cells. T2 also uniquely contained cells expressing two unnamed homologues of TRGC1 (TCR gamma constant 1; Additional file 1: Fig. S6), suggesting that a subpopulation of T2 represents the less common gamma-delta T-cells, supporting the previous sequenced-based notion that these cells exist in salmonids [57].
T6 markers were consistent with proliferating cells (Fig. 3), while T3 did not express any unique high-abundance markers and showed signatures of dead or dying cells (high mitochondrial gene expression, few expressed genes, and low overall expression; Additional file 1: Fig. S2).
B-cells
Seven B-cell clusters were clearly separated on the UMAP (Fig. 7A). After excluding markers of non-B-cell clusters, 113 genes were identified as B-cell markers (Fig. 7B; Additional file 2: Table S7). Several largely shared high-abundance markers were identified, but all seven clusters also possessed at least one unique marker (Fig. 7B). A marker shared by at least five B-cell clusters is cd79a (Fig. 7C), while cd79b showed similar expression patterns but was less expressed in the larger clusters B1 and B2. Other “pan” B-cell markers included mef2cb [58], hipk2 [59], and one homologue of EBP41 [60] (genes whose mammalian homologues have been demonstrated to be involved in B-cell proliferation or differentiation) as well as canonical B-cell marker homologue CD37-like and at least six immunoglobulin-like genes (e.g., first two panels of Fig. 7E). In addition, B-cell markers included several genes homologous to Ig proteins, some lacking annotation but containing predicted IG domains (see Additional file 2: Table S7). These tended to show variable expression between and within clusters (Fig. 7E) and included an IGLL1-like gene which was uniquely overexpressed in B3.Fig. 7B-cells. A Distinguishing features of B-cell clusters B1–B6. B Unique vs shared high-abundance cluster markers (expressed in at least 60% of cells in the cluster). Markers shared with non-B-cell clusters were excluded. C–E Examples of cluster marker expression patterns. Normalized expression values of markers C shared by multiple clusters, D expressed principally in one cluster, and E immunoglobulin homologues or markers containing Ig domains. Gene names are as given in the ENSEMBL annotation (without altering case) except for entries with no gene names, in which case they are named according to protein homology (e.g., IGLL-like). Clusters for which the gene is a high-abundance marker (expressed in at least 60% of cells) are listed. Where relevant, clusters for which the gene is a non-high-abundance marker (expressed in < 60% of cells) are also indicated. *Denotes genes that are also prior markers
B1 and B2 (putatively mature B-cells) display overlapping expression patterns. High-abundance markers of B1 are often also highly expressed in B2 (though not necessarily in high abundance), and vice versa (Fig. 7D). cd79b was only weakly expressed in both, which in zebrafish is a sign for differentiation to the Ig-secreting stage [61]. Both B1 and B2 also expressed arhgap24, which is enriched in naïve and memory B-cells in humans (Human Protein Atlas version 24; https://www.proteinatlas.org/ENSG00000138639-ARHGAP24/single+cell; [62]) and inhibits cell proliferation [63]. Ig-like genes were abundant in B1 and B2 (Fig. 7E), and B1 and B2 markers were enriched for the GO term “actin binding”; indeed, reassembly of the B-cell actin cytoskeleton allows expansion of the cell surface to increase contact with antigen-presenting cells [64].
Most specific to B1 was zgc:194,275, a homologue of a zebrafish B-cell marker [65] also known as PLAAT1, which regulates antiviral responses in zebrafish [66]. Pseudotime analysis of B-cell clusters with B3 assigned as the starting point (Additional file 1: Fig. S4B) supported the designation of B1 as terminally differentiated B-cells.
Specific B2 markers include the second homologue of the pan-B-cell marker EBP4, implying functional divergence between the orthologs, and a previously unannotated gene (mikado.19G1247, not shown) with homologies to NHSL2, a protein involved in cell migration—a process critical for patrolling immune cells [67].
B3 unique markers support a putative pre- or pro-B-cell identity. Most notably, B3 highly expressed rag1 (recombination activating gene 1), a gene well known for its crucial role in B-cell development [68] (see also [69]). Other markers of B3 with roles in B-cell or pre-B-cell development include a kap12 [70], itga5 [71], and tek [72], and markers of B3 and B4 were both enriched for the GO term “structural constituent of chromatin” suggestive of chromatin remodeling and differentiation.
B4 was a highly proliferative cluster as evidenced by expression of proliferation markers (see Fig. 3). B5 was characterized by cxcr5, a marker associated with B-cell migration [73] and with B helper T cells [74]. B5 and B6 shared xpa (associated with DNA repair; [75]) as a marker with the pro/pre-B-cell cluster B3, suggestive of an intermediate stage between pro/pre-B-cells and mature B-cells.
B6 represents putative regulatory B-cells based on expression of pkdr2 [76]. Also, B6 markers were enriched for actin binding similar to B1 and B2, suggesting similar interactions with antigen-presenting cells.
Finally, B7 was a small cluster of seemingly highly differentiated cells as indicated by lack of expression of both CD79 genes [61] and intense expression of Ig-like genes (see Fig. 7E). Unique high-abundance markers of B7 included DKK4-like, a similar gene to which in mammals, DKK3, is a modulator of B-cell fate and function [77].
Other distinct cell types resolved by marker genes
Natural killer-like cells (cluster NK) were identified by prior markers such as prf1 (perforin) and characterized by four unique cluster markers which also included a previously unannotated homologue of NK-lysin (Fig. 8A).Fig. 8. Other cell types. Examples of cluster marker expression patterns. Normalized expression values of markers for A natural killer cells, B thrombocytes, C red blood cells, D myeloid progenitor cells, and E renal cells. Gene names are as given in the ENSEMBL annotation (without altering case) except for entries with no gene names, in which case they are named according to protein homology (e.g., NK-lysin-like). Where relevant, clusters for which the gene is a non-high-abundance marker (expressed in < 60% of cells) are also indicated. *Denotes genes that are also prior markers
A small but highly distinct cluster with 41 unique markers corresponded to thrombocytes (cluster TH) according to expression of prior markers such as mpl (thrombopoietin receptor-like) and thbs1b (thrombospondin 1-like) (Fig. 8B), both of which have mammalian homologues involved in platelet formation and/or aggregation [78, 79].
Red blood cells (cluster RBC) were identified based on expression of hemoglobin homologues such as hbaa2 (Fig. 8C). Eleven genes were markers uniquely of the RBC cluster, including alas2 (5-aminolevulinate synthase, erythroid-specific, mitochondrial-like), thus comprising markers of mature RBCs. Hemoglobin homologues were also sporadically, but highly expressed in the central unclassified cluster NA1, suggesting it contained erythroid progenitors.
Another small cluster was identified as putative myeloid progenitor cells (cluster MP) based on unique markers kita and egr1 (Fig. 8D), homologues of which are implicated in early myeloid cell differentiation [80, 81]. Other markers of MP were three fos genes and a homologue of transcription factor JUND, which are also implicated in early myeloid function [82].
Finally, two clusters with similar expression patterns were identified as putative renal (kidney) cells (clusters RN1 and RN2) based on markers such as ly97.3 (CD59-like; Fig. 8E), si:dkey-194e6.1 (collectrin-like), and gstr (glutathione S-transferase A-like), which are highly expressed in mammalian kidney [83–85]. RN1 and RN2 were also the only clusters expressing gapdh (Fig. 8E).
Functional divergence of ohnologues in the immune system
Several cluster marker genes were present in duplicate and displayed differential expression between clusters, for example, CD28 (Fig. 6D) and EBP41 (Fig. 7D). We therefore systematically investigated the expression patterns of ohnologues (gene pairs arising from the salmonid-specific “SSR4” whole-genome duplication event) among immune cell clusters, based on the assumption that differential expression patterns could signify subfunctionalization (i.e., division of labor) or neofunctionalization (i.e., one copy gains a new function). Among the 2000 variably expressed genes, we identified 175 putative ohnologue pairs based on the presence of a single homologue in the northern pike (Esox lucius) (Additional file 2: Table S8).
Eighty-six (49%) show identical or near-identical expression patterns across clusters (see examples in Fig. 9A), suggesting that both copies have either retained the same function or have functionally diverged only on a subcellular level. Sixty-eight gene pairs (39%) display “gain or loss” patterns, where ohnologues have similar expression but expression of one ohnologue is missing from certain clusters (see examples in Fig. 9B), which may suggest either subfunctionalization or a transition to neofunctionalization. Six gene pairs (3%) display “partially contradicting” patterns, where each ohnologue is to some extent expressed in mutually exclusive patterns while retaining some overlap, suggesting the beginning of neofunctionalization. Fifteen gene pairs (8%) display “contradicting” patterns, where ohnologues display mutually exclusive patterns, suggesting neofunctionalization or subfunctionalization from a more broadly expressed ancestral state (see examples in Fig. 9C).Fig. 9. Expression patterns of putative ohnologue gene pairs identified based on homology with northern pike (Esox lucius) genes. Ohnologue pairs with A identical expression patterns, B one ohnologue missing from certain clusters, and C asymmetric expression patterns. ENSELUGXXXX: ENSEMBL gene ID and gene names of the E. lucius single copy homolog. Red dotted lines: clusters that lost expression compared to the other ohnologue
A striking example of contradictory expression patterns between ohnologues is dachd. The Drosophila Dach transcription factor (dachd homologue) is a master developmental regulator, forming complexes with chromatin to induce differentiation [86]. In brown trout, one ohnologue was expressed widely in neutrophils and the other in pre-B-cells. The ohnologues of SCIN, an actin-severing protein [87], were expressed in B-cells versus NK-cells (Fig. 9C). Plots of all examined ohnologue pairs can be found in Additional file 3.
Environment of current or prior generations affects immune cell transcriptomes
Given our use of material derived from fish of different origins, we were able to show that rearing environment impacts immune gene expression and even the presence of immune cell clusters. Briefly, “wild” fish (N = 3) were captured directly from a local stream in the canton of St. Gallen, Switzerland, and had an intact fat fin indicating that they were born and raised in the wild. The “farm” fish (N = 3) were reared for at least two generations in an aquaculture facility but were originally derived from the same stream, while “mix” fish (N = 3) were reared in a facility but derived from wild parents from the same stream as the “wild” fish.
Impacts of rearing environment were apparent at two levels. Firstly, T-cell cluster T7 displayed an origin-specific abundance pattern and was present only in fish from the “wild” group (Additional file 2: Fig. S5). The wild group also had noticeably higher proportions of cells in clusters M4 and T5, although we refrained from formal statistical analyses due to small sample sizes and large number of clusters. Secondly, clusters from all major lineages displayed an origin-specific gene expression pattern. We selected a cluster from each of the four main lineages (N1, M1, T5, and B3 for neutrophils, monocytes/macrophages, T-cells, and B-cells, respectively) and performed between-group differential expression analyses using the pseudobulk approach [88]. Of the selected cell types, macrophages (represented by M1) showed the most abundant differences in gene expression between groups (235 DEGs at FDR < 0.05, of which 157 were DE in mix and 78 in farm; Additional file 2: Table S9), followed by neutrophils (represented by N1; 178 DEGs at FDR < 0.05, of which 95 were DE in mix and 83 in farm; Additional file 2: Table S10) and B-cells (represented by B3; 121 DEGs at FDR < 0.05, of which 65 were DE in mix and 56 in farm; Additional file 2: Table S11). PCA plots of all three showed separation between rearing conditions along PC1 and/or PC2 (Fig. 10). T-cells (represented by T5) meanwhile showed more modest differences (74 DEGs at FDR < 0.05, of which 35 were DE in mix and 39 in farm; Additional file 2: Table S12) and less clear sample separation on PCA (Additional file 1: Fig. S7).Fig. 10. Impact of raising environment. A, C, E Terminally differentiated neutrophils, macrophages, and pre-/pro-B-cells highlighted on UMAP plot, and PCA plot of all samples stratified by raising environment (point color) and experimental batch (point shape). B, D, F Heatmaps of normalized expression values of genes differentially expressed (FDR < 0.01) between “farm” and/or “mix” compared to the “wild” group. Dendrograms represent hierarchical clustering of genes and samples according to the Z-scores of the Log2(CPM) values. Cell color indicates low (blue) versus high (yellow) Z-score. Sample IDs at the top of each heatmap indicate raising environment (wild = blue, mix = pink, farm = green). Genes are indicated by gene name where available from the ENSEMBL database, or by ENSEMBL gene ID. Genes corresponding to GO terms enriched among DEGs are highlighted (red = “heme binding” or “iron ion transport,” blue = “G protein-coupled receptor activity,” green = “calcium ion binding”). Note that genes DE with FDR > 0.01 are not shown but were used for GO term analysis
GO terms associated with the differentially regulated genes (Additional file 2: Table S13) included “G protein-coupled receptor activity,” which was enriched amongst genes upregulated in the farm group in N1 and B3, and downregulated in the mix group in M1, and “calcium ion binding,” which was enriched among genes upregulated in B3 in the farm group. No enriched GO terms were detected for genes differentially expressed in T5. Finally, The GO terms ‘heme binding’ and ‘iron ion binding’ were enriched amongst genes upregulated in the mix group in N1, M1, and B3. It is plausible however, that the latter GO term enrichments result of contamination of target immune cell libraries by RNA from red blood cells in the processed material. Indeed, the expression of hemoglobin homologues (e.g. hbaa2, HBE1) in N1, M1 and B3 was highest in the ‘mix’ sample B6 which also had the highest proportion of RBCs (around 16%, see Additional file 2: Fig. S6).
Discussion
Our data suggests a conserved immune system setup for brown trout, given that expression modules derived from other species’ immune cell markers readily identified core cell types. This is noteworthy because an increasing number of fish immune systems have been analyzed in detail, e.g., through single-cell RNA sequencing [65, 89, 90], and some were found to notably differ from the vertebrate core configuration [91, 92]. Broadly, our results together with previous scRNAseq studies [34, 91] therefore suggest a conserved immune system repertoire in salmonids.
Our data reveal a range of novel cell type markers and marker combinations for distinct sub-lineages, which will facilitate future studies. Due to extensive evolutionary divergence between salmonids and model organisms (from which most immune cell markers are derived), identifying cell subtypes in non-models generally and fish specifically is challenging. Only a few cell type-specific antibodies exist, and qPCR markers require extensive verification due to WGD and functional divergence. Our data suggest that several well-established markers can be used to identify broad cell lineages in trout, for example, the CD79 genes are useful markers of the B-cell lineage. In addition, we found numerous novel markers, some with no reported immune function, that were able to distinguish both broad immune cell lineages and sub-lineages. These include high choriolytic enzyme (HCE), an enzyme otherwise involved in digestion of the chorion during hatching, marking non-terminally differentiated neutrophils, or epdl1 (ependymin) as a new marker of a specific macrophage subpopulation (M3), which was previously reported expressed from the teleost central nervous system [44]. These markers, reported in the supplementary data, will be of great value in research and diagnostic settings and could potentially be used to produce better and species-/salmonid-specific antibody reagents for FACS. Importantly, once thoroughly validated across a number of brown trout and ideally other salmonids, they will facilitate further immunological investigations without the need for single-cell sequencing.
In addition to the immune cell conserved lineages, our data may suggest the existence of phagocytic B cells [93]: cluster NA2 concomitantly expresses B-cell markers and neutrophil markers, clusters with B-cells, and may represent this cell type. However, an alternative explanation for such a pattern is the presence of doublets (two cells within the same droplet), given that B-cells physically interact with phagocytes during antigen presentation and may be co-selected during 10X library preparation.
Our analysis provides strong leads for in-depth evolutionary studies of gene duplications and their functional consequences. Salmonid genomes have retained intricate patterns of ohnologue deletion, pseudogenization, maintenance, and divergence [24, 94–96] after the SSR4 whole-genome duplication event, which are understudied in relation to the immune system. The fate of SSR4-derived duplicates has been debated, with Sandve et al. [97] concluding that the prevailing fate of ohnologue pairs is differential purifying selection. The copy with more relaxed selection, in turn, may undergo neofunctionalization or pseudogenization (indeed, this has been posited as the terminal fate of duplicated genes [98]). Asymmetric expression patterns could signify such a transition, and indeed, half of 175 ohnologue pairs expressed in immune cells showed divergent expression patterns. A “gain or loss” pattern is the most common, but also mutually exclusive patterns were observed. Interestingly, several cases involve partition of expression between fundamentally distinct lineages (e.g., B-cells versus neutrophils, or B-cells versus natural killer cells). This is consistent with subfunctionalization, but could also reflect a transition state and, in millions of years, lead to the emergence of new function or to pseudogenization. We did not further investigate the underlying sequence of coding regions or promotors and also cannot distinguish divergence of ohnologue function from differential isoform usage. Our data are however an excellent starting point for in-depth cross-salmonid follow-up studies.
Finally, we identify transcriptome variation corresponding to rearing environment. This is of both conservatory and economic relevance. Commercial fish rearing environments have profound effects on fish health and disease resistance, with little knowledge regarding the underlying molecular mechanisms [99]. Yet, millions of fish are annually farm-raised and released into the wild for stocking purposes and are expected to support declining wild populations. Our data align with previous observations on the lower fitness of fish from more artificial rearing environments [98, 100–103] and epigenetic traces of hatchery rearing [103], and reveal that both the presence and the expression patterns of immune cells can be associated with the raising environment. Terminally differentiated neutrophils, most macrophages, and naïve B-cells all differ depending on whether fish were raised for at least two generations in a farm (farm), for a single generation on a farm (mix), or exclusively in natural streams (wild). In particular, genes involved in G protein-coupled receptor activity—a function closely linked with immune modulation, for example, interleukin production [104] and TLR signaling [105]—were differentially expressed in two of the four tested clusters. G protein-coupled receptors have also been suggested as “domestication genes” that were repeatedly under selection during the domestication of pets and farm animals [106]. The recurrence of similar gene sets across multiple cell types could imply that fixed to-be-discovered polymorphisms at cis- or trans-regulatory elements may affect the expression of these genes in a systemic fashion depending on the rearing environment. In addition, our setup revealed that an entire T-cell cluster (T7) was present exclusively in fish of the “wild” group. These individuals come from rivers harboring fish pathogens (for example, Tetracapsuloides bryosalmonae). The origin-specific cell cluster T7 therefore may reflect prior infection history (for example, they may represent memory T-cells) and could in the future serve as an indicator for previous exposures to this or other pathogens. In summary, these findings warrant continued research on the origin of the observed transcriptional differences: Do they arise from environmental influences during early growth, or are they a consequence of natural selection in a farm rearing environment? The functional significance of these differences remains to be explored in the context of infection experiments.
Conclusions
Our study provides the first single-cell atlas of the immune system of brown trout. We identify a conserved immune system repertoire, novel cell type markers, and impacts of genome duplication on immune gene evolution. By additionally including short-term anthropogenic pressures in the form of captive rearing, we expose how early-life environment impacts immune transcriptional programs in adults, which raises relevant concerns, but also monitoring opportunities for fish conservation and reintroduction strategies. We find that a single generation of hatchery rearing leaves a detectable molecular imprint on the immune systems of this flagship species. This adds to an ongoing debate about the fitness and resilience of stocked fish, which are routinely released into the wild to support declining populations. As climate change and environmental degradation continue to erode natural habitats, we thus propose that stocking strategies should evolve to include molecular methods to assess immune status. We advocate for the integration of immunological parameters when designing hatchery protocols and reintroduction programs, e.g., by promoting practices that mimic natural conditions, to enhance the long-term success of conservation and fisheries efforts. In this sense, our findings provide a valuable framework for future studies on functional immunity, aquaculture practices, and vertebrate genome evolution.
Methods
Study design, sampling
Brown trout from different rearing histories and environments were used: a wild group (wild), a mix group (mix), and a farm group (farm). The wild group consisted of young-of-the-year (≤ 1 year old) brown trout that were fished during spring 2021 in two Swiss rivers (Brübach, coordinates between 47.471195°N/9.136465°E and 47.473303°N/9.140526°E; and Rörlibadbach between 47.473019°N/9.136563°E and 47.478731°N/9.134755°E). We are confident that these fish were born and raised in the wild due to their lacking fin clips. Furthermore, an ongoing study of the river Wyna, Canton Aargau, indicates that stocked fish fail to reach reproductive age in these rivers (Schmidt-Posthaus, unpublished data), and thus their parents are also highly likely to have been born and raised in the wild. Animals in the “wild” group are therefore considered as those born to wild parents and raised in the wild. A relevant note is that the source waterbodies are positive for proliferative kidney disease, caused by a parasite Tetracapsuloides bryosalmonae (own observation, see also Stelzer et al. [107]). Although all animals used in the study tested negative for T. bryosalmonae according to Bettge et al. [108], we cannot exclude that the “wild” group faced an early infection or has been in contact with T. bryosalmonae.
The mix group (mix) originated from wild parents, female and male adult brown trout, which were fished in the same river system (Brübach) and spawned in captivity in autumn 2020. Fertilized eggs were transferred to a farm facility where they (F1 generation) were raised until spring 2021. Animals in the mix group are thus born to wild parents but raised in captivity.
The farm group (farm) originated from a Swiss aquaculture farm. Parents raised in captivity were spawned and their offspring was again raised in captivity. Animals in the farm group are thus born to captive parents and raised in captivity. Both mix and farm F1 generations were raised in the same hatchery facility (Fischereizentrum Steinach, Switzerland) from the stage of fertilized eggs until approximately 3 months of age-old fry. All animals used had the same genetic origin in that they were derived from the Brübach/Rörlibadbach populations.
All F1 generations were transferred to the Institute for Fish and Wildlife Health (FIWI), University of Bern, Switzerland, in May 2021 when fish were approximately 3 months of age. In the context of a larger experimental design, 162 fish from each group were distributed into six group-specific tanks with equal fish numbers (n = 27 per tank; a total of 3 × 6 tanks); three of the six tanks per group were later exposed to a pathogen in a treatment/control setting not discussed in this study. The fish used for this study came from the nine control tanks (three control tanks per group).
At arrival, two fish per tank were subjected to a health check; no pathogens were detected. Fish were maintained at constant 16 °C with a flow-through water system and 12:12-h light:dark photoperiod. The fish were fed a commercial diet twice a day with 2.5% of their body weight and additionally with live artemia in the first month after arrival. The fish were acclimatized to laboratory conditions for 4 months prior to sampling.
Cell enrichment, library preparation, and sequencing
Immune cells were isolated from the brown trout head and trunk kidneys when they reached the age of approximately 8 months. Both head and trunk kidneys are centers of hematopoietic cell production, and therefore material from both organs were pooled for each biological replicate. Fish were euthanized with an immersion bath in overdosed MS222 (150 µg/l tricaine methanesulfonate; MS 222®, Pharmaq). The kidneys were then dissected and processed for RNA sequencing as follows.
The kidney tissue was passed through a 105-µm pore size mesh filter with Leibovitz’s (L-15) medium (Thermofischer, Reinach, Switzerland) containing 10% fetal calf serum (FCS). The resulting cell suspension was then layered onto a sterile, isotonic Ficoll gradient (Ficoll-paque Plus, GE Healthcare Bio-Sciences AB, Sweden) with a density of 1.077 g/ml and spun at 750 × g for 40 min at 4 °C to dissociate red blood cells from white blood cells. Leukocytes at the Ficoll/medium interphase were aspirated, washed in L-15 medium, and centrifuged at 290 × g at 4 °C for 10 min. The cell suspension was aspirated and resuspended in PBS and kept on ice.
Cell suspensions were checked for viability and concentration using an automated cell counter (DeNovix® CellDrop Automated Cell Counter, NC, USA). The samples were diluted to a concentration ranging between 700 and 1900 cells/µl, depending on the sample. The cell suspensions were sent to the Bern Genomics Facility for 10 × Chromium sequencing library preparation. All samples were prepared to ensure a target cell recovery of 10,000 cells and were subjected to cell isolation on the 10 × Genomics Chromium Controller instrument. To minimize cell death, all steps, including PBL isolation, sorting, and Chromium™ Single Cell isolation, were completed on the same morning.
Per sample, 700–1500 cells/µl were loaded into the chips of the Chromium™ Single Cell 3′ Gel Beads Kit (Single Cell 3′ v3.1 Dual Indexing) (10 × Genomics) and processed on the Chromium Controller instrument to generate single-cell Gel Bead-In Emulsions (GEMs) following the manufacturer’s instructions.
Next, GEMs were subjected to library construction using the Chromium™ Single Cell 3′ Library Kit v3.1 (10 × Genomics). In the first step, reverse transcription was performed, generating cDNA tagged with a cell-specific barcode and a unique molecular index (UMI) per transcript. Fragments were size-selected using SPRIselect magnetic beads (Beckman Coulter, Brea, CA, USA). Illumina sequencing adapters were then ligated to the size-selected fragments and cleaned up using SPRIselect magnetic beads. The quality of the final library was assessed using an Agilent 2100 Bioanalyzer (Agilent technologies, Amstelveen, The Netherlands). The samples were subsequently sequenced using a NextSeq 550 instrument (Illumina, CA, US) with 150PE chemistry.
Augmented gene annotation
To maximize the amount of usable scRNAseq reads, which depends on the proportion of reads that can be confidently aligned to the transcriptome, we augmented the existing fSalTru1.1.104 ENSEMBL annotation using two published bulk RNAseq datasets derived from S. trutta kidney tissue [109, 110]. Transcripts were assembled using a combination of de novo and genome guided approaches (Trinity, RNAbloom, and Stringtie), collapsed using EvidentialGene [111], and further processed using the Mikado pipeline version 2.3.4 [112] to identify a nonredundant set of transcripts not represented in the prior annotation. This approach enabled us to add 15,095 additional transcripts and 1200 additional gene loci, with functional annotation derived using PANNZER2 [113]. For more details, please see the supplementary methods in Additional file 4. The additional transcripts and gene loci (“Mikado-derived genes”) were appended to a modified version of the ENSEMBL gtf (fSalTru1.1.104) which had been filtered to retain only protein-coding genes, lncRNAs, and the IG − and TR − genes that undergo somatic recombination before transcription.
scRNAseq read alignment and quantification
Raw FASTQ files from each of the nine samples were processed using CellRanger version 6.0.1 (10 × Genomics, Inc.). The genome (fSalTru1.1, GCF_901001165.1) was indexed using “cellranger mkref” with the augmented annotation file (see above), and reads were aligned to the genome using “cellranger count” using non-default parameters –nosecondary and –include-introns, and otherwise default parameters.
Processing of count data and clustering
The filtered barcode matrices from CellRanger-count were processed in R version 4.2.2 with the Seurat package version 4.3.0 [114], using the Reciprocal Principal Components Analysis (RPCA) approach to normalize expression estimates across the three sample collection batches. In the RPCA approach, normalization (by SCTransform) and PCA were initially performed separately on samples from each of three sampling batches before integrating according to anchor features. To avoid subsequent clustering being overly influenced by cell cycle stage, the anchor features upon which PCA was performed were filtered to remove genes with annotation related to “cell cycle” (using information from ENSEMBL). Similarly, given that red blood cells likely contributed a significant source of variation, hemoglobin genes were also excluded from the PCA. However, genes excluded from PCA were retained in the final set of 2000 variable genes. We did not explicitly filter according to mitochondrial gene expression which is liable to have retained dying cells in the dataset [39], although we note that mitochondrial filters may also exclude viable cells [115], particularly given the role of mitochondrial oxidative burst in some immune responses [116]. For clustering, the FindNeighbors() function was run using the first 30 principal components. We then tested a series of resolution options for the FindClusters() step, from 0.1 to 1 in intervals of 0.1. Using clusters from all resolutions, we then obtained a clustering tree using the clustree() function from the clustree package version 0.5.0 [117] and obtained the SC3 index, an indicator of cluster stability, for each resolution. A resolution of 0.3 was found to have the highest SC3 value (Additional file 1: Fig. S8) and we therefore chose this resolution for initial clusters (Additional file 1: Fig. S1), but later performed additional subclustering of specific clusters in cases where subpopulations could be clearly distinguished from the expression of lineage markers (Additional file 1: Fig. S9). No clear batch effects were evident from UMAP projections (Additional file 1: Fig. S10).
Differential expression between clusters was performed by running the FindAllMarkers() function using the raw (non-integrated) expression values (given that the integrated values are statistically interdependent), producing a list of expression markers for each cluster (hereafter “cluster markers”). For further analyses, we principally considered cluster markers with log2FC >/= 1 (relative to cells in all other clusters) and which were detected in at least 60% of cells of a given cluster (“high-abundance” markers), except for when clusters could be notably distinguished by expression patterns restricted to a subset of cells in the cluster.
Assigning putative cluster identities
For assignment of clusters to broad immune cell classes, we examined the cluster-specific expression of putative cell type marker genes, a list of which was compiled a priori. These genes (hereafter “prior markers”) were obtained by compiling a list of putative cell lineage markers from other species (humans in addition to some fish-specific markers), derived from the literature (Additional file 2: Table S1). Putative brown trout homologues of these prior markers were identified by either (1) queries to the ENSEMBL database using the biomaRt R package version 2.52.0 [118] to obtain putative brown trout homologues, or (2) reverse blast searches (ENSEMBL blast tool, TblastN, default parameters) toward the brown trout genome using human queries and manually annotated fish queries from the literature. Reverse BLAST hits were aligned toward human and mouse reference sequences and query sequences to evaluate the overall degree of amino acid conservation (MEGA 7.0, clustalw aligner). A simple neighbor-joining tree (MEGA 7.0, Poisson substitution model, pairwise deletion) was constructed for markers with more pronounced sequence divergence (data not shown). In cases where homology could not be determined, the marker was not used. This yielded in a list of 283 homologues of 129 prior markers of major cell lineage groups (e.g., B-cells, neutrophils) and functional classes (e.g., antigen-presenting cells, cytotoxic cells, proliferating cells). For prior marker homologues which were present among the 2000 variable genes, expression values were summarized across markers belonging to the same lineage group or functional class using the AddModuleScore() function in Seurat, deriving expression modules corresponding with each major lineage group or functional class. We note that the list of prior marker genes was intended as a starting point and was not intended to be an exhaustive list of putative markers.
After grouping the clusters into broad immune (or other) cell types (i.e., neutrophils, monocytes/macrophages, T-cells, B-cells, and others), we used the cluster markers to further identify distinguishing gene expression features and more specific putative function. For this, we considered only cluster markers that were unique to the broad cell type under investigation (e.g., markers of both T-cell and B-cell clusters were not used, although markers of ambiguous cluster N2 were allowed to be considered as either neutrophil or B-cell markers). We also performed GO term enrichment analyses on the cluster markers (all markers with log2FC >/= 1 and 60% detection in the cluster, with no criteria for uniqueness to a broad cell type) using the enricher() function from the ClusterProfiler package version 4.4.4 [119] and GO terms of both ENSEMBL and Mikado-derived genes. From the latter, we excluded clusters with fewer than 10 cluster markers and excluded enriched GO terms represented by fewer than three cluster markers.
Orthology analyses
To identify ohnologues among the 2000 variably expressed genes, S. trutta gene IDs were queried against the ENSEMBL database for homologues in the northern pike (E. lucius) using the biomaRt package version 2.52.0 [118]. For genes with pike homologues, we filtered these to retain only those with a 2:1 ratio (trout:pike) and for which each member of the pair resided on a different trout chromosome. This resulted in a list of 175 putative ohnologue pairs among the variably expressed genes. UMAP plots for each pair were then visually assessed and assigned to one of the following four categories (similar to [29], but visually): (1) same pattern same levels (identical); (2) same pattern different levels; (3) gain/loss (with one ortholog expressed across a wider area than the other); (4) swap or partial swap (where orthologs have, at least partly, mutually exclusive expression patterns). Importantly, these assignments should be considered suggestive rather than final, particularly because gain/loss patterns may also result from differences in sequencing depth or sampling imbalances.
Pseudobulk analyses
A pseudobulk sample is formed by aggregating the expression values from a group of cells (i.e. a cluster) from the same individual. As the identities of individual cells are discarded, the sample can be treated as having been generated from a bulk RNAseq experiment. Clusters to use for pseudobulk differential expression analyses were chosen based on the following criteria: that they were clearly distinguishable from other clusters (i.e., abundant unique high-abundance markers and not overlapping other clusters on a UMAP plot), and that they contained a reasonable cell count in all individuals (minimum 45). This led to the selection of clusters N1, M1, T5, and B3. The choice of B3 was further motivated by its identification as the pro/pre-B-cell cluster and thus alterations in this cluster could potentially have knock-on effects on subsequently differentiated B-cells. After pooling non-normalized read counts from each cluster/individual combination, the resulting pseudobulk count matrices were processed for differential expression analyses using the edgeR package version 3.38.4 [120]. Counts matrices were first filtered using the filterByExpr() function to retain only those with at least 10 counts in at least three out of nine individuals. Following TMM-normalization, PCA was calculated using the plotMDS() function which uses the top 500 most variably expressed genes by default. Differential expression was calculated using the estimateDisp() and glmFit() functions with a model design matrix that included group as the main factor while also accounting for experimental batch [model.matrix(~ batch + group)]. Genes were considered differentially expressed between groups at FDR < 0.05. GO term enrichment analysis (using the domain “molecular function”) was carried out using the enricher() function from the ClusterProfiler package version 4.4.4 [119] and GO terms of both ENSEMBL and Mikado-derived genes. For heatmaps, log_2_CPM values of differentially expressed genes were Z-score normalized and plotted using the Heatmap() function from the ComplexHeatmap package [121].
Pseudotime analyses
Pseudotime is a computational construct inferred from ordering cells along a trajectory based on similarities in their gene expression profiles and provides an estimate of consecutive developmental stages, e.g., along a differentiation gradient. The integrated, normalized data (SCT assay data with 2000 features for 83,847 cells) was converted into a “cell_data_set”-object with monocle (monocle3_1.3.4). After preprocessing the data (preprocess_cds) using the first 50 dimensions of the PCA, a UMAP dimensionality reduction was performed. Cells were clustered with a resolution of 0.00001 which resulted in 24 clusters. This resolution was chosen because the number of cell clusters was close to the original 28. After running the “learn_graph” function, subsets of clusters and starting points for trajectories were interactively selected based on prior biological knowledge. Two separate pseudotime analyses were performed: one on neutrophil clusters using the cluster corresponding with putative myeloid progenitors (cluster MP in main analysis) as the starting point and another on B-cells using the cluster corresponding with pre-/pro-B-cells (cluster B3 in the main analysis) as the starting point.
Other software
Figures were created initially with ggplot2 version 3.4.0 and further processed with Adobe Illustrator version 29.6.1 and Adobe Photoshop version 26.8.1. Text sections were backchecked for spelling, grammar, and flow with ChatGPT-4o. ChatGPT was not used to create content.
Supplementary Information
Additional file 1. Figures S1–S10. Fig. S1 UMAP and TSNE projections of initial 29 Seurat clusters. Fig. S2 Gene expression indicators of dead/dying cells (mitochondrial gene expression, number of expressed genes, and total gene expression). Fig. S3 UMAP plots showing expression patterns of genes found to be markers of multiple cell types. Fig. S4 UMAP projections of Monocle clusters for pseudotime analysis. Fig. S5 Proportions of cells assigned to each cluster per individual. Fig. S6 UMAP plot showing expression of TRGC1 homologues. Fig. S7 Pseudobulk differential expression between fish of different origins for cluster T5. Fig. S8 SC3 stability index values for different Seurat clustering resolutions. Fig. S9 Detail of finer-scale subclustering of three initial Seurat clusters. Fig. S10 PCA, UMAP, and TSNE projections of cells marked according to sampling batch. Full figure legends are provided on the first page of the document, followed by each labeled supplementary figure on separate pages.Additional file 2. Table S1 List of genes used as “prior markers” for putative cell type assignments of clusters and associated notes. Table S2 List of marker genes for all clusters as identified using the FindAllMarkers() function in Seurat. Additional information includes gene names and descriptions and whether the gene is a “prior marker.” Table S3 Compiled output from GO term enrichment of high-abundance markers (pct.1 >/= 0.6 and log2FC >/= 1) for each of 23 clusters. Tables S4–S7 Genes found to be markers exclusively of neutrophil (Table S4), macrophage (Table S5), T-cell (Table S6), and B-cell clusters (Table S7), respectively, i.e., genes that were markers of that specific putative cell type and not markers of other putative cell types. Clusters of which the gene is a high-abundance marker (pct.1 >/= 0.6) and those of which the gene is a marker regardless of abundance are listed in columns 2 and 7, respectively. In Table S7, an additional column denotes whether the gene encodes a putative immunoglobulin. Table S8 Categorization of 175 ohnologue pairs among variably expressed genes according to differences in or similarity of expression patterns across clusters. Pairs are labeled according to the gene ID of the corresponding Esox lucius homologue. Tables S9–S12 Genes differentially expressed in pseudobulk in clusters M1 (Table S9), N1 (Table S10), B3 (Table S11), and T5 (Table S12) in fish of “mix” or “farm” origin relative to “wild” origin. Table S13 Enriched GO terms (molecular function) among genes differentially expressed in pseudobulk in “mix” or “farm” origin fish relative to “wild”.Additional file 3. A collection of 175 UMAP projection plots showing integrated expression values of ohnologue gene pairs (one pair per plot). Pairs are labeled according to the gene ID of the corresponding Esox lucius homologue. Complements Table S8.Additional file 4. Supplementary methods providing further details of the augmented gene annotation using the Mikado pipeline.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Swann JB, Holland SJ, Petersen M, Pietsch TW, Boehm T. The immunogenetics of sexual parasitism. Science. 2020;369(6511):1608–15.10.1126/science.aaz 944532732279 · doi ↗ · pubmed ↗
- 2Lobón-Cerviá J, Sanz N. Brown trout: life history, ecology and management. Brown trout: life history, ecology and management. Wiley; 2017. https://onlinelibrary.wiley.com/doi/book/10.1002/9781119268352.
- 3Rehberger K, Wernicke von Siebenthal E, Bailey C, Bregy P, Fasel M, Herzog EL, et al. Long-term exposure to low 17α-ethinylestradiol (EE 2) concentrations disrupts both the reproductive and the immune system of juvenile rainbow trout, Oncorhynchus mykiss. Environ Int. 2020;142:105836.10.1016/j.envint.2020.10583632563011 · doi ↗ · pubmed ↗
