Genomic Divergence of Sympatric Lineages Within Stichopus cf. horrens (Echinodermata: Stichopodidae): Insights on Reproductive Isolation Inferred From SNP Markers
Kenneth M. Kim, Apollo Marco D. Lizano, Robert J. Toonen, Rachel Ravago‐Gotanco

TL;DR
This study explores how different groups of sea cucumbers in the same area become reproductively isolated, using genetic markers to identify three distinct clusters and potential genes involved in reproductive barriers.
Contribution
The study confirms three reproductively isolated genotype clusters in Stichopus cf. horrens and identifies genes potentially involved in reproductive isolation.
Findings
Three reproductively isolated genotype clusters were confirmed in Stichopus cf. horrens.
Outlier SNP loci related to reproduction, such as rhodopsin and tachykinin receptor signaling, were identified.
Limited gene flow was inferred among the genotype clusters.
Abstract
How reproductive barriers arise in early stages of divergence among broadcast spawning organisms that exist in sympatry remains poorly understood. Reproductively isolated lineages (Clade A and B) of Stichopus cf. horrens were previously reported across the western Pacific, with an additional putative cryptic species detected within the Clade B lineage warranting further examination. The present study further examines the hypothesis that the two mitochondrial lineages (Clade A and Clade B) of Stichopus cf. horrens represent putative cryptic species and whether another cryptic species within the Clade B lineage exists using a reduced representation genomic approach. Using double‐digest RAD (ddRAD) sequencing, a total of 9788 single nucleotide polymorphism (SNP) markers were used to examine divergence among Stichopus cf. horrens lineages (n = 82). Individuals grouped into three SNP…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
FIGURE 1
FIGURE 2
FIGURE 3
FIGURE 4| Location | Site code | Lat (°N) | Long (°E) | mtDNA | SSR | SNP | SNP Clusters | ||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Clade A | Clade B | SA‐1 | SA‐2 | SA‐3 | SA‐A | ||||||
| Sta. Ana, Cagayan | STA | 18.511 | 122.152 | 4 | 22 | 17 | 37 | 2 | 17 | 14 | 4 |
| Mabini, Davao de Oro | MAB | 7.297 | 125.836 | 16 | 3 | 18 | 21 | 20 | 1 | ||
| Pujada Bay, Davao Oriental | PUJ | 6.844 | 126.228 | 17 | 7 | 20 | 24 | 18 | 6 | ||
| Total | 37 | 32 | 55 | 82 | 40 | 24 | 14 | 4 | |||
| mtDNA clade | SSR cluster | SA‐1 | SA‐2 | SA‐3 | SA‐A | Total |
|---|---|---|---|---|---|---|
| A | SSR‐1 | 26 | 26 | |||
| A | SSR‐3 | 2 | 2 | |||
| A | SSR‐A | 4 | 4 | |||
| A | ND | 3 | 1 | 4 | ||
| B | SSR‐1 | 1 | 1 | |||
| B | SSR‐2 | 2 | 8 | 1 | 11 | |
| B | SSR‐3 | 2 | 5 | 7 | ||
| B | Admixed | 1 | 2 | 3 | ||
| B | ND | 7 | 2 | 9 | ||
| ND | ND | 2 | 6 | 5 | 2 | 15 |
| Total | 40 | 24 | 14 | 4 | 82 |
- —Schweizerischer Nationalfonds zur Förderung der Wissenschaftlichen Forschung 10.13039/501100001711
- —Philippine Council for Agriculture, Aquatic and Natural Resources Research and Development 10.13039/501100014166
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEchinoderm biology and ecology · Marine Bivalve and Aquaculture Studies · Coral and Marine Ecosystems Studies
Introduction
1
With the use of molecular approaches for species delimitation, cryptic species are now more readily detected and appear to be common in the marine environment (Bickford et al. 2007; Knowlton 1993, 2000; Nygren 2014). While allopatric speciation can be easily explained by physical barriers when cryptic species have disconnected distribution ranges, species with parapatric, overlapping, or sympatric ranges likely have a more complex evolutionary history (Bird et al. 2012; Faria et al. 2021). This complexity is often seen in the marine environment, where barriers to gene flow are less obvious and divergences along ecological boundaries are thought to occur more frequently than in the terrestrial environment (Bowen et al. 2013). In the marine environment, where external fertilization is common, it is more challenging for extrinsic ecological barriers to arise, leading to increased opportunities for hybridization and decreased opportunities for divergence. Mechanisms like gametic incompatibilities and variations in the timing and spatial variation of gamete release are considered key factors maintaining reproductive isolation in broadcast spawning species occurring in sympatry (Palumbi 1994; Swanson and Vacquier 2002; Bird et al. 2011). The evolution of these mechanisms to limit hybridization is acquired in various orders throughout the “speciation continuum” (De Queiroz 2005, 2007; Kulmuni et al. 2020; Nosil and Feder 2012; Seehausen et al. 2014; Roux et al. 2016; Stankowski and Ravinet 2021) and the existence of cryptic species offers an interesting toolbox to examine how reproductive barriers arise in early stages of divergence among broadcast spawning organisms that exist in sympatry.
The tropical sea cucumber Stichopus cf. horrens exhibits a high degree of morphological variability, making species identification challenging. A recent phylogenetic study revealed two divergent lineages (Clade A and B) within S. cf. horrens, where identification based on molecular data are incongruent with spicule morphology (Lizano et al. 2024). These lineages were further revealed to be reproductively isolated, with limited contemporary gene flow among sympatric populations. Moreover, an additional putative cryptic species within the Clade B lineage was also detected, warranting further examination. The low frequency of mitonuclear discordance, coupled with the absence of F1 hybrids between lineages, raises the question: how is reproductive isolation maintained between broadcast spawning cryptic species despite occurring in sympatry? Prezygotic isolation, particularly through the accumulation of gamete incompatibilities due to selection, is thought to be a primary mechanism reinforcing isolation in broadcast spawning marine organisms in sympatry (Levitan et al. 2004; Metz and Palumbi 1996; Landry et al. 2003; Coyne and Orr 2004; Bird et al. 2012). Positive selection on gamete recognition proteins (GRPs) has been reported in several echinoderm species (Lessios 2011; Sunday and Hart 2013; Patiño et al. 2016) and these proteins have been shown to establish reproductive barriers between closely related broadcast spawning individuals in sympatry (Palumbi 2009; Vacquier and Swanson 2011; Kosman and Levitan 2014). The role of GRPs in forming prezygotic reproductive barriers in sea cucumbers; however, is not yet known, and it remains unclear whether divergence in GRPs contributes similarly to speciation in other broadcast spawning organisms. Levels of polymorphism and patterns of divergence in GRPs vary across different groups. For example, in Ciona intestinalis , although GRPs are found to evolve more rapidly than proteins with other functions, the rates of evolution between allopatric and sympatric populations of the two reproductively isolated forms were not significantly different (Nydam and Harrison 2011). Similarly, in the sea stars Cryptasterina hystera and C. pentagona, little evidence of selection on gamete recognition genes was observed. Instead, signatures of selection were found in genes related to environmental interactions (Guerra et al. 2021). In addition, in Ophioderma longicauda , there is no evidence of positive selection on gamete recognition proteins; rather, positive selection was found on two genes encoding sperm ion channels involved in the signal transduction cascade in response to pheromones (Weber et al. 2017). Alternative mechanisms, such as habitat preference and synchronized spawning triggered by hormonal cues (Bierne et al. 2003; Hamel and Mercier 1995, 1999; Mercier and Hamel 2010), have been proposed. This implies that other genes involved in reproduction, potentially those upstream in the reproductive cascade, may also play a role in prezygotic isolation in sympatric broadcast spawning species.
In this study, we utilized single nucleotide polymorphisms (SNPs) analyzed via double‐digest restriction‐site‐associated DNA sequencing (ddRADseq; Peterson et al. 2012) to test whether similar patterns of genetic differentiation and clustering previously found within Stichopus cf. horrens using mitochondrial and microsatellite markers will be recovered. Specifically, we expect genotype clusters of Stichopus cf. horrens recovered using SNP loci to be concordant with mitochondrial lineages and microsatellite genotype clusters, and further exhibit significant genetic differentiation resulting from limited gene flow. Additionally, this study will use SNP outlier analysis to examine the hypothesis that higher levels of genetic divergence will be detected in gene regions or loci related to reproduction (e.g., GRPs). Genome scans are now commonly used to identify potential outlier loci by examining molecular signatures of selection (Ravinet et al. 2017). These techniques involve assessing genetic divergence (e.g., based on distance metrics such as F ST or d _ xy _) or admixture proportions (Martin et al. 2015) across thousands to millions of loci to pinpoint regions of the genome where variation deviates from neutral divergence models.
A previous study reported the existence of two cryptic species and suggested an additional third species but with weak support due to small sample sizes and limited resolution from six microsatellite markers (Lizano et al. 2024). Here in this study, using high resolution SNP markers, we recovered three genotype clusters exhibiting limited gene flow, providing evidence of the existence of three cryptic species within Stichopus cf. horrens. In addition, we did not recover highly divergent SNP loci with significant homology to GRPs; instead, we found other proteins that are related to receptor signaling, neurotransmitter receptor activity, response to testosterone, peptide hormone processing, and sperm motility. Findings from this study will provide valuable insights into the possible drivers of reproductive isolation in broadcast spawning organisms occurring in sympatry.
Materials and Methods
2
Sample Collection and DNA Extraction
2.1
This study used Stichopus cf. horrens individuals collected from three sites in the Philippines where cryptic species have been reported and occur in sympatry (Lizano et al. 2024): Sta. Ana, Cagayan (STA) and two sites in the Davao Gulf: Mabini, Davao de Oro (MAB), and Pujada Bay, Davao Oriental (PUJ) (Table 1) (Figure 1). Genomic DNA was extracted from preserved tissues (body wall or papillae in 96% Ethanol or RNAShield) using a QIAGEN DNeasy Blood and Tissue kit. DNA concentration was quantified using a fluorometer, and quality assessed by agarose gel electrophoresis. Of the 96 samples used in this study, some samples have been previously genotyped at the mitochondrial COI region (n = 69) and at six microsatellite loci (n = 55) for assignment to mitochondrial lineage and genotype clusters (Lizano et al. 2024, Table S1).
Sampling locations for Stichopus cf. horrens. Samples were collected from three sites in the Philippines (left) where three putative pseudocryptic species were reported to be sympatric: Sta. Ana, Cagayan (STA) (top middle), Mabini, Davao de Oro (MAB), and Pujada Bay, Davao Oriental (PUJ) (bottom middle). Photographs of two cryptic species within S. cf. horrens, designated as Clade A and Clade B (Lizano et al. 2024) are shown (right). Maps were generated in R using the sf, naturalearth, and ggplot2 packages.
Double‐Digest Restriction Associated DNA Sequencing and Quality Filtering
2.2
Library preparation and sequencing were performed at the Genomics Core Lab, Texas A&M University Corpus Christi. Libraries were prepared following the double digest RADseq (ddRADseq) protocol adapted from Peterson et al. (2012) using restriction enzymes PaeI and TasI for digestion of genomic DNA. Individually barcoded libraries were pooled, size‐selected, and sequenced on one lane of Illumina HiSeq 4000 (paired‐end, 150 bp). Demultiplexed raw reads were quality assessed using FASTQC (Andrews 2010). Reads were initially filtered for quality using the “process_radtags.pl” pipeline in STACKS version 1.48 (Catchen et al. 2011). Individual reads with phred scores below 10 or with ambiguous barcodes were discarded. Quality filtered reads were then filtered for contaminants using BBMap and BBSplit (https://github.com/BioInfoTools/BBMap) by mapping against bacterial and viroid sequences.
De Novo Assembly and SNP Filtering
2.3
RADseq loci were de novo assembled using the “denovo_map.pl” pipeline in STACKS. Different parameters were explored and evaluated based on the number of generated loci and the depth of coverage per sample. The final parameters selected for this study that yielded a suitable number of loci and minimum depth of coverage (10×) were a minimum read depth to create a stack (−m = 3), number of mismatches allowed between loci within individuals (−M = 7), and number of mismatches allowed between loci within catalog (−n = 8). The parameters used followed recommendations by Paris et al. (2017) where −M should be high for populations with high divergence and −n is either = M, M−1 or M + 1.
After de novo mapping, the “populations” module of STACKS was used to filter loci. We used a relaxed filtering criterion, retaining SNPs shared by 60% percent of individuals across all populations (−r = 60), retaining only one SNP per locus (−write_single_snp). SNPs with a minor allele frequency (MAF) < 0.1 were excluded to reduce the number of false polymorphic loci due to sequencing error. The resulting SNPs were exported in GenePop format for further analysis. As missing data can influence genetic clustering and diversity estimates, we applied three different thresholds for missing data: 40%, 20%, and 10%, to compare for congruence across analyses. Filtering was performed using dartR v2.9.2 (Gruber et al. 2018) to exclude loci and individuals with missing data greater than the specified thresholds.
Genotype Cluster Analysis
2.4
To examine patterns of genetic clustering of SNP genotypes, we used two clustering approaches: multivariate analysis and model‐based assignment methods. Multivariate analysis was performed using a principal component analysis (PCA) implemented in dartR::gl.pcoa. A model‐based assignment method implemented in the software ADMIXTURE v1.3.0 (Alexander et al. 2009) was used to infer the number of genetic clusters K and estimate individual ancestry coefficients (q) using a maximum likelihood approach. The optimal K was identified based on the lowest cross‐validation error estimated across 10 independent runs (for K = 1 to 10). Individual ancestry coefficients (q) were then assessed at the optimal K with 100 independent runs. Results were examined using “pophelper” v2.3.1 (Francis 2017). Individuals were assigned to a SNP genotype cluster when q > 0.9950 corresponding to operationally “pure” individuals and identified as admixed if otherwise (Caniglia et al. 2020).
We also visualized genetic relationships among SNP genotypes by calculating a Euclidean distance matrix among individuals using the dartR::gl.dist.ind function and performing hierarchical clustering using the hclust function in the R package stats v4.3.1 (R Core Team 2023). A dendrogram was generated using dendextend v1.17.1 (Galili 2015).
To compare genotype cluster assignments between SNP and microsatellite (SSR) markers, published SSR genotype data (https://doi.org/10.5281/zenodo.8273196) was re‐analyzed for 55 samples genotyped at six SSR loci and with matching SNP genotypes. PCA performed using the R package adegenet v2.1.3 (Jombart 2008) revealed three emergent SSR genotype clusters (SSR‐1, SSR‐2, SSR‐3; Figure S1). A Bayesian model‐based assignment approach implemented in the software STRUCTURE v.2.3.4 (Pritchard et al. 2000) was then used to estimate individual ancestry coefficients (q), at K = 3 to assign individuals to genotype clusters revealed by PCA. Ten replicate MCMC simulations were performed for each K value using an admixture model with correlated allele frequencies. Each run was carried out for 1 × 10^6^ iterations with an initial burn‐in of 100,000 steps. Individuals were assigned to an SSR cluster based on a threshold value of q > = 0.9 (Vähä and Primmer 2006), while individuals having q values between 0.10 and 0.9 were categorized as admixed (SSR‐A; Table S1).
Genetic Diversity and Differentiation
2.5
Genetic differentiation estimators were calculated for SSR and SNP genotype clusters. For SSR genotypes, measures of genetic differentiation between genotype clusters were estimated using F ST (Weir and Cockerham 1984) and a standardized G ST (G' ST; Hedrick 2005) corrected for highly polymorphic SSR loci, calculated using the package “diveRsity” v.1.9.9 (Keenan et al. 2013). The significance of F ST and G' ST (null hypothesis of genetic homogeneity, H o: F ST = 0) was evaluated by estimating the bootstrapped 95% confidence interval (95% CI). For SNP genotypes, we used the dartR package to calculate Weir and Cockerham's F ST and associated p‐values.
Outlier Loci Analysis and Functional Annotation
2.6
To identify SNP loci underlying divergence between genotype clusters, outlier loci were identified using three approaches: BayeScan v2.1 (Foll and Gaggiotti 2008), Arlequin v3.5.2.1 (Excoffier and Lischer 2010), and pcadapt v4.3.5 (Luu et al. 2017). BayeScan and Arlequin are F ST‐based methods that identify loci that exhibit higher genetic differentiation among defined groups than expected under a neutral model (Ahrens et al. 2018). BayeScan analysis was performed using 20 pilot runs, each consisting of 5000 iterations, followed by 100,000 iterations with a burn‐in of 50,000 iterations. We used a posterior odds (PO) threshold > 10 and q‐value < 0.05 after correction for false discovery rate (FDR; Benjamini and Hochberg 1995). Arlequin, meanwhile, accounts for hierarchical genetic structure in parameter estimations, and outlier analysis was performed using 200,000 simulations and 100 demes with default expected heterozygosity settings. In contrast, pcadapt identifies outliers as loci that contribute significantly to population structure following a principal component analysis (Luu et al. 2017). We used an alpha value of 0.05 and used the package qvalue v2.1.12 (Storey et al. 2024) to calculate FDR corrected q‐values, setting a threshold of q < 0.05 for outlier loci identification. All loci identified by any of the three methods were considered as candidate outliers.
To characterize the outlier loci, contigs were queried against the NCBI nonredundant database using BLASTx v 2.15.0 (Altschul et al. 1990). The BLAST search was restricted to Echinodermata with an e‐value cutoff of 10^−5^. Gene ontology (GO) annotation terms of the outlier loci were retrieved from the Gene Ontology database (Gene Ontology Consortium 2023) through UniProt (UniProt Consortium 2023). GO terms were summarized using REVIGO (Supek et al. 2011), a clustering algorithm that relies on semantic similarity measures, and were clustered using the simRel score for functional similarity, allowing for redundancy in similar terms up to a value of 0.7 before removal. The resulting GO clusters were visualized using a tree dendrogram, where hierarchical relationships between terms were inferred based on their principal component (PC) similarity. Hierarchical clustering was performed using the hclust function from the R package stats v4.3.1 (R Core Team 2023) and the dendrogram was generated using the R package ape v5.8 (Paradis and Schliep 2019).
Results
3
ddRADseq Data Filtering and SNP Genotyping
3.1
A total of 644,788,232 raw reads consisting of forward reads (146 bp) and reverse reads (151 bp) were obtained from 96 individuals collected from three sites (STA, MAB and PUJ). Eleven of the 96 samples were excluded due to low read recovery accounting for only 0.13% (892,096) of the total reads. Quality control and contaminant filtering yielded 459,329,688 reads consisting of sequences from 85 samples: 40 from STA, 21 from MAB, and 24 from PUJ, with a total of 1,941,591 variable sites and 4,296,347 heterozygous SNPs identified among individuals. The depth of coverage per individual ranged from 9×–23× with an average read depth of 14x across all samples. A total of 9788 SNPs were identified from the remaining 85 individuals. Three individuals with a high percentage of missing data (> 60% missing data) were excluded before filtering was conducted at varying thresholds of missing data. Applying three filtering thresholds for maximum percentage of missing data across loci and individual genotypes yielded the following datasets (Table S2): 40% (9788 loci, 82 individuals), () 30% (5011 loci, 80 individuals) and () 10% (2410 loci, 75 individuals). Results from the 40% missing dataset are presented unless indicated otherwise, based on concordance of results across the three SNP filtering thresholds.
Genotype Cluster Analysis
3.2
Cluster analysis of SNP genotypes using various approaches consistently recovered three groups that broadly corresponded with mitochondrial lineage and microsatellite genotype clusters. ADMIXTURE analysis identified the optimal number of clusters at K = 3 across all datasets with varying missing data thresholds. The majority of the samples were assigned to one of three clusters, hereafter designated as clusters SA‐1, SA‐2, and SA‐3, based on a threshold of q > 0.9950. A small number of samples exhibited mixed ancestry (0.0050 < q < 0.9950) and were identified as admixed (SA‐A; Table 2, Figure 2a), with the number of admixed individuals varying across missing data thresholds (n = 4–7, Table S3). For admixed individuals, the highest q‐values ranged from q = 0.800 to q = 0.974. Assignment to clusters was broadly concordant across all missing data thresholds, except for three samples identified as parental or pure categories at 40% missing data but identified as admixed at 20% and 10% missing data (Figure S2; Table S3). All admixed individuals occurred in one population (STA) and have ancestry from clusters SA‐1 or SA‐3 (Figure 2a).
Cluster analysis of Stichopus cf. horrens SNP genotypes: (a) Barplot of individual ancestry coefficients (q) from ADMIXTURE analysis of 82 Stichopus cf. horrens samples genotyped at 9788 SNP loci. Each bar on the x‐axis represents an individual, the y‐axis is the proportion of ancestry in each of three identified clusters, K: SA‐1, SA‐2, SA‐3; (b) Principal component analysis of the same individuals showing three genetic lineages (SP‐1, SP‐2, and SP‐3). Each point represents an individual, with lineages identified by shape (mtDNA Clade) and color (SSR genotype clusters). Individuals that were not typed at mtDNA and SSRs (ND = no data) are indicated as crosses (+); (c) Dendrogram of the same individuals based on genetic distance. Lineages are colored according to ADMIXTURE assignments (colored bars beside the dendrogram), with admixed individuals colored yellow.
The PCA plot segregates samples into three distinct, non‐overlapping groups, designated as SP‐1, SP‐2, and SP‐3 (Figure 2b). Excluding admixed individuals (SA‐A), cluster assignments were congruent between PCA and ADMIXTURE, that is, SP‐1 = SA‐1, SP‐2 = SA‐2, and SP‐3 = SA‐3 (Table S4). The primary axis (PC1 = 28.64% of the total variance) separated the 82 samples into two groups broadly concordant with mitochondrial lineages: one group consists predominantly of Clade A individuals (SP‐1; n = 34 of 38 samples), while the second group consists exclusively of Clade B individuals (SP‐2, SP‐3; n = 27). The second PCA axis (PC2 = 10.7% of the total variance) further segregates Clade B individuals into two groups broadly corresponding to microsatellite genotype clusters (SSR Cluster), with SP‐2 consisting mostly of SSR Cluster 2 individuals (n = 11 of 18 individuals), and SP‐3 consisting mostly of SSR Cluster 3 individuals (n = 5 of 9 individuals).
Hierarchical clustering of pairwise individual genetic distances likewise reveals three clusters broadly concordant with PCA and ADMIXTURE assignments (Figure 2c). Clusters SA‐2 and SA‐3 are genetically more similar, with a smaller mean pairwise Euclidean distance (mean = 88.7, range = 78.9–97), than either SA‐2 or SA‐3 is to SA‐1 (SA‐1 vs. SA‐2, mean = 98.0, range = 82.6–110.0; SA‐1 vs. SA‐3, mean = 92.3, range 78.2–101). Genotype clusters SA‐1 and SA‐2 occurred in all three sites, with SA‐1 being more abundant than SA‐2 (48.7% and 29.2% of samples, respectively). Cluster SA‐3, accounting for 17% of the samples, was restricted to STA.
Genetic Differentiation
3.3
The three SNP clusters exhibit significant genetic differentiation. Overall F ST (global F ST = 0.458, p < 0.001, n = 78 excluding admixed individuals) and pairwise comparisons between SNP clusters indicate significant differentiation (F ST > 0): SA‐1 and SA‐2 (F ST = 0.510), SA‐1 and SA‐3 (F ST = 0.434), SA‐2 and SA‐3 (F ST = 0.423). Differentiation between clusters is maintained even in sympatry, that is, in PUJ (SA1 and SA‐2, F ST = 0.5057, p < 0.001) and STA (SA‐2 and SA‐3, F ST = 0.422, p < 0.001), and exhibits much higher F ST values compared to within‐cluster comparisons for different sites. Cluster SA‐1 PUJ and MAB populations and Cluster SA‐2 PUJ and STA populations exhibit much lower F ST values than sympatric clusters (F ST = 0.015 and F ST = 0.004; p < 0.001; Figure S3a), despite their geographic separation (200 km and 1400 km, respectively).
The three microsatellite clusters likewise exhibit genetic differentiation (overall F ST = 0.177, G' ST = 0.591). All pairwise F ST comparisons reject the null hypothesis of homogeneity (F ST > 0): SSR Cluster 1 and SSR Cluster 2 (F ST = 0.226), SSR Cluster 2 and SSR Cluster 3 (F ST = 0.035), SSR Cluster 1 and SSR Cluster 3 (F ST = 0.156). Differentiation between clusters is significant even in sympatry in MAB (SSR Cluster 1 and Cluster 2, F ST = 0.231, p < 0.001) and STA (Cluster 2 and Cluster 3 STA, F ST = 0.176, p < 0.001; Figure S3b). SSR Cluster 2 STA and MAB populations exhibit lower F ST values (F ST = 0.156) compared to sympatric lineages at these sites. SSR Cluster 1 populations in MAB and PUJ are not significantly differentiated (F ST = 0), in contrast with differentiation detected by SNP markers.
Comparison of Lineage Assignments Across Mitochondrial, Microsatellite, and SNP Markers
3.4
Comparing lineage assignments across multiple marker types reveals broad concordance of SNP clusters with mtDNA and SSR lineage assignments for 54 individuals with data across all three marker types (Table 2, Figure 3). While mtDNA lineages recover only two lineages, these broadly correspond with SNP clusters, that is, SA‐1 consists predominantly of Clade A (n = 32 of 35 SA‐1 individuals), while SA‐2 and SA‐3 both belong to Clade B (18 SA‐2 and SA‐3 individuals). Meanwhile, individual assignments based on PCA and STRUCTURE analysis of SSR genotypes at K = 3 groups generally correspond with SNP clusters (Table 2): SA‐1 consists mostly of SSR‐1 (n = 26 of 35 individuals), SA‐2 is mostly SSR‐2 (n = 8 of 11 individuals), and SA‐3 is predominantly SSR‐3 (n = 5 of 8 individuals).
Comparison of lineage assignment barplots for 54 Stichopus cf. horrens individuals for mtDNA, SSR (at K = 2 and K = 3), and SNP markers. Each individual is represented by a vertical bar where the proportion of ancestry (q) in a cluster is indicated by colors corresponding to mitochondrial lineages (Clade A, Clade B), microsatellite genotype clusters inferred from STRUCTURE analysis at K = 2 and K = 3, and SNP genotype clusters inferred from ADMIXTURE analysis at K = 3. The horizontal dashed line represents q‐value thresholds for identifying admixed individuals for SSR data (0.1 < q < 0.9).
Excluding one SNP genotype with mixed ancestry (SA‐A), discordant assignments were observed between mtDNA and SNP lineages in three of 53 samples (SA‐1 genotypes with Clade B lineage observed at two sites, MAB and PUJ), and between SSR and SNP lineages in 13 of 53 samples. The majority of the mismatched SSR cluster assignments were identified as admixed by SSRs, but not by SNPs (7 of 13 assignments), while the remaining mismatches were accounted for by SA‐1 identified as SSR Cluster 2 or SSR Cluster 3 (n = 4), and SA‐2 identified as SSR‐3 (n = 2) (Table 2, Figure 3).
SNP Outlier Analysis and Annotation
3.5
A total of 108 outlier SNP loci were identified by BayeScan with statistically significant patterns of genetic differentiation. In addition, a total of 384 outlier loci were detected by Arlequin, and 1162 outlier loci were detected by pcadapt (Figure S4). All 1507 loci detected using the three methods were considered candidate loci potentially under selection and were subjected to annotation. A total of 157 outlier loci were successfully annotated from the pool of outliers detected with an e‐value cutoff of 10^−5^ and were mostly mapped against * Apostichopus japonicus and Holothuria leucospilota *. Functional annotation revealed GO terms related to cell‐matrix adhesion, response to testosterone, neuromuscular process, transmembrane transport, peptide hormone binding, and receptor signaling (Figure 4).
The tree dendrogram represents the hierarchical clustering of GO terms ((a) Molecular function, (b) Biological process) based on their semantic similarity. GO terms were grouped into clusters, with nodes colored according to the number of terms in each cluster with representative GO terms displayed per cluster. Clustering was performed using the principal component values derived from dimensionality reduction of the semantic similarity matrix implemented in REVIGO.
Discussion
4
The present study provides genomic evidence of further cryptic species within Stichopus cf. horrens and confirms the existence of an additional cryptic species within the Clade B lineage of Stichopus cf. horrens, first reported by Lizano et al. (2024). Using high‐resolution SNP markers, we confirm the presence of three divergent lineages within Stichopus cf. horrens that maintain reproductive isolation despite occurring in sympatry. These SNP genotype lineages are broadly concordant with mitochondrial lineage and microsatellite genotype assignments. We also identify several outlier loci underlying genomic divergence which provide insight into putative genes contributing to the development and maintenance of reproductive barriers among sympatrically‐occurring cryptic lineages of Stichopus cf. horrens.
SNP Loci Reveal Reproductive Isolation Among Stichopus cf. horrens Lineages
4.1
Using double‐digest RAD sequencing to interrogate genetic divergence among previously described Stichopus cf. horrens lineages provided unparalleled resolution for identifying genetic clusters, with 9788 SNP loci recovering three distinct groups (SA‐1, SA‐2, and SA‐3). Individual assignments were consistent across various clustering approaches and were also relatively robust to the proportion of missing data. While a small proportion of individuals (3 of 82 genotypes) exhibited discordant assignments across three missing data thresholds, this involved identification of pure individuals as admixed or vice versa, and there were no misassignments of individuals between pure clusters.
Genetic differentiation of sympatric SNP clusters, coupled with broad concordance with mitochondrial and SSR lineages, provides further evidence for reproductive isolation. Genetic differentiation among S. cf. horrens SNP lineages (F ST = 0.458) is similar to values for other cryptic echinoderm species also delineated using SNP markers, such as Stronglyocentrotus (F ST range = 0.467–0.497, Addison and Kim 2018) and Ophioderma (F ST range = 0.191–0.472; Weber et al. 2017). SNP‐based F ST estimates are also higher than microsatellite‐based F ST values, indicating greater resolution of SNP loci at detecting genetic divergence (Morin et al. 2004). Previous studies suggest that SNP allele lineages are less prone to homoplasy than microsatellites (Coates et al. 2009), and in the case of this study, the small number of microsatellite markers likely accounts for weaker resolving power (Osborne et al. 2023). The low proportion of admixed SNP genotypes (4 of 82 samples) points to limited contemporary gene flow. Moreover, ancestry coefficients of admixed individuals (maximum q value range = 0.8–0.97) are consistent with older‐generation hybrids, and are not typical of recent‐generation hybrids, that is, F1 or F2 where q values are expected to be between 0.35–0.70 (Caniglia et al. 2020). Thus, these results strongly suggest that the three SNP lineages of Stichopus cf. horrens represent different species.
The apparent mitonuclear discordance observed in a few samples belonging to the SA‐1 genotype but belonging to Clade B instead of Clade A lineage may be due to two scenarios: incomplete lineage sorting or hybridization. Past hybridization is likely, particularly considering the observed admixture between SA‐1 and SA‐3 genotypes (associated with Clade A and Clade B, respectively), with four admixed individuals found in STA. However, the discordant mitonuclear combinations were detected in MAB and PUJ, and not STA, which may also reflect past hybridization between SA‐1 and SA‐2 genotypes. Further analysis with larger sample sizes is recommended to test hypotheses of historical gene flow pointing to past hybridization or introgression events.
Outlier Analysis Reveals SNP Loci Related to Reproductive Processes
4.2
Fertilization in broadcast‐spawning invertebrates progresses through a series of interactions between sperm and egg, including sperm chemotaxis, which is the activation and attraction of sperm mediated by chemoattractants released by the egg, acrosome reaction, penetration, and membrane fusion (Vacquier 1998). We detected outlier loci with putative functions related to the fertilization process in broadcast spawning invertebrates. Outlier loci detected had relevant BLAST hits with proteins encompassing different functions such as rhodopsin and tachykinin receptor signaling, neurotransmitter receptor activity, response to testosterone, peptide hormone processing, cell‐matrix adhesion, sperm head, and sperm motility. Most of the outlier loci detected have functions involved in regulating the reproductive process by acting as receptors or by regulating receptor activity. High differentiation at these regions suggests the potential role of these outlier loci in the formation of pre‐zygotic barriers among divergent lineages of Stichopus cf. horrens.
Several outliers identified in this study are involved in cell signaling, a key process in cellular development, and its basic machinery involves a receptor that perceives signals such as light, hormones, or neurotransmitters (Trewavas and Malho 1997). Many of these outliers mapped to G protein‐coupled receptors (GPCRs), known for their roles in chemosensory reception. Although GPCRs are functionally pleiotropic, previous studies have shown how they can negatively or positively regulate reproduction, i.e., cannabinoid receptors (Rhodopsin type GPCR) found on sea urchin sperm are receptors of endocannabinoids that prevent polyspermy by blocking the acrosome reaction (Chang et al. 1993; Schuel et al. 1994) or the binding of gonadotropic neuropeptides to GPCRs to induce spawning (Iwakoshi 1995; Kato et al. 2009; Ohtani et al. 2002; Fujiwara et al. 2010; Yamano et al. 2013). In addition, putative tachykinin‐like receptors are GPCRs found to stimulate sperm motility (Satake et al. 2013) and are also identified to be involved with hormone reception in zebrafish (Biran et al. 2012). The outlier loci detected in this study suggest the potential role of chemosensory signaling in spawning, and previous studies have already shown how hormonal reception influences spawning behavior in sea cucumbers (Hamel and Mercier 1999; Kato et al. 2009; Marquet et al. 2018).
Aggregation and synchrony in spawning play a huge part in the reproductive success of broadcast spawners by offsetting sperm dilution due to water currents (Pennington 1985; Levitan 2000) and is the mechanism by which sympatric Hawaiian limpet species have diverged (Bird et al. 2011). Stichopus cf. horrens is known to display aggregative behavior during spawning, and in situ spawning observations have shown that Clade A and Clade B individuals spawn at different times. Clade A individuals spawned 3–4 days before the new moon, between 22:00 and 02:00 h of the next day, while Clade B individuals spawned 1–4 days after the new moon, at an earlier time, between 19:00 and 23:00 h (Juinio‐Meñez et al. 2024). How chemosensory reception influences the synchrony of spawning in Stichopus cf. horrens is unknown. However, earlier studies may provide valuable evidence on how chemical communication affects aggregation and synchrony in the spawning of sea cucumbers. While photoperiod is a major trigger of gametogenesis in marine invertebrates (Pearse and Cameron 1991; McClintock and Watts 1990), chemical communication plays a key role in fine‐tuning these processes in broadcast spawning invertebrates (Hamel and Mercier 1995; Soong et al. 2005; Mercier and Hamel 2010; Marquet et al. 2018). Mercier and Hamel (2010) have shown that gametogenesis was highly asynchronous in Cucumaria frondosa individuals kept separately even under natural conditions (Hamel and Mercier 1995) and that the mucus secreted by C. frondosa was found to induce gametogenic synchrony (Hamel and Mercier 1999). Holothuria arguinensis male sea cucumbers were also found to release chemicals that induce aggregation and spawning (Marquet et al. 2018). Male‐first spawning behavior is common among echinoderms, and it is generally observed that male sea cucumbers are more likely to respond to environmental cues than females (Mercier and Hamel 2010). Therefore, one possible explanation would be that males are triggered to spawn in response to certain environmental variables such as lunar phase, which will subsequently synchronize spawning using biological cues such as pheromones. Stichopus cf. horrens males spawn first during a specific lunar phase, followed afterward by the spawning of females (Juinio‐Meñez et al. 2024). This is similar to the previous observation in the crown‐of‐thorns starfish where males are found to be more responsive to environmental cues and subsequently synchronize spawning by using pheromones to induce spawning in females and other males (Caballes and Pratchett 2017). Divergence in pheromone receptors may thus exhibit variation in response to cues and ultimately in synchronicity and spawning success. Currently, there is no direct evidence on how divergence in hormone receptors can form a reproductive barrier in sea cucumbers, but a previous study on brittle stars found compelling evidence of positive selection on sperm‐expressed genes that encode the sodium–proton exchanger (NHE) and the tetrameric potassium‐selective cyclic nucleotide‐gated channel (TetraKCNG). These two genes form part of the signal transduction cascade within the sperm in response to pheromones (Weber et al. 2017). Moreover, numerous other studies in chemosensory systems have likewise shown how chemosensory receptors can be key players in speciation (Smadja and Butlin 2009; Van Schooten et al. 2020).
In echinoderms, where visual capabilities are largely confined to photoreceptors (Millott 1976; Yamamoto and Yoshida 1978; Blevins and Johnsen 2004) and where predator avoidance, localization of food sources and species recognition are mostly mediated through chemical signaling (McClintock and Vernon 1990; Campbell et al. 2001; Morishita and Barreto 2011; Lawrence 2013), the role of chemical receptors in survival is undeniably important. Variation in receptors vital to reproduction may thus affect synchrony of broadcast spawning species, and temporal differences in gamete release are one of the primary mechanisms contributing to prezygotic reproductive isolation between closely related marine species in sympatry (Palumbi 1994). The widespread occurrence of synchronous spawning among marine organisms suggests that this trait is strongly favored by natural selection. However, among potentially hybridizing species in sympatry and where hybrids have lower fitness than parental lineages, selection may thus favor asynchrony in gamete release (Geyer and Palumbi 2003; Wolstenholme 2004; Fogarty et al. 2012; Monteiro et al. 2016). The evolution of pre‐zygotic isolating mechanisms to limit hybridization is crucial in maintaining species cohesion and may be key factors driving speciation in sympatry (Palumbi 1994; Gardner 1997; Fukami et al. 2003; Levitan et al. 2004; Coyne and Orr 2004). How divergence in putative hormone receptors can facilitate aggregation and spawning synchrony in sympatric species of Stichopus cf. horrens remains to be uncovered.
Implications for Species Identification
4.3
Mitochondrial and microsatellite markers can diagnose two lineages, Clade A‐Cluster1 and Clade B‐Cluster 2, which match gross morphological differences in papillae distribution and density, but not spicule morphology (Lizano et al. 2024) and are further characterized by asynchronous spawning (Juinio‐Meñez et al. 2024). Genome‐wide SNP markers provide improved resolution over mitochondrial and microsatellite markers, clearly recovering a third lineage, previously detected as a sub‐lineage within Stichopus cf. horrens Clade B‐Cluster 2. Interrogating genomic variation using SNP genotyping methods addresses challenges in species identification such as morphological similarity and hybridization, and is expected to be pivotal in increasing the discovery and reporting of cryptic species.
Apart from the genetic data reported here, there is limited information on the morphology, ecology, and reproductive biology of the third cryptic species. These results call for a more comprehensive reassessment of the morphology, genetic variation, and ecology of Stichopus cf. horrens across its distributional range. The SNP markers reported in this study represent novel genomic resources that can be leveraged to develop sequence‐based methods for species identification, which can accelerate species assessments and monitoring efforts to support broader‐scale studies on the biology, ecology, and taxonomy of this species complex.
Conclusions
5
Findings from this study provide strong evidence that the three genotype clusters of Stichopus cf. horrens recovered using genomic SNPs are reproductively isolated and represent cryptic species. The recovery of the third genotype cluster thus confirms the existence of the third species within the Clade B lineage. Further investigation into the morphology, ecology, and reproductive biology of the third cryptic species, as well as broader reassessments of Stichopus cf. horrens across its distribution range, are still needed. F _ ST _ outlier analysis from this study revealed a set of highly divergent SNP loci that are mapped to putative genes involved in reproductive processes, such as G‐protein coupled receptors (GPCRs). The detection of outlier SNPs in gene regions related to receptor signaling and hormone response, and where a growing body of evidence points to hormone receptors as key players in spawning synchrony (Hamel and Mercier 1999; Mercier and Hamel 2010; Marquet et al. 2018) suggests that these pathways may play a key role in the reproductive isolation in sympatric populations of Stichopus cf. horrens. Findings from the present work provide a basis for further examination of the role of divergence in hormone receptors in the formation of pre‐zygotic isolating mechanisms among sympatric individuals with external fertilization.
Author Contributions
Kenneth M. Kim: conceptualization (equal), data curation (lead), formal analysis (lead), investigation (lead), methodology (lead), writing – original draft (lead), writing – review and editing (lead). Apollo Marco D. Lizano: methodology (supporting), writing – review and editing (supporting). Robert J. Toonen: resources (supporting), writing – review and editing (supporting), writing – review and editing (supporting). Rachel Ravago‐Gotanco: conceptualization (equal), formal analysis (equal), funding acquisition (lead), project administration (equal), supervision (lead), writing – original draft (supporting), writing – review and editing (equal).
Ethics Statement
No ethical considerations to declare.
Conflicts of Interest
The authors declare no conflicts of interest.
Supporting information
Figures S1–S4.
Tables S1–S4.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Addison, J. A. , and J.‐H. Kim . 2018. “Cryptic Species Diversity and Reproductive Isolation Among Sympatric Lineages of Strongylocentrotus Sea Urchins in the Northwest Atlantic.” Facets 3, no. 1: 61–78. 10.1139/facets-2017-0081. · doi ↗
- 2Ahrens, C. W. , P. D. Rymer , A. Stow , et al. 2018. “The Search for Loci Under Selection: Trends, Biases and Progress.” Molecular Ecology 27, no. 6: 1342–1356. 10.1111/mec.14549.29524276 · doi ↗ · pubmed ↗
- 3Alexander, D. H. , J. Novembre , and K. Lange . 2009. “Fast Model‐Based Estimation of Ancestry in Unrelated Individuals.” Genome Research 19, no. 9: 1655–1664. 10.1101/gr.094052.109.19648217 PMC 2752134 · doi ↗ · pubmed ↗
- 4Altschul, S. F. , W. Gish , W. Miller , E. W. Myers , and D. J. Lipman . 1990. “Basic Local Alignment Search Tool.” Journal of Molecular Biology 215, no. 3: 403–410. 10.1016/S 0022-2836(05)80360-2.2231712 · doi ↗ · pubmed ↗
- 5Andrews, S. 2010. “FASTQC. A Quality Control Tool for High Throughput Sequence Data.”
- 6Benjamini, Y. , and Y. Hochberg . 1995. “Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing.” Journal of the Royal Statistical Society, Series B: Statistical Methodology 57, no. 1: 289–300. 10.1111/j.2517-6161.1995.tb 02031.x. · doi ↗
- 7Bickford, D. , D. J. Lohman , N. S. Sodhi , et al. 2007. “Cryptic Species as a Window on Diversity and Conservation.” Trends in Ecology & Evolution 22, no. 3: 148–155. 10.1016/j.tree.2006.11.004.17129636 · doi ↗ · pubmed ↗
- 8Bierne, N. , F. Bonhomme , and P. David . 2003. “Habitat Preference and the Marine‐Speciation Paradox.” Proceedings of the Royal Society of London. Series B: Biological Sciences 270, no. 1522: 1399–1406. 10.1098/rspb.2003.2404.PMC 169138012965032 · doi ↗ · pubmed ↗
