The Genomics of Convergent Adaptation to Intertidal Gravel Beaches in Mediterranean Clingfishes
Maximilian Wagner, Philipp Resl, Nadine Klar, Jonathan M Huie, Iliana Bista, Shane McCarthy, Michelle Smith, Richard Durbin, Stephan Koblmüller, Hannes Svardal

TL;DR
This study explores how clingfish species in the Mediterranean have evolved similar traits in response to gravel beach habitats, using genomics and morphology.
Contribution
The paper introduces a novel convergence score statistic and integrates multi-level genomic and morphological data to study convergent adaptation in clingfishes.
Findings
No large-scale genomic or protein-level convergence was detected.
Candidate regions at the level of single variants, genes, and pathways were identified, including a haplotype around the gene adam12.
The complex genetic basis of traits like eye size and skull shape may explain the lack of simple genomic convergence patterns.
Abstract
Understanding the genetic basis of widespread phenotypic convergence, particularly for complex morphological traits, remains a major challenge in evolutionary biology. The Mediterranean gravel beach clingfishes of the genus Gouania provide an excellent system to study this phenomenon. Within this genus, two distinct morphotypes, “slender” and “stout,” have repeatedly evolved, adapting to different microhabitats. These morphotypes differ in multiple complex traits, including body elongation, head compression, vertebral number, eye size, and the structure of the adhesive disc. First, to scrutinize phylogenetic convergence, we combined 3D morphometrics of the pelvic girdle and skull, with molecular species delimitation based on >660 DNA barcodes, and a phylogenomic framework based on more than 3,400 single-copy orthologs. Second, by employing whole-genome resequencing and a novel…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Fig. 1
Fig. 2
Fig. 3
Fig. 4- —Austrian Academy of Science
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIchthyology and Marine Biology · Developmental Biology and Gene Regulation · Genetic diversity and population structure
Introduction
The independent evolution of similar phenotypic adaptations in nonsister lineages (convergent or parallel evolution) is one of the most intriguing biological phenomena and can help provide answers to fundamental questions such as the contingency of evolution and the molecular underpinnings of adaptation. Teleost fishes hold some of the most compelling examples of convergent evolution in the animal kingdom, eg in the context of marine–freshwater transitions (Thacker and Gkenas 2019; Magalhaes et al. 2021; Friedman et al. 2022), or with respect to adaptations to extreme environments (Wang and Guo 2019) such as caves (Powers et al. 2020), sulfuric springs (Tobler et al. 2011), high altitudes (Yang et al. 2021), or the deep sea (Shen et al. 2019). The accumulation of cases of convergent evolution in the context of adaptation to unstable environments is not a coincidence, but likely reflects the strength of selection and constraints of evolutionary outcomes under such extreme environmental conditions (Losos 2011; Xu et al. 2020).
Among the most demanding environments on the planet are the interstitial spaces of marine gravel beaches. The prevailing dynamic and hostile conditions, including tidal fluctuations, salinity changes, and especially intense mechanical forces, allow only highly adapted species to survive. Among the few vertebrates that have evolved to thrive in extreme interstitial gravel ecosystems are Japanese wormgobies (Luciogobius, Gobiidae) and the blunt-snouted clingfishes (Gouania, Gobiesocidae) endemic to the Mediterranean (Yamada et al. 2009; Wagner et al. 2019). Both gobies and clingfishes independently evolved a ventral sucking disc, but in clingfishes, the disc is also supported by the pelvic girdle, potentially enhancing evolutionary modularity and adaptive potential (Huie et al. 2022).
Compared with other species in their families, both genera have independently evolved an extreme worm-like and partly fossorial appearance, which is primarily driven by an increase in vertebral numbers and a reduction of fins and eye size (Briggs 1955; Yamada et al. 2009; Wagner et al. 2019). Furthermore, Mediterranean Gouania clingfishes and Japanese wormgobies have further diversified in these environments along an axis of body elongation, presumably adapting to distinct microhabitats (Yamada et al. 2009; Wagner et al. 2023).
Specifically, within Gouania, the five recently discovered species have been assigned to “stout” and “slender” morphotypes, with slender species having an increased number of vertebrae, a more compressed head shape, and smaller eyes, fins, and sucking discs compared with their stout counterparts (Wagner et al. 2019, 2021). While there is only a single stout to intermediate type in the western Mediterranean, separate pairs of stout and slender species co-exist in the Adriatic and eastern Mediterranean, respectively (Fig. 1a). Where sympatric, the two morphotypes occupy different microhabitats, with slender species more likely to be found in beach areas with smaller gravel sizes, compared with their stout congenerics, consistent with divergent adaptation to distinct interstitial space profiles (Wagner et al. 2023). A previous multigene phylogeny (Fig. 1b) suggested that the different morphotypes evolved repeatedly in different Gouania lineages (Wagner et al. 2019). However, evidence based on the analysis of whole-genome data is lacking.
Morphological convergence, distribution, and phylogenetic relationships of the blunt-snouted clingfishes (Gouania). (a) Currently known distribution ranges and sampling sites for diversity screening (see Supplementary Results/Discussion). The map has been adapted after Wagner et al. (2021). (b) Previous research (Wagner et al. 2019) suggested that Gouania morphotypes, slender and stout evolved convergently. (c and d) 3D geometric morphometric analysis of the (c) pelvic girdle and the (d) neurocranium clearly separate two morphotypes, slender and stout, along the first two principal components but generally not species within morphotypes.
Convergent evolution at the molecular level is a complex question and can unfold in various ways. First, since the genetic basis of complex phenotypes is most often polygenic, some underlying loci might be shared but not others (Cerca 2023). Second, qualitatively different levels of sharing of a molecular basis can be distinguished. At the lowest level, adaptive alleles can be shared between convergently evolving lineages because they appeared once in a shared common ancestor. Such shared mutations from a single origin—referred to as “identity by descent” (IBD) in population genetics or “collateral evolution” sensu (Stern 2013)—may be the result from introgression between lineages undergoing convergent evolution, or ancestral polymorphisms that were maintained over time and later independently selected in diverging species pairs. As an alternative to IBD, mutations could appear independently in different convergently evolving lineages, either affecting the same nucleotide positions, the same genes but different nucleotides, or different genes within the same or different molecular pathways (Elmer and Meyer 2011). Distinguishing between these different alternatives is challenging (Stern 2013; Alaei Kakhki et al. 2023), but critical for our understanding of the processes behind evolutionary convergence.
In this study, we combine 3D morphometrics, molecular species delimitation based on DNA barcodes, and genomics to scrutinize convergent evolution in Gouania and investigate its molecular underpinnings. We specifically focus on the different levels of molecular convergence, testing for signatures at the level of nucleotides, sequences, genes, and pathways.
Results/Discussion
Convergent Phenotypic Divergence Among Gouania Lineages Confirmed by 3D Geometric Morphometrics and Phylogenomics
A number of different terms have been employed to describe the evolution of similar phenotypic adaptations in nonsister lineages (Arendt and Reznick 2008; Stern 2013; Alaei Kakhki et al. 2023; Cerca 2023). The most prominent—convergent and parallel evolution—have variably been used to distinguish between different phylogenetic levels, molecular bases, and trajectories of phenotypic change (Arendt and Reznick 2008; Elmer and Meyer 2011; Stern 2013; Alaei Kakhki et al. 2023; Cerca 2023). In this study, we will not distinguish between the terms (following Arendt and Reznick 2008), but focus on questions about the (shared) molecular basis of phenotypic convergence.
To confirm the previously suggested phenotypic dichotomy of slender and stout morphotypes of Gouania (Wagner et al. 2019), we investigated 26 linear measurements (Fig. S1), as well as 3D shape differences of the pelvic girdle and the neurocranium (Fig. 1c and d) in a sample of 27 individuals of the five recognized species. Both the 3D geometric morphometric and the linear analyses clearly separated the five species into the two main morphotypes, slender and stout (Figs S1 and 1c and d). Overall, the first two principal components (PCs) explained 49.16% and 64.39% of shape variation for the pelvic girdle and the neurocranium, respectively. Shape differences of the pelvic girdle reflected changes in the width of the girdle with slender types showing an overall wider pelvis compared with stout morphotypes (Fig. 1c). Notably, Gouania willdenowi, the stout species not co-occurring in sympatry with a slender species, occupied a slightly more intermediate area of the morphospace (Fig. 1c). For the neurocranium, the first principal component mainly loaded on differences representing a narrower dorsolateral, but a wider dorsoventral shape in slender compared with stout morphotypes (Fig. 1d). Finally, since the adhesion capacity of clingfishes correlates with the area or size of the sucking disc (Huie et al. 2022), we also compared disc sizes among species and found them to be significantly smaller in slender morphs (Fig. S2).
Our morphological results nicely align with previous findings that show a more compact, but elongated body shape is preferred by slender Gouania inhabiting smaller interstitial spaces (Wagner et al. 2019, 2021, 2023). Respectively, the overall smaller but wider pelvic girdle in slender Gouania species might indicate adaptations to narrower interstitial spaces, where suction ability becomes less important due to smaller grain sizes compared with those occupied by stout Gouania species (Wagner et al. 2023).
The interpretation of morphological convergence in the genus might be flawed by overlooked (cryptic) diversity. To address this, we analyzed mitochondrial cytochrome-c-oxidase I (COI) DNA barcodes from 668 individuals of 23 populations of all described species across previously recorded sites (Fig. 1a). The results strongly support the validity of the applied five species taxonomic model for Gouania (Fig. S3; see Supplementary Results/Discussion).
Previous phylogenetic analysis based on nine nuclear loci and one mitochondrial marker suggested that slender–stout divergence happened at least two times in parallel in Gouania (Wagner et al. 2019). To scrutinize this result, we assembled draft genomes for four Gouania species and the shore clingfish (Lepadogaster lepadogaster), the presumed sister-lineage to Gouania (Conway et al. 2020), and used those genomes together with recent reference genomes of Gouania adriatica and the jeweled blenny, Salarias fasciatus (Rhie et al. 2021) in the phylogenomic analysis, based on 3,406 single-copy orthologous BUSCO genes (Tables S1 and S2).
Both a concatenated maximum likelihood (ML) phylogeny (Fig. 2a) and an ASTRAL species tree inferred from individual gene trees (Fig. S4) suggested that the slender Gouania hofrichteri is the sister group of the Adriatic and Eastern Mediterranean stout species, G. adriatica and Gouania orientalis, while the Western Mediterranean stout G. willdenowi and the Adriatic slender Gouania pigra form a sister group basal to those lineages. However, despite obtaining high bootstrap support (and quartet scores from the ASTRAL tree) for all nodes, we observed low site- and gene-wide concordance factor values, especially in deeper nodes close to the root (Fig. 2a). Furthermore, a neighbor-joining tree of pairwise differences showed different phylogenetic relationships, placing G. hofrichteri as the global outgroup to all other species (Figs S5 and 2b). Overall similar topological incongruences were observed in a previous study based on a concatenated tree using nine nuclear loci and one mitochondrial marker, which, only after inclusion of the mitochondrial marker, resolved G. hofrichteri as sister to all other Gouania species (Wagner et al. 2019).
Phylogenetic incongruence at the base is most likely explained by ancient gene-flow. (a) The “best” ML tree estimation from a concatenated dataset of 3,406 single-copy BUSCO genes. (b) ML-based constrained tree to the neighbor-joining topology (Fig. S5), revealed a slightly different topology with G. hofrichteri being the global outgroup to all other Gouania species, albeit with lower overall log-Likelihood values. For each node, the bootstrap support values, the gene- and site-concordance factors (ie the fraction of genes or single nucleotides that support a certain node) are given. The trees are further annotated according to slender (circles) and stout (triangles) morphotypes. (c) D-statistics (Dmin) show an excess allele sharing between different Gouania lineages (all values are significant after correction for multiple testing; Table S3). (d) The best-scoring phylogenetic network (restricted to one hybrid edge) calculated with PhyloNetworks based on all ML trees inferred from 3,406 BUSCO loci. The network shows a 38.3% introgression from the ancestor of G. hofrichteri into the ancestor of G. adriatica and G. orientalis.
Since low concordance factors and inconsistencies between tree building methods can be a sign of complex (network-like) phylogenetic relationships, we next investigated whether gene flow contributed to the evolution of Gouania. Indeed, ABBA-BABA tests of excess allele sharing revealed strong signals of network-like evolution, consistent with G. hofrichteri constituting an outgroup lineage to other Gouania species that had genetic exchange with an ancestor of G. adriatica and G. orientalis (Fig. 2c and Table S3). The directionality of such a gene flow event cannot be distinguished in ABBA-BABA tests, but a reconstruction based on phylonetworks confirmed a contribution from a lineage related to the ancestor of G. hofrichteri into the ancestor of G. adriatica and G. orientalis. We thus consider this the most likely evolutionary scenario (Fig. 2d). Importantly, all phylogenetic inference and gene flow results as well as phylogenetic hypothesis testing support at least two slender–stout transitions within Gouania and thus the presence of convergent evolution (Fig. S6 and Tables S3 and S4; see Supplementary Results/Discussion).
Given the unresolved phylogenetic relationships, the ancestral state (slender or stout) of the Gouania radiation remains uncertain. Throughout this study, we interpret the stout morphotype as the ancestral state based on ecological and evolutionary evidence. First, the immediate outgroup lineages to Gouania, including the sister genus Lepadogaster, are uniformly stout-bodied and occupy similar gravel (and partially intertidal) microhabitats (Briggs 1955; Conway et al. 2020). For example, G. willdenowi occurs in sympatry with L. lepadogaster across much of its distribution range and inhabits gravel as well as boulder fields (Hofrichter and Patzner 2000; Wagner et al. 2021). Second, the slender morphotype is characterized by several reductive traits, such as reduced eye size and fins. It is widely accepted that the reduction or loss of complex traits is more parsimonious than their complete re-evolution from a reduced state.
No Evidence for Accelerated Protein Evolution
To investigate the genomic basis of phenotypic convergence, we analyzed population level whole-genome sequencing data for 58 specimens (10 to 13 individuals per species). Our final variant call-set included 22 million biallelic variants that passed filtering criteria (19 million single nucleotide polymorphisms [SNPs], 3 million indels) with most variation observed in noncoding parts of the genome such as introns, and intergenic regions (Fig. S7). To test whether ecomorph divergence was accompanied by accelerated protein evolution, we computed the genome-wide neutrality index for all species comparisons (Stoletzki and Eyre-Walker 2011). Out of six slender–stout comparisons, two showed a neutrality index significantly smaller than 1.0 consistent with a genome-wide excess of positive selection. However, the neutrality indices for slender–stout comparisons were not significantly lower compared with within ecotype comparisons (Fig. S8; NI_TG_, P = 0.7542). Consistent with this, only a maximum 23 out of 22,687 genes showed significant evidence for positive selection having acted on them, with the largest numbers consistently found in comparisons that involved G. hofrichteri (Fig. S9).
No Signals of Convergent Evolution at Large Genomics Scales
To test whether larger genomic regions show signals of convergent evolution between slender–stout species pairs, we investigated summary statistics in genomic windows of 20,000 bp. The cluster separation score (CSS), a measure of convergent evolution (Jones et al. 2012; Miller et al. 2019), did not show any significant outliers. This suggests the absence of longer shared haplotypes between convergently evolving species. The presence of longer shared haplotypes would have been expected if convergence were due to relatively recent genetic introgression between species pairs.
Even without a recent shared origin, the same genomic regions could have responded to selection in the different slender–stout species pairs independently, thus leading to shared signals of genetic divergence across diverging ecomorphs. To test for this, we calculated net nucleotide divergence (Da) between slender–stout species pairs (Fig. S10) and investigated the overlap of signals of high divergence across comparisons. We got 1,998 and 2,058 outlier windows for the two pairwise comparisons, respectively. In total, 145 outlier windows, corresponding to 294 genes, were shared between the two comparisons, which is similar to what would be expected simply by chance (ie 117 windows at P > 0.05; Fig. S10), suggesting that genomic signals of slender–stout divergence are not shared at larger genomic scales.
Convergent phenotypes can arise through multiple molecular routes, ranging from shared variants at homologous genomic positions to independent changes in different genes that participate in the same functional pathways (Elmer and Meyer 2011; Stern 2013; Cerca 2023). Consequently, different analytical approaches are expected to capture complementary rather than identical signals of convergence. In the following sections, we therefore explore convergence at increasing molecular resolution, from genomic windows to individual variants and pathway-level patterns.
Single-Nucleotide Changes Point Towards Convergent Adaptation in Morphological and Neurological Developmental Processes
To investigate whether single genomic variants reflected convergent evolutionary patterns, we developed the “convergence score” (Fig. 3a), a statistic that measures whether allele frequency shifts at SNPs happened convergently (ie increased frequency of the same allele in same-morph comparisons, convergence score > 0) or divergently (increased frequency of the same allele in opposite-morph comparisons, ie divergence, convergence score < 0). Unlike window-based approaches, which are sensitive to shared haplotypes or extended regions of reduced divergence, single-variant analyses can detect convergent allele frequency shifts even when they occur on otherwise independent genetic backgrounds. These approaches therefore capture distinct aspects of molecular convergence operating at different evolutionary scales.
Homologous changes (snps and indels) and associated genes and functional enrichments based on convergent allelic frequency shifts (convergence score). (a) Convergence score values for each variant (snps and indels) along the 23 chromosomes of the Gouania genome. Negative values indicate divergent allele frequency shifts whereas positive values show convergent allele frequency shifts (ie consistent with morph assignment). Values higher than 0.25 and lower than −0.25 are highlighted in green and yellow, respectively (see the Materials and Methods section). Note that in the figure only the downstream names of genes are given. (b) Venn diagram of intersection between genes associated with outlier regions for negative and positive scores and (c) overlap between biological functional enrichment categories obtained for the unique 519 and 481 genes. (d) GO enrichment network showing the relationships among the 44 unique GO categories associated with positive changes. (e) Chromosome 15 details and highlights of the adam12 haplotype. (f) Genotype plot of all significant positive outliers in the adam12 region (green dots shown in (e); convergence scores ≥ 0.25). The missense mutation on adam12 exon 3 is highlighted in bold (position 10,988,596).
Overall, of the 22 million variants, 1.4% showed a nonzero convergence score, corresponding to allele frequency shifts in both ecomorph comparisons. Among those, there was a small but significant excess of convergent changes (156,640 vs. 154,608, binomial test P = 0.000272).
Among convergently fixed variants (convergence score = 1), there was a mutation on chromosome 2 upstream of slitrk6, a gene expressed early in the development of the nervous system of zebrafish and throughout adulthood, whereby expression has been mainly detected in the otic vesicle (Round et al. 2014). The gene has been found to impact retinal development and has been associated with myopia in mice and humans (Tekin et al. 2013). Convergently fixed changes also included an intron variant on chromosome 3 in the gene adgrg1, important in brain development in humans (Singh and Lin 2021), a variant on chromosome 4 upstream of the gene agbl4—a paralog of abgl5 that is a promising candidate to cause retinitis pigmentosa, a sickness that could lead to the progressive loss of vision up to blindness (Kastner et al. 2015)—and an intergenic variant on chromosome 16 between mbpb and znf236.
To identify further candidate regions, we focused on the variants with convergence scores of at least 0.25 in absolute value, corresponding to a total allele frequency shift of at least 0.5 (Fig. 3a). The resulting 994 outlier sites again showed a small but significant excess of convergent changes (535 vs. 459, binomial P = 0.01732). Associated with these outliers, we detected 519 and 481 genes that only showed positive and negative outliers, respectively (Fig. 3b). For the 519 genes associated with positive convergence scores, we found 44 uniquely enriched gene ontology (GO) terms (ie significantly associated with genes of positive convergence score outliers but not with negative ones; Fig. 3c). These included neurological terms such as “regulation and development of the central nervous system”, “cell- or synaptic-signalling” and “embryonic camera-type-eye formation”, but also one larger term representing “T-cell differentiation” (Fig. 3d and Table S5).
The appearance of neurological and visual-system-related development candidate genes as well as the enrichment of associated gene ontologies (eg “embryonic camera-type-eye formation”) among convergence outliers is not unexpected given the significant differences in head and eye sizes between slender and stout morphotypes (Wagner et al. 2019, 2021). Compared with stout species, slender Gouania, which inhabit narrower and darker interstitial spaces, encounter a lower light environment and relatively stronger mechanical forces (Wagner et al. 2023). Consequently, convergently evolved retardations of eye development in slender species might follow a similar trajectory as eyesight loss in cave fishes compared with their surface-dwelling relatives (Jeffery 2005). Such changes might constitute an energy-saving mechanism, but might also be an adaptation to reduce risk of mechanical injury, as open wounds in fishes quickly become infected by fungi or bacteria. Indeed, throughout the sampling period, we often found adult individuals that lost one or even both eyes (Fig. S11).
In line with this, it is worth noting that Gouania (and potentially also other clingfishes) lack immunoglobulin genes, which are essential for the adaptive immune response in other organisms (Mirete-Bachiller et al. 2021). The engagement of T-cells could reflect an alternative immune response to the immunological challenges faced by Gouania and could reflect prevailing pathogen pressures in divergent microhabitats where slender and stout morphotypes coexist. Consistent with this interpretation, another convergently fixed variant corresponds to a premature stop codon on il15ra chromosome 6, a gene that plays a central role in T-cell-mediated immune responses in fishes (Jiang et al. 2024).
Nonetheless, the predominance of noncoding variants among convergently fixed and convergence-outlier sites suggests an important role for regulatory convergence in Gouania, although this enrichment may partly reflect the larger mutational target size of noncoding regions.
Extended Region of Convergent Evolution Around adam12
Consistent with the absence of large windows of genomic convergence seen above, genetic variants with high convergence scores generally did not cluster in specific genomic regions (Fig. 3a). However, a notable exception was a peak of several convergent variants on chromosome 15, which were mainly within and downstream of the gene adam12 (Fig. 3e). This gene encodes a transmembrane protein with various functions such as cell-to-cell interactions, cell adhesion, and intracellular signaling (Kveiborg et al. 2008). Most of the variation in this region (again) corresponded to noncoding nucleotides (Fig. S12) but we found a single missense mutation in the third exon of adam12 (chromosome 15: 10,988,596 or adam12 exon 3: 43 AA position), which affected the two slender morphs (Fig. S13). In this exon, we detected two and one more private missense mutations for the slender G. pigra and G. hofrichteri, respectively (Fig. S13). Additionally, the region also included several noncoding RNAs (lncRNA) downstream of adam12 and a gene upstream of adam12 (ENSGWIG00000018693; adam12-like) that was annotated as lncRNA but showed high sequence similarity to adam12 (Fig. 3e) consistent with a relatively complex evolutionary history of these genomic regions that involves gene duplication. However, we found no substantial difference in relative sequencing coverage among species in this region, suggesting that the observed signal is not due to copy number variation (Fig. S14).
The striking accumulation of convergently evolved genetic variants (convergence score ≥ 0.25) in the adam12 region makes it unlikely that convergent changes evolved independently among species pairs, but rather indicates common ancestry of the mutations present in species of the same morphotype (Fig. 3f). However, the fact that haplotypes are overall still relatively divergent (Fig. S15) speaks against recent transfer through introgression, but rather suggests a relatively old shared origin, potentially accompanied by some form of recombination suppression that helped to link variants together.
In zebrafish, adam12 plays a role in juvenile growth, particularly by controlling genes affecting bone and cartilage development. Adam12-deficient fish exhibit reduced body growth without major developmental defects (Tokumasu et al. 2016). In cichlid fishes, this gene is linked to soft tissue development involved in hypertrophy and craniofacial development, including parts of the neurocranium (Conith et al. 2018, 2024). Furthermore, adam12-deficient zebrafish exhibit smaller ligaments compared with wild types and show truncated coronoid processes of the mandible (Conith et al. 2018). Overall, this makes adam12 a strong candidate gene for convergent adaptation of Gouania morphotypes.
Nonhomologous Fixed Changes Associated With Developmental Processes
The convergence score investigated above does not capture nonhomologous molecular convergence, that is, changes leading to convergent phenotypic adaptation that affect different parts of the same gene or different genes in the same pathway (Elmer and Meyer 2011). To investigate the potential presence of such changes, we identified genes with independently fixed genetic variants (ie sites corresponding to a fixed difference between one species and all others) and tested for overlap within morphotypes in terms of the gene content and function (Fig. 4). We focused on one pair each of convergently evolved slender and stout species (ie excluding G. adriatica from the analysis), but an analysis including all species yielded similar results (Fig. S16; see Supplementary Results/Discussion).
Nonhomologous fixed variation involved in convergent evolution of the genus Gouania. (a) Number of sites where one allele is fixed in the given species but absent in all others, as well as the associated number of genes. (b) Venn diagram of number of genes with differentially fixed mutations. Highlighted are the 42 and 1,850 genes associated with differentially fixed mutations that overlap between the stout (G. willdenowi and G. orientalis) and slender (G. hofrichteri and G. pigra) morphotypes, respectively. (c) After conducting a GO enrichment analysis of random comparisons of gene sets of morphotypes, only 31 unique GO terms remain for convergent comparisons. (d) Network showing the 31 GO terms unique to convergent patterns.
The number of differentially fixed variants and corresponding genes (“differential fixation candidate genes”) were substantially different between species (Fig. 4a), with G. hofrichteri showing by far the largest number of fixed differences consistent with its inferred basal phylogenetic position (Figs 2a and 4a). We found that 42 genes showed fixed differences in both stout morphotypes but in neither of the slender morphs, while 1,808 genes showed the opposite pattern (Fig. 4b). The larger number in the latter category can be explained by the overall larger number of genes identified in G. hofrichteri. Overall, there was no significant over-representation of genes shared between within-morphotype comparisons compared with between morphotype comparison (permutation; P = 0.33).
For the total 1,850 morphotype-specific differential fixation candidate genes (1,808 slender + 42 stout), we detected 59 significantly enriched biological process categories, 31 of which were not found in any permutation of slender and stout labels (see the Materials and Methods section; Fig. 4c). Several of these unique GO terms involved embryonic organ and skeletal development (eg “skeletal system development, embryonic cranial skeleton morphogenesis”) (Fig. 4d and Table S6).
Finally, to scrutinize that the identified candidate genes do not just correspond to genes under strong purifying selection, which are expected to show an accelerated fixation of linked neutral variants, we investigated their Direction of Selection (DoS) scores, but found no excess of purifying selection compared with the genomic background (Fig. S17, also when including G. adriatica, Fig. S18).
Conclusion: The Importance of Considering Genomic Convergence at Different Molecular Levels
Convergently evolved morphological traits often correspond to highly complex biological structures with a complex genetic basis expected to be highly polygenic (Cakan et al. 2012), involving gene interactions (epistasis) (Conith et al. 2018) and pleiotropic effects (Morris et al. 2019; Rennison and Peichel 2022). This is also expected for convergent traits in Gouania, such as the shape of the neurocranium or other complex morphological features, such as the number of vertebrae.
Our analysis yielded evidence supporting convergent evolution at the level of single nucleotides, with identified genes strongly enriched for relevant developmental biological pathways involved in slender and stout morphotypes of Gouania. By contrast, window-based analyses such as CSS—an established measure of parallel evolution—did not reveal any signatures of genomic convergence. This statistic, originally developed to study convergence in recently diverged stickleback populations, likely failed in our study, because of the relatively old age of the Gouania radiation (ie several million years vs. thousands of years) and concomitant old signals of genetic admixture (Marques et al. 2019; Wagner et al. 2019). This means that recombination had sufficient time in Gouania to unlink adaptive alleles inherited from a common ancestor from their genetic background.
Although we cannot conclusively address what proportion of convergently evolved single nucleotide variants are identical by descent (as opposed to corresponding to recurrent mutations), at least the signals of convergent evolution around adam12 are not expected to have arisen independently. They rather point toward a single mutational origin of this putatively adaptive allele which further underlines the importance of old genetic variation in ecological diversification (Marques et al. 2019).
Finally, although homologous and nonhomologous approaches highlighted partly distinct candidate gene sets, both consistently implicate developmental and regulatory processes relevant to morphological differentiation in the Gouania radiation. This concordance at the functional level strengthens the inference of convergent molecular mechanisms underlying the slender and stout morphotypes and highlights the importance of considering layered genomic convergence, which may reflect different underlying processes and timescales of convergent evolution.
Materials and Methods
Diversity Screening (DNA-barcoding Analyses) and Species Delimitation Methods
To assess the geographic variability and genetic diversity in the Gouania genus, we analyzed a total set of 668 DNA barcodes (partial sequences of the mitochondrial cytochrome-c-oxidase subunit 1 [COI] gene) from 23 populations. A total list of all barcodes downloaded for this study (Gouania spp.: N = 122 from (Wagner et al. 2017, 2019); Lepadogaster ssp.: N = 11 from (Wagner et al. 2017) and newly created barcodes [N = 544, GenBank accession numbers OL839338 to OL944591] can be found in Table S7). We generated the DNA barcodes following the procedures and protocols outlined in (Wagner et al. 2019) using the primers CO1_Fish_F1: 5' TCAACCAACCACAAAGACATTGGCAC 3' and CO1_Fish_R1: 5' TAGACTTCTGGGTGGCCAAAGAATCA 3' (Ward et al. 2005).
All sequences were aligned using MUSCLE (Edgar 2004) in MEGA v7.0 (Kumar et al. 2016) and we tested for the best fitting substitution model according to BIC (K2 + G). Then we conducted a phylogenetic analysis based on three datasets including all specimens (I), collapsed identical haplotypes (II), and species-specific datasets (III). For some of the analyses that require outgroups, we used sequences of the genus Lepadogaster downloaded from GenBank. We used the collapsed dataset mainly for reducing the computational load for the Bayesian barcoding clustering analyses (see below). Then we calculated an ML tree in IQtree v2.1.4-beta (Minh et al. 2020) using the automatic evolutionary model search function (Kalyaanamoorthy et al. 2017) and applying a resampling of 1,000 ultrafast bootstraps (Hoang et al. 2018). Additionally, we calculated a maximum clade credibility tree (MCC tree) for the reduced dataset in BEAST v2.6.6 (Bouckaert et al. 2019), using a Birth-Death prior, applying the best model inferred in MEGA v7.0 (Kumar et al. 2016: 201) and a uniformly distributed substitution rate (0.0323 [0.022, 0.0424]; obtained from Conway et al. (2017) for COI; see Wagner et al. (2019). The analysis was run for 200 million generations, with log- and tree-files stored every 20,000 generations. Finally, we assessed the convergence of the MCMC runs in Tracer v1.7.1 (Rambaut et al. 2014) and visualized the trees in FigTree v1.4.4 (available at http://tree.bio.ed.ac.uk/software/figtree/).
To assess species richness within the genus, we employed a variety of single locus species-delimitation methods. In the first step, we applied the distance-based DNA clustering algorithms Assemble Species by Automatic Partitioning (ASAP; Puillandre et al. 2021) and Automatic Barcode Gap Discovery (ABGD; Puillandre et al. 2012) by applying the Kimura-2-parameter model. We selected two thresholds 0.011 and 0.063 for ABGD because they represent the highest intraspecific divergence values in two plateau phases in the estimated number of groups (Fig. S19). On the reduced haplotype dataset (II), we calculated the Bayesian Poisson Tree Processes (bPTP) model (Zhang et al. 2013), using specimens of Lepadogaster as an outgroup. We discarded a burn-in fraction of 0.25, ran the analysis for 5,000,000 MCMC iterations using a sampling frequency of 100, and visualized the values obtained from the heuristic and ML searches. Furthermore, we applied the Mixed Yule Coalescent (GMYC) model (Fujisawa and Barraclough 2013) on the collapsed MCC tree using the “gmyc” function from the SPecies LImits by Threshold Statistics (splits) library in R.
Classical and 3D Geometric Morphometrics
Based on 26 linear measurements obtained from (Wagner et al. 2021), we conducted a Principal Coordinates Analysis (PCoA) using the ape package (Paradis and Schliep 2019) based on a dissimilarity matrix using bray-curtis distance indices from the vegan package (Dixon 2003) in R v4.2.1 (R Core Team 2021). The individual measurements were standardized by standard length. Additionally, we used 3D geometric morphometrics to compare cranial and pelvic girdle morphology across Gouania. Micro-CT scans representing the five Gouania species (n = 5 to 6 individuals per species) were obtained from a previous study (Wagner et al. 2021) and MorphoSource.org. We generated skeletal models of the neurocrania and basipterygia and placed landmarks on them using the SlicerMorph toolkit in 3D Slicer (Kikinis et al. 2014; Rolfe et al. 2021). Thirty-three fixed landmarks were placed on each neurocranium, and 18 fixed landmarks and 6 semi-landmarks on each basipterygium. We performed a Generalized Procrustes Superimposition on each of the landmark datasets in R with the “geomorph” package (Adams and Otárola-Castillo 2013). To visualize the major axes of shape change, we conducted principal component analyses and plotted shape variation in morphospace. We warped representative neurocranial and pelvic girdle meshes to estimate shapes associated with extreme ends of the morphospace using the “Morpho” R package (Schlager 2017).
Whole-Genome Resequencing, Draft-genome Assemblies, and Variant Calling/Filtering
We re-sequenced whole genomes of a total of 59 fish for this study based on Illumina short read 150 bp paired-end data. The obtained sequencing coverage ranged from 5 to 30× and we aimed for a balanced sample for each of the five Gouania species and morphotypes, “slender” and “stout” (ie 10 to 13 individuals per species). Additionally, we re-sequenced one specimen representing an outgroup of the genus, Lepadogaster. All raw data are stored at European Nucleotide Archive (ENA) under the project accession number PRJEB49819 (Table S8).
In addition to the existing Gouania reference genome (derived from a G. adriatica individual), we assembled draft genomes from our short-read data for the other four Gouania species and the outgroup species L. lepadogaster de novo. For the genome assemblies, we initially filtered raw reads with trimmomatic v0.39 (Bolger et al. 2014). Read quality for raw and filtered reads was assessed with FastQC v0.11.9 (Anders 2010) and summarized using MultiQC v1.9 (Ewels et al. 2016). Next, we used DISCOVAR de novo (Weisenfeld et al. 2014), SOAPdenovo v2.40 (Luo et al. 2012), and ABySS v2.0.2 (Simpson et al. 2009) to assemble filtered reads into contigs. Since ABySS requires the kmer size as input, we assessed the optimal size using kmergenie 1.70 (Chikhi and Medvedev 2014) along a range of kmer sizes (21 to 121) with a step-size of 6.0. For each “best” obtained kmer size, we added and subtracted a value of 4 and assembled contigs for each specimen using all three kmer sizes (referred to as “optimal k” = best k, “lower k” = best k—4, and “upper k” = best k + 4) in ABySS. We assembled for each individual independently contigs in DISCOVAR de novo and SOAPdenovo. We assessed the quality and contiguity of the contigs in quast v5.0.2 (Gurevich et al. 2013). Contigs assembled from the above-mentioned three input kmer sizes were filtered according to highest N50 values (Table S1). Finally, the selected draft assemblies were scaffolded using a reference-guided approach in RagTag v1.0.1 (Alonge et al. 2022) using the Gouania v. fGouWil2.1 (Rhie et al. 2021) reference genome as backbone. For assessing the “best” scaffolded assemblies, we ran BUSCO v4.1.3 (Waterhouse et al. 2018) using the set actinopterygii_odb10. We selected draft genomes with (first) the highest number of single-copy BUSCO genes, (second) highest N50, and (third) lowest numbers of N and for the subsequent phylogenetic analyses (Table S2).
We then aligned raw reads from re-sequencing data of 59 fishes (10 to 13 individuals per species) against fGouWil2.1 (Rhie et al. 2021) using BWA v0.7.17 (Li and Durbin 2009) to create a call-set for further genomic exploration. After inspection of the mapping quality using samtools flagstat, we obtained variant sites using bcftools v1.14 (Danecek et al. 2021). We filtered/masked sites which had a mapping quality of zero for more than 10% of reads mapping to that site or which showed an overall mapping quality of <50. Additionally, we masked sites where the mapping quality between the two strands was significantly different (P < 0.001) and sites that showed extremely high (*>*97.5 percentile) or low (<2.5 percentile) depth. We filtered heterozygous sites which had a significantly biased depth of reference and alternative alleles by means of binomial test (PHRED score >20), which represents between 0.02% and 0.15% of heterozygous sites per individual. Furthermore, we masked sites showing InbreedingCoeff < 0.2 (ie excess heterozygosity) and sites with more than 20% of missing genotypes. For the subsequent investigation, only biallelic SNPs or indels were considered leading to a total of 22,348,287 sites across the 23 chromosomes. Finally, we annotated the single variants and their effects using SnpEff (Cingolani et al. 2012b) and the gene annotation version fGouWil2.1.99.
Phylogenomic Analyses
For inferring the phylogenetic relationships within the genus Gouania we used the phylogenomic pipeline phylociraptor (commit #6a6c0eb) (Resl and Hahn 2023). Phylociraptor extracts BUSCO genes from (draft) genomes and calculates a concatenated ML tree in IQtree v2.0.7 (Minh et al. 2020), a neighbor-joining tree and a species tree in Astral v5.7.1 (Zhang et al. 2018) based on the corresponding set of single-copy BUSCO genes. In addition to our draft assemblies above, representing the species G. willdenowi, G. pigra, G. hofrichteri, G. orientalis, and L. lepadogaster, we downloaded the latest reference genomes of G. adriatica (fGouWil2.2) and another outgroup of the family of Blenniidae, S. fasciatus (fSalaFa1.1), from NCBI (Rhie et al. 2021). Orthologous genes were identified with phylociraptor using BUSCO v3.0.2 and the gene set actinopterygii_odb9. We filtered out multicopy orthologs, orthologs with missing data in more than one individual (minsp = 6) and less than 11 parsimony sites to retain only phylogenetically informative single-copy orthologs. Alignment and trimming of the individual sequences for each locus was done using MAFFT v7.464 (Katoh and Standley 2013) and trimAl v1.4.1 (Capella-Gutiérrez et al. 2009). We performed phylogenomic inference on the remaining trimmed alignments of 3,406 loci. For each locus, we applied an automatic search for the best substitution model in IQtree (Kalyaanamoorthy et al. 2017) with 1,000 ultrafast bootstrap resampling replicates (Hoang et al. 2018).
From the concatenated data, we inferred an ML tree in IQtree with 1,000 ultrafast bootstrap replicates and a species tree in Astral using the default parameters in phylociraptor. Furthermore, for the tree based on concatenated data, we computed site and gene wide concordance factors (Mo et al. 2023). We then tested three different constrained tree topologies: (i) a topology without convergence (ie two main clusters comprised of slender and stout morphotypes); (ii) G. willdenowi as sister to all other Gouania species (as a scenario explaining a refilling from the Mediterranean after the Messinian salinity crisis); and (iii) G. hofrichteri as the global outgroup (ie the neighbor-joining topology). We compared them with an approximately unbiased test of phylogenetic tree selection (Shimodaira 2002) using 10,000 RELL replications (Kishino et al. 1990) in IQtree. We then calculated the gene-wise phylogenetic signal (dGLS) (Shen et al. 2017) to assess the contribution of single loci to the overall phylogeny.
Finally, to test for potential introgression events, we constructed an explicit phylogenetic network from all trees inferred for all 3,406 loci with the SNaQ (Species Networks applying Quartets) method (Solís-Lemus and Ané 2016). We used the Phylonetworks v.0.14 (Solís-Lemus et al. 2017), implemented in the Julia package using the code and procedures following (Lescroart et al. 2023).
Window and Site-wise Investigation for Searching Candidate Genomic Regions
We conducted several analyses in fixed-length windows along the genome to detect regions under convergent differentiation between groups. For the window-based analyses, we excluded indels, but not for the site-based analyses (see below).
In a first step, we used the “popgenWindows.py” script from https://github.com/simonhmartin/genomics_general to estimate the genetic diversity (pi) of each of the five Gouania species and to calculate average net between-group divergence (Da):
where D_xy_ is the average genetic distance between two species and π_A_ and π_B_ represent the intraspecific diversity within each species A and B. We calculated D_a_ in sliding windows of 20,000 bp and a step size of 10,000 bp for each of the 23 chromosomes for all species combinations. Furthermore, we calculated the CSS (Jones et al. 2012; Miller et al. 2019) between the slender and stout Gouania morphotypes, mainly following the procedures described in De-Kayne et al. (2022) using a window size of 20,000. For D_a_ in windows, we considered as outliers the upper five percent of all data points and for CSS a permutation test was applied as in De-Kayne et al. (2022). We then extracted all gene information within the window boundaries of outlier coordinates using SnpSift (Cingolani et al. 2012a).
Site-wise Investigation for Searching Candidate Genomic Regions
We assessed the extent of convergent evolution on single sites. We first estimated the allele frequency at each site for each of the five Gouania species using bcftools. Then, we developed an allele frequency-based statistic to assess the extent of convergent evolution for two sets of groups (in our case “slender” and “stout”) belonging to four species using the formula below:
AF corresponds to the allele frequency and A and B to the two convergent slender and stout species pairs. In our concrete case, we tested the species G. hofrichteri (slenderA) and G. orientalis (stoutA) as well as G. pigra (slenderB) and G. willdenowi (stoutB). We chose this species combination because G. orientalis is the sister species of G. adriatica and therefore would not add independent genetic information in terms of convergent evolution in the genus. We considered sites with convergence scores <−0.25 or >0.25 as outlier since they correspond to an average allele frequency change of 0.5 between the morphotypes.
Since we found an interesting region with many positive outliers in and around the gene adam12 on chromosome 15 (position 10,900,000 to 11,111,000), we further investigated this region in greater detail. We also extracted an IUPAC-coded multiple sequence alignment from vcf files of exon 3 of adam12 and visualized the translated sequence based on the reverse strand (the direction of expression of adam12) as sequence logos using WebLogo3 (Crooks et al. 2004).
Then, to test for convergent fixed mutations identical by state, we checked for alleles that were fixed in either stout or slender morphotypes but not between any stout or slender. Specifically, we checked for SNPs or indels that are fixed for one allele in either of the Gouania species but fixed for the other allele in all other species (fixed differences). We then extracted a list of genes associated with these variants from the SnpEff annotation and removed duplicated gene names. Next, we checked for candidate genes using the formulas below (note that the all species comparison is only available in the Supplementary Results/Discussion).
A two-species comparison (excluding G. adriatica):
and B all-species comparison:
Since fixed differences might reflect regions under strong purifying and not divergent selection, due to a lower effective population size and increased drift in these regions, we did a McDonald–Kreitman test (McDonald and Kreitman 1991) based on the SnpEff annotation. Because mean NI values can be biased, we calculated genome-wide NI_TG_ and to test whether NI_TG_ values significantly differed from 1 (the expectation under neutrality), we performed a parametric bootstrap with 1,000 iterations. We then calculated the DoS for single genes following (Stoletzki and Eyre-Walker 2011) and applied a Fisher's exact test to determine significant positive selection (ie NI < 1) from all available genes and species pairs.
To assess potential functional convergence, we performed GO enrichment analyses using g:Profiler (Reimand et al. 2007) on all gene sets identified in the convergence score and fixed allele analyses. All the GO analyses were run in the same way using the candidate gene lists as input, zebrafish as organism, a custom background gene list (ie all genes obtained from the SnpEff annotation) and applying a significance threshold of 0.05 based on Benjamini–Hochberg FDR (Benjamini and Hochberg 1995).
For the candidate genes obtained from the fixed AF differences, we furthermore randomly reshuffled the formulas (3 and 6) above and conducted an independent GO analyses for each obtained gene set. Then we checked for overlaps and unique GO terms of the random sets and the obtained biologically meaningful (slender vs. stout) comparisons. Finally, we also checked for genes enriched for certain transcription factor binding sites from the input gene sets in g:Profiler following the same procedure described above for the GO analysis.
Supplementary Material
evag031_Supplementary_Data
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Adams DC, Otárola-Castillo E. Geomorph: an r package for the collection and analysis of geometric morphometric shape data. Methods Ecol Evol. 2013:4:393–399. 10.1111/2041-210X.12035. · doi ↗
- 2Alaei Kakhki N, et al A phylogenomic assessment of processes underpinning convergent evolution in open-habitat chats. Mol Biol Evol. 2023:40:msac 278. 10.1093/molbev/msac 278.36578177 PMC 10161543 · doi ↗ · pubmed ↗
- 3Alonge M, et al Automated assembly scaffolding using Rag Tag elevates a new tomato system for high-throughput genome editing. Genome Biol. 2022:23:258. 10.1186/s 13059-022-02823-7.36522651 PMC 9753292 · doi ↗ · pubmed ↗
- 4Anders S . Babraham bioinformatics—Fast QC a quality control tool for high throughput sequence data. Soil 2010:5. http://www.bioinformatics.babraham.ac.uk/projects/.
- 5Arendt J, Reznick D. Convergence and parallelism reconsidered: what have we learned about the genetics of adaptation? Trends Ecol Evol. 2008:23:26–32. 10.1016/j.tree.2007.09.011.18022278 · doi ↗ · pubmed ↗
- 6Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B Methodol. 1995:57:289–300. 10.1111/j.2517-6161.1995.tb 02031.x. · doi ↗
- 7Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for illumina sequence data. Bioinformatics. 2014:30:2114–2120. 10.1093/bioinformatics/btu 170.24695404 PMC 4103590 · doi ↗ · pubmed ↗
- 8Bouckaert R, et al BEAST 2.5: an advanced software platform for Bayesian evolutionary analysis. P Lo S Comput Biol. 2019:15:e 1006650. 10.1371/journal.pcbi.1006650.30958812 PMC 6472827 · doi ↗ · pubmed ↗
