Microsatellite Markers Developed Based on Transcriptomic Data Reveal the Genetic Diversity and Population Genetic Structure of Angulyagra polyzonata in Guangxi, China
Shengjie Zhang, Dapeng Wang, Kangqi Zhou, Yong Lin, Zhong Chen, Junqi Qin, Xuesong Du, Liuping Long, Caiqun Zhang, Xianhui Pan, Wenhong Li

TL;DR
Researchers developed new genetic markers to study the declining freshwater snail Angulyagra polyzonata in China, revealing low genetic diversity and population structure.
Contribution
Nine novel microsatellite markers were developed from transcriptomic data to assess genetic diversity and population structure of A. polyzonata.
Findings
Genetic diversity in A. polyzonata populations has declined, with 73% of variation within populations and 27% between populations.
UPGMA clustering and structure analysis divided the 12 populations into two subgroups, indicating significant genetic differentiation.
The LA population shows high genetic diversity and is suggested for prioritized protection.
Abstract
This study focused on Angulyagra polyzonata, an economically important freshwater snail in Guangxi, China, whose wild populations have declined sharply due to overharvesting. To assess its genetic status, we developed nine novel microsatellite markers via transcriptomic analysis following the screening of a total of 798,244 SSR loci. These markers were then used to analyze 360 individuals from 12 wild populations across the region. This study provides crucial baseline data for conserving A. polyzonata and highlights the value of integrating whole-genome data into future research to refine management strategies. Additionally, the developed microsatellite markers represent valuable tools for the ongoing monitoring of this ecologically and economically important species. Angulyagra polyzonata is a significant freshwater snail species in southern China. However, its wild resources have…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8- —National Key R&D Program of China
- —Guangxi Science and Technology Program
- —China Agriculture Research System Guangxi Innovation Team
- —Projiect of Financial Funds of Ministry of Agriculture and Ruaral Affairs: Investigation of Fishery Resources and Habitat in the Pearl River Basin
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenetic diversity and population structure · Identification and Quantification in Food · Genomics and Phylogenetic Studies
1. Introduction
Angulyagra polyzonata, a significant freshwater snail species belonging to the Gasteropoda class, Viviparidae family, and Angulyagra genus, is widely distributed across Southern China and various Southeast Asian countries [1]. In China, it can be found in freshwater lakes, rivers, streams, and ditches. Renowned for its high nutritional value and distinct flavor [2], this snail has long been a favorite among consumers. Moreover, its appealing appearance and distinct edge patterns have recently propelled it to emerge as a rising star in the ornamental market. In recent years, Liuzhou Luosifen from Guangxi has gained global popularity, standing out as one of China’s geographically representative products [3]. In 2024, the total output value of the Luosifen industry exceeded RMB 75 billion. As a core raw material in this booming industry, the market demand for snails is enormous, with annual consumption surpassing 1.5 million tons. However, China’s Viviparidae economic freshwater snail breeding output is only approximately 96,900 tons, leaving the vast majority to be sourced through wild fishing [4]. This heavy reliance on wild populations, coupled with the impacts of human activities, has significantly affected the population size of A. polyzonata in the wild. Under such circumstances, establishing a method to assess the current status of genetic resources in China’s A. polyzonata populations has become a critical and urgent task.
Microsatellite markers, also known as simple sequence repeats (SSRs), are widely used in population genetic analyses due to their convenience and practicality [5,6]. The currently commonly used methods for developing microsatellite markers are categorized into five types: traditional gene library construction, microsatellite enrichment, homologous transfer, public database search, and transcriptome sequencing methods [7]. Broadly speaking, the transcriptome refers to the collection of all RNAs transcribed within a cell under a specific condition; narrowly speaking, the transcriptome typically refers to the collection of all RNA transcripts. The research methods currently used for determination of the transcriptome mainly fall into two categories: one is based on hybridization techniques, including microarray technology and gene chip technology; the other is primarily based on sequencing technology, including expression sequence tag technology, gene expression series analysis technology, and RNA sequencing technology [8]. RNA sequencing (RNA-seq) is a typical representative of “next-generation” sequencing technology, which can sequence millions of DNA or RNA molecules at once [9]. Compared to other sequencing technologies, RNA-seq offers several advantages, including high throughput, high resolution, high sensitivity, no species restrictions, a wide dynamic range, and good repeatability. Simultaneously, this sequencing technology can also identify positional genes and discover new transcripts in the species [10]. Since the advent of microsatellite markers, their application has extended to various freshwater snail species, including Cipangopaludina chinensis [4], C. cathayensis [11], Bellamya purificata [12,13], and Promenetus exacuous [14].
Current research on A. polyzonata primarily focuses on its nutritional components [2], its role in parasite transmission [15], and its habitat distribution [16]. In genomic studies, Zhang and others assembled the mitochondrial genome structure of A. polyzonata using second- and third-generation sequencing while exploring its phylogenetic relationships within the Viviparidae family [17]. Additionally, Zhu and others analyzed SSR molecular markers in A. polyzonata from Hunan, China, using the Microsatellite Identification Tool (MISA), revealing high heterozygosity and a low number of repetitive sequences in the species [18]. However, investigations into the population genetic diversity and genetic structure of A. polyzonata remain relatively limited. In this study, we employed the RNA-seq transcriptome method to design nine pairs of specific microsatellite primers for A. polyzonata in the Guangxi region. The findings are expected to provide new insights for the conservation of wild germplasm resources and artificial selective breeding of A. polyzonata.
2. Materials and Methods
2.1. Experimental Materials and DNA Extraction
The materials used for the experiment were sourced from 360 wild A. polyzonata specimens collected across 12 regions of Guangxi between June and July 2024 (Figure S1). The specific locations included Yongning (YN), Tiandeng (TD), Long’an (LA), Longzhou (LZ), Luchuan (LC), Fangcheng (FC), Qinnan (QN), Yinhai (YH), Xingdao (XD), Shatian (ST), Hezhou (HZ), and Liunan (LN). Thirty samples were taken from each location. Details, including the latitude and longitude of sampling sites, as well as the number of individuals, are presented in Figure 1 and Table 1.
All samples were transported under low-temperature conditions to the Guangxi Academy of Fishery Sciences, where they were temporarily housed in stepped tanks to enable the recovery of activity. Following anesthesia with MS-222, the snails were dissected (approved by the Guangxi Institutional Animal Care and Use Committee (GACUC number 201703021; date: 30 September 2024), and fresh muscle tissue was harvested, placed into 1.5 mL centrifuge tubes, and preserved in an appropriate volume of 95% anhydrous ethanol at −20 °C in a medical refrigerator. Genomic DNA of A. polyzonata was extracted using the Omega Animal DNA Extraction Kit (REF: D3396-02, Omega Bio-Tek Inc., Norcross, GA, USA). The quality and integrity of extracted DNA were assessed via 1% agarose gel electrophoresis, while its concentration and purity were determined using a NanoDrop ONE spectrophotometer (ThermoFisher, Waltham, MA, USA).
2.2. Library Construction and SSR Search
One DNA sample was selected from each of three sampling sites: Fangcheng (FC), Yinhai (YH), and Shatian (ST). Libraries with an insert fragment size of 400 bp were constructed for each sample, followed by paired-end (PE) sequencing using the Illumina NovaSeq platform based on next-generation sequencing (NGS) technology. Initial sequencing data statistics are detailed in Table S1. Raw sequencing data were filtered using fastp (v0.20.0; https://github.com/OpenGene/fastp, accessed on 15 June 2025) to remove 3′ end adapter contamination and retain high-quality sequences [19]. Quality filtering employed a sliding window approach with a 5 bp window, which was slid from the 3′ end to the 5′ end to calculate base quality (Q) values. Bases within the current window were truncated if the Q value was below 20; otherwise, sliding was terminated, and PE reads were retained or discarded based on their length.
Given the PE sequencing mode, data from all samples were merged into a combined dataset (designated as popA). High-quality reads were then obtained by overlapping PE sequences using FLASH (v1.2.11; https://ccb.jhu.edu/software/FLASH/, accessed on 16 June 2025) [20] with the following parameters: minimum overlap length = 10 bp, maximum mismatch density = 0.2, and “outie” pair allowance = false. Detailed information for popA is provided in Table S2. Microsatellite (SSR) loci were identified using MISA (Microsatellite Identification Tool; http://pgrc.ipk-gatersleben.de/misa/, accessed on 17 June 2025) [21] with the following parameters: ≥10 mononucleotide repeats; ≥6 dinucleotide repeats; and ≥5 tri-, tetra-, penta-, and hexanucleotide repeats. The maximum interval between two SSRs was set to 100 bp, and reverse complement sequences as well as shifted permutations were treated as identical SSR types.
2.3. SSR Clustering and Polymorphism Assessment
Repetitive sequences in the sequences were masked using the Perl program (replaced with the letter “R”), and SSRs with flanking sequences shorter than 20 bp were filtered out. The filtered sequences were clustered using cd-hit (v4.5.7; https://github.com/weizhongli/cdhit, accessed on 18 June 2025) [22] with the following parameters: nucleotide sequence similarity set to 90%, coverage to 70%, and gap penalties specified as -gap 1 -gap-ext 0. Sequences containing two or more SSRs were counted and clustered separately. The statistical results of SSR clustering are presented in Table S3. Clustering results were further analyzed using the Perl program, with each cluster categorized based on the length of the SSR. Polymorphism of each cluster was determined as follows: a polymorphism value of 1 was assigned if all SSRs in the same cluster had identical lengths, a value of 2 if two distinct lengths were present, etc. The statistical results of SSR polymorphism for each cluster are shown in Table S4.
2.4. Design of SSR Primers
Primer3 (version 2.3.6; https://sourceforge.net/projects/primer3/files/primer3/2.3.6/, accessed on 20 June 2025) [23] was used to design primers for SSR sequences within clusters with polymorphism values > 2, with primer binding sites located at both ends of the sequences. The 5′ end of the upstream primer was modified to include the M13 universal primer sequence (TGTAAAACGACGGCCAGT), and M13 primer sequences labeled with different fluorescent groups were synthesized. The length of the target amplified fragment was controlled within 100–400 bp, and the amplification range was set from the first base upstream of the repetitive sequence to the fifth base downstream of the repetitive sequence. A total of 144 pairs of primers were generated; primers with flanking sequence lengths less than 20 bp and target amplification fragment lengths exceeding the range of 100–400 bp were excluded. Through evaluation using the Primer3 software (version 2.3.6), primers with a high probability of self-complementary sequences were excluded. Initially, 33 pairs of primers with potential polymorphisms were selected.
2.5. Verification of SSR Loci and Screening of Polymorphisms
To validate the SSR loci and screen for polymorphic markers, the 12 aforementioned geographical populations of A. polyzonata from Guangxi were used. The simplex PCR strategy was employed for the amplification of the 9 loci, meaning that each reaction system specifically amplified only 1 locus. The reaction was performed on a Veriti 384 PCR instrument (Applied Biosystems, Waltham, MA, USA) with the following program: label each pair of forward primers at its 5′ end with a fluorescent dye (TAMRA, HEX, ROX, or FAM). First, add 5.0 μL of 2× Taq PCR Master Mix reagent and 1.0 μL of DNA, and perform a 5-min pre-denaturation treatment at 95 °C. Then, perform a 30-s denaturation treatment, a 30-s gradient annealing (between 62 and 52 °C), and extend for 30 s at 72 °C, running 10 cycles; add 0.5 μL of the upstream and downstream primers (concentration 10 pmol/μL), and add 3.0 μL of ddH_2_O. Perform a 30-s denaturation treatment at 95 °C, a 30-s annealing treatment at 52 °C, and a 30-s extension treatment at 72 °C. Run 25 cycles, extend at 72 °C for 20 min, and finally store at 4 °C. Each individual’s DNA sample undergoes 9 independent single-round PCR reactions, resulting in amplification products at 9 corresponding loci. After all the amplifications are completed, the 9 amplified products from the same individual are mixed at equal molar concentrations in a single centrifuge tube. After PCR completion, the amplified products were analyzed using fluorescence capillary electrophoresis (Figure 2).
2.6. Data Processing and Analysis
Raw data were acquired from the ABI 3730xl platform and exported as .fsa files. After classification by locus, the data were imported into GeneMarker (v3.0.0; https://softgenetics.com/products/genemarker/, accessed on 21 June 2025) [24] to generate and export Excel-formatted genotype data and PDF files of genotyping peak profiles.
Genetic diversity indices for both SSR loci and populations were calculated using GenAlEx (v6.501; https://biology-assets.anu.edu.au/GenAlEx/Welcome.html, accessed on 23 June 2025) [25], including the observed number of alleles (Na), the effective number of alleles (Ne), Shannon’s information index (I), polymorphism information content (PIC), observed heterozygosity (Ho), expected heterozygosity (He), and inbreeding coefficient (F). The inbreeding coefficient was computed using the formula F = 1 − Ho/He.
Genetic distances between populations were calculated with PowerMarker (v3.25; https://en.freedownloadmanager.org/Windows-PC/PowerMarker-FREE.html, accessed on 25 June 2025) [26]. In the PowerMarker software, cluster analysis is conducted using the unweighted pair-group method with arithmetic mean (UPGMA) based on Nei’s genetic distance, and a tree diagram is generated.
Population structure of the 360 samples was analyzed using STRUCTURE (v2.3.4; https://web.stanford.edu/group/pritchardlab/structure_software/release_versions/v2.3.4/html/structure.html, accessed on 26 June 2025). The parameter K (number of hypothetical populations) was set from 1 to 20, with a burn-in period of 10,000 and 100,000 Markov Chain Monte Carlo (MCMC) iterations. Each K value was run 20 times, and the optimal ΔK value (indicating the best population stratification) was determined [27]. Visualization was generated based on results from the optimal K.
Based on population genetic structure analyses, GenAlEx was further used to assess genetic variation and differentiation within and between populations. The fixation index (Fst) and gene flow (Nm) were calculated, with gene flow determined by the formula Nm = 0.25(1 − Fst)/Fst.
3. Results
3.1. The Number and Distribution of SSR Loci
Using the MISA software, we successfully obtained the simple sequence repeats from the sample popA of A. polyzonata. Among 664,946 sequences, 798,244 SSR loci were found. The occurrence frequency (the proportion of sequences containing SSR among the total sequences) was 9.44%. Among SSR loci, there were 126,494 compound SSR loci, accounting for 15.85% of the total SSR loci. The frequency of SSR loci occurrence (the frequency of SSR loci appearing in the total sequence) was 11.33%. Among 430,110 sequences, 64.68% contained a single SSR locus. Additionally, 108,342 sequences contained more than one SSR locus, accounting for 16.29% (Table 2).
3.2. SSR Repetitive Type and Characteristics
The SSR repeat types in A. polyzonata exhibit considerable diversity, encompassing one to six nucleotide repeat motifs, though the number of SSR loci varies significantly across different nucleotide repeat types. Among these, dinucleotide repeats are the most prevalent, accounting for 47.64% of the total, followed by mononucleotide repeats at 33.34%. Tetranucleotide and trinucleotide repeats constitute 9.42% and 9.36%, respectively, while pentanucleotide and hexanucleotide repeats are relatively rare, representing only 0.20% and 0.04% of the total SSR loci (Table 3).
In terms of the repeat units within A. polyzonata SSR sequences, mononucleotide repeats are dominated by (A/T)n, with 235,187 loci accounting for 29.46% of the total. Dinucleotide repeats are primarily (AC/GT)n, comprising 212,705 loci (26.65% of the total). Trinucleotide repeats are mostly (AAT/ATT)n, with 27,379 loci making up 3.42%. Tetranucleotide repeats are predominantly (AGAT/ATCT)n, totaling 34,416 loci (4.31%). Pentanucleotide repeats are mainly (AATAT/ATATT)n, with 293 loci (0.04%). Hexanucleotide repeats are dominated by two motifs: (AAGAAT/ATTCTT)n and (ACACAG/CTGTGT)n, containing 69 and 72 loci, respectively, each accounting for approximately 0.009% of the total SSR loci (Figure 3).
Among the SSR loci in A. polyzonata, microsatellite loci with 5 to 20+ repeats are the most abundant. With the exception of mononucleotide repeats (which have a minimum of 10 repeats) and hexanucleotide repeats (which are mostly in the 5–7 repeat range), dinucleotide, trinucleotide, tetranucleotide, and pentanucleotide repeats are predominantly distributed in the 5–20+ repeat range, with dinucleotide repeats showing the highest distribution frequency (Table 4).
3.3. Primer Polymorphism Analysis
Using a mixture of DNA from 12 different geographical populations of A. polyzonata as the template (performing operations of equal-sized mixing within the same population and equal-proportion merging between different populations for the extracted DNA), 33 primers were subjected to PCR amplification, and the amplification products were analyzed using fluorescence capillary electrophoresis (Figure 4). Primers with low peak intensity, non-target interferences, and overlapping peaks were excluded. Eventually, nine pairs of SSR primers with high polymorphism and stability were selected (Table 5).
Among 360 A. polyzonata samples, the 9 primer pairs detected a total of 119 observed alleles (Na), with an average of 13.222 alleles per locus. The effective number of alleles (Ne) ranged from 3.388 to 7.856, with a mean value of 5.131. Shannon’s information index (I) varied between 1.391 and 2.371, averaging 1.867. For the nine loci, the observed heterozygosity (Ho) ranged from 0.356 to 0.598 (mean = 0.480), while the expected heterozygosity (He) spanned 0.705 to 0.873 (mean = 0.787). The polymorphism information content (PIC) values ranged from 0.662 to 0.861, with an average of 0.761, indicating that all nine selected SSR loci possess high polymorphism (PIC > 0.500). The inbreeding coefficient (F) varied from 0.310 to 0.519, with a mean of 0.390, suggesting a deficiency of heterozygotes at these loci. Hardy–Weinberg equilibrium tests revealed that all nine loci significantly deviated from Hardy–Weinberg equilibrium (p < 0.001) (Table 6).
3.4. Genetic Diversity Analysis
Quantitative parameters of genetic diversity among the 12 A. polyzonata populations are summarized in Table 7. Regarding the average number of observed alleles per population, the LA population exhibited the highest value (6.667), whereas the FC population showed the lowest (3.000). Similarly, the LA population had the highest average effective number of alleles (4.405), and the YH population displayed the lowest number of alleles (1.663). These two parameters collectively indicate that the LA population has the highest level of genetic variation. The average observed heterozygosity across populations ranged from 0.285 to 0.635, with the ST population achieving the highest value (0.635) and the YH population achieving the lowest value (0.285). Regarding average expected heterozygosity, values spanned from 0.299 to 0.749; the LA population ranked highest (0.749), while the YH population was again the lowest (0.299). The average inbreeding coefficient was 0.148, with the LA population having the highest value (0.311) and the QN population having the lowest value (0.028). These results suggest that the ST and TD populations exhibit the highest heterozygosity and genetic variation, whereas the QN and YH populations show the opposite trend. The ranking of the average Shannon index was consistent with that of the average effective number of alleles, further confirming that the LA population has the highest level of genetic diversity among all populations, while the YH population has the lowest.
3.5. Genetic Differentiation
As shown in Table 8, the Nm values among the 12 A. polyzonata populations ranged from 0.361 to 14.456. Among them, the levels of gene exchange were relatively high between the TD and LA populations (14.456) and between the LZ and LC populations (11.655). In contrast, the gene flow between the QN and YH populations, as well as most other populations, was relatively low, with average values of 0.246 and 0.268, respectively. The fixation index (Fst) ranged from 0.017 to 0.409. The smallest genetic differentiation was observed between the TD and LA populations (0.017), while the largest was found between the QN and YH populations (0.409). The degree of genetic differentiation among groups was relatively high (the Fst value averaged 0.179). The results of the molecular variance analysis (AMOVA) revealed that genetic variation between populations accounted for 27% of the total, whereas that within populations accounted for 73% (Table 9), indicating that the total genetic variation across all populations is primarily driven by variation within populations. The genetic distances among the 12 populations ranged from 0.145 to 0.733. The closest genetic distance was between the LZ and LC populations (0.145), and the farthest was between the YH and LZ populations (0.733) (Table 10).
3.6. Population Genetic Structure
Cluster analysis results showed that the QN and FC populations, as well as the YH and XD populations, initially formed independent clusters, which then merged with the ST population. In addition, the TD and LA populations clustered together, while the HZ and LN populations remained independent initially before merging with the LC and YN populations (Figure 5). As determined from the PCoA results (Figure 6), the YH and QN populations have relatively independent genetic characteristics and are quite distinct from other groups. The LN, HZ, YN, and TD populations are clustered together, while the remaining population sample points are relatively scattered, with many overlapping areas between different populations. Structural analysis of 360 samples using nine microsatellite markers revealed that the optimal K value was 2 (Figure 7), suggesting that the snail populations in Guangxi could be divided into two subpopulations (Table S5, Figure 8 and Figure S2).
4. Discussion
In recent years, with the continuous advancement of molecular research techniques, high-throughput sequencing has been widely applied to identify polymorphic SSR loci. In this study, using next-generation sequencing (NGS) based on the Illumina NovaSeq platform, a total of 798,244 SSR loci were identified from 664,946 sequences, with an occurrence frequency of 9.44%. This frequency is lower than that reported for C. cathayensis (23.92%), B. aeruginosa (13.77%), V. tricinctus (23.46%) [18], and C. chinensis (20.12%) [28], but slightly higher than that of Babylonia lutosa (6.86%) [29] and Thais luteostoma (6.45%) [30]. Variations in SSR occurrence frequency are attributed not only to species differences and sampling locations, but also to factors such as sequencing platforms, SSR mining tools, search criteria, and database richness [31,32,33]. In terms of SSR repeat types, A. polyzonata is dominated by dinucleotide repeats, accounting for 47.64%. This is consistent with findings in C. cathayensis (47.50%), B. aeruginosa (48.40%), V. tricinctus (50.40%) [18], C. chinensis (57.67%) [28], and Scapharca subcrenata (58.06%) [34], but differs from Potamocorbula ustulata (the mononucleotide repeats accounted for the highest proportion, with a value of 47.64%, and the same applies below) [35] and S. subcrenata (33.89%) [36]. Such differences may be related to species specificity, locus mutation rates, and selective evolutionary mechanisms [28,37]. Regarding SSR repeat units, A. polyzonata is primarily characterized by (AC/GT)n, which accounts for 29.46% of the total SSR loci. This aligns with previous studies on the SSR repeat sequence characteristics of A. polyzonata and four other gastropod species in Hunan Province, China [18]. The dinucleotide repeat sequences of A. polyzonata have the highest number of SSR loci, and the repetition frequency is primarily within the range of 6 to 9. This differs slightly from the repeat counts reported for C. chinensis (47 repeats) [28] and Hemifusus termatamus (730 repeats) [38]. Thus, the pattern of SSR repeat counts may be influenced by differences in the coding and non-coding regions of species [39]. Currently, most microsatellite marker studies focus on dinucleotide repeats; however, the small differences between alleles often lead to severe peak interference. In contrast, the nine SSR loci selected in this study include five tetranucleotide repeats and four trinucleotide repeats, which can effectively maintain locus polymorphism while improving genetic stability and the accuracy of result resolution [36,40,41].
At present, the polymorphic information content (PIC) is widely employed to evaluate the capacity of microsatellite markers in detecting population polymorphism [42]. In the present study, the PIC values of 9 microsatellite loci across 12 populations of A. polyzonata ranged from 0.662 to 0.861. This indicates that the developed microsatellite loci exhibit high polymorphism (PIC > 0.5) and are capable of accurately assessing genetic differences among populations [43]. The inbreeding coefficient (F) serves as a tool to measure the deviation between the observed heterozygosity and the expected heterozygosity within a population [44,45]. In this study, the F values for each locus in A. polyzonata varied from 0.310 to 0.519, suggesting a deficiency of heterozygotes (F > 0) among the 12 A. polyzonata populations. This phenomenon may be attributed to inbreeding within these populations. Notably, all nine loci of A. polyzonata showed deviations from Hardy–Weinberg equilibrium. Such deviations could arise from factors including inbreeding, natural selection, and the presence of null alleles [46]. Similar deviations of microsatellite loci from Hardy–Weinberg equilibrium have also been reported in C. cathayensis [11] and B. purificata [47].
The genetic diversity of a species is the result of the evolutionary process of the previous generations. To a certain extent, the adaptability of species to their environment is positively correlated with genetic diversity; specifically, higher genetic diversity enables species to adapt more readily to complex environmental changes [48]. Compared with fish, shrimp, and crabs, the protection, evaluation, and development of snail germplasm resources in China have received relatively less attention. Thus, further research on the genetic diversity of mussel populations such as A. polyzonata is essential for understanding the current status of mussel resources in Guangxi. Microsatellite markers are commonly used to assess the genetic diversity of organisms by examining parameters such as the number of alleles (Na), the effective number of alleles (Ne), Shannon’s information index (I), observed heterozygosity (Ho), and expected heterozygosity (He) [49]. The discrepancy between Ne and Na reflects the uniformity of allele distribution within a population: a larger difference indicates a more uneven distribution of allele frequencies, with a small number of alleles occurring at high frequencies. Conversely, a smaller discrepancy indicates a more uniform distribution of allele [50]. In this study, when the number of alleles (Na) exceeds the effective number of alleles (Ne), it indicates that the distribution of alleles within the population of this species is uneven. Furthermore, the highest Na was in LA and the lowest in FC, yet FC is surrounded by many other nearby sampling locations (QN, XD, and YH), and the nearest location to LA is TD. This situation may occur because the two populations do not share the same water system. Here, LA belongs to the Xijiang River system, while FC belongs to the system that flows directly into the sea. The Na level in the FC population is low. This might be due to the invasion of alien species and habitat destruction, which leads to a reduction in the habitat, a decrease in the size of the native species population, intensified inbreeding, and ultimately, the loss of alleles [4,12]. Observed heterozygosity (Ho) and expected heterozygosity (He) can be used to quantify the deviation of a population’s actual heterozygosity from its theoretical state, thereby reflecting the population’s genetic stability [51]. In the A. polyzonata populations examined, He values were consistently higher than Ho values, indicating potential mating among individuals with identical genotypes within the populations, leading to a certain degree of genetic similarity.
The Shannon index can, to some extent, reflect the level of genetic diversity in a species, with higher values corresponding to greater genetic diversity [18,28]. The mean Shannon index of the A. polyzonata populations in this study was 1.116. Compared with other snail species, such as C. cathayensis (1.513) [11] and Babylonia areolata (1.914) [52], the 12 A. polyzonata populations exhibited lower genetic diversity. It is speculated that inbreeding within A. polyzonata populations contributed to this reduced genetic diversity. Possible underlying causes are as follows: (1) A sharp decline in wild populations. For instance, the Luosifen industry relies heavily on harvesting wild resources, which may accelerate the depletion of snail populations [53]. (2) The invasion of P. canaliculata, which has occupied the ecological niche of native snails [54]. Currently, the analysis of the influence of these two factors is primarily based on speculative inferences derived from existing ecological knowledge and regional actual conditions, lacking direct quantitative data and statistical analysis support. Therefore, in the future, we will focus on collecting environmental data, such as fishing intensity and the distribution density of P. canaliculata, and analyze their correlations with genetic parameters through methods like Mantel tests in order to more rigorously reveal the driving factors influencing the genetic diversity of A. polyzonata. (3) A. polyzonata is primarily distributed in Southern Guangxi, including regions such as Beihai, Fangchenggang, Nanning, and Qinzhou, which lie south of the Tropic of Cancer and experience prolonged high-temperature periods [2]. Its germplasm resources have a relatively restricted distribution, and their restoration may be a slow process. Therefore, it is imperative to enhance awareness of A. polyzonata conservation in Guangxi to prevent the degradation of its germplasm resources.
In a specific ecological context, an in-depth investigation into genetic differentiation and population genetic structure is crucial for understanding the adaptability of organisms and the mechanisms underlying their persistence. The extent of gene exchange between populations is a key factor influencing the degree of genetic differentiation among them. Numerous studies have demonstrated that when gene flow (Nm) > 1, frequent gene exchange occurs between populations, resulting in relatively low genetic differentiation. This indicates that in such cases, the impact of gene flow on genetic differentiation between populations outweighs that of genetic drift. Conversely, when gene flow (Nm) < 1, the level of gene exchange between populations decreases significantly, and the degree of genetic differentiation increases, suggesting that genetic drift exerts a greater influence on genetic differentiation between populations under this scenario [55,56]. The Nm values among A. polyzonata populations range from 0.361 to 14.456, with most populations exhibiting Nm values less than 1. This suggests that in A. polyzonata populations, genetic drift and other stochastic factors have a more substantial impact on inter-population genetic differentiation than gene flow.
Genetic differentiation between populations is assessed using the fixation index (Fst). Specifically, Fst values between 0 and 0.05 indicate weak genetic differentiation, values between 0.05 and 0.15 indicate moderate differentiation, values between 0.15 and 0.25 indicate high differentiation, and values greater than 0.25 indicate extremely high genetic differentiation [57]. The degree of genetic differentiation varies among different A. polyzonata groups: differentiation between the TD and LA groups is relatively weak, as is that between the LC and LZ groups (Fst < 0.05). In contrast, the YH and HZ groups exhibit relatively high genetic differentiation from most other groups (Fst > 0.25). Nevertheless, despite the considerable geographical distance between the LC and LZ populations, their genetic differentiation is minimal, which may be attributed to “long-distance convergent evolution” driven by directed selection pressures [58]. Conversely, QN and YH populations are geographically close yet exhibit the highest level of genetic differentiation, which may be due to habitat fragmentation (artificial barriers) preventing gene flow between the two populations [59].
Results from molecular variance analysis (AMOVA) indicate that variation within A. polyzonata populations constitutes the primary source of total variation. This pattern is also observed in other Viviparidae snail species, such as C. chinensis [4] and C. cathayensis [11]. It further suggests a lack of individual migration or hybridization between these populations, meaning genetic variation cannot be homogenized through gene flow. Instead, divergent environmental selection pressures (climate and food resources) drive population differentiation, leading adaptive variations to be primarily concentrated among populations [60].
Genetic distance serves as an indicator of the genetic relationships among distinct biological populations [61]. The UPGMA dendrogram, which illustrates hierarchical clustering, reveals two primary clades, suggesting that the populations within these clades may share a relatively recent common ancestor while retaining a certain level of genetic diversity [62]. Furthermore, the two main branches of the UPGMA dendrogram do not represent the sampling geographical fitting pattern, but can be grouped according to the coastal water systems (QN, FC, ST, YH, and XD) and the inland water systems (LA, TD, LZ, YN, LC, LN, and HZ).
The results of principal coordinate analysis (PCoA) indicate that the distribution of the samples in the multi-dimensional space shows a certain similarity to the UPGMA clustering (samples such as LN, HZ, YN, and TD are clustered together). The clear separation of certain populations in the PCoA plot, such as YH and QN populations, indicates measurable genetic differentiation, whereas overlapping clusters reflect varying degrees of genetic connectivity between populations [63].
Structural analysis revealed that the optimal solution corresponds to K = 2, with significantly weaker support for more complex population structures. This suggests that A. polyzonata in the Guangxi region may comprise two distinct evolutionary lineages. The bar chart from the structural analysis provides a more intuitive illustration of this differentiation and highlights the varying compositional ratios between these two genetic clusters. We hypothesize that several factors may have contributed to the formation of this population structure: (1) Geographical barriers, such as the distribution of water systems and terrain in the Guangxi region, have restricted gene flow between populations [64], leading to the gradual accumulation of genetic differences. (2) Variations in habitat characteristics, including hydrological features, may have exerted selective pressures [65], driving genetic differentiation among populations. (3) Human activities, such as agricultural practices and urban construction, may have altered natural dispersal patterns [66,67]. Although this study, based on nine transcriptome-derived pairs of microsatellite markers, initially analyzed the genetic diversity and differentiation characteristics of different populations, further research is still needed to explore the influence of geographical micro-environments (altitude, water quality, and climate gradient) and temporal dynamics (breeding period and non-breeding period) on the genetic characteristics of the populations. Therefore, in the subsequent studies, we will build on this foundation to conduct a systematic improvement.
5. Conclusions
The findings of this study reveal that A. polyzonata is confronting a decline in genetic resources, characterized by low population genetic diversity, widespread inbreeding, significant heterozygote deficiency, high genetic differentiation among populations, and limited gene exchange. Therefore, measures ensuring the protection of its wild resources are of great significance. We should expand the breeding of this freshwater snail species to replace the use of wild resources. Fortunately, we discovered that the LA population of A. polyzonata has relatively high genetic quality, making it the preferred object for germplasm resource breeding in the Guangxi region and helping to establish a wild germplasm resource reserve. Overall, this study provides a scientific basis for the protection and sustainable utilization of A. polyzonata germplasm resources in the Guangxi region. However, in the future, it is necessary to combine mitochondrial genes and single-nucleotide polymorphisms to analyze their genetic diversity and genetic structure, and further optimize their protection strategy.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Zhang L.J. Yen Y.H. Chen Z.Y. Du L.N. Ng T.H. von Rintelen T. A new genus of river snails, Bakyietaia (Mollusca, Viviparidae), from South China and the Indochinese Peninsula Eur. J. Taxon.2025100516410.5852/ejt.2025.1005.2985 · doi ↗
- 2Zhang S. Zhou K. Pan X. Yang Y. Zhang C. Peng J. Li W. Wang D. Analysis of nutrient constituents and flavor substances in gastropod and visceral mass of Angulyagra polyzonata Freshw. Fish.2024548795
- 3Yao J. Wen L. Ling L. Wan P. Wang R. Guan W. Wang Q. Chen D.-W. Analysis of key aroma-active compounds in cooked river snail (Sinotaia quadrata) meat Appl. Food Res.2025510073610.1016/j.afres.2025.100736 · doi ↗
- 4Wei X. Zhou K. Zou X. Zhang X. Li Y. Luo H. Huang Y. Du X. Qin J. Chen Z. Microsatellite analyses reveal genetic diversity and population structure of Cipangopaludina chinensis in Guangxi, China Aquac. Rep.20254010264510.1016/j.aqrep.2025.102645 · doi ↗
- 5Duan B. Kang T. Wan H. Liu W. Zhang F. Mu S. Guan Y. Li Z. Tian Y. Kang X. Microsatellite markers reveal genetic diversity and population structure of Portunus trituberculatus in the Bohai Sea, China Sci. Rep.202313866810.1038/s 41598-023-35902-137248314 PMC 10227030 · doi ↗ · pubmed ↗
- 6Liu F. Qu Y.-K. Geng C. Wang A.-M. Zhang J.-H. Li J.-F. Chen K.-J. Liu B. Tian H.-Y. Yang W.-P. Analysis of the population structure and genetic diversity of the red swamp crayfish (Procambarus clarkii) in China using SSR markers Electron. J. Biotechnol.202047597110.1016/j.ejbt.2020.06.007 · doi ↗
- 7Carneiro Vieira M.L. Santini L. Diniz A.L. Munhoz C.d.F. Microsatellite markers: What they mean and why they are so useful Genet. Mol. Biol.20163931232810.1590/1678-4685-GMB-2016-002727561112 PMC 5004837 · doi ↗ · pubmed ↗
- 8Liu W. Guo G. Mi C. An Overview of Transcriptionomics Research Techniques and Their Applications Biol. Teach.20194425
