Whole-genome simple sequence repeat development and genetic diversity analysis of sponge gourd (Luffa cylindrica)
Rongjing Cui, Zhu Wang, Li Jia, Yongmei Miao, Congsheng Yan, Ming Qian, Yingjie Shu, Kaijing Zhang

TL;DR
This study develops genome-wide SSR markers for sponge gourd and analyzes its genetic diversity, revealing high diversity and identifying India as a key genetic center.
Contribution
The first comprehensive whole-genome SSR marker development and genetic diversity analysis for Luffa cylindrica using the P93075 genome.
Findings
128,557 SSR loci identified in the sponge gourd genome with a 75.32% polymorphism rate in tested markers.
High genetic diversity observed in 67 global germplasms, with India identified as the core genetic diversity center.
Three distinct genetic groups were identified, correlated with geographical origin.
Abstract
Sponge gourd (Luffa cylindrica) is a versatile economic crop with nutritional, medicinal, and industrial value, but its genetic research has long been limited by a lack of stable molecular markers. Although high-quality genomic data of the sponge gourd line P93075 is available, no comprehensive whole-genome simple sequence repeat (SSR) marker development and systematic genetic diversity analysis based on this genome have been reported to date. This study first developed SSR markers using the high-quality L. cylindrica P93075 genome as reference. Using a microsatellite identification tool (MISA), 128,557 genome-wide SSR loci were identified, with a density of 195.91 SSRs/Mb; dinucleotide repeats (41.25% of total loci) were the dominant type. Candidate polymorphic SSR markers were initially screened via TBtools, resulting in 8,557 potential polymorphic markers. These markers showed ≥3 bp…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5- —National Natural Science Foundation of China
- —Anhui Provincial Outstanding Youth Research Project
- —Key Discipline Construction Fund for Crop Science of Anhui Science and Technology University
- —College Students’ Innovative Entrepreneurial Training Plan Program
- —Talent Foundation of Anhui Science and Technology University
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvances in Cucurbitaceae Research · Chromosomal and Genetic Variations · Seed and Plant Biochemistry
Introduction
Luffa is an economically significant cucurbit crop distributed worldwide, with India widely recognized as its primary center of domestication based on historical records and existing research. As a thermophilic vegetable, it has uniquely lobed leaves and produces slender green fruits. Its cultivation is mainly concentrated in India, China, Thailand, Central America, and Africa (Rabei, Rizk & Khedr, 2013; Wu et al., 2014; Sheikh, Islam & Himel, 2024). The genus includes two primary cultivated species: the ridged gourd Luffa acutangula (L.) Roxb. (Marr, Bhattarai & Xia, 2005) and the sponge gourd Luffa cylindrica (L.) Roem. (Marr, Xia & Bhattarai, 2005). Among these, sponge gourd is a highly versatile economic crop with a chromosome number of 2n = 2x = 26, boasting notable nutritional, medicinal, and industrial value. Its fruit is particularly rich in dietary fiber and proteins, endowing it with significant nutraceutical potential (Prakash et al., 2014b). The tender green fruits of sponge gourd can be eaten raw like cucumbers or cooked for consumption like pumpkins (Maamoun, El-Akkad & Farag, 2021). Moreover, extracts derived from the fruit have been demonstrated to exhibit pharmacological properties such as anticancer and anti-inflammatory activities, which provide a robust scientific basis for its application in the pharmaceutical industry (Sharma, Rawat & Goel, 2015; Jadhav et al., 2013). Beyond its health-related benefits, sponge gourd also plays a vital role in industry. When mature fruits are peeled and deseeded, they can be processed into luffa sponges, which are materials valued for their eco-friendly characteristics. These porous fibrous structures are widely utilized in cleaning, precision filtration, and the development of bio-based composite materials (Anastopoulos & Pashalidis, 2020).
Simple sequence repeats (SSRs, also known as microsatellites) are tandemly repeated DNA sequences composed of 1–6 nucleotide units, widely distributed in prokaryotic and eukaryotic genomes. Compared with other molecular markers such as random amplified polymorphic DNA (RAPD) and inter-simple sequence repeats (ISSR), SSRs offer significant advantages: uniform genomic distribution, high specificity, abundant polymorphism information content (PIC), and codominant inheritance that conforms to Mendel’s laws of inheritance. These characteristics make them highly valuable for DNA fingerprinting construction (Zhang et al., 2024), crop variety purity identification (Selvakumar et al., 2010), and gene mapping (Hamwieh et al., 2005). To date, SSR markers have been widely applied in various crops, including garlic (Li et al., 2022), sunflower (Ahmed et al., 2022), pomegranate (Wang et al., 2023), and mango (Kassa et al., 2025).
In cucurbit crops, genomic resources have been reported for cucumber (Huang et al., 2009), watermelon (Guo et al., 2013), melon (Garcia-Mas et al., 2012), and wax gourd (Xie et al., 2019), facilitating SSR marker development (Zhu et al., 2016b, 2016a; Pandey et al., 2021; Hu et al., 2022), and subsequent genetic diversity analysis and gene mapping (Lu et al., 2018; Alhariri et al., 2021; Wang et al., 2020). In contrast, genome-wide SSR marker development remains unexplored in sponge gourd, resulting in a lack of species-specific SSR resources and relative lag in molecular breeding progress. With growing market demand for sponge gourd, genetic improvement has become a critical priority in breeding programs. In the field of luffa genetic research, existing molecular marker resources have provided a preliminary foundation. For example, RAPD (Hoque & Rabbani, 2009), ISSR (Prakash et al., 2014a), directed amplification of minisatellite-region DNA (DAMD) (Misra et al., 2017), and single nucleotide polymorphisms (SNPs) (Perez et al., 2021) have all been applied in population structure analysis and genetic diversity assessment. However, these markers have inherent limitations: ISSR exhibits relatively low reproducibility and reliability, RAPD and DAMD lack stable reproducibility, and SNPs require high development and genotyping thresholds (Amom et al., 2020; Bidyananda et al., 2024). In addition, most markers used in these studies are not developed based on the whole genome, and some are directly adopted from previously published reports, resulting in a limited number of available species-specific markers. Moreover, previous genetic diversity studies on sponge gourd have yielded contradictory results due to variations in germplasm populations or incomplete species-specific datasets (Prakash et al., 2014a; Kumar, Pandit & Pathy, 2019), highlighting the need for more reliable and comprehensive molecular tools. Given that genome-wide SSRs are ideal for constructing high-density genetic maps, mining functional genes, and conducting genome-wide association studies (GWAS), filling this genome-wide SSR marker development gap is critical to advancing sponge gourd molecular breeding and resolving inconsistencies in its genetic diversity studies.
Against this backdrop, driven by the rapid advancement of genome sequencing technologies, studies on sponge gourd genomics have progressed by leaps and bounds, with three high-quality sponge gourd genome assemblies successively released in recent years (Zhang et al., 2020; Wu et al., 2020; Pootakham et al., 2021). Among these published genomes, the assembly of sponge gourd accession P93075 stands out for its exceptional quality and is widely recognized as a high-quality reference genome for related research. Despite this progress, the development of SSR molecular markers based on the P93075 genome, as well as associated genetic diversity studies using these markers, have not yet been reported. Therefore, this study aims to develop SSR markers using the P93075 genome as a reference and to evaluate the genetic diversity of 67 globally sourced sponge gourd germplasm accessions. The results of this research are expected to advance sponge gourd molecular breeding, lay a solid foundation for the improvement and utilization of sponge gourd varieties, and hold significant implications for both fundamental genetic research and practical breeding applications in this crop.
Materials and Methods
Materials
A total of 67 sponge gourd germplasm accessions were used for the development and validation of SSR molecular markers. Among these, 62 accessions belonged to the PI (Plant Introduction) series, which were sourced from the United States Department of Agriculture (USDA) National Plant Germplasm System (NPGS) (official query link: https://npgsweb.ars-grin.gov/gringlobal/search). Corresponding information of these PI accessions, including taxonomic classification and geographic origin, can be retrieved by searching their respective PI numbers through this official database. These PI accessions originated from diverse geographical regions, including China, India, the United States, and Canada. The remaining five accessions, namely S1, S10, S58, SP55, and SP48, were provided by the Horticultural Crop Breeding, Cultivation, and Comprehensive Utilization Research Team of Anhui Science and Technology University. Detailed information on germplasm accessions, taxonomic classification, and geographic origins (with city/province-level precision where available) is presented in Table 1, while Table S1 summarizes the number of accessions per source country and their corresponding ecological backgrounds.
Table 1: Geographic origin and taxonomic information of 67 sponge gourd germplasm accessions.
SSR locus mining
The genome assembly file of sponge gourd accession P93075 was downloaded from the Cucurbit Genomics Database version 2 (Yu et al., 2023). SSR loci across the 13 chromosomes of sponge gourd were identified using Microsatellite identification tool (MISA) (Beier et al., 2017) with the following filtering criteria: mononucleotide repeats ≥20 repeat units, dinucleotide repeats ≥6 repeat units, and trinucleotide, tetranucleotide, pentanucleotide, and hexanucleotide repeats ≥5 repeat units each. Incomplete SSRs interrupted by <100 base pairs (bp) were included in the analysis. The identified SSR loci were collated and recorded using Microsoft Excel software.
SSR primer design
For SSR primer design, 150 bp sequences upstream and downstream of each identified SSR locus were retrieved as flanking regions, with the expected length of polymerase chain reaction (PCR) amplification products set to 100–300 bp. Primers were batch-designed using Primer3 software (Rozen & Skaletsky, 2000) with the following parameters: primer length 18–25 bp (optimal: 23 bp), annealing temperature 57–63 °C (optimal: 60 °C), GC content 40–60%, and limits on self-complementarity (≤8 bp) and primer pair complementarity (≤12 bp). To improve polymorphism efficiency of SSR markers, additional screening was performed using the Primer Check plugin in TBtools software (Chen et al., 2023) combined with electronic PCR (e-PCR) analysis based on the genomic sequences of two published sponge gourd genomes (P93075 and SG2019) (Wu et al., 2020; Zhang et al., 2020). Detailed procedures were as follows: genome files of P93075 and SG2019 were added to the project management module; primers were input in Fasta format; the Gellmage option was selected, and e-PCR images were output in PNG format. Primers producing identical amplification products or non-specific amplification across the two genomes were excluded. Finally, SSR molecular markers were selected for subsequent experiments based on two criteria: (1) amplification fragment length difference ≥15 bp between the two genomes, and (2) even distribution across the 13 chromosomes of sponge gourd.
DNA extraction
Young leaves of each sponge gourd accession were placed into 2 mL sterile centrifuge tubes (1/3 volume) with one 5 mm zirconia bead, immediately frozen in liquid nitrogen, and ground using a pre-cooled 48-well plate on an LC-TC-24 tissue grinder (Shanghai Lichen Bangxi, China) at 35 Hz for 40 s. Right after grinding, 1 mL pre-heated (65 °C) 2% CTAB buffer (containing 2% PVP40) and 20 μL β-mercaptoethanol were added, vortexed, and incubated at 65 °C for 1 h (inverted every 10 min). After cooling to room temperature, equal volume (≈1 mL) of phenol:chloroform:isoamyl alcohol (25:24:1, v/v/v) was added, gently inverted for 5 min, and centrifuged at 12,000 rpm for 5 min. The upper aqueous phase was transferred to a new tube, re-extracted with equal volume of chloroform:isoamyl alcohol (24:1, v/v), and centrifuged under the same conditions. DNA was precipitated by filling the tube with pre-cooled (−20 °C) absolute ethanol, incubated at −20 °C for ≥30 min, and collected by centrifugation at 8,000 rpm (4 °C) for 5 min. The pellet was washed with 500 μL pre-cooled 70% ethanol, centrifuged at 10,000 rpm (4 °C), air-dried for 10 min, and dissolved in 40 μL sterile ddH_2_O. DNA was stored at 4 °C (short-term) or −80 °C (long-term) (Doyle & Doyle, 1987).
SSR primer polymorphism screening
To screen for polymorphic SSR primers, genomic DNA from six sponge gourd accessions (S1, S10, S58, SP55, SP48, and PI 163295) was subjected to PCR amplification. These accessions were chosen to capture broad genetic variation within the species. The 10 μL SSR PCR reaction system consisted of 1.0 μL genomic DNA (50 ng/μL), 1.0 μL each of forward and reverse primers (10 μmol/L), 5.0 μL PCR master mix (containing Taq DNA polymerase, dNTPs, and buffer), and 2.0 μL nuclease-free ddH_2_O. PCR amplification conditions were: pre-denaturation at 95 °C for 5 min; 34 cycles of denaturation at 95 °C for 30 s, annealing at 55–60 °C for 30 s (adjusted per primer), and extension at 72 °C for 30 s; final extension at 72 °C for 10 min. PCR products were stored at 4 °C and subjected to polyacrylamide gel electrophoresis (PAGE) within 24 h.
Amplified products were separated and detected by non-denaturing polyacrylamide gel electrophoresis (PAGE). Briefly, a non-denaturing polyacrylamide gel solution (final concentration 9%) was prepared with 30 mL distilled water, 4.5 mL 10 × TBE buffer, 11.3 mL 30% polyacrylamide (acrylamide:bisacrylamide = 29:1), 60 μL N,N,N′,N′-tetramethylethylenediamine (TEMED), and 250 μL 10% ammonium persulfate (APS), then poured into assembled glass plates for polymerization at room temperature for 30 min. After submerging the gel in 1 × TBE buffer, 2.5 μL of 50 bp DNA Ladder (for PAGE) was loaded into the first well, and 3.5 μL of PCR products were added to the remaining wells sequentially. Electrophoresis was conducted at a constant voltage of 200 V for 90 min. The gel was stained with silver nitrate for 8 min, rinsed briefly with distilled water, and developed in a sodium hydroxide-formaldehyde solution until clear bands appeared, followed by observation and photography using a gel imaging system (Xu et al., 2025). Only SSR primers that produced clear, distinct amplification bands, exhibited high polymorphism, and showed good stability in repeated amplifications were selected for subsequent analysis. Based on the physical positions of the SSR markers developed in this study and the length of each chromosome in the sponge gourd P93075 genome, the above information was imported into MapChart 2.32 software to construct a physical map (Voorrips, 2002).
Genetic diversity analysis
The amplification band patterns of SSR primers were scored using the manual band-reading method. Clear and reproducible bands at the same electrophoretic position were recorded as “1”, absent bands as “0”, and ambiguous results as “9”. An initial “0/1” binary data matrix was constructed using Microsoft Excel. Genetic diversity analysis was performed with PopGene32 software (Yeh, Yang & Boyle, 1999) to calculate parameters including Gene Diversity, Number of observed alleles (Na), Number of effective alleles (Ne), Observed heterozygosity (Ho), Expected heterozygosity (He), and Shannon-Weaver diversity index (I). PIC was computed using Power Marker 3.25 (Liu & Muse, 2005), and genetic distances were determined via Nei’s method. Cluster analysis was conducted using the Unweighted Pair Group Method with Arithmetic Mean (UPGMA), and Molecular Evolutionary Genetics Analysis (MEGA) software (Tamura, Stecher & Kumar, 2021) was used to construct the corresponding dendrogram. Population structure of the 67 sponge gourd germplasms was analyzed using Structure 2.3.4 software (Pritchard, Stephens & Donnelly, 2000), with K values set from 1 to 10 (10 independent runs each) and Markov Chain Monte Carlo (MCMC) parameters: 100,000 burn-in iterations followed by 200,000 additional iterations. The optimal K value was determined via the online platform StructureSelector (https://lmme.ac.cn/StructureSelector/index.html), and replicate run results corresponding to this optimal K value were integrated using CLUMPP software (Jakobsson & Rosenberg, 2007) for final visualization of the population genetic structure plot in R. GenAlEx 6.5 (Smouse & Peakall, 2012) was used for data format conversion, and Principal Coordinates Analysis (PCoA) was performed in R to further explore genetic structure and differentiation of the germplasm population.
Results
Comparison of three published sponge gourd genomes for SSR marker development
As shown in Table 2, a comprehensive comparison of three published sponge gourd genomes was conducted: P93075 (Wu et al., 2020), SG2019 (Zhang et al., 2020), and SO-3 (Pootakham et al., 2021). Among these, the P93075 genome exhibited distinct advantages across multiple key metrics. In terms of sequencing technology, P93075 adopted a multi-platform strategy integrating Illumina, 10× Genomics, PacBio, and Hi-C. This combination is more conducive to capturing comprehensive genomic information compared to the Illumina + PacBio + Hi-C approach used for the SG2019 genome and the single PacBio technology applied for the SO-3 genome. The diverse data sources of P93075’s sequencing strategy potentially enhance the resolution of complex genomic regions (e.g., repetitive or heterozygous regions). Regarding assembly size, the P93075 genome has an assembly size of 656.19 Mb, compared to 669.71 Mb for the SG2019 genome and 689.87 Mb for the SO-3 genome. Differences in assembly size among the three genomes may stem from variations in sequencing depth, assembly algorithms, or handling of repetitive sequences. The moderate assembly size of P93075 likely reflects a balanced sequencing and assembly approach, reducing the risk of over-assembly or missing segments. In terms of assembly continuity, P93075 contains only 332 scaffolds, significantly fewer than the 798 scaffolds of the SG2019 genome and 3570 scaffolds of the SO-3 genome. Fewer scaffolds directly indicate higher assembly continuity. Additionally, P93075 achieved a Contig N50 of 8.80 Mb, much higher than the 4.82 Mb of the SG2019 genome; its Scaffold N50 (48.76 Mb) is comparable to that of the SG2019 genome (48.66 Mb), and both far exceed the 578.62 kb of the SO-3 genome. These metrics collectively confirm the superior assembly precision of P93075. For annotation quality, P93075 obtained the highest BUSCO completeness score (95.5%), outperforming the SG2019 genome (91.6%) and SO-3 genome (93.0%), which demonstrated more complete annotation of the core eukaryotic gene set. Although the number of predicted protein-coding genes in P93075 (25,508) is lower than in the other two genomes, it has the highest repeat sequence content (63.81%), which surpasses that of the SG2019 genome (62.18%) and the SO-3 genome (56.78%). Higher repeat sequence content in P93075 suggests more comprehensive coverage and annotation of genomic structural elements, which is critical for identifying SSR loci (often derived from repetitive sequences).
Table 2: Comparison of key characteristics of three published sponge gourd genomes.
In summary, considering its advanced sequencing strategy, moderate and reliable assembly size, high assembly continuity and precision, and excellent annotation quality, the P93075 genome was identified as the most suitable reference for SSR molecular marker development in sponge gourd.
Identification and characterization of SSR loci in the whole genome of sponge gourd P93075
In this study, a comprehensive search of the sponge gourd P93075 genome identified a total of 128,557 SSR loci distributed across 13 chromosomes, with a density of 195.91 SSRs per megabase (Mb). Among these loci, 13,415 contained more than one SSR (i.e., compound SSRs), and 11,361 complex SSRs were detected. The minimum distance between adjacent SSR loci was 100 bp (Table S2). SSR repeat motifs ranged from mononucleotides to hexanucleotides, with the following distribution: 40,572 mononucleotides, 53,028 dinucleotides, 28,322 trinucleotides, 4,886 tetranucleotides, 1,202 pentanucleotides, and 547 hexanucleotides, where dinucleotide repeats accounted for the highest proportion of total SSR loci (41.25%), followed by mononucleotide repeats (31.56%) and trinucleotide repeats (22.03%), while tetranucleotide, pentanucleotide, and hexanucleotide repeats were relatively rare, representing 3.8%, 0.93%, and 0.43% of the total SSR loci, respectively. Analysis of the repeat copy number distribution revealed distinct patterns across motif types: mononucleotide repeats exclusively occurred with >12 copies (all 40,572 loci); dinucleotide repeats were most abundant at six copies (18,171 loci); trinucleotide repeats were predominantly concentrated in five copies (15,055 loci); tetranucleotide, pentanucleotide, and hexanucleotide repeats were most frequent at five copies (3,492, 897, and 380 loci, respectively) with sparse distribution in higher copy numbers. Overall, >12 copies were the most common (39.39%), followed by six copies (20.03%) and five copies (15.42%), while 7–11 and 12 copies accounted for 1.31–9.65% (Table 3).
Table 3: Distribution of SSR repeat units and repeat number in sponge gourd P93075.
Based on the physical positions of the SSR loci, we mapped the distribution of SSR loci with different repeat types across the P93075 genome of sponge gourd (Fig. 1A). No apparent correlation was observed between chromosome physical length and SSR locus density. The 13 chromosomes ranged in size from 42.17 to 55.64 Mb, with SSR densities varying between 183.37 and 224.47 SSRs/Mb. Although the longest chromosome (Chr04, 55.64 Mb) contained the most SSR loci, its density (212.03 SSRs/Mb) only ranked fourth. In contrast, the shortest chromosome (Chr01, 42.17 Mb) showed the highest SSR density (224.47 SSRs/Mb). These results demonstrated conclusively that SSR distribution density was not directly related to chromosome size. Moreover, the proportion of dinucleotide SSR loci was highest across all chromosomes, followed by mononucleotide and trinucleotide repeats, while tetranucleotide, pentanucleotide, and hexanucleotide repeats were found in lower frequencies. Chromosome 4 (Chr04) contained the highest number of SSR loci (11,798). Chromosome 11 (Chr11) ranked second with 10,531 SSR loci, showing comparable numbers to Chromosome 6 (Chr06, 10,509) and Chromosome 3 (Chr03, 10,260). The SSR counts were similar between Chromosome 5 (Chr05, 9,922) and Chromosome 9 (Chr09, 9,968), while Chromosome 13 (Chr13, 8,676) exhibited the lowest SSR density (Fig. 1B; Table S3).
Distribution of SSR motifs in the reference genome of sponge gourd P93075.(A) Circos plot showing the genome-wide distribution of SSR motifs across the chromosomes of sponge gourd. Tracks from outer to inner represent: (a) sponge gourd chromosomes; (b) gene density; (c–h) distributions of mono-, di-, tri-, tetra-, penta-, and hexa-nucleotide SSR repeats, respectively. (B) Bar chart depicting the number of SSR loci in each chromosome of sponge gourd.
Types and occurrence frequencies of SSR repeat motifs in the whole genome of sponge gourd P93075
A total of 218 distinct SSR repeat motifs were detected in the whole genome of sponge gourd P93075. These motifs covered all six repeat unit types, ranging from mononucleotides to hexanucleotides (Table 4). Among these, mononucleotide repeats exhibited the lowest diversity, with only two motif types. The A/T motif was dominant, accounting for 31.18% of the total SSR loci (40,078 loci), whereas the C/G motif was relatively rare and represented merely 0.38% (494 loci). Dinucleotide repeats included four motif types, and the AT/AT motif had the highest proportion (25.24%, 32,446 loci). This value was much higher than the sum of the other three dinucleotide motifs: AC/GT (2.34%, 3,012 loci), AG/CT (13.61%, 17,491 loci), and CG/CG (0.06%, 79 loci). Trinucleotide repeats consisted of 10 motif types, among which AAT/ATT (12.55%, 16,142 loci) and AAG/CTT (6.57%, 8,444 loci) were the most frequent. The remaining eight trinucleotide motifs (AAC/GTT, ACC/GGT, ACG/CGT, ACT/AGT, AGC/CTG, AGG/CCT, ATC/ATG, and CCG/CGG) each had a frequency of less than 1% (ranging from 0.06% to 0.72%), with corresponding locus counts between 72 and 931. Thirty tetranucleotide repeat motifs were identified, and AAAT/ATTT was the most abundant (2.19%, 2,812 loci) and represented the dominant tetranucleotide type. The remaining 27 tetranucleotide motifs collectively accounted for 1.03% of total SSRs (1,327 loci). Fifty-four pentanucleotide repeat motifs were detected, among which AAAAT/ATTTT was the most frequent (0.25%, 323 loci). Two other relatively common pentanucleotide motifs were AAATG/ATTTC (0.11%, 146 loci) and AAAAG/CTTTT (0.19%, 238 loci). The remaining 51 pentanucleotide motifs together contributed 0.38% of total SSRs (495 loci), and each individual motif accounted for less than 0.1%. Hexanucleotides had the highest diversity, with 118 distinct motifs, but all exhibited low frequencies. The two most common hexanucleotide motifs were AAAAAG/CTTTTT (0.07%, 96 loci) and AAAAAT/ATTTTT (0.06%, 61 loci). The remaining 116 motifs collectively represented only 0.30% of total SSRs (390 loci). The combined frequency of all hexanucleotide motifs was less than 0.5%. Overall, with increasing repeat unit length from mononucleotides to hexanucleotides, the number of SSR motif types increased significantly (from 2 to 118), whereas the frequency of each individual motif decreased substantially.
Table 4: SSR repeat motif types and their frequency in the whole genome of sponge gourd P93075.
Genome-wide SSR marker development and polymorphism validation in sponge gourd P93075
Based on the genome-wide SSR locus information of sponge gourd P93075, SSR primers were designed in batches using Primer3 software, resulting in a total of 115,588 SSR primer pairs. Following e-PCR screening, 69,857 SSR primer pairs with potential polymorphism were initially identified. The Primer Check plugin in TBtools software was then used to evaluate these candidate primers against two previously published sponge gourd genomes, P93075 and SG2019 (Wu et al., 2020; Zhang et al., 2020) (Fig. 2A). Primers that produced identical amplification products or exhibited non-specific amplification were excluded. This yielded 8,557 polymorphic, locus-specific SSR markers, which exhibited a minimum difference of ≥3 bp in amplified fragment length between the two sponge gourd genomes (Wu et al., 2020; Zhang et al., 2020) (Table S4). To evaluate marker performance, a total of 308 SSR markers were evenly selected from 13 chromosomes of the P93075 genome. The results of physical map construction confirmed that these markers exhibited a uniform distribution pattern across the whole genome (Fig. S1). PCR amplification was conducted on six sponge gourd accessions, and amplification products were separated and detected via PAGE. Of these 308 primer pairs, 232 yielded clear amplification bands and demonstrated good polymorphism, corresponding to a polymorphism rate of 75.32%. Fifteen markers with distinct bands and stable polymorphism were further randomly selected from these 232 validated primer pairs (Table S5). These 15 SSR markers were subsequently used for amplification in a larger panel of 67 sponge gourd accessions. All 15 markers produced clear polymorphic bands, which confirms their reliability for application in genetic diversity analyses (Fig. 2B).
Screening of polymorphic SSR molecular markers in sponge gourd.(A) Screening of primers with good specificity and polymorphism in two sponge gourd genomes using Primer Check plugin in TBtools software. (B) PAGE test of some SSR primers in sponge gourd materials.
Genetic diversity analysis of 67 sponge gourd germplasms
Genetic diversity of 67 sponge gourd germplasms was analyzed using the 15 validated SSR markers, which resulted in the detection of 83 total alleles. For the 15 markers, the number of Na ranged from 3 to 7 with an average of 5.5333, the number of Ne varied from 2.2845 to 6.5172 with an average of 3.9810, and the C2-2064 locus exhibited the highest Ne (6.5172), followed by C1-4705 (5.2911) and C6-4054 (5.1127). This indicated these primer-targeted loci possessed relatively high genetic variation and thus higher efficiency for capturing diversity. Additional genetic parameters were calculated and summarized in Table 5. Specifically, Ho spanned from 0.0462 (C10-1184) to 1.0000 (e.g., C1-4705, C5-3356, C6-4054) with an average of 0.5586. He ranged from 0.5665 (C3-1009) to 0.8533 (C2-2064) with an average of 0.7323. Gene diversity varied from 0.5623 (C3-1009) to 0.8466 (C2-2064) with an average of 0.7267. The I ranged from 0.9533 (C3-1009) to 1.9073 (C2-2064) with an average of 1.4618. PIC values ranged from 0.5010 (C3-1009) to 0.8272 (C2-2064) with an average of 0.6838. An average PIC value of 0.6838 confirmed the SSR markers developed in this study exhibited high polymorphism (a standard threshold for high polymorphism: PIC > 0.5), which made them suitable for evaluating the genetic diversity of the 67 sponge gourd germplasms.
Table 5: Genetic diversity parameters of 67 sponge gourd germplasms based on 15 SSR markers.
Population genetic structure analysis of 67 sponge gourd germplasms
The genetic distances among the 67 sponge gourd germplasm samples were calculated using Nei’s method (Table S6). The genetic distances ranged from 0.1281 to 0.9286, with an average of 0.6009. The lowest genetic distance (0.1281) was observed between PI 438851 and PI 438852, suggesting that these two samples were genetically closely related. Conversely, the highest genetic distance (0.9286) was found between PI 163295 and PI 381897, as well as between PI 250160 and PI 381875, indicating that these pairs of genotypes were more genetically divergent. Using the UPGMA, the 67 sponge gourd germplasms were grouped into three main clusters. Cluster I contained 17 sponge gourd germplasms from five different countries, including China, Zambia, Mexico, Costa Rica, and the United States. Cluster II consisted of 27 sponge gourd germplasms, predominantly from India, including PI 381888, PI 381889, and PI 381891. Cluster III included 23 sponge gourd germplasms, with 17 germplasms originating from India and six germplasms from other countries: PI 167337 and PI 171706 from Turkey, PI 250160 from Pakistan, PI 286425 from Nepal, PI 483322 from Australia, and PI 419141 from China (Fig. 3).
UPGMA dendrogram showing clustering of 67 sponge gourd germplasm accessions.
Based on the genotypic data from the 15 SSR molecular markers, the genetic population structure of the 67 sponge gourd germplasms was further analyzed using Structure 2.3.4 software. The analysis of K values ranging from 1 to 10 revealed that the ∆K value was maximized when K = 3 (Fig. 4A). This indicated that the 67 sponge gourd germplasm samples could be classified into three distinct clusters. Although no clear ecological differentiation pattern was observed, it can be inferred that there was some degree of diffusion and intermingling of subpopulations (Fig. 4B).
Population genetic structure analysis of 67 sponge gourd germplasms using Structure software.(A) Variation of ∆K value with the number of subpopulations (K). (B) Genetic structure of 67 sponge gourd germplasms at K = 3.
PCoA was performed on the SSR data to further assess the phylogenetic relationships among the sponge gourd varieties. In the two-dimensional PCoA plot, the 67 sponge gourd germplasms were similarly grouped into three clusters, which corresponded with the clustering observed in both the UPGMA dendrogram and the population structure analysis conducted using Structure 2.3.4. The horizontal and vertical axes of the PCoA explained 10.48% and 8.14%, respectively, of the variance in the data, offering an alternative view of the genetic diversity and relationships among the samples (Fig. 5). Notably, the consistent clustering results obtained via UPGMA, Structure, and PCoA analyses are derived from the polymorphism of the 15 SSR markers employed, thus reflecting marker-based genetic resolution rather than comprehensive genome-wide divergence.
Principal coordinate analysis (PCoA) of 67 sponge gourd germplasms based on 15 polymorphic SSR markers.
Discussion
SSR molecular markers are among the most widely used tools in genetic research and have been applied in diverse fields, including germplasm resource identification (Zhao et al., 2019), molecular breeding (Gonzaga et al., 2015), and genetic diversity analysis (Mukuze et al., 2020). Unlike morphological and physiological markers, SSR markers are not affected by environmental factors, making them a more reliable approach for accelerating the identification of optimal progeny (Ovesná, Poláková & Leišová, 2002). Their polymorphism, stability, and reproducibility are critical for the success of genetic studies (Liu et al., 2017; Selkoe & Toonen, 2006). To date, researchers have used SSR markers to investigate genetic variation across varieties of multiple species, such as wheat (Triticum aestivum L.) (Farhangian-Kashani et al., 2021), pumpkin (Cucurbita spp.) (Nyabera et al., 2021), apple (Malus Mill.) (Wang et al., 2024), and chayote (Sechium edule) (Cheng et al., 2024), walnut (Juglans regia L.) (Xue et al., 2025).
In the early stages of sponge gourd research, the lack of whole-genome sequence information limited the development of its molecular markers, resulting in relatively lagging progress in this area. Notably, this study is the first to develop genome-wide SSR markers for Luffa cylindrica based on a high-quality reference genome, which fills the gap in previous transcriptome-based marker development efforts. The selection of the Luffa cylindrica P93075 genome as the reference for SSR marker development in this study was justified by its status as the optimal sponge gourd genome available. As supported by our results (Table 2), the P93075 genome exhibits superior quality characteristics, including high assembly continuity such as longer scaffold N50, low sequence redundancy, and comprehensive coverage of both coding and non-coding regions, compared to previously reported genomes such as SG2019 (Zhang et al., 2020) and SO3 (Pootakham et al., 2021). This optimal genome quality ensured that the SSR loci identified herein were more accurate and representative of the entire sponge gourd genome, which reduces the risk of false-positive or incomplete SSR locus prediction that may occur with lower-quality reference genomes.
Based on this high-quality P93075 genome of sponge gourd, we identified a total of 128,557 SSR loci using MISA software, and this number is far more than the SSR loci reported in previous transcriptome-based studies (Wu et al., 2014). The sponge gourd genome harbors abundant SSRs in both quantity and repeat motif diversity, though the distribution of SSRs across different repeat motif categories varies substantially. Using the high-quality P93075 genome assembly (656.19 Mb, Contig N50 = 8.80 Mb; Table 2), our study characterized these 128,557 genome-wide SSR loci. This quantity substantially exceeds the 21,249 and 21,303 SSR loci reported for the SG2019 and SO3 genomes, respectively, by Cui et al. (2022), and the discrepancy primarily reflects the superior assembly completeness and sequence accuracy of the P93075 reference genome. Importantly, all three genomes share a conserved SSR distribution pattern: AT/AT dinucleotide repeats are particularly abundant, indicating their evolutionary conservation as dominant repeat motifs in Luffa cylindrica. This dominance of dinucleotide repeats is a conserved characteristic across many Cucurbitaceae crops, as reported in cucumber and watermelon. In terms of SSR quantity and density, the 128,557 SSR loci (195.91 SSRs/Mb) identified in sponge gourd are slightly higher in quantity than cucumber (112,073 loci, 551.9 SSRs/Mb) (Cavagnaro et al., 2010) and significantly higher than watermelon (39,523 loci, 111 SSRs/Mb) (Zhu et al., 2016b), with a density intermediate between the two. This difference may be attributed to variations in genome size and repeat sequence composition among cucurbit species.
PIC is a key metric for evaluating the polymorphism level of genetic markers (Chesnokov & Artemyeva, 2015). According to the standard PIC classification criterion, marker polymorphism is categorized as high (PIC > 0.5), moderate (0.25 < PIC < 0.5), or low (PIC < 0.25) (Vaiman et al., 1994). In this study, 232 polymorphic SSR markers were initially validated from 308 evenly distributed candidates, and 15 markers were selected for subsequent genetic diversity analysis due to their distinct bands, high stability, and excellent polymorphism. These 15 SSR markers had PIC values ranging from 0.5010 to 0.8272, with an average of 0.6838, confirming their high polymorphism. This result was consistent with those reported by An et al. (2017); (average PIC = 0.5281) and Pandey et al. (2018); (average PIC = 0.550), but significantly higher than the average PIC of 0.31 reported by Cui et al. (2022). The polymorphism rate of sponge gourd SSR markers (75.32%) in this study is higher than that of melon (70.8%) (Liang et al., 2025), and the average PIC value (0.6838) and average number of alleles (5.5333) also show excellent polymorphism performance. In addition, e-PCR screening was performed based on the published genomes of sponge gourd P93075 and SG2019. The sequence differences between the two genomes were exploited to improve the polymorphism of SSR markers, and this efficient marker development strategy can provide a reference for other crops. Among the 15 markers, C2-2064 and C1-4705 showed the highest PIC values, making them particularly valuable for applications such as DNA fingerprinting, variety authentication, and genotype conservation. Additionally, all validated SSR markers amplified fragments shorter than 300 bp, a characteristic that allows them to perform well even with low-quality DNA samples. Together, these findings confirm the high reliability of the SSR markers developed in this study for polymorphism detection.
Assessing genetic diversity and genetic relationships among plant populations is a critical foundation for crop breeding programs. Previous studies on sponge gourd germplasm have reported substantial genetic diversity, which can be leveraged to advance breeding efforts. In the present study, the average genetic diversity indices, namely gene diversity (0.7267) and I (1.4618), further confirm high genetic diversity within the 67 sponge gourd germplasms analyzed. This result aligns with those of Misra et al. (2017) and Perez et al. (2021), who also documented relatively high genetic diversity in sponge gourd germplasms. However, our findings contrast with studies by Perez et al. (2022) and Tyagi et al. (2020), which reported low genetic diversity in sponge gourd. This discrepancy may arise from differences in experimental design, such as the type of molecular markers used, the number of primers employed, or the scope of germplasms sampled such as local vs. global collections (Hajibarat et al., 2015). Genetic distance is an intuitive quantitative indicator reflecting the degree of genetic differentiation among different genotypes. A larger genetic distance value indicates more significant differences in allele composition at the tested SSR loci between two genotypes, suggesting a higher level of genetic divergence between these genotype pairs (Türkoğlu et al., 2023). Notably, the highest genetic distance (0.9286) in this study was observed between PI 163295 and PI 381897, as well as between PI 250160 and PI 381875. Specifically, PI 250160 (originating from Pakistan) and PI 381875 (from India) differ in geographic origin and ecological background, while PI 163295 (India) and PI 381897 (India) exhibit distinct allele distributions across 13 of the 15 SSR loci. This could be attributed to their distinct geographic origins or long-term independent domestication and selection histories. Notably, although both PI 163295 and PI 381897 originate from India, a recognized diversity center for sponge gourd, their distinct allele profiles may reflect regional adaptation to different ecological niches within the country, such as variations in precipitation, temperature, or cultivation practices.
The 67 sponge gourd germplasms in this study were collected from 10 different countries. Based on SSR allele data, a combined clustering analysis was performed using three methods, namely STRUCTURE, UPGMA and PCoA. The results showed that all the germplasms were clearly divided into three main clusters, with consistent clustering outcomes across the three methods. These results indicate that the genetic relationships of sponge gourd germplasms are closely associated with their geographic origins and ecological distributions, with frequent gene flow occurring between regions (Garzón-Martínez et al., 2015), and this pattern is particularly evident in Indian sponge gourd germplasms. As the center of origin of sponge gourd (Chandra, 1995), sponge gourd in India has undergone long-term natural evolution and domestication, which has not only accumulated and retained abundant genetic variations but also avoided genetic bottlenecks caused by interregional dissemination. It is worth noting that similar genetic complexity related to geographic origin and gene flow has also been reported in cucumber, a fellow member of the Cucurbitaceae family (Iftikhar et al., 2024). Previous studies have shown that Indian cucumber genotypes are mostly clustered into a single group with a relatively narrow genetic base (Dar et al., 2017). In sharp contrast, the Indian sponge gourd germplasms in this study are stably distributed across two clusters and form complex subclade structures, fully reflecting a higher level of genetic diversity.
Conclusions
This study is the first to comprehensively identify SSR loci based on the high-quality Luffa cylindrica P93075 genome, providing a novel set of molecular markers for sponge gourd genetic research. Through rigorous screening and experimental validation, we obtained a set of polymorphic SSR markers with high stability and reliability, which proved effective in evaluating genetic diversity. Population genetic analysis of 67 globally sourced sponge gourd germplasms revealed rich genetic variation and a clear population structure, with clustering patterns closely tied to geographic origins, particularly highlighting India as a key diversity center for this crop. These newly developed SSR markers and insights into genetic diversity will significantly advance sponge gourd molecular breeding, germplasm utilization, and genetic research.
Supplemental Information
10.7717/peerj.20934/supp-1Supplemental Information 1Physical map of 308 pairs of SSR primers across 13 chromosomes in Luffa cylindrica.
10.7717/peerj.20934/supp-2Supplemental Information 2Number of accessions and dominant ecological backgrounds of sponge gourd (Luffa cylindrica) germplasm resources by country.
10.7717/peerj.20934/supp-3Supplemental Information 3Results of SSR locus identification in the whole genome of sponge gourd P93075.
10.7717/peerj.20934/supp-4Supplemental Information 4First round response to reviewers in table format.
10.7717/peerj.20934/supp-5Supplemental Information 5Distribution of nucleotide repeat types of SSR loci on different chromosomes of sponge gourd P93075.
10.7717/peerj.20934/supp-6Supplemental Information 6Genome-wide SSR primers developed from the sponge gourd genome P93075.
10.7717/peerj.20934/supp-7Supplemental Information 7The information of 15 polymorphic SSR molecular markers.
10.7717/peerj.20934/supp-8Supplemental Information 8Pairwise Nei’s genetic distance matrix of 67 sponge gourd germplasms.
10.7717/peerj.20934/supp-9Supplemental Information 9Uncropped image of Fig. 2.
10.7717/peerj.20934/supp-10Supplemental Information 10Original unprocessed images of Fig. 2.
10.7717/peerj.20934/supp-11Supplemental Information 11Raw polyacrylamide gel electrophoresis images.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Ahmed HGM-D Rizwan M Naeem M Khan MA Baloch FS Sun S Chung G Molecular characterization and validation of sunflower (Helianthus annuus L.) hybrids through SSR markers PLOS ONE 2022175 e 026738310.1371/journal.pone.026738335588423 PMC 9119457 · doi ↗ · pubmed ↗
- 2Alhariri A Behera TK Jat GS Devi MB Boopalakrishnan G Hemeda NF Teleb AA Ismail E Elkordy A Analysis of genetic diversity and population structure in bitter gourd (Momordica charantia L.) using morphological and SSR markers Plants 2021109186010.3390/plants 1009186034579393 PMC 8466607 · doi ↗ · pubmed ↗
- 3Amom T Tikendra L Apana N Goutam M Sonia P Koijam AS Potshangbam AM Rahaman H Nongdam P Efficiency of RAPD, ISSR, i PBS, S Co T and phytochemical markers in the genetic relationship study of five native and economical important bamboos of North-East India Phytochemistry 202017411233010.1016/j.phytochem.2020.11233032146386 · doi ↗ · pubmed ↗
- 4An J Yin M Zhang Q Gong D Jia X Guan Y Hu J Genome survey sequencing of Luffa cylindrica L. and microsatellite high resolution melting (SSR-HRM) analysis for genetic relationship of Luffa genotypes International Journal of Molecular Sciences 2017189194210.3390/ijms 1809194228891982 PMC 5618591 · doi ↗ · pubmed ↗
- 5Anastopoulos I Pashalidis I Environmental applications of Luffa cylindrica-based adsorbents Journal of Molecular Liquids 202031911412710.1016/j.molliq.2020.114127 · doi ↗
- 6Beier S Thiel T Münch T Scholz U Mascher M MISA-web: a web server for microsatellite prediction Bioinformatics 201733162583258510.1093/bioinformatics/btx 19828398459 PMC 5870701 · doi ↗ · pubmed ↗
- 7Bidyananda N Jamir I Nowakowska K Varte V Vendrame WA Devi RS Nongdam P Plant genetic diversity studies: insights from DNA marker analyses International Journal of Plant Biology 202415360764010.3390/ijpb 15030046 · doi ↗
- 8Cavagnaro PF Senalik DA Yang L Simon PW Harkins TT Kodira CD Huang S Weng Y Genome-wide characterization of simple sequence repeats in cucumber (Cucumis sativus L.)BMC Genomics 201011156910.1186/1471-2164-11-56920950470 PMC 3091718 · doi ↗ · pubmed ↗
