Genetic Diversity and Population Structure of Hainan Indigenous Pig Breeds Revealed by Microsatellite and mtDNA D-Loop Analysis
Yushan Cui, Maosong Wu, Xiaolei Ding, Jiayu Yan, Jing Chen, Shidao Zhao, Lifan Zhang, Wei Wei, Jie Chen

TL;DR
This study reveals the genetic diversity and population structure of Hainan indigenous pigs using microsatellite and mtDNA analysis to guide their conservation and breeding.
Contribution
The study provides new insights into the genetic uniqueness of Duntou and Wuzhishan pigs and highlights the need for conservation of Lingao pigs.
Findings
Duntou and Wuzhishan pigs exhibit the highest genetic diversity and unique genetic information.
Lingao pigs show no genetic variation and require focused conservation efforts.
Hainan pigs form a distinct subclade closely related to Luchuan pigs, indicating their evolutionary significance.
Abstract
Local pig breeds in Hainan are unique to China’s tropical region, but their distinct genetic characteristics have not been fully clarified, potentially affecting their protection and proper use. This study aimed to determine the genetic traits and differences among seven local pig breeds in Hainan. We used two common biological research methods to analyze the genes of 147 pigs. The results show that these local pigs have rich genetic diversity, but the genetic characteristics of different breeds are quite different—among them, Duntou and Wuzhishan pigs have the most unique genetic information, while Lingao pigs require more effort to maintain their genetic diversity. This study can help relevant departments formulate more scientific protection plans for Hainan local pigs, thereby ensuring that these unique pig breeds are not lost, while also providing a basis for breeding pigs with…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2- —the Guidance Foundation, the Sanya Institute of Nanjing Agricultural University
- —the National Natural Science Foundation of China
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenetic and phenotypic traits in livestock · Genetic Mapping and Diversity in Plants and Animals · Genetic diversity and population structure
1. Introduction
Genetic diversity, the material basis of species and ecosystem diversity, is primarily reflected in molecular-level genetic variations among individuals. These variations are driven by differences in chromosomal DNA nucleotide sequence [1]. Therefore, the essence of studying genetic diversity lies in deciphering these DNA nucleotide sequence differences among different species. Species with high genetic diversity, characterized by abundant genetic variation, exhibit greater adaptability to new environments, thereby facilitating population expansion and extension of their distribution ranges [2].
Hainan indigenous pigs, the dominant porcine germplasm on Hainan Island, feature desirable traits such as low adiposity, tender meat, and abundant flavor-associated fatty acids [3]. However, their poor production performance (slow growth, low feed efficiency, and suboptimal input–output ratios) reduces farmers’ breeding willingness [3]. Extensive crossbreeding with commercial lines, outdated breeding practices, and inadequate conservation/protection measures have depleted purebred populations and hindered large-scale rearing, undermining their market competitiveness [3].
Microsatellites, also known as simple sequence repeats (SSRs), are widely distributed in eukaryotic genomes. Each microsatellite sequence is composed of repetitive units, typically 1–6 nucleotides in length, and is repeated 15–65 times across the genome [4]. Owing to their codominant inheritance and high polymorphism, microsatellites have been extensively used to analyze the genetic profile of various species, particularly in evaluating genetic diversity and genetic uniqueness among pig breeds [5,6].
Mitochondrial DNA (mtDNA) is a widely used molecular marker in genetic studies and is characterized by rapid evolution, maternal inheritance, and a simple molecular structure [7]. Within the mtDNA genome, the D-loop region, a non-coding region located between the tRNA-Pro and tRNA-Phe genes, exhibits an evolutionary rate 5–10 times faster than that of other mtDNA regions. This high evolutionary rate renders it particularly suitable for genetic diversity analysis and for studying the phylogenetics of pigs [8,9].
Notably, previous studies on a national scale have confirmed the reliability of FAO/ISAG-recommended microsatellite markers for assessing genetic diversity in Chinese indigenous pigs. For instance, a study on 56 Chinese native pig breeds using 27 microsatellite loci showed that these markers could effectively distinguish geographical types (e.g., the North China type or the South China type), with polymorphism information content (PIC) values ranging from 0.39 to 0.86 and mean heterozygosity varying from 0.44 to 0.87.
This study verified the efficiency of FAO markers in distinguishing the geographical types (e.g., the North China type or the South China type) of Chinese indigenous pigs and provided a robust and validated methodology for studying porcine genetics [10].
In this study, 17 microsatellite markers and mtDNA D-loop sequencing were combined to systematically analyze the genetic structure and maternal lineage of 5 Hainan indigenous pig breeds. This combined approach has been validated in autochthonous pig breed research [11], and it addresses two key research gaps in current studies on Hainan indigenous pigs: the lack of integrated nuclear and mitochondrial molecular marker analysis, and the insufficient genetic evaluation data to support effective germplasm conservation. Notably, a prior study on these pigs only used mtDNA D-loop single-marker analysis [12], which failed to comprehensively reveal their nuclear genetic diversity. This work thus complements the genetic information of Hainan local pigs and provides a more holistic molecular basis for their germplasm conservation and utilization.
Adopting this validated framework, our study used 17 microsatellites to analyze the nuclear genetic structure and mtDNA D-loop sequencing to trace maternal lineage. This not only complements the genetic information on Hainan local pigs but also addresses the research gap of “single-marker analysis” in a previous study by Yu et al. [12], who only used the mtDNA D-loop region for analysis.
2. Materials and Methods
2.1. Sample Collection and DNA Extraction
Ear tissue samples were collected from 5 Hainan indigenous pig breeds: Duntou (DT, 74 initial samples: 48 from DT-DZ, 16 from DT-SJ, and 10 from DT-SG, the 3 subpopulations being classified based on their distinct geographical sampling regions across Hainan Island), Wuzhishan (WZS, n = 17), Lingao (LG, n = 6), Wenchang (WC, n = 22), and Tunchang (TC, n = 28), totaling 147 initial samples (prior to quality screening).
Genomic DNA for each pig was extracted from ear tissue samples weighing approximately 100 mg via the SDS-Proteinase K lysis method combined with phenol–chloroform extraction. Briefly, tissues were lysed in Tris-EDTA-Na-Cl-SDS buffer (pH 8.0) with 30–50 μL of Proteinase K at 55 °C for 14–20 h. Post-cooling, sequential extractions were performed using a Tris-saturated phenol, phenol/chloroform/isoamyl alcohol mixture (25:24:1), and a chloroform/isoamyl alcohol mixture (24:1). DNA was precipitated with 2× pre-cooled (−20 °C) absolute ethanol, washed with 75% ethanol, air-dried, and dissolved in preheated (65 °C) sterile ultrapure water.
DNA purity and concentration were quantified using a NanoDrop 2000C UV-Vis Spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA). Acceptable criteria were defined as OD_260_/OD_280_ = 1.8–2.0 and OD_260_/OD_230_ ≥ 2.0, and DNA concentration ≥ 50 ng/μL samples failing to meet these criteria were excluded. Due to poor quality, a total of 43 samples were excluded, leaving 104 qualified samples for mtDNA D-loop analysis. The distribution of valid samples by breed/subpopulation after screening is summarized as follows: Duntou pigs (DT, 30 samples: 10 from DT-DZ, 11 from DT-SJ, and 9 from DT-SG), Wuzhishan pigs (WZS, 17 samples), Wenchang pigs (WC, 22 samples), Lingao pigs (LG, 6 samples), and Tunchang pigs (TC, 29 samples). Detailed information on the distribution of the valid samples for mtDNA D-loop analysis is shown in Table S1 (see Supplementary Materials).
2.2. Primer Selection and PCR Amplification
Genotyping was performed for all samples using 17 microsatellite loci (S0155, S0226, SW240, S0097, S0005, IGF1, SW2406, SW632, SW911, SW830, S0143, S0090, S0068, SW857, S0355, S0026, and YLI04) (Table 1), which were selected to ensure maximal chromosome coverage. Among these loci, 16 pairs were screened from the microsatellite markers recommended by the Food and Agriculture Organization (FAO) (https://www.fao.org/3/i2413e/i2413e00.pdf) (accessed on 10 February 2026) [13], and one pair of microsatellite primers located on the Y chromosome was referenced from the study by Iacolina et al. (2016) [14]; Iacolina et al. (2016)’s study validated Y-chromosome short-tandem repeats in Sus scrofa, confirming the suitability of YLI04 for porcine genetic structure analysis. The selected microsatellite loci are widely validated for Sus scrofa genetic analysis: Costa et al. (2012) confirmed high polymorphism (mean PIC = 0.72) of S0155/SW240 in European wild boar [15], and Zhao et al. (2018) applied S0155/SW240 for Chinese pig individual identification, ensuring the reliability of these loci for subsequent genetic diversity and structure analysis in this study [16].
We performed PCR amplification in a 25 μL reaction mixture, which consisted of 12.5 μL of 2× Accurate Taq Master Mix DNA polymerase, 1 μL of the DNA template, 0.5 μL each of the forward and reverse primers, and 10.5 μL of RNase-free water. The PCR cycling conditions were set as follows: pre-denaturation at 94 °C for 30 s; 35 cycles of denaturation at 98 °C for 30 s, annealing at the locus-specific temperature for 30 s, and extension at 72 °C for 1 min; and a final extension at 72 °C for 2 min, then holding at 4 °C indefinitely.
2.3. Polyacrylamide Gel Electrophoresis (PAGE)
A 12% polyacrylamide gel was prepared with 16 mL of 30% acrylamide/bis-acrylamide (29:1, Sangon Biotech, Shanghai, China; Cat. No. P8886), 4 mL of 10× TBE buffer (Solarbio, Beijing, China; Cat. No. T1070), 20 mL of double-distilled water (ddH_2_O), 0.2 mL of 10% ammonium persulfate (Sigma-Aldrich, St. Louis, MO, USA; Cat. No. A3678), and 100 μL of tetramethylethylenediamine (TEMED, Sigma-Aldrich St. Louis, MO, USA; Cat. No. T9281). The gel was polymerized at 25 ± 1 °C for 1 h, then mounted in a vertical electrophoresis system (Beijing Liuyi Instrument, Beijing Liuyi, Beijing, China, Model DYCZ-24EN) with 1× TBE buffer in both reservoirs.
We loaded 1 μL of PCR product per sample and conducted electrophoresis at a constant voltage of 250 V for 7 h; an ice bath maintained the tank temperature at 5–15 °C to prevent gel overheating. Post-electrophoresis, DNA bands were visualized via silver staining: gels were shaken (60 rpm) in 500 mL of 0.2% (w/v) silver nitrate (Sinopharm Chemical Reagent, Shanghai, China; Cat. No. 10022818) at 25 °C for 10 min, rinsed with ddH_2_O for 15 s, and developed in 500 mL of fresh 1.5% (w/v) NaOH (containing 0.4% v/v formaldehyde, Sinopharm Chemical Reagent, Shanghai, China; Cat. No. 10004118) until bands were clear. Development was terminated by rinsing three times with ddH_2_O.
A 20 bp DNA ladder (Thermo Fisher Scientific, Waltham, MA, USA; Cat. No. 10488085) was used for molecular weight calibration. Gel images were analyzed via the Tanon 2500 Automatic Digital Gel Image Analysis System (Tanon Science & Technology, Shanghai, China) for genotyping.
2.4. Microsatellite Data Analysis
All statistical tests in this section adopted a significance level of p < 0.05 unless otherwise specified. GenAlEx 9 was used to calculate genetic diversity parameters: the number of alleles (NA), the effective number of alleles (NE), observed heterozygosity (Ho), and expected heterozygosity (He). Cervus 3.0 was used to compute the polymorphism information content (PIC) and to perform Hardy–Weinberg equilibrium (HWE) tests with Bonferroni correction. These microsatellite-derived parameters are reliable for assessing the genetic variation in Sus scrofa, consistent with studies on European wild boars showing a high correlation (r = 0.89) between microsatellite data (PIC: 0.58–0.83; H: 0.61–0.85) and genome-wide SNP data [15].
For each microsatellite locus, the polymorphism information content (PIC) and Sus scrofa chromosome (SSC) location were recorded (Table 1), with PIC > 0.6 defined as high polymorphism.
PopGene 32 (v1.32) was used to calculate Nei’s unbiased genetic distance and gene flow (N). MEGA X was employed to construct neighbor-joining (NJ) and Unweighted Pair Group Method with Arithmetic Mean (UPGMA) phylogenetic trees. Arlequin 3.5.2.2 was used to perform an Analysis of Molecular Variance (AMOVA) [17], and genetic differentiation coefficients (F) were calculated with Benjamini–Yekutieli false discovery rate (BY-FDR) correction—a robust method for controlling false positives in multiple comparisons, making it suitable for complex population genetic data—to quantify inter- and intra-population variation.
Genetic structure was evaluated via principal coordinate analysis (PCoA) (GenAlEx 9, based on Euclidean distance and F) and STRUCTURE clustering (STRUCTURE 2.3). STRUCTURE adopted a Bayesian model with the Markov Chain Monte Carlo (MCMC) method. The settings were as follows: 100,000 burn-in iterations and 100,000 sampling iterations, with K values set from 1 to 10 (10 replicates each). The optimal K value was determined using STRUCTURE HARVESTER with the ΔK algorithm (Figure S1 and Table S2).
2.5. Construction and Quality Control of the mtDNA D-Loop Dataset
We retrieved the pig mtDNA D-loop reference sequence from GenBank (accession number: NC_000845.1, Sus scrofa domestic pig) for sequence alignment and variant calling. Qualified genomic DNA (OD_260_/OD_280_ = 1.8–2.0, OD_260_/OD_230_ > 2.0) from 104 individuals was sent to Beijing Berry Genomics Co., Ltd. (Beijing, China) for library construction and sequencing. DNA was enzymatically fragmented into 350 bp segments before end repair, A-tailing, and adapter ligation. Libraries were quality-validated (insert size approximately 350 bp) and PCR-amplified.
Sequencing was conducted on an Illumina platform with paired-end reads (average depth 11×). This coverage is sufficient for mtDNA D-loop region analysis, as the region is non-coding and has relatively low sequence variation compared to nuclear genomes. However, the moderate sequencing depth is a minor limitation of this study. It may restrict the detection of rare mitochondrial variants or low-frequency heteroplasmic sites, which could offer additional insights into the maternal lineage dynamics of Hainan indigenous pigs. Raw reads were filtered (adapter containing >10% N bases and low-quality reads) using the Trimmomatic software, then aligned to the reference pig mitochondrial genome (GenBank: NC_000845.1, Sus scrofa domestic pig) via the BWA software. SNPs/indels were called with the GATK software, annotated using ANNOVAR v20210601, and filtered (QD < 2.0, FS > 200.0, SOR > 10.0, MQRankSum < −12.5, and ReadPosRankSum < −8.0). High-quality SNPs (genotyping missing rate < 0.1; minor allele frequency > 0.1) were retained via the PLINK software.
mtDNA D-loop SNPs were further filtered: (1) excluding reads with >5% N bases, (2) removing monomorphic loci, and (3) eliminating motif loci in overlapping regions to avoid false positives.
2.6. mtDNA D-Loop Region Data Analysis
Target sequences (screened based on SNP criteria) were verified as pig D-loop region sequences via online BLAST alignment, followed by statistical determination of nucleotide compositions, transitions, and transversions using the MEGA X software. The genetic diversity indices (the number of segregating sites [S], haplotype diversity [Hd], nucleotide diversity [π], the number of haplotypes [h], and average nucleotide differences [K]) of each Hainan local pig population’s D-loop sequences were calculated using the DnaSP6 software [18].
Combined with GenBank homologous sequences, 17 outgroups were introduced, including Chinese local breeds (Luchuan: KP126954, Bama: EF590178, Meishan: AY230827, Rongchang: KM044239, and Tibetan pigs: EF590189) and Chinese wild boars (Wild Boar1: EF545585 and Wild Boar2: EF545586), as well as foreign breeds (Yorkshire: AY342481 and Berkshire: AY578045), with their corresponding GenBank accession numbers provided for reference. The neighbor-joining (NJ) phylogenetic tree of the Hainan pig D-loop haplotypes and outgroups was constructed using MEGA X (Kimura 2-parameter model) with 1000 bootstrap replicates. To partition the genetic variance in Hainan local pigs, we referenced the Analysis of Molecular Variance inferred from metric distances among DNA haplotypes, which validates haplotype-based AMOVA as robust for mtDNA variation quantification [17], thereby supporting reliable mtDNA D-loop AMOVA in this study.
2.7. Haplotype Network Construction and Visualization Using RStudio
Haplotype network analysis was performed in R 4.0.3 (RStudio) with filtered mtDNA D-loop data, following the workflow described by Toparslan et al. [19]: (1) Loaded core packages (pegas v1.3, ape v5.3, RColorBrewer v1.1–3) for haplotype analysis, phylogenetic data processing, and color scheme design. (2) The input data included 12 haplotype sequences and the haplotype count matrix of 7 pig populations, which were identified and detailed in Section 3.5; the sequences were split using the strsplit() function and converted to DNAbin format via ape::as.DNAbin(). (3) Deduplicated haplotypes with pegas::haplotype(), and constructed a statistical parsimony-based network using pegas::haploNet() (node sizes were determined by total haplotype frequencies calculated via rowSums(pie_mat); branch dashes represent nucleotide mutations). (4) Established a haplotype–pig population distribution matrix (pie_mat); generated grouped color-coded pie-chart networks using pegas::plot.haploNet(), with legend() to label population–color correspondence.
3. Result
3.1. Genetic Diversity of Microsatellite Loci
To evaluate population-level genetic diversity, the genetic parameters of seven Hainan local pig populations were calculated (Table 2). Among all populations, the Wuzhishan pig (WZS) exhibited the highest genetic diversity, characterized by the highest values for the effective number of alleles (Ne = 3.382), the Shannon diversity index (I = 1.279), observed heterozygosity (Ho = 0.768), and expected heterozygosity (Hexp = 0.666). For the three Duntou pig subgroups, DT-SJ showed slightly higher genetic diversity than DT-DZ and DT-SG, as indicated by its higher Ne (3.274 vs. 3.188 in DT-DZ and 3.095 in DT-SG), I (1.276 vs. 1.219 in both DT-DZ and DT-SG), and Hexp (0.662 vs. 0.632 in DT-DZ and 0.652 in DT-SG). In contrast, LG had the lowest genetic diversity across all measured parameters, with the smallest Na (3.294), Ne (2.523), I (0.990), Ho (0.636), and Hexp (0.578). The overall mean values of the genetic parameters across all the 7 populations were Na = 4.437, Ne = 3.126, I = 1.205, Ho = 0.736, and Hexp = 0.639, indicating moderate to high genetic diversity in Hainan local pig populations.
Genetic diversity analysis of the 17 microsatellite loci revealed that 15 loci (except S0097 and YLI04) exhibited high polymorphism (Table 3). For these 15 loci, the polymorphism information content (PIC) ranged from 0.607 (SW240) to 0.832 (SW857), with a mean PIC of 0.6814. Importantly, all values exceeded 0.5, meeting the criterion for high polymorphism. In contrast, the remaining two loci (S0097 and YLI04) showed moderate polymorphism, with PIC values of 0.484 and 0.459, respectively. Hardy–Weinberg equilibrium (HWE) testing across all 17 loci indicated that 14 loci (the majority, including S0155, SW240, and SW911) significantly deviated from the HWE after Benjamini–Yekutieli false discovery rate (BY-FDR) correction (p < 0.01 for **; p < 0.001 for ***), while only 3 loci (IGF1, SW632, and S0026) were in HWE (p > 0.05). The mean observed heterozygosity (Ho = 0.736) across all 17 loci was higher than the mean expected heterozygosity (He = 0.639), suggesting a lack of inbreeding and potential outcrossing in the studied populations.
3.2. Genetic Variation Analysis of Seven Distinct Populations
To investigate genetic variation among Hainan local pig breeds, genetic differentiation indices (Fst) and Nei’s unbiased genetic distances were calculated for seven distinct populations using 17 microsatellite loci. All pairwise genetic differentiations (Fst) among the 7 populations reached a significant level, with Nei’s unbiased genetic distances ranging from 0.1027 to 0.5269 (Table 4). These Fst values indicate low to moderate genetic differentiation among the studied populations, in which Fst < 0.05 reflects slight genetic differentiation, 0.05 ≤ Fst ≤ 0.15 represents moderate differentiation, and Fst > 0.15 indicates high genetic differentiation between populations. Table 4 presents details on the results: the pairwise fixation indices (Fst) are displayed above the diagonal, and Nei’s unbiased genetic distance are displayed below the diagonal for these determined populations.
To assess the level of genetic differentiation, F-statistics were employed, including the inbreeding coefficient (Fis*), the fixation index (Fit), and the overall fixation index (Fst). For the 17 loci examined, the Fit values ranged from −0.0344 (locus SW632) to 0.4949 (locus S0097). With respect to Fst values, locus S0068 exhibited the highest value at 0.2519, while locus SW2406 displayed the lowest value at 0.0571 (Table 5).
3.3. Structure Analysis
A UPGMA dendrogram was constructed based on Nei’s unbiased genetic distances (Figure 1a). The results revealed that the seven Hainan local pig groups clustered into two genetic clusters. In the first clade, the DT-DZ and DT-SJ populations of Duntou pigs first grouped together, then this subcluster merged with the DT-SG population. In the second clade, the Wuzhishan pig population and the Wenchang pig population grouped together initially; this subcluster then joined with the subcluster composed of the Lingao pig and Tunchang pig populations.
Notably, this clustering pattern is consistent with a microsatellite-based phylogenetic study on 32 Chinese indigenous pig breeds, which reported that Hainan local pigs (including Lingao and Wuzhishan pigs) clustered into a subclade within South China pig breeds, with the closest genetic relationship to Luchuan pigs from Guangxi (Fst = 0.072). Such cross-regional comparative evidence confirms that the genetic clustering of Hainan local pigs observed in this study is not accidental but rather reflects the common phylogenetic characteristics of indigenous pig breeds in South China [20].
Moreover, the principal coordinate analysis (PCoA) results showed that axis 1 divided the seven Hainan local pig populations into two genetic clusters: one cluster comprising the DT-DZ, DT-SJ, and DT-SG groups, and the other including the Wuzhishan, Wenchang, Lingao, and Tunchang pigs (Figure 1b). This clustering pattern was consistent with the UPGMA clustering results.
The majority of genetic variation occurred within populations. Similarly, for the two genetic clusters identified, the primary genetic variation was also found to reside within the populations of each cluster (Table 6).
The clustering analysis results from STRUCTURE 2.3.4 (Figure 1c) indicated that the 3 subpopulations of Duntou pigs from different regions formed one genetic lineage, while the other four Hainan local pig breeds constituted another genetic lineage.
3.4. mtDNA D-Loop Region Sequencing Results and Base Composition
To further characterize the maternal genetic diversity of Hainan indigenous pigs following library construction and high-throughput sequencing, we analyzed the polymorphic sites and nucleotide composition of the mtDNA D-loop region for all qualified samples, with the results presented below.
This section verifies the quality of the mtDNA D-loop sequencing data and clarifies its base composition, providing foundational support for subsequent genetic diversity and phylogenetic analyses. We subjected 104 samples to whole-genome sequencing. To analyze mitochondrial genetic variation, the reference sequence of the pig mtDNA D-loop region was retrieved from the GenBank database. The results revealed that 12 polymorphic sites were identified among the 104 samples, and they were located at positions 62 bp, 241 bp, 279 bp, 301 bp, 359 bp, 405 bp, 443 bp, 452 bp, 501 bp, 553 bp, 560 bp, and 1096 bp within the D-loop region.
The full length of the mtDNA D-loop region was 1175 bp. Nucleotide composition analysis revealed that adenine (A) had the highest content, followed by cytosine (C), while guanine (G) had the lowest content. Specifically, the average contents of A, T, C, and G were 33.6%, 25.4%, 26.4%, and 14.6%, respectively, resulting in an A + T content of 59% and a C + G content of 41% (Table S1 in Supplementary File S1). This base composition pattern is consistent with the typical nucleotide proportion of mammalian mitochondrial DNA, which is characterized by a higher A + T content than C + G content.
The confirmed polymorphic sites and base composition characteristics ensure the validity of downstream genetic parameter calculations and interspecific sequence comparisons.
3.5. Polymorphic Site Analysis of the mtDNA D-Loop Region
Haplotype analysis of the mtDNA D-loop region identified a total of 12 distinct haplotypes across the seven Hainan local pig populations (Table 7). These haplotypes exhibited population-specific distribution patterns. Most populations possessed unique haplotypes, except for WZS and LG. Notably, the LG population was strictly monomorphic, harbouring only Hap_6 (no genetic variation at the mtDNA D-loop region), which is a critical finding reflecting the extremely low maternal genetic diversity of this breed and implying potential evolutionary and conservation implications. In contrast, the WZS, although polymorphic, lacked population-specific haplotypes. All other populations displayed polymorphic characteristics, with multiple haplotypes, including unique ones. Hap_1 was the most wildly shared haplotype across most populations, indicating a potential common maternal genetic origin among these groups (Table 8).
Among the 3 subpopulations of Duntou pigs (DT-DZ, DT-SJ, and DT-SG), the DT-DZ group exhibited four haplotypes (Hap_1, Hap_2, Hap_3, and Hap_4), with Hap_1 as the dominant one (five individuals); DT-SJ exhibited five haplotypes (Hap_1, Hap_3, Hap_5, Hap_6, and Hap_7), with Hap_1 also being the most frequent one (six individuals); and DT-SG possessed five haplotypes (Hap_1, Hap_6, Hap_7, Hap_8, and Hap_9), with Hap_7 being the most abundant (five individuals).
For the other 3 populations, the WZS group exhibited six haplotypes (Hap_1, Hap_3, Hap_6, Hap_7, Hap_8, and Hap_10), with Hap_3 being the dominant haplotype (9 individuals); the WC population exhibited four haplotypes (Hap_1, Hap_7, Hap_10, and Hap_12), with Hap_1 showing the highest frequency (19 individuals); and TC exhibited four haplotypes (Hap_1, Hap_7, Hap_10, and Hap_11), with Hap_1 being the most prevalent (26 individuals).
The haplotype network, constructed using the pegas package in RStudio, visualized genealogical relationships among the 12 haplotypes (Figure 2a). Node sizes were proportional to haplotype frequencies (calculated via rowSums(pie_mat)), with branch dashes denoting nucleotide mutations. Hap_1, the dominant haplotype, served as the network’s central node, with Hap_3, Hap_6, and Hap_7 radiating outward—forming a star-like structure characteristic of recent population expansion. Population-specific haplotypes (e.g., Hap_2 in DT-DZ, Hap_5 in DT-SJ, Hap_9 in DT-SG, Hap_11 in TC, Hap_12 in WC) were distributed as peripheral nodes, reflecting unique maternal lineages in these populations. Notably, LG (monomorphic for Hap_6) formed an independent small node, confirming its lack of mitochondrial genetic variation.
Furthermore, sequence variation analysis revealed that all polymorphisms among these haplotypes were transitions (C/T and A/G), which refer to base substitutions between purines and pyrimidines. No transversions (base substitutions between two purines or two pyrimidines) were detected. This pattern is consistent with the relatively conservative mutation pattern of the mtDNA D-loop region in mammals. Notably, a study on the porcine mtDNA D-loop region further clarified the genetic basis of these polymorphisms—it identified a tandemly repeated sequence (CGTGC GTACA) in the D-loop region, whose self-complementary property and repeated structure easily induce replication slippage and mispairing. This process generated length heteroplasmy (e.g., 14–29 repeat units in a single individual) and promoted the accumulation of transition mutations [21].
This mechanism explains the high haplotype diversity observed in Hainan local pigs well (e.g., 12 haplotypes in total), as the repeated sequences in the D-loop region provide a genetic basis for the emergence of polymorphic sites. Overall, these findings, which are based on analyses of the mtDNA D-loop region, reflect a moderate level of genetic diversity for Hainan local pigs.
3.6. Genetic Diversity of the mtDNA D-Loop Region
Genetic diversity indices, including haplotype diversity (H_d), nucleotide diversity (π), and the average number of pairwise differences (K), were calculated for 7 distinct local pig populations in Hainan. The results reveal that these populations exhibit relatively high haplotype diversity but low nucleotide diversity (Table 9).
Among the seven Hainan indigenous pig populations, DT-SG exhibited the highest nucleotide diversity (π = 0.00345) and average pairwise differences (K = 4.056), indicating abundant nucleotide-level variation; in contrast, the LG group showed the lowest diversity (Hd* = 0, π = 0, and K = 0), suggesting a near-complete absence of genetic variation. This low-diversity pattern aligns with that observed in European Mangalica pigs [22], where mtDNA D-loop analysis linked reduced diversity to historical bottlenecks and limited gene flow—supporting that LG’s monomorphism likely stems from long-term isolation and founder effects, emphasizing targeted conservation needs.
Overall, Hainan indigenous pigs displayed moderate mtDNA D-loop diversity (total Hd* = 0.668, π = 0.00193, and K = 2.27). This is consistent with a global study of 4434 Sus scrofa sequences [23], which found that Asian indigenous pigs (including Chinese breeds) have lower π than non-Asian populations (Southeast Asia being the only high-diversity Asian region), fitting Hainan pigs’ status as a South China representative (π = 0.00193) shaped by continental-scale evolutionary or anthropogenic factors. Additionally, Hainan pigs showed a “high Hd (0.200–0.733) but low π (0.00083–0.00345)” pattern, consistent with Laxmivandana et al.’s work on five Indian indigenous pig breeds [9]—their D-loop sequencing results identified 56 unique haplotypes and higher intra- (59.1%) than inter-population (40.9%) variation, confirming that high mtDNA D-loop Hd is universal in indigenous pigs (while π varies regionally with geography/breeding history), providing cross-continental evidence for Hainan pigs’ regional genetic specificity.
3.7. Phylogenetics of the mtDNA D-Loop Region
A phylogenetic analysis was conducted based on the introduction history and natural geographical distribution of different local pig breeds in China. The phylogenetic analysis initially suggested a division of the tested pig populations into two major clades, with one predominantly composed of Chinese indigenous breeds and the other expected to contain exotic breeds. However, the exotic Yorkshire breed did not cluster within the putative exotic clade but instead grouped closely with Chinese native breeds, but instead grouped closely with some Chinese native breeds. (Figure 2b). Notably, Hainan pig populations formed an independent clade without mixing with other breeds, implying their unique maternal genetic origin and evolutionary trajectory. This local maternal origin is consistent with mtDNA studies on Chinese domestic pigs, which confirmed the upper Yangtze River as a key domestication center [24]. It supports the hypothesis that Hainan pigs likely originated from the domestication of local wild boars in South China, rather than long-distance introduction, with the Qiongzhou Strait further preserving their genetic uniqueness. This unique evolutionary trajectory aligns with the global porcine evolution framework revealed by a large-scale pig genome analysis. The study included 102 whole genomes of Sus scrofa (domestic pigs and wild boars) from 32 countries and identified three major ancestral clades of global pigs—European, East Asian, and Southeast Asian. It further confirmed that Chinese indigenous pigs (including southern breeds like Hainan pigs) belong to the East Asian clade, with a divergence time of approximately 1.2 million years from European clades [25].
4. Discussion
4.1. Genetic Variation
Allelic richness, a core and widely used indicator of genetic variation sensitive to effective population size and evolutionary history (directly reflecting long-term population genetic resilience [26]), was used for microsatellite-based diversity analysis. For the seven Hainan indigenous pig populations, mean alleles (Na = 4.437) and effective alleles (Ne = 3.126) were relatively uniform. Compared with five Guizhou indigenous pig breeds [27], Hainan pigs exhibited moderately higher allelic richness (mean Na = 4.437 vs. 3.000–3.667), and also showed higher genetic diversity than three representative Guangxi indigenous pig breeds in terms of microsatellite heterozygosity and mtDNA D-loop haplotype diversity. A detailed comparison of genetic diversity parameters between Hainan indigenous pigs and other Chinese indigenous pig breeds is summarized in Table S3 (see Supplementary Materials) [27]. All 7 Hainan populations showed higher observed heterozygosity (Ho = 0.636–0.772, mean = 0.736) than expected heterozygosity (He = 0.578–0.666, mean = 0.639)—consistent with findings in 3 Guangxi pig breeds [28], where heterozygosity ranged from 0.7 to 0.8, indicating abundant polymorphism in southern Chinese indigenous pigs. Of the 17 microsatellite loci, only S0097 (PIC = 0.484) and YLI04 (PIC = 0.459) showed moderate polymorphism; the 15 remaining loci had high polymorphism (PIC > 0.6, mean = 0.6814). Collectively, these parameters confirm relatively high genetic variation (total Hd = 0.688, mean Na = 4.437, mean Ho = 0.736) and rich genetic information in all 7 Hainan indigenous pig populations.
The mtDNA D-loop region, a core marker for maternal lineage tracing, revealed divergent genetic diversity patterns among the seven Hainan local pig populations. These populations exhibited the typical pattern of high haplotype diversity but low nucleotide diversity (total Hd = 0.688, π = 0.00193) [29,30], a genetic signature indicative of historical rapid population expansion that retains novel haplotype mutations but limits the accumulation of nucleotide-level variation. This pattern was most pronounced in Duntou and Wuzhishan pigs, implying these two breeds experienced ancestral expansion events, while Lingao, Wenchang and Tunchang pigs showed low mtDNA diversity (with Lingao pigs exhibiting complete monomorphism) due to long-term small effective population sizes and strict geographical isolation. The sample sizes of all populations (n = 6–29) are sufficient for mtDNA D-loop diversity analysis [18], with the small sample size of LG (n = 6) reflecting its actual population size in the field. This high-Hd low-π pattern is a shared evolutionary feature of southern Chinese indigenous pigs [31], underscoring the strong biogeographic and historical shaping of maternal genetic diversity in this region. Notably, the nuclear–mitochondrial diversity discrepancy in Hainan pigs aligns with East Asian pig studies [32], which reported that domestic pigs retain higher mtDNA haplotype continuity but lower nuclear variation than wild boars (maternal lineage preservation and nuclear gene selection), confirming the typical evolutionary trajectory seen in East Asian domestic pigs. Mutation accumulation partially increased Hd in Hainan pigs, but limited effective population sizes prevented significant π elevation, highlighting a key conservation challenge.
Hardy–Weinberg equilibrium (HWE) testing revealed that 14 of the 17 microsatellite loci deviated significantly from HWE. This widespread pattern in Hainan indigenous pigs can be attributed to four interrelated factors. First, the Wahlund effect: sampling three geographically distinct Duntou subpopulations (DT-DZ, DT-SJ, DT-SG) as a single group mixed genetically differentiated lineages, inflating observed heterozygosity. Second, null alleles: loci like S0097 and YLI04 may carry null alleles due to primer binding site mutations, leading to underestimated homozygosity. Third, artificial selection: long-term unplanned breeding for traits such as meat quality and tropical adaptability altered allele frequencies of functional loci linked to microsatellites. Fourth, non-random mating: free mating within small local populations and limited inter-breed gene flow further disrupted genetic equilibrium. These interacting factors shape Hainan pigs’ genetic structure, and their identification provides a scientific basis for optimizing breeding and conservation practices.
4.2. Genetic Structure and Phylogenetic Analysis
Gene flow—facilitating inter-population allele transmission via individual or gamete migration—had a mean value of 1.74 across all loci (values >1 indicate high genetic exchange and low differentiation). F-statistic analysis (quantifying genetic differentiation) revealed moderate differentiation between Duntou subpopulations DT-SG and DT-DZ (pairwise 0.05 ≤ Fst ≤ 0.15), while other Duntou pairs showed extremely low differentiation. Notably, LG was significantly differentiated from DT-DZ/DT-SG, and Tunchang pigs were significantly differentiated from all three Duntou subpopulations (0.15 ≤ Fst ≤ 0.25), with moderate differentiation in remaining pairs (likely due to intra-island geographical distribution).
This link between geographical isolation and differentiation is supported by global pig genome studies [25], which identified barriers (e.g., the Qiongzhou Strait separating Hainan from the mainland and intra-island central mountains) as key drivers of Sus scrofa differentiation. These barriers limit gene flow and promote population-specific mutations, explaining the unique genetic structure of Hainan pigs (e.g., an independent clade in the NJ tree).
Analysis of 17 microsatellite loci revealed positive Fis at 5 loci (partial inbreeding) but an overall negative Fis across all seven populations (reflecting high heterozygosity). Non-random mating increased heterozygotes and enhanced genetic diversity [33], with DT-DZ exhibiting the highest negative Fis (highest heterozygosity, a conservation opportunity). The UPGMA tree (based on Nei’s unbiased genetic distances) showed three Duntou subpopulations forming a distinct clade (ancestry traced to Guangdong–Guangxi introductions, with selective breeding preserving small-spotted pig traits), while the other four populations clustered into pairs before merging (evolved from local Hainan black pigs via selective breeding, sharing a common ancestor).
Molecular variance analysis (AMOVA) of the seven populations showed that 87.32% of total genetic variation originated from within-population and individual-level differences (statistically significant), consistent with UPGMA tree and PCoA results. STRUCTURE-based population structure analysis further supported their division into two major genetic lineages. Phylogenetic analysis incorporating multiple pig breeds identified two major clades, with Hainan pigs clustering within the clade containing Chinese indigenous pigs, wild boars, and some exotic breeds—reflecting their shared maternal genetic background with East Asian porcine populations. Two Chinese wild boars clustered within the Chinese domestic clade, confirming their close relationship [34].
Comparing microsatellite (nuclear DNA) and mtDNA D-loop results reveals distinct signals: microsatellites indicate moderate-to-high nuclear differentiation among Hainan pig breeds (driven by isolation and selection), while mtDNA shows high haplotype but low nucleotide diversity (reflecting shared maternal ancestry), highlighting the need to combine both markers for comprehensive genetic characterization.
Notably, the foreign Yorkshire breed clustered within the clade containing Chinese indigenous pigs and wild boars, which may be attributed to historical crossbreeding events between exotic and Chinese local pig breeds; collectively, these results indicate that Hainan pigs are genetically closer to Chinese pigs than foreign breeds—supported by whole-genome resequencing showing that Duntou pigs have the shortest genetic distance to Wuzhishan pigs (0.008) among Hainan breeds and carry 12 tropical adaptation-specific genes (e.g., HSP70), which may underpin their unique NJ tree trajectory, thus reflecting long-term tropical evolution [3].
Furthermore, phylogenetic analysis revealed that Hainan pigs cluster closely with Luchuan pigs, a finding with important biogeographic and historical implications. Geographically, Hainan Island is adjacent to Guangxi (the native habitat of Luchuan pigs), and the Qiongzhou Strait—once a late Pleistocene land bridge—facilitated wild boar migration and human-mediated introduction of Luchuan-related domestic pigs to Hainan. Historically, ancient migrations from Beihai (Guangxi), Zhanjiang (Guangdong) and nearby regions brought pigs closely related to Luchuan pigs to Hainan, where long-term artificial selection and natural adaptation shaped Hainan indigenous pigs while preserving their genetic affinity to Luchuan pigs. Historical cross-regional trade further promoted limited gene flow, maintaining genetic similarity between the 2 populations. This affinity confirms Hainan pigs’ southern Chinese origin and provides molecular evidence for historical human–animal migration between Hainan and the mainland, consistent with studies linking South China indigenous pigs (including Hainan and Luchuan) to a distinct subclade driven by historical gene flow and geographical proximity [35].
Based on the core genetic characteristics revealed in this study—high overall diversity, distinct population differentiation, and extremely low genetic diversity in Lingao pigs—we propose targeted conservation strategies for Hainan indigenous pig germplasm: prioritize closed-core breeding of Duntou and Wuzhishan pigs to preserve their unique genetic integrity, restore Lingao pigs’ genetic variation via controlled crossbreeding and population expansion, optimize moderate gene flow between closely related breeds (e.g., Wuzhishan and Tunchang pigs, Fst = 0.12082), and establish a germplasm bank by cryopreserving resources from all 7 populations.
Additional clustering showed partial Chinese indigenous pig breeds forming geographically associated subclades, while some breeds deviated from traditional geographical classifications—consistent with a study reporting discrepancies between phylogenetic trees and traditional breed classifications [36]. These findings suggest that increased modern transportation-facilitated migration and genetic exchange have weakened correlations between breed clustering and geography, as well as between the genetic and geographical distances in some local breeds.
This study provides baseline genetic data for conserving Hainan indigenous pig germplasm. Based on core findings (high overall diversity, distinct population differentiation, and extremely low diversity in Lingao pigs), we propose targeted conservation strategies: prioritize closed-core breeding of Duntou and Wuzhishan pigs, restore Lingao pigs’ genetic variation via controlled crossbreeding and population expansion, optimize moderate gene flow between closely related breeds (e.g., Wuzhishan and Tunchang pigs, Fst = 0.12082), and establish a germplasm bank by cryopreserving resources from all 7 populations.
4.3. Limitations and Future Directions
This study has several limitations: (1) 29% mtDNA sample loss may introduce subtle selection bias; (2) 11× sequencing depth may miss rare heteroplasmies; (3) LG’s small sample size (n = 6) restricts complex genetic analyses; (4) lack of a priori power analysis for sample size design. Future research should: optimize DNA extraction to reduce sample loss, increase mtDNA sequencing depth to ≥ 30×, expand LG’s sample size via conservation breeding, incorporate a priori power analysis, and integrate whole-genome sequencing to explore genome-wide variation. These improvements will strengthen conservation and utilization insights for Hainan indigenous pigs.
5. Conclusions
5.1. Genetic Characteristics and Phylogenetic Status
This study is the first to comprehensively assess the genetic diversity, population structure, and phylogenetic relationships of five Hainan indigenous pig breeds (seven populations) by integrating 17 FAO-recommended microsatellite markers and mtDNA D-loop sequencing—filling the gap of single-marker analysis in previous research on these breeds. The combined approach revealed that Hainan indigenous pigs exhibit overall high genetic diversity (mean observed heterozygosity Ho = 0.736, total mtDNA haplotype diversity Hd = 0.688), with no signs of inbreeding depression (likely due to historical gene flow and non-random mating). The 17 microsatellite loci proved highly informative (mean polymorphism information content PIC = 0.6814), with 15 showing high polymorphism (PIC > 0.6).
Population structure analyses (UPGMA, PCoA, STRUCTURE) consistently divided the seven populations into two distinct genetic clusters: three Duntou pig subpopulations form a genetically unique lineage, while Wuzhishan (WZS), Wenchang (WC), Lingao (LG), and Tunchang (TC) pigs cluster together. In total, 87.32% of total genetic variation originates within populations (AMOVA result), with significant genetic differentiation between Duntou pigs and other breeds (pairwise), confirming Duntou pigs as a distinct germplasm resource.
mtDNA D-loop analysis identified a “high haplotype diversity but low nucleotide diversity” pattern, which reflects historical population expansion with retained haplotype mutations but limited recent genetic exchange. This pattern guides conservation priorities: breeds with low nucleotide diversity (e.g., TC, π = 0.00083) require measures to enhance genetic variation, while LG exhibits extremely low genetic diversity (Hd = 0.000, π = 0.000) due to historical bottlenecks, warranting urgent intervention. Phylogenetic analysis confirms Hainan pigs form an independent subclade within Chinese indigenous pigs, with the closest affinity to Luchuan pigs—supporting shared maternal ancestry and historical gene flow between Hainan and the Guangdong–Guangxi region, and no exotic genetic admixture.
5.2. Implications for Conservation and Utilization of Germplasm Resources
Based on the core findings, we propose the following prioritized conservation recommendations (ranked by urgency/importance): (1) Expand the population size to ≥20 individuals (target sample size for robust genetic analysis) and introduce unrelated individuals from genetically close breeds (e.g., WC) to restore genetic variation, given its complete monomorphism (Hd = 0.000, π = 0.000). (2) Maintain a closed breeding system for Duntou subpopulations (especially DT-DZ with the highest heterozygosity) to preserve their unique genetic lineage. Implement breed certification programs and standardized breeding records to prevent genetic introgression from foreign commercial breeds. (3) Promote moderate gene flow between WZS and TC—selected for their low genetic differentiation (Fst = 0.12082) and complementary genetic traits—to increase nucleotide diversity and mitigate inbreeding risk. (4) Establish a dynamic genetic database for all Hainan indigenous pig breeds, with regular assessments using microsatellite and mtDNA markers to track changes in diversity and population structure.
5.3. Future Research Directions
Future studies on Hainan indigenous pigs should expand sample sizes (≥20 for Lingao pigs, ≥30 for others) and integrate whole-genome sequencing, functional genomics, and ancient DNA analyses to decipher tropical adaptation-associated genes and drivers of population differentiation.
In conclusion, this study provides comprehensive molecular evidence for Hainan indigenous pigs’ genetic characteristics, clarifying their evolutionary history and informing targeted conservation strategies. Future research should expand sample sizes and integrate whole-genome sequencing to explore trait-related genetic mechanisms, thereby supporting genetic improvement and industrial application.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Wang C.F. Zeng Y.Q. Genetic diversity and conservation utilization of livestock and poultry breed resources Contemp. Anim. Husb.200114143(In Chinese)1002-2996
- 2Zeng J.L. Peng C.E. Gao H. Cao D.R. Wang Z.G. Li K. Study on genetic diversity and conservation effect evaluation of Hu sheep Chin. J. Anim. Sci.202359155160(In Chinese)10.19556/j.0258-7033.20220324-03 · doi ↗
- 3Yan J. Chen J. Zhao S. Chen J. Zhang L. Whole genome resequencing reveals genetic relationships and differences between three types of Hainan local pig breeds Front. Vet. Sci.202512154432110.3389/fvets.2025.154432140520431 PMC 12164305 · doi ↗ · pubmed ↗
- 4Saddoud Debbabi O. Rah ani Mnasri S. Ben Amar F. Ben Naceur M. Montemurro C. Miazzi M.M. Applications of Microsatellite Markers for the Characterization of Olive Genetic Resources of Tunisia Genes 20211228610.3390/genes 1202028633670559 PMC 7922852 · doi ↗ · pubmed ↗
- 5Archibald A.L. Haley C.S. Brown J.F. Couperwhite S. Mc Queen H.A. Nicholson D. Coppieters W. Van de Weghe A. Stratil A. WinterøA.K. The Pi G Ma P consortium linkage map of the pig (Sus scrofa)Mamm. Genome 1995615717510.1007/BF 002930087749223 · doi ↗ · pubmed ↗
- 6Dietrich W.F. Miller J.C. Steen R.G. Merchant M. Damron D. Nahf R. Gross A. Joyce D.C. Wessel M. Dredge R.D. A genetic map of the mouse with 4006 simple sequence length polymorphisms Nat. Genet.1994722024510.1038/ng 0694 supp-2207920646 · doi ↗ · pubmed ↗
- 7Lan D. Hu Y. Zhu Q. Liu Y. Mitochondrial DNA study in domestic chicken Mitochondrial DNA A 201728252910.3109/19401736.2015.110652626680506 · doi ↗ · pubmed ↗
- 8Vergara A.M.C. Martínez A.M. Bermejo J.V.D. Macri M. Nájera P.R.A. Duchi N.A.D. Vargas P.A.T. A Matrilineal Study on the Origin and Genetic Relations of the Ecuadorian Pillareño Creole Pig Population through D-Loop Mitochondrial DNA Analysis Animals 202111332210.3390/ani 1111332234828053 PMC 8614550 · doi ↗ · pubmed ↗
