Genomic Evaluation of the Genetic Structure and Analysis of Selective Evolutionary Signatures of Xupu Goose

Kairui Zhu; Zhenkang Ai; Yuchun Cai; Yonghao Li; Yuhang Cheng; Yang Zhang; Wenming Zhao; Guohong Chen

PMC · DOI:10.3390/biology15060479·March 17, 2026

Genomic Evaluation of the Genetic Structure and Analysis of Selective Evolutionary Signatures of Xupu Goose

Kairui Zhu, Zhenkang Ai, Yuchun Cai, Yonghao Li, Yuhang Cheng, Yang Zhang, Wenming Zhao, Guohong Chen

PDF

Open Access

TL;DR

This study uses whole-genome sequencing to analyze the genetic diversity and evolutionary traits of the Xupu goose, a Chinese breed at risk of decline.

Contribution

The study provides a genomic foundation for conservation and breeding by identifying selection signatures and candidate genes linked to fatty liver and muscle traits.

Findings

01

Xupu goose shows moderate genetic diversity but signs of historical inbreeding.

02

Candidate genes related to lipid metabolism (ACSS2, ACSS3, PECR) and muscle development (CMYA5, MTPN, LEPR) were identified.

03

Conservation efforts have prevented recent inbreeding despite lineage differentiation.

Abstract

The Xupu goose is an economically valuable indigenous breed in China, celebrated for its large body size and exceptional fatty liver capacity. However, as an “at-risk” population, it requires urgent conservation efforts. In this study, we conducted whole-genome sequencing on 15 Xupu geese to evaluate their genetic diversity and uncover the genomic basis of their unique traits. We found that while the breed retains a moderate genetic reservoir, it carries a legacy of historical inbreeding driven by past population bottlenecks. Importantly, our results indicate that current conservation practices have successfully prevented severe recent inbreeding, despite the natural emergence of distinct family lineages. Furthermore, we identified key candidate genes strongly linked to lipid metabolism and muscle development. These findings provide crucial scientific guidance for optimizing future…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Figures3

Click any figure to enlarge with its caption.

Funding1

—National Key Research and Development Program of China

Keywords

Xupu goosewhole-genome resequencingpopulation genetic structureselective sweepsIHS selection signalsgenetic diversity

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenetic and phenotypic traits in livestock · Genetic Mapping and Diversity in Plants and Animals · Genetic diversity and population structure

Full text

1. Introduction

China’s vast territory and diverse ecological types have nurtured abundant indigenous goose genetic resources with distinct characteristics [1,2]. These indigenous breeds are widely distributed across China and have developed strong adaptability to complex local environments through long-term natural domestication and artificial selection [3]. Among them, the Xupu goose, originating from the hilly regions of Xupu County in Hunan Province and recognized as one of the “Three Famous Geese of China”, is renowned for its large body size, exceptional fatty liver performance, high-quality down, and tolerance to coarse fodder. Studies have shown that its meat is rich in unsaturated fatty acids, and the average weight of its fatty liver can exceed 606 g [4]. Furthermore, conservation assessments by Chen et al. classify the Xupu goose as “at-risk” among Chinese indigenous waterfowl resources [5]. In recognition of its exceptional breed characteristics and urgent conservation value, the breed has been accorded high-level national protection status. Specifically, the Xupu goose was inscribed in the National List of Livestock and Poultry Genetic Resources for Conservation in February 2014 and was subsequently designated as a National Geographical Indication Protected Product in December of that year.

Whole-genome resequencing (WGS), relying on reference genomes, utilizes high-throughput technology to accurately identify single-nucleotide polymorphisms (SNPs), insertions/deletions (Indels), and structural variations (SVs) within the genome [6,7]. These genetic variation data are instrumental in revealing the genetic diversity, selective evolutionary signals, and domestication processes of goose populations [8,9]. In studies of breed origins, Wen et al. [10] confirmed the dual-origin model of domestic geese—originating from Swan Geese (Anser cygnoides) and Greylag Geese (Anser anser)—through resequencing analysis of domestic geese and their wild ancestors, and elucidated the impact of gene introgression, such as IGF-1, on body size evolution. Regarding phenotypic trait dissection, Ren et al. [11] identified 26 plumage color regulatory genes, including KITLG, using Pool-Seq technology. Sun et al. [12] mapped 88 SNP loci significantly associated with body size through GWAS, revealing the regulatory mechanisms of genes like THADA in skeletal development. For reproductive traits, Zhao et al. [13] screened 107 egg-production-related candidate genes and found them significantly enriched in signaling pathways such as PI3K-Akt. In summary, whole-genome data serves not only as the cornerstone for genetic resource conservation but also as a key driver for transitioning the goose industry toward molecular precision breeding.

The unique lineage of the Xupu goose is deeply rooted in its geographic and cultural history. As documented in the 1869 edition of the Xupu County Annals, the breed emerged through over a century of localized, closed-herd breeding, with residents rarely transferring populations outside the hilly regions of Hunan Province. While this unique breeding history has shaped its superior breed characteristics, it also exposes the population to the potential risk of reduced genetic diversity. However, a systematic genomic evaluation of the Xupu goose—particularly regarding its genetic diversity, inbreeding levels (ROH), and selective signatures—remains insufficient. This study aims to conduct a whole-genome resequencing analysis of Xupu geese to deeply explore their genetic variation characteristics, thereby providing systematic theoretical support for the conservation and molecular breeding of this breed.

2. Materials and Methods

2.1. Ethics Statement

To ensure the highest level of Animal Ethics and welfare, this study was conducted under the rigorous oversight of the committee at Yangzhou University. All protocols were officially authorized under Permit Number 2023004742, ensuring that every step of the research met established safety and humanitarian guidelines. Beyond institutional approval, the project maintained full Regulatory Compliance with the “Regulations on the Administration of Laboratory Animal Affairs” (Yangzhou University, 2012) and “the specific Management Standards for Experimental Practices” (Jiangsu, China, 2008) established in Jiangsu, China. By integrating these legal frameworks, the study guarantees that all animal management procedures were performed with professional integrity and in accordance with modern scientific standards.

2.2. Experimental Animals

In this genetic study, male Xupu geese were obtained from the National Waterfowl Gene Bank (Taizhou, China), where the conservation population is maintained under a closed management system utilizing rotational mating. To maximize breed representativeness and minimize kinship bias, experimental individuals were randomly selected from distinct, unrelated families based on comprehensive pedigree records. Venous blood samples (2 mL) were collected from the brachial vein of 15 male geese (90 days old) using sterile heparinized syringes. Blood collection was performed by licensed veterinarians. To ensure animal welfare, geese were gently restrained using a calm, manual holding technique to minimize stress. No anesthesia was required for the minimally invasive wing vein puncture, and hemostasis was immediately confirmed after sampling to prevent infection. Collected blood samples were processed for genomic DNA extraction and subsequent genetic analyses. Prior to extraction, samples were temporarily stored at −20 °C. Following extraction, the purified genomic DNA was stored at −80 °C to preserve integrity for downstream sequencing.

2.3. Genomic DNA Extraction and Quality Assessment

Genomic DNA was extracted from blood samples using a DNA extraction kit (DP304, Tiangen, China). The integrity and fragmentation of the extracted genomic DNA were verified using 1% agarose gel electrophoresis, and the precise concentration was determined using a NanoDrop-2000 micro-spectrophotometer (Menlo Park, CA, USA) to ensure all samples met the stringent requirements for library construction and sequencing. Only samples meeting stringent quality criteria (concentration > 200 ng/µL; OD260/280: 1.75–2.0) were selected for library construction and sequencing. Qualified DNA samples were shipped to Nanjing Jisihuiyuan Biotechnology Co., Ltd. (Nanjing, China) for resequencing library construction and sequenced on the Illumina NovaSeq 6000 platform. The Anser cygnoides (Swan goose) reference genome (ASM4018256v1; NCBI Accession: GCF_040182565.1) was utilized for read mapping and downstream genomic analyses. SNP calling was performed using the Genome Analysis Toolkit (GATK). After stringent filtering, the obtained SNP markers were utilized for haplotype phasing and integrated Haplotype Score (iHS) analysis [14].

2.4. Data Quality Control and Variant Calling

Raw data were acquired using a high-throughput sequencing platform (Illumina NovaSeq 6000, San Diego, CA, USA) and converted into raw reads. Subsequently, the sequencing reads were aligned to the reference genome using BWA. On this basis, PCR duplicates were removed using Picard tools (V2.17.0) (Mark Duplicates) to eliminate the effects of amplification bias. Subsequent variant detection was primarily implemented using the GATK (v4.1.4.1) toolkit. Specifically, variant calling was first performed to identify candidate SNPs and InDel loci, followed by Variant Quality Score Recalibration and stringent data filtering to construct the final set of high-confidence SNP loci.

2.5. Analysis of Population Genetic Diversity, Linkage Disequilibrium (LD), and Runs of Homozygosity (ROH)

To conduct an in-depth analysis of the genetic structure of the Xupu goose population, the --het and --freq functions in PLINK 1.9 software were employed post-filtering to accurately calculate key indices of genetic variation, including expected heterozygosity (H_E_) and observed heterozygosity (H_O_), which were subsequently used to assess the inbreeding coefficient (F) and intra-population genetic distance (D_st_) [15].

Population genetic structure is a crucial approach for investigating internal evolutionary processes and phylogenetic relationships within a species. The top 10 eigenvalues and eigenvectors were calculated using the --pca parameter in PLINK v1.9 software. Subsequently, the first two principal components (PC1 and PC2) were visualized using the ggplot2 package in R v4.4.0 to explore the clustering relationships and genetic consistency within the conserved population.

The characterization of population genetic structure is an essential approach for elucidating intraspecific evolutionary dynamics and phylogenetic relationships. To this end, Principal Component Analysis (PCA) was conducted using PLINK v1.9 (with the --pca parameter) to extract the top 10 principal components. Subsequently, the first two principal components (PC1 and PC2) were visualized using the ggplot2 package in R v4.4.0 to assess spatial clustering patterns and genetic homogeneity within the conserved population.

Linkage Disequilibrium (LD) analysis was performed using PopLDdecay (https://github.com/BGI-shenzhen/PopLDdecay, accessed on 14 February 2026) with the following specific parameters: -MaxDist 500, -Het 0.1, -Miss 0.3, and -OutPairLD 5. The chromosomal distribution of LD was visualized via LD decay plots, which characterize the relationship between the LD decay rate and physical or genetic distance [16].

To identify Runs of Homozygosity (ROH) within the Xupu goose genome, a systematic scan of the autosomes was performed using PLINK V1.9 software [17,18]. The detection process employed a sliding window approach, with stringent filtering parameters established based on the studies by Yu et al. [19] and Sun et al. [20] to ensure the reliability of the results. The detection parameters were strictly defined as follows: a sliding window of 50 SNPs was employed (--homozyg-window-snp 50), allowing for a maximum of one heterozygous genotype (--homozyg-window-het 1) and five missing genotypes (--homozyg-window-missing 5) per window, with a threshold of 0.05 (--homozyg-window-threshold 0.05). To be classified as an ROH, a segment was required to span at least 100 kb (--homozyg-kb 100), contain a minimum of 10 SNPs (--homozyg-snp 10), and maintain a minimum density of one SNP per 10 kb (--homozyg-density 10), with a maximum gap of 100 kb between adjacent markers (--homozyg-gap 100). Statistical analyses were conducted on the frequency, length, and distribution of ROH within the Xupu goose population. The genomic inbreeding coefficient based on F_ROH_ (was calculated using the following formula [21]:

[eqn]

where $[eqn]$ represents the total length of all ROH segments detected in each individual, and $[eqn]$ is the total length of the autosomal genome covered by the genotype data.

2.6. Detection of Selection Signatures Using iHS

To identify genomic regions under selection (significant signatures) within the Xupu goose population, the Integrated Haplotype Score (iHS) method was employed. The standardized iHS score is typically calculated as follows [22]:

[eqn]

where $[eqn]$ and $[eqn]$ represent the integrated Extended Haplotype Homozygosity (EHH)scores for the ancestral and derived core alleles, respectively. $[eqn]$ and $[eqn]$ denote the expected value (mean) and standard deviation calculated within allele frequency bin p. iHS scores were calculated and visualized as Manhattan plots using the rehh package (v3.2.1) in R. Candidate genes located within genomic regions exhibiting significant selection signatures were identified based on a threshold of |iHS| > 2 (p < 0.05) to filter potential false positives. Subsequently, functional enrichment analyses, including Gene Ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways, were performed using the Metascape platform [23]. Statistically significant terms were identified using a Benjamini–Hochberg corrected p-value threshold of < 0.05.

3. Results

3.1. Statistics and Quality Assessment of Sample Sequencing Data

In this study, whole-genome resequencing was performed on the Xupu goose, a distinctive indigenous breed, followed by a rigorous quality control pipeline for the acquired sequencing data. Data analysis results indicated that the average raw data volume per sample reached 6.87 Gb, with an average of 22,896,147 raw reads (Table 1). After stringent filtering, the average effective data volume was 6.79 Gb, with an average of 22,660,317 clean reads, yielding a high effective data utilization rate of 98.97% (Figure 1A). Regarding sequencing quality assessment, the average values for Q20 and Q30 reached 97.23% and 91.32%, respectively, with an average GC content of 43.77%. Furthermore, the GC content distribution across all samples was balanced, ranging from 43.31% to 44.52%. In summary, the sequencing data exhibited excellent accuracy and stability, fully meeting the technical requirements for genome-wide SNP variant detection and subsequent population genetics analysis, thus providing a solid data foundation.

3.2. Analysis of SNP Variant Detection and Annotation Results

Whole-genome SNP variant detection was performed on all samples in this study. Statistical results indicate some fluctuation in SNP counts across samples (Table 2), with an average of 4,269,107, ranging from 3,877,534 in XP.15 to 4,629,469 in XP.12. These inter-individual differences may stem from subtle variations in sequencing depth or the inherent heterogeneity of genetic backgrounds.

In terms of heterozygosity analysis, the average number of heterozygous SNPs (2,312,216) was significantly higher than that of homozygous SNPs 1,959,778, with an average Heterozygous variant proportion of approximately 54% (Figure 1A). Inter-individual analysis showed that sample XP.12 had the highest number of heterozygous loci (2,661,174), suggesting strong individual genetic polymorphism; conversely, sample XP.15 had a relatively lower number (2,008,960), suggesting this individual may have accumulated more homozygous segments.

Regarding variant types, the transition/transversion (Ti/Tv) ratio across all samples showed high consistency, ranging between 2.47 and 2.50 (mean 2.49). In molecular evolution, due to the cytosine methylation deamination effect, the biochemical probability of transitions is significantly higher than that of transversions. The ratio of 2.49 obtained in this study aligns with the typical characteristics of most vertebrate genomes, usually between 2.0 and 2.5, indicating not only the high accuracy and reliability of the variant detection but also the exclusion of potential systematic bias during sequencing. Further analysis revealed that C:G > T:A transitions were the most abundant, followed by T:A > C:G, while T:A > A:T transversions were the least frequent (Figure 1B).

SNP functional annotation results revealed the distribution patterns of variants across different genomic functional regions (Table 3). The vast majority of SNPs were enriched in non-coding regions: intronic regions contained an average of 1,821,812 SNPs (accounting for 56.4%), and intergenic regions contained an average of 1,223,203 SNPs (accounting for 37.9%). This aligns with the “Neutral Theory” of molecular evolution, which posits that non-coding regions are subject to less selection pressure and possess a greater capacity to tolerate variations. In gene regulation-related regions, upstream and downstream areas contained an average of 66,370 and 71,264 SNPs, respectively, suggesting they may participate in gene expression regulation by influencing transcription initiation or post-transcriptional modifications; meanwhile, the average number of SNPs in upstream/downstream overlapping regions was 10,517. In coding regions, synonymous mutations averaged 35,063; while not altering amino acid sequences, they may affect translation efficiency. Notably, non-synonymous mutations averaged 12,458. Although non-synonymous mutations account for a very small proportion, because they directly result in amino acid changes, they are often eliminated by strong purifying selection. Therefore, these retained non-synonymous mutations are often key potential factors driving phenotypic variation (such as growth rate and meat quality traits) and disease susceptibility, serving as important candidate loci for subsequent functional gene mining.

3.3. Demographic History and Genetic Status

In this study, the average total length of ROH in the Xupu goose genome was 230.2 Mb, ranging from 195.1 to 272.9 Mb. Analysis of the segment size distribution (Figure 1E) revealed that short fragments (0.1–0.2 Mb) were the most abundant, totaling 7056 segments and accounting for 54.13% of the total, whereas long segments (>1 Mb) comprised only 1.76% (Supplementary Table S1). This dominance of short segments suggests the influence of ancient common ancestors or historical bottleneck effects rather than recent inbreeding—a conclusion further supported by the rapid decrease in frequency as ROH length increases. Consequently, the calculated average genomic inbreeding coefficient ( $[eqn]$ ) was 0.204 (range: 0.173–0.241) (Figure 1D) (Supplementary Table S2). Collectively, these results demonstrate that while the Xupu goose population faces moderate inbreeding pressure. This moderate level of inbreeding is consistent with the breed’s history of closed-population conservation. Given the limitations of historical pedigree depth, the $[eqn]$ was calculated to provide a precise estimate of individual autozygosity and population inbreeding levels. Regarding heterozygosity and inbreeding levels (Figure 2B), the observed heterozygosity ( $[eqn]$ ) ranged from 0.194 to 0.249 (mean 0.217), whereas the expected heterozygosity ( $[eqn]$ ) remained stable, varying narrowly between 0.297 and 0.298 (mean 0.298). The average $[eqn]$ was significantly higher than the average $[eqn]$ . Furthermore, linear regression analysis (Figure 2D) revealed a distinct negative correlation between $[eqn]$ and F. Collectively, these findings indicate a degree of heterozygote deficiency and inbreeding within the population.

The genomic relationship matrix (G-matrix) heatmap (Figure 2A) visualizes the kinship structure among the 15 sequenced individuals. Visual inspection of the kinship heatmap revealed a generally moderate background relatedness across the population, characterized by predominant dark blue coloration. However, distinct clusters of elevated genomic similarity (lighter hues) were observed among specific individuals. To further validate this observed substructure, a Principal Component Analysis (PCA) was conducted (Supplementary Figure S1). The PCA results strikingly corroborated the G-matrix patterns, with the first principal component (PC1) accounting for an exceptionally high 66.23% of the total genetic variance. Rather than forming a single panmictic cloud, the 15 individuals exhibited clear stratification along the PC1 axis, aggregating into corresponding sub-clusters.

The decay pattern of Linkage Disequilibrium (LD) reflects the population’s recombination history and serves as a critical determinant of the resolution of association mapping. Analysis of LD decay (Figure 2C) reveals that the linkage disequilibrium coefficient (r^2^) exhibits a rapid decline as physical distance increases. Notably, the curve drops precipitously within the short-range interval of 0–100 kb, after which it plateaus and stabilizes at a relatively low level.

3.4. Genome-Wide Detection of Selection Signatures Using iHS

To investigate signatures of recent positive selection associated with the domestication of the Xupu goose, a genome-wide scan was performed using the Integrated Haplotype Score (iHS) method. First, the raw iHS values were standardized to facilitate comparison. The resulting genome-wide frequency distribution (Figure 3B) revealed that the bulk of the observed iHS values (blue curve) aligned closely with the theoretical standard normal distribution (red curve), exhibiting a typical bell-shaped pattern. This distributional pattern is critical for establishing statistical significance, as extreme values serve as indicators of potential targets of selection. Subsequently, a Manhattan plot (Figure 3A) was constructed to visualize the spatial distribution of these candidate selection signatures. Using a significance threshold of |iHS| > 2, multiple genomic regions enriched for selection signals were identified. Notably, distinct signal peaks were observed on autosomes 1, 2, and 4 (Chr1, Chr2, Chr4), and on the sex chromosome (ChrZ).

3.5. Gene Annotation and GO/KEGG Pathway Enrichment Analysis

To further elucidate the biological functions of these selected regions, the identified significant SNPs were mapped to the reference genome for gene annotation. A total of 3235 potential candidate genes were identified. Subsequently, functional enrichment analysis was performed on these genes using the Gene Ontology (GO) and KEGG databases.

GO functional enrichment analysis revealed that the identified candidate genes were significantly enriched across three main categories (Figure 3C): Biological Process (BP), Cellular Component (CC), and Molecular Function (MF). The top five most significantly enriched GO terms within each category are detailed (Table 4). In the BP category, the genes were predominantly involved in DNA recombination, smooth muscle contraction, and fatty acid metabolic process, along with other processes such as axonemal dynein complex assembly, transmembrane transport, and cell growth. Regarding the CC category, the candidate gene products were mainly localized in the cell junction, postsynaptic membrane, and cilium, while also showing significant enrichment in muscle-related components like myofibril and myosin filament. Within the MF category, ATP binding exhibited the highest level of significance and the largest gene count, followed by protein serine/threonine kinase activity and motor activity. Furthermore, enrichment in terms such as ionotropic glutamate receptor activity, sphingolipid transporter activity, and structural constituent of muscle. The Xupu goose population possesses specific genetic characteristics related to fat deposition, muscle development, and neural regulation. KEGG pathway enrichment analysis revealed (Figure 3D) that the candidate genes under selection were primarily involved in Glycosphingolipid biosynthesis, Neuroactive ligand-receptor interaction, Selenocompound metabolism, and ABC transporters. The top ten most significantly enriched pathways are listed in detail (Table 5).

KOG functional classification results revealed that the candidate genes were distributed across all 25 functional categories (Figure 3E). Among them, signal transduction mechanisms, General function prediction only, and Function unknown represented the three most enriched categories. Notably, the number of genes involved in Signal transduction mechanisms reached 75, indicating their predominant role in the identified selection signals. In addition, a significant number of genes were associated with Lipid transport and metabolism, Posttranslational modification, protein turnover, chaperones, and the cytoskeleton. These findings suggest that the evolutionary selection process in this population is highly concentrated on pathways related to cellular signaling, protein regulation, and structural development.

Protein–protein interaction network analysis revealed tight interactions among proteins encoded by the candidate genes (Figure 3F), forming multiple distinct functional modules. Within the network structure, the GRIA gene family exhibited high connectivity, constituting one of the core modules. This finding aligns perfectly with the significant GO term “ionotropic glutamate receptor activity”, suggesting that neural signaling and endocrine regulation play pivotal roles in defining the breed’s characteristics. Concurrently, the ATR protein emerged as another major hub node, interacting closely with FANCA and CHEK2, thereby forming a functional sub-network governing DNA repair and cell cycle surveillance.

4. Discussion

4.1. Genome-Wide Variation Patterns

Whole-genome resequencing has emerged as a pivotal tool for characterizing the genetic structure of livestock populations and evaluating germplasm resources [24]. In this study, we leveraged high-throughput WGS data to investigate the genetic diversity and signatures of selection within the Xupu goose population. It is important to note that the average sequencing depth of 7× may limit the detection of rare variants. However, stringent filtering criteria were applied to ensure that the common SNPs used for population structure and selection signature analyses are high-confidence variants. Post-quality control (QC) analysis revealed that the average GC content (~45%) and mapping rate (~98%) were consistent with those reported for other goose breeds [11,20], ensuring the reliability and accuracy of the data for subsequent analyses. Single-Nucleotide Polymorphisms (SNPs) serve as primary markers for assessing genetic diversity and are crucial for reconstructing evolutionary trajectories [25]. In the present study, statistical analysis identified an average of approximately 4.272 million SNPs per individual in the Xupu goose samples. The Ti/Tv ratio of 2.49 observed in this study falls well within the typical range reported for vertebrate genomes. This consistency not only attests to the high accuracy and reliability of the variant detection results, but also effectively rules out potential systematic biases introduced during the sequencing process. The genomic distribution of SNPs is consistent with the findings of Joanna Grzegorczyk et al. [26], being primarily concentrated in intergenic and intronic regions. Analysis of zygosity distribution reveals that the average number of heterozygous sites in Xupu goose samples generally exceeds that of homozygous sites; the observed proportion of heterozygous variants of 54% is within a reasonable range. Typically, long-term high-intensity artificial selection leads to a rapid increase in population homozygosity [27,28]; however, as a superior indigenous breed, the relatively high heterozygosity of the Xupu goose suggests that the population has not experienced a severe genetic bottleneck, thereby possessing potential for environmental adaptation and breeding plasticity. Although non-synonymous mutations account for only a minute proportion of variations, they are often subject to strong purifying selection and eliminated because such mutations can directly lead to amino acid alterations [29].

The length and frequency of ROH reflect individual relatedness; longer and more frequent ROH segments indicate a higher probability of consanguinity, with different segment lengths corresponding to ancient and recent inbreeding histories, respectively [30,31]. This study revealed that ROH in the Xupu goose genome were predominantly short segments (0.1–0.2 Mb), with their frequency decreasing rapidly as segment length increased. The extremely low proportion of long ROH segments (>1 Mb) was observed in the Xupu goose (accounting for only 1.76%). The findings suggest that the observed genomic homozygosity is likely attributable to ancient common ancestry or historical population bottlenecks [32].

Genetic diversity is the core for evaluating the value of livestock and poultry germplasm resources, with expected ( $[eqn]$ ) and observed ( $[eqn]$ ) heterozygosity serving as key indicators for measuring population variation levels [33,34]. Meanwhile, the genomic inbreeding coefficient ( $[eqn]$ ) calculated based on runs of homozygosity (ROH) sensitively reflects the loss of population heterozygosity and the depletion trend of genetic variation [35]. For context, Huang et al. [36] reported an average $[eqn]$ of 0.261 and $[eqn]$ of 0.223 in the Shitou goose population. Similarly, Zhang et al. [37] observed an average $[eqn]$ of 0.345 alongside a notably elevated $[eqn]$ of 0.352 in the Landes breed. In contrast, the Xupu goose population in this study exhibited a lower average $[eqn]$ (0.217), while $[eqn]$ remained at a higher level (0.298), with an average $[eqn]$ of 0.204. The genomic inbreeding level of the Xupu goose is comparable to that of other conserved breeds [38], yet remains significantly lower than that observed in the intensively selected Landes breed. Notably, although an $[eqn]$ of 0.204 suggests moderate inbreeding pressure, its genomic landscape is characterized by the absolute predominance of short ROH segments and a low proportion of non-synonymous mutations ( $[eqn]$ = 0.313). This unique combination of “high $[eqn]$ , predominant short segments, and low $[eqn]$ indicates that the population’s homozygosity primarily stems from long-term historical genetic drift or ancient ancestral bottleneck effects, rather than recent high-intensity inbreeding. This finding aligns with the “managed balanced population” concept proposed by Cendron et al. [39], which maintains breed characteristics while effectively mitigating the risk of recent inbreeding depression. Furthermore, the gap between the high $[eqn]$ level and $[eqn]$ in Xupu geese reveals a unique “heterozygosity potential space”, indicating for future breeding improvement. As stated by Groeneveld et al. [40], indigenous breeds often experience a phased increase in inbreeding coefficients during conservation breeding, which not only elucidates the formation mechanism of the current moderate inbreeding level in Xupu geese but also provides a scientific basis for subsequent marker-assisted selection and sustainable utilization of germplasm resources.

To further elucidate the genetic architecture underlying these metrics, population structure was assessed using a genomic relationship matrix (G-matrix) and Principal Component Analysis (PCA). The G-matrix heatmap revealed specific clusters of elevated genomic similarity. This structural stratification was strongly corroborated by the PCA projection, where the first principal component (PC1) accounted for a substantial 66.23% of the total genetic variance, grouping individuals into distinct sub-clusters. This dual line of evidence confirms the presence of cryptic family-lineage substructures within the ex situ conservation flock. Complementing these findings, Linkage Disequilibrium (LD) decay analysis demonstrated a rapid decline in $[eqn]$ over short physical distances—a pattern highly concordant with the predominance of short ROH segments. This reinforces the conclusion that while distinct family substructures exist, the overarching inbreeding signature primarily reflects historical genetic drift rather than recent, intensive consanguineous mating.

4.2. Genomic Signatures of Selection and Biological Functions

In this study, GO and KEGG enrichment analyses of genomic regions under selection, filtered by specific thresholds, revealed that genes such as GRIA1, GRIA4, LEPR and GABRA were significantly enriched in the “Neuroactive ligand-receptor interaction” pathway. AMPA receptors are tetrameric ion channels formed by the combinatorial assembly of GRIA1, GRIA2, GRIA3, and GRIA4 subunits; they function as the primary mediators of fast excitatory neurotransmission in the central nervous system (CNS), facilitating rapid synaptic signal transduction [41]. Furthermore, the trafficking and expression of AMPA receptors play pivotal roles in long-term potentiation (LTP) and long-term depression (LTD), mechanisms that are intrinsically linked to learning, memory, and environmental adaptability in animals [42]. Relevant experiments in chicks have demonstrated that the interaction between NMDA and AMPA receptors plays a significant role in the regulation of food intake [43]. The GABRA gene family encodes the primary inhibitory receptors for gamma-aminobutyric acid (GABA), which are ubiquitously expressed and play critical roles in the mammalian central nervous system (CNS) [44]; dysfunction or loss of GABRA significantly impairs neural development [45]. The leptin receptor belongs to the Class I cytokine receptor family and is primarily responsible for mediating the majority of the biological effects of leptin (LEP) in both the central nervous system and peripheral tissues [46]. Studies have demonstrated that specific haplotype blocks within the LEPR gene are significantly associated with body weight at 49 and 70 days of age, and feed intake in broiler chickens, thereby confirming its pivotal role in regulating growth and appetite in poultry [47,48]. These neural pathways likely facilitated the domestication process by modulating feeding behavior and neural regulation, thereby enabling the geese to better adapt to the environmental conditions of captivity.

Subsequently, during the identification of traits associated with fat deposition and muscle growth, the ACSS2, ACSS3, and PECR genes were found to be enriched in the fatty acid metabolism pathway. Existing studies have indicated that ACSS2 is primarily responsible for the conversion of acetate into acetyl-CoA [49]. In mouse models, ACSS2 deficiency has been shown to significantly attenuate body weight gain under high-fat diet conditions and ameliorate hepatic steatosis, confirming its role in optimizing systemic fat storage and utilization through the selective regulation of genes related to lipid metabolism [50]. Similar to ACSS2, ACSS3 is a pivotal gene in lipid metabolism regulation, functioning primarily to drive propionate catabolism by converting it into propionyl-CoA [51]. Furthermore, studies on PRKAA2 have observed that its expression is significantly upregulated in ducks as dietary energy levels increase, suggesting its potential involvement in fatty acid regulation, lipid biosynthesis, and transport processes [52]. Concurrently, other findings have demonstrated a significant correlation between this gene and meat quality traits, specifically muscle tenderness and pH value [53]. CMYA5 and MTPN, which are involved in the myosin filament pathway and the cell growth pathway, are also significant genes influencing muscle growth. MTPN promotes protein synthesis and induces cellular hypertrophy by upregulating the expression of various cardiac marker genes in cardiomyocytes. Regarding skeletal muscle, Shohei Shiraishi et al. [54] found that MTPN significantly increases the content of structural proteins in murine muscle, while a study by Wang et al. [55] demonstrated that MTPN promotes porcine myocyte differentiation and myotube hypertrophy, results which are consistent with previous reports. Concurrently, MTPN has also been identified as a crucial candidate gene regulating skeletal muscle growth and development in beef cattle. Furthermore, in studies on pork quality, CMYA5 is regarded as a key candidate gene, with research showing it is significantly correlated with drip loss and intramuscular fat content [56,57].

However, lacking a comparative control breed renders the associations of these single-population sweeps putative. Therefore, future research should prioritize expanding the sample size and conducting cross-breed comparative analyses, complemented by multi-omics approaches and functional validation to fully elucidate these complex molecular networks.

5. Conclusions

This study provides a foundational whole-genome resequencing dataset for the “protected indigenous breed” Xupu goose. Population genomics analysis indicates that while the breed retains moderate genetic potential, it currently faces significant historical inbreeding pressure. Crucially, dual evidence from the G-matrix and PCA reveals the presence of a cryptic family substructure within the ex situ conserved population. This structural stratification accurately accounts for the observed heterozygote deficiency and confirms that the population’s homozygosity stems from historical background drift and closed-herd management rather than recent extreme inbreeding. Through intra-population Integrated Haplotype Score (iHS) scanning, we identified multiple candidate genes associated with the breed’s unique phenotypic traits. Genes such as ACSS2, ACSS3 and PECR were identified as candidates for lipid metabolism and fatty liver deposition. Meanwhile, CMYA5, MTPN and LEPR were found to be associated with muscular development. While these single-population associations require future cross-breed and multi-omics validation, these foundational genomic insights provide critical resources and theoretical guidance for optimizing conservation strategies, marker-assisted selection, and sustainable breeding programs for this indispensable waterfowl resource.

Bibliography57

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Li H.-F. Chen K.-W. Yang N. Song W.-T. Tang Q.-P. Evaluation of Genetic Diversity of Chinese Native Geese Revealed by Microsatellite Markers World’s Poult. Sci. J.20076338139010.1017/S 0043933907001511 · doi ↗
2Li H.F. Zhu W.Q. Chen K.W. Xu W.J. Song W. Two Maternal Origins of Chinese Domestic Goose Poult. Sci.2011902705271010.3382/ps.2011-0142522080007 · doi ↗ · pubmed ↗
3Eda M. Itahashi Y. Kikuchi H. Sun G. Hsu K.-H. Gakuhari T. Yoneda M. Jiang L. Yang G. Nakamura S. Multiple lines of evidence of early goose domestication in a 7000-y-old rice cultivation village in the lower Yangtze River, China Proc. Natl. Acad. Sci. USA 2022119 e 211706411910.1073/pnas.211706411935254874 PMC 8944903 · doi ↗ · pubmed ↗
4Liu X. Xu M. Qu X. Guo S. Liu Y. He C. He J. Liu W. Molecular Cloning, Characterisation, and Expression Analysis of Adipocyte Fatty Acid Binding Protein Gene in Xupu Goose (Anser Cygnoides Domesticus)Br. Poult. Sci.20196065966510.1080/00071668.2019.165570931509442 · doi ↗ · pubmed ↗
5Chen G. Liu J. Xu Q. Current status and prospect of conservation and utilization of goose genetic resources in China Guide Chin. Poult.202239612
6Bentley D.R. Balasubramanian S. Swerdlow H.P. Smith G.P. Milton J. Brown C.G. Hall K.P. Evers D.J. Barnes C.L. Bignell H.R. Accurate Whole Human Genome Sequencing Using Reversible Terminator Chemistry Nature 2008456535910.1038/nature 0751718987734 PMC 2581791 · doi ↗ · pubmed ↗
71000 Genomes Project Consortium A Map of Human Genome Variation from Population Scale Sequencing Nature 20104671061107310.1038/nature 0953420981092 PMC 3042601 · doi ↗ · pubmed ↗
8Huang M. Sun J. Wang J. Ye X. Chen Z. Zhao X. Zhang K. Ma L. Xue J. Luo Y. Goose multi-omics database: A comprehensive multi-omics database for goose genomics Poult. Sci.202510410484210.1016/j.psj.2025.10484239874782 PMC 11810826 · doi ↗ · pubmed ↗