Unveiling the Genomic Landscape of Yan Goose (Anser cygnoides): Insights into Population History and Selection Signatures for Growth and Adaptation
Shangzong Qi, Zhenkang Ai, Yuchun Cai, Yang Zhang, Wenming Zhao, Guohong Chen

TL;DR
This study explores the genome of the Yan goose to understand its population history and genetic traits related to growth and adaptation.
Contribution
The study identifies genomic regions and candidate genes linked to economically important traits in the Yan goose.
Findings
Population structure analysis showed a close genetic relationship within the Yan goose population.
Genes related to growth, lipid metabolism, and immune response were identified as being under strong selection.
Functional enrichment analysis highlighted key metabolic and immune-related pathways.
Abstract
The Yan goose (Anser cygnoides) is a valuable indigenous poultry breed in China, known for its large body size, high-quality meat, and adaptability to various environments. Despite its economic importance, the genetic basis of these traits remains poorly understood. In this study, we performed whole-genome resequencing on a conserved population of Yan geese to assess their genetic diversity and evolutionary history. Our analysis reconstructed the population’s demographic trajectory, highlighting its resilience to past climatic changes. We also identified key genomic regions and candidate genes associated with growth, lipid metabolism, and immune response, which are crucial for the species’ commercial value. These findings enhance our understanding of the molecular mechanisms underlying the Yan goose’s unique characteristics and provide valuable genomic resources that can support its…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5- —National Key Research and Development Program of China
- —Jiangsu Province Modern Agriculture Key and General Projects
- —Jiangsu Provincial Seed Industry Revitalization Announcement Leading Project
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenetic Mapping and Diversity in Plants and Animals · Genetic and phenotypic traits in livestock · Animal Genetics and Reproduction
1. Introduction
The domestic goose, as one of the earliest domesticated poultry species, has a domestication history tracing back to 7000 years ago in eastern China [1]. Through natural environmental adaptation and artificial selection, the domestic goose has gradually developed a rich variety of local breeds. These breeds not only exhibit distinct characteristics in terms of body morphology and production performance but also harbor unique genetic resources and adaptive traits. As one of the countries with the richest goose genetic resources in the world, China currently includes 31 goose breeds in the “National Directory of Animal and Poultry Genetic Resources” (2021 edition), with 30 local breeds and one improved breed [2]. The Yan goose (YE, Anser cygnoides), known for its unique morphological features, excellent meat performance, and strong environmental adaptability, is a typical representative of the gray domestic goose breeds in China [3]. Specifically, the body weight of Yan geese can reach 3.5 to 4.0 kg at 70 days of age, which is significantly higher than that of other indigenous breeds such as the Wuzong goose (2.8 kg) and Baizi goose (2.6 kg), classifying it as a medium-sized meat-type breed [4]. Due to its unique germplasm characteristics and economic value, YE was included in the “National Directory of Livestock and Poultry Genetic Resources” in 2000 and was reconfirmed in 2006, becoming one of the key protected livestock and poultry genetic resources. However, with the promotion of modern intensive farming practices and the introduction of foreign breeds, many local goose breeds, including YE, face challenges such as declining population numbers, loss of genetic diversity, and increased inbreeding risks [5]. Therefore, there is an urgent need to strengthen the protection of genomic information for these species.
Genetic diversity is the cornerstone of a species’ evolutionary potential and adaptive capacity, and it is also at the core of genetic resource conservation and sustainable utilization [6]. Traditional genetic assessments often rely on pedigree records, microsatellite markers, or mitochondrial DNA fragment analysis. While these methods have provided insights into maternal lineage origins and genetic structure to some extent, their ability to resolve lineage information is limited, making it difficult to capture genomic-level variations comprehensively [7]. Wei et al. investigated the genetic diversity of the Sanzhou-Black goose and local goose breeds from Guangdong Province using mitochondrial DNA, specifically the Cytochrome C Oxidase Subunit I gene and the Displacement Loop region. While these studies revealed maternal genetic diversity in the extranuclear genome, they were unable to reflect the overall genetic background and selection history of the nuclear genome [8]. With the advancement of high-throughput sequencing technology, WGS has become a powerful tool for analyzing genetic diversity, population structure, and adaptive evolution of species. One such tool is the iHS method, a widely used statistical approach for detecting recent positive selection [9]. The iHS identifies genomic regions under strong selection in populations by comparing the extended homozygosity of ancestral versus derived alleles around core SNP sites and calculating a standardized statistic [10]. The use of selection signal detection methods to study the evolutionary and domestication history of populations plays a crucial role in uncovering potential regions under selection for important economic traits and their genetic mechanisms. This method has been successfully applied in genetic research of multiple livestock and poultry species. It has been widely used to identify candidate genes associated with economic traits in pigs [11], sheep [12], and chickens [13]. Despite significant progress in whole-genome resequencing and selection signal analysis for many local goose breeds, systematic genomic studies on YE, an important local genetic resource, remain limited.
This study aims to utilize whole-genome resequencing technology to perform high-throughput SNP detection on the conserved population of YE. By integrating genetic diversity analysis, population structure assessment, and iHS selection signal scanning, we will comprehensively reveal its genetic characteristics and germplasm potential. This research will not only provide a scientific basis for the conservation and utilization of YE germplasm resources but also offer crucial data support for future molecular breeding and functional gene discovery in geese.
2. Materials and Methods
2.1. Ethics Approval
All animal experiments were approved by the Institutional Animal Care and Use Committee of the Yangzhou University (Approval Number: 202203621). All procedures were performed in accordance with the Regulations on the Management of Laboratory Animal Affairs (Yangzhou University, 2012) and the Standards for the Management of Experimental Practices (Jiangsu, China, 2008).
2.2. Experimental Design and Facilities
This study selected the YE from the National Waterfowl Gene Bank (off-site conservation facility, Taizhou, China) as the research subject (Figure 1). The population is maintained as a closed flock for conservation breeding; however, a rotational mating system is implemented to minimize inbreeding and preserve genetic diversity. For sample collection, we utilized pedigree records from the conservation farm to guide the selection process. Individuals were randomly selected from distinct, unrelated families within the National Waterfowl Gene Bank to maximize the representation of the breed’s genetic diversity and minimize biases arising from kinship. After restraining the geese and disinfecting their wings, venous blood samples were collected from 15 male geese at 70 days of age (15♂) using a venipuncture method. The blood samples were stored in tubes containing anticoagulants and kept at −20 °C until DNA extraction.
2.3. Genomic DNA Extraction and Library Construction
Genomic DNA Extraction and Sequencing Genomic DNA was isolated using the TIANamp Genomic DNA Kit (YDP304-03, TIANGEN Biotech, Beijing, China), strictly adhering to the manufacturer’s protocol. The integrity of the DNA was assayed via 1% agarose gel electrophoresis to preclude degradation and contamination. Purity was verified using a UV spectrophotometer to ensure an A260/280 absorbance ratio of 1.8–2.0. DNA concentration was quantified using a Qubit^®^ 2.0 fluorometer (Life Technologies, Carlsbad, CA, USA), yielding a mean concentration of 227 ± 58.2 ng/µL. Library Construction and Bioinformatic Analysis. Upon obtaining high-quality genomic DNA, samples were dispatched to Jisihuiyuan Biotechnology Co., Ltd. (Nanjing, China) for library construction. Sequencing was executed on the Illumina NovaSeq 6000 platform (Illumina, San Diego, CA, USA), employing a paired-end 150 bp (PE150) strategy with an insert size of approximately 350 bp and an average sequencing depth of 6×. Subsequent to quality control of the raw reads, sequences were mapped to the reference genome using the BWA-MEM algorithm. The reference genome of the goose breed was sourced from the National Center for Biotechnology Information database, and the website is (https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_002166845.1/, accessed on 5 January 2026). The SNP calling was implemented via the Genome Analysis Toolkit [14]. Following rigorous filtering, the resultant SNP markers were utilized for haplotype phasing and integrated haplotype score analysis.
2.4. SNP Detection and Annotation
Single-nucleotide polymorphism detection was primarily conducted using the GATK v4.1 Based on the alignment results of the clean reads to the reference genome, GATK was utilized to identify SNPs and obtain the final SNP set for subsequent statistical analyses [15]. The core detection process encompassed the following steps: (1) PCR duplicates were removed from the BWA-aligned reads using Picard’s Mark Duplicates tool to mitigate their influence. (2) Variant calling, encompassing both SNPs and Indels, was performed using GATK. (3) Variant quality scores were recalibrated using GATK’s Variant Quality Score Recalibration procedure. (4) The resulting variants were subjected to rigorous filtering using GATK to retain only reliable polymorphisms. Following the initial mapping, a Bayesian approach was implemented for SNP calling in each group. Individual SNPs were detected using the “mpileup” command. To mitigate errors in SNP calling caused by misalignment and Indels, only high-quality SNPs were retained based on the following criteria: minimum coverage depth (≥4) and maximum coverage depth (≤200), root mean square (RMS) mapping quality (≥20), minimum distance between adjacent SNPs (≥5 bp), absence of an Indel within a 3 bp window, and a maximum missing rate of 50% within each group. These retained variants were subsequently annotated, and their effects predicted using SnpEff v5.0. Variant Call Format (VCF) files, compatible with community tools (https://github.com/vcftools/vcftools, accessed on 5 January 2026), served as both input and output for this process.
2.5. Characterization of Population Genetic Structure
Following the filtration process, population statistics, including SNP counts, identity-by-state (IBS) genetic distances, inbreeding coefficients (F), and linkage disequilibrium (LD), were calculated utilizing the populations module within the Stacks software suite (v2.59). The estimation of Runs of Homozygosity based on autosomal SNPs was conducted for each individual employing PLINK (v1.9). Principal component analysis (PCA) was performed using the GCTA software (v1.92.1). LD analysis was executed using PopLDdecay (https://github.com/BGI-shenzhen/PopLDdecay, accessed on 5 January 2026) with the following specific parameters: -MaxDist 500, -Het 0.1, -Miss 0.3, and -OutPairLD 5. The chromosomal distribution of LD was visualized via LD decay plots, which characterize the rate of LD decay relative to physical or genetic distance [16]. To reconstruct the demographic history, the PSMC model (v0.6.5-r67) was employed to infer historical fluctuations in effective population size (Ne) based on genomic fragments exhibiting varying densities of heterozygous sites. Furthermore, a maximum likelihood population tree was constructed using the available genetic data to delineate migration events across the phylogenetic topology (specifically for scenarios involving four or more populations).
2.6. Detection of Selection Signatures Using the iHS Method
To elucidate potential positive selection regions across the YE genome, selection signatures were assessed using the iHS method [9]. The analysis utilized SNP data generated from the whole-genome resequencing efforts. Initially, the raw SNP data underwent stringent quality control (QC). To ensure the accuracy and robustness of subsequent analyses [17], loci were filtered out if they exhibited a minor allele frequency (MAF) less than 0.05, a missing rate exceeding 10%, or a significant deviation from Hardy–Weinberg equilibrium (HWE; p < 1 × 10^−6^). The filtered high-quality data were then subjected to haplotype phase inference using Beagle v5.4 software. This inference was informed by homologous sequence alignment results from the ancestral Wild Swan goose (Anser cygnoides). Upon obtaining the phased haplotypes, the extended haplotype homozygosity (EHH) was calculated for the haplotype set carrying both the ancestral and derived alleles, with each SNP locus serving as the core reference point. Subsequently, the EHH values were integrated over distance [18] to derive the integrated haplotype homozygosity (iHH) for each respective allele, calculated according to the following formula:
Here, EHHC denotes either the ancestral or derived allele. The integration proceeds until the EHH value decays to a threshold of 0.05 or reaches the chromosomal terminus. Based on the computed iHH values for both allelic states, the unstandardized iHS statistic (uniHS) is derived as follows:
Given the intrinsic dependency of iHS values on allele frequencies, a standardization procedure was implemented to mitigate this bias. Specifically, all identified loci were stratified into 20 bins based on their derived allele frequencies (DAF). Within each frequency bin, the uniHS scores were normalized to yield the final standardized iHS statistic, calculated as follows:
where and denote the mean and standard deviation, respectively, of the uniHS values within the corresponding frequency bin. To visualize and validate the selection signatures, the distribution of iHS scores was examined, and Manhattan plots were constructed employing the rehh v3.2.1 within the R-statistical environment. Genomic regions exhibiting elevated selection signals were subsequently annotated and subjected to functional enrichment analysis. Gene Ontology (GO) functional annotation and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses were conducted for the candidate genes utilizing the metascape database [19]. Significantly enriched GO terms and KEGG pathways were identified based on a Benjamini–Hochberg-corrected p-value threshold.
3. Results
3.1. Quality Control and Assessment
3.1.1. Quality Control and Genome Alignment
In this study, we conducted whole-genome resequencing of the local goose population to characterize its genomic variations. The sequencing data exhibited high integrity and stability following rigorous quality control protocols (Table S1). The average clean data yield per sample was 7.02 Gb, with mean Q20 and Q30 scores of 97.33% and 91.04%, respectively. The GC content was uniform across samples, averaging 43.38%. Subsequent alignment to the reference genome demonstrated excellent compatibility (Table S2), achieving a mean alignment rate of 98.30% and an average sequencing depth of 6.43×. These metrics confirm that the data quality satisfies the requirements for reliable genome-wide SNP detection and population genetic analyses.
3.1.2. Identification and Annotation of SNP Variants
Genome-wide SNP variant detection was performed on the sample cohort (Table 1). The results revealed discernible variation in SNP abundance across individuals. The mean SNP count stood at 4,429,082, ranging from 4,249,848 (YE.1) to 4,868,607 (YE.5). Regarding variation types, both transitions (Ti) and transversions (Tv) were identified. The transition-to-transversion ratio (Ti/Tv) remained consistent across samples, ranging between 2.45 and 2.49 with a mean value of 2.47. This ratio conforms to the canonical patterns observed in most vertebrate species, attesting to the biological validity of the identified variants and their consistency with established genomic norms. In terms of zygosity, the average counts of heterozygous and homozygous SNPs were 2,185,710 and 2,243,372, respectively. The slight preponderance of homozygous variants over heterozygous ones aligns with the characteristic genomic stability of the YE. Notably, sample YE.5 exhibited a distinctly elevated heterozygosity count (2,517,574), potentially indicative of greater individual genetic diversity, whereas sample YE.6 displayed the lowest count (1,825,696).
3.1.3. Genomic Distribution and Functional Annotation of SNPs
Subsequent analyses were conducted based on the statistical distribution of SNP annotations across the sample cohort (Table 2). The distribution of SNPs across all samples was predominantly concentrated within intergenic regions. With a mean count of 1,167,101, these variants constituted a substantial proportion of the total SNP dataset, significantly exceeding the counts observed in other functional genomic elements. Regarding functional coding regions, the average counts of synonymous and non-synonymous mutations in exons were 33,677 and 11,735, respectively. The prevalence of synonymous mutations suggests that the majority of coding variants are likely neutral, potentially exerting a minimal impact on phenotypic expression. Conversely, intronic regions exhibited a consistently high frequency of variants, with a mean of 1,742,040 SNPs. This observation highlights the substantial genetic diversity harbored within the intronic landscape of the YE genome, which may hold implications for alternative splicing regulation. In the proximal regulatory regions, the average SNP counts for upstream and downstream regions were 62,680 and 68,101, respectively, indicating a considerable degree of variation within these gene regulatory elements. Notably, sample YE.5 displayed a significantly elevated upstream SNP count (77,569) compared to other individuals, potentially correlating with enhanced diversity within its gene regulatory sequences. In contrast, sample YE.1 presented a lower upstream SNP count of 57,391, which may imply relative stability in its gene expression regulatory mechanisms.
3.2. Genetic Structure Analysis of Population
3.2.1. Population Structure and Demographic History
To characterize the genetic diversity and population structure of the protected YE, we conducted IBD distance analysis, LD decay analysis, G-matrix visualization, and inference of Ne fluctuations using the PSMC model (Table 3 and Figure 2). The results indicate that the mean IBD genetic distance across the population was predominantly distributed between 0.22 and 0.26, suggesting a relatively close genetic relationship at the overall population level. However, samples YE.5 and YE.9 exhibited an elevated IBD distance of 0.31 relative to all other individuals, a value substantially exceeding the population mean. This suggests a potential origin from a relatively distinct or independent genetic background. Conversely, individuals such as YE.1, YE.2, YE.3, and YE.4 displayed lower inter-individual genetic distances, ranging from 0.22 to 0.24, indicating the formation of a closely related kinship unit within the population.
The LD decay curve for the YE exhibited a rapid decline, with the r^2^ value dropping below 0.3 at approximately 50 kb (Figure 2A). This observation is indicative of a low level of linkage disequilibrium within the population and a high frequency of recombination events. Consequently, this suggests a high degree of genetic diversity and sufficient gene flow among populations, implying that a relatively random mating scheme has been maintained during the artificial rearing and conservation process. Moreover, the visualization of the G-matrix (Figure 2B) revealed a largely uniform genetic relationship among the majority of individuals in the YE. Only a limited number of pairs exhibited moderate correlation, with no evidence of clear genetic differentiation or pronounced kinship clustering being observed. The PSMC model inference (Figure 2C) demonstrated that the Ne of the YE has undergone periodic fluctuations, characterized by two major bottleneck events. Following the Last Glacial Period (LGP, 2.3 × 10^4^1.1 × 10^5^ years ago), a relatively warm geological epoch, Ne initially showed a sustained decrease before gradually rebounding. However, Ne experienced a sharp decline to its minimum during the Last Glacial Maximum (LGM, 1.52.1 × 10^4^ years ago), a period of severe cold, and subsequently stabilized to a plateau around 10 thousand years ago. This demographic history is potentially attributable to a combination of factors, including artificial domestication, population migration events, or increased selection pressure during modern breeding practices.
3.2.2. Assessment of Genomic Inbreeding and Heterozygosity
To further provide a quantitative assessment of the population’s genetic diversity and inbreeding levels, we analyzed Runs of Homozygosity (ROH), heterozygosity, and inbreeding coefficients (Figure 3). The ROH analysis (Figure 3A,B) showed that the genomic burden of homozygosity is moderate, with the total ROH length per individual varying within a reasonable range. The observed heterozygosity (Ho) and expected heterozygosity (He) distributions (Figure 3C,D) indicated a balanced genetic variability within the population. Crucially, the genomic inbreeding coefficient (Fis) for most individuals was centered around zero (Figure 3E), confirming that the population does not suffer from severe inbreeding depression. These metrics collectively suggest that the current conservation strategy, utilizing a rotational mating system, has been effective in maintaining the genetic diversity of the YE.
3.3. Detection of Selection Signatures
3.3.1. Population-Level Selection Signal Detection
To elucidate the genomic footprints of recent positive selection within the YE, we conducted a genome-wide scan utilizing the iHS method. Given that the reliability of detected selection signals is contingent upon the statistical robustness of the calculated scores, we first characterized the distributional properties of the iHS values and their associated significance levels (Figure 4), The genome-wide distribution of iHS values approximated a standard normal distribution, exhibiting a mean centered near zero (μ = 1) and a standard deviation of approximately one (σ = 0). The fitted Gaussian curve (red line) demonstrated a high degree of congruence with the observed frequency histogram (Figure 4C,D). Furthermore, the quantile–quantile (Q-Q) plot (Figure 4E) indicated that the observed iHS values closely adhered to the theoretical normal distribution line. This corroborates the hypothesis that the vast majority of SNP loci represent a state of genomic neutrality. In contrast, the distribution of significance values exhibited a pronounced right-skewed pattern (skewness = 4.76) characterized by a long tail (Figure 4F). This distribution is quintessential for statistical significance metrics, where extreme values denote potential targets of selection. Leveraging the strict normality of the iHS distribution, SNPs falling at the extreme tails were identified as potential candidate loci indicative of selective sweeps (Figure 4G, Table S3). Specifically, employing a threshold of |iHS| > 2, we identified multiple candidate genes within these selection signal regions, including PLPP4, GHRL, IFT122, CDK1, TP53, STAT3, MAPK3, AKT1, LOC106040672, and LOC125182034, etc. Notably, this set includes several uncharacterized genes (LOC-prefix) that currently lack formal nomenclature in the National Center for Biotechnology Information database. Functionally, these genes are intimately associated with critical traits in the YE, including growth and development, lipid deposition, and disease resistance and stress resilience.
3.3.2. Functional Annotation and Pathway Enrichment Analysis of Candidate Genes
To delineate the biological functions of the putatively selected candidate genes in the YE, we performed GO and KEGG enrichment analyses (Figure 5). The GO analysis categorized candidate genes into Biological Process (BP), Cellular Component (CC), and Molecular Function (MF) classes (Figure 5A), with the top five most significantly enriched terms in each category summarized (Table 4). GO enrichment results within the BP category revealed that candidate genes were significantly enriched for terms related to cellular organization and immune response, such as ‘actin cytoskeleton organization’ and ‘T cell proliferation’. The chord diagram (Figure 5B) illustrates that genes such as VCL, KIT, and PHACTR1 are intimately linked to these processes. In the MF category (Figure 5C), candidate genes were primarily enriched for ‘signaling receptor activity’ and ‘molecular transducer activity’. Key genes such as ESR1, THRB, and NR2F2 were identified in these categories, suggesting that hormone regulation and signal transduction constitute important targets of selection. In the CC class (Figure 5D), significant enrichment was observed for ‘Golgi membrane’, ‘synapse’, and ‘endomembrane system’. Notably, genes such as EGFR, SH3GLB1, and TENM2 were enriched within these cellular structures, indicating that cellular signaling and structural integrity represent potential adaptive directions. The KEGG pathway enrichment analysis (Figure 5E) further revealed significant enrichment of candidate genes in metabolic and signaling pathways, with the top ten most significantly enriched pathways detailed (Table 5). The most highly enriched pathways included ‘Glycerophospholipid metabolism’ (p < 0.01), ‘Purine metabolism’ (p < 0.01), and ‘Inositol phosphate metabolism’ (Figure S1). The enrichment of genes such as PLA2G15, PLPP4 (in Glycerophospholipid metabolism), ENPP3, and ADK (in Purine metabolism) underscores the critical role of lipid and nucleotide metabolism in the adaptation of the YE. Specifically, the ‘Purine metabolism’ pathway demonstrates that the identified candidate genes (marked in red) play crucial roles within the metabolic network (Figure 5F). Furthermore, pathways related to environmental information processing, such as ‘Neuroactive ligand-receptor interaction,’ also showed enrichment, involving genes like HTR4 and GRIA2. This may confer advantages for the population’s behavioral and physiological adaptation.
4. Discussion
4.1. Genomic Landscape of Single Nucleotide Variations
Whole-genome resequencing has established itself as a pivotal instrument for elucidating the genetic architecture of livestock populations and evaluating germplasm characteristics. In the present study, we identified an average of 4.43 million SNPs across the 39 chromosome pairs of the YE. In contrast, Wen et al. [20], in their genomic investigation into the origins of domestic geese, reported 24.89 million SNPs—a figure significantly exceeding the count observed in our study. This discrepancy is likely attributable to two primary factors. First, the relatively lower sequencing depth employed in this study (6.4× < 11.2×) may have constrained the detection of low-frequency variants. Second, substantial divergence exists between the two study populations regarding their genetic backgrounds and breeding histories.
Additionally, the transversion-to-transversion (Ti/Tv) ratio in geese aligns with the average levels observed in other goose breeds (2.46 > 2.40), yet remains within the typical range for waterfowl species. This reflects a conservative preference in mutation mechanisms, where transversions are less likely than transitions [21]. The genomic distribution of SNPs parallels findings from prior studies, characterized by a predominant accumulation of variants within intergenic and intronic regions [22]. This accumulation pattern in non-coding regions is consistent with the neutral theory of molecular evolution, which predicts a higher tolerance for mutations in sequences under relaxed functional constraints. Intriguingly, we observed a ratio of synonymous (33,677) to non-synonymous (11,735) SNPs of approximately 2.87. This ratio is significantly higher than that derived from Huoyan goose data (2.35) [23]. Such a discrepancy implies that the YE goose may have undergone stronger purifying selection, effectively purging deleterious non-synonymous mutations to preserve protein structural stability. Although rigorous quality control was applied, the potential for phasing uncertainty due to limited sequencing depth remains a limitation. Future studies with higher sequencing depth are therefore recommended to verify these observations.
4.2. Analysis of Genetic Structure in the Conservation Population
Further elucidation of the target population’s genetic structure is fundamental for assessing conservation efficacy and formulating effective breeding strategies. Our IBD distance and G-matrix analyses revealed a relatively tight and uniform genetic relationship within the YE (average IBD = 0.24), with no evidence of significant family stratification. This outcome suggests that the exsitu conservation farm has been successful in maintaining population size and implementing random mating protocols, thereby efficaciously mitigating severe genetic drift or subpopulation differentiation often associated with closed, small-population breeding. Intriguingly, individuals YE.5 and YE.9 exhibited an elevated IBD distance (0.31) relative to the rest of the cohort. This finding parallels observations by Ouyang et al. [24] in their studies of Chinese indigenous goose evolution, where certain individuals displayed evidence of admixture or distinct evolutionary pathways. This difference in genetic background may stem from the retention of unique ancestral haplotypes, which should be a key focus for prioritization and utilization in future selective breeding programs.
The rate of LD decay is a crucial metric reflecting population genetic diversity and recombination history. Our study showed that the YE experienced a rapid decay, with r^2^ dropping below 0.3 at approximately 50 kb. This decay rate is comparable to that reported for other Chinese indigenous goose breeds (Anser cygnoides) [25,26], suggesting that the YE breeding history involved frequent genetic recombination, thereby sustaining a high level of genetic diversity. Furthermore, considering the potential haplotype phasing uncertainties at 6× sequencing depth, we chose to use the PSMC method to avoid biases associated with haplotype-dependent methods. We recommend that future studies utilize high-depth sequencing data and employ these newer algorithms to decipher the recent population dynamics trajectories of geese. PSMC analysis of the Ne revealed two major historical bottleneck events. Specifically, during the LGM (15 to 21 thousand years ago), Ne declined sharply to its minimum; this finding corroborates the notion that habitat loss and resource scarcity due to global cooling were the primary constraints limiting the expansion of ancient avian populations [27]. Subsequently, the Ne entered a stabilization phase (plateau) approximately 10 thousand years ago. This timing coincides with the Neolithic period, when humans initiated the domestication of poultry globally. Compared to wild counterparts, domesticated geese benefited from the stability of human protection, buffering them against intense natural selection pressures [28]. This finding not only corroborates the shared historical background of goose (Anser cygnoides) domestication in China but also provides crucial genomic evidence for tracing the ancestral origins of the YE breed.
4.3. Genomic Signatures of Positive Selection and Their Biological Implications
In the present study, we leveraged whole-genome SNP data to conduct a genome-wide scan for selection signatures within the conservation population of YE, employing the iHS method. The iHS statistic offers a distinct advantage in detecting ongoing or recent positive selection signatures that have not yet reached fixation. Consequently, it is particularly efficacious in capturing the genomic “selection footprints” imprinted during the processes of domestication and artificial breeding [29]. By applying a defined screening threshold, we successfully identified a suite of candidate genes subject to potential selection, including PLPP4, GHRL, IFT122, CDK1, TP53, STAT3, MAPK3, and AKT1. Functional analysis revealed that these genes are predominantly enriched in biological processes encompassing growth and development, lipid metabolism, and immune response. These findings align closely with the distinguishing germplasm characteristics of the YE as a superior meat-type breed, namely, its rapid growth rate, large body size, and robust stress resilience.
First, growth traits constitute a primary target of intense artificial selection during the domestication of poultry. Within the identified genomic regions harboring strong selection signatures, we detected multiple pivotal genes intimately associated with the growth axis and cellular proliferation, notably GHRL, AKT1, and MAPK3. Specifically, the GHRL encodes the ghrelin–obestatin preproprotein, a precursor to ghrelin. As the endogenous ligand for the growth hormone secretagogue receptor (GHS-R), it plays a central orchestration role in regulating feed intake behavior, maintaining energy homeostasis, and stimulating growth hormone secretion. Previous investigations by Li et al. [30] into the growth patterns of dwarf chickens demonstrated that GHRL functions as an appetite modulator, influencing nutrient absorption by regulating feed intake. Consequently, the enrichment of GHRL variants in the YE genome may be linked to the breed’s enhanced foraging capacity and somatic development. Furthermore, AKT1 and MAPK3 serve as canonical components of the PI3K-Akt and MAPK signaling pathways, respectively. These pathways are integral to cellular proliferation, differentiation, and the process of skeletal muscle myogenesis [31,32]. The identification of these key signaling members suggests that genomic regions governing muscle development in the YE have undergone significant adaptive evolution during long-term selective breeding. This evolutionary process has likely established the genetic architecture underpinning the breed’s superior meat production performance.
Second, selection signatures associated with lipid metabolism illuminate the potential genetic mechanisms underlying the meat quality traits of the Yan goose. The identified candidate genes, including PLPP4, SAMD8, and LPIN1, are integral to pathways such as glycerophospholipid metabolism and lipid synthesis, playing pivotal roles in these metabolic processes [33,34]. Waterfowl, particularly geese, possess a distinctive capacity for adipose deposition (exemplified by hepatic lipid metabolism). Consequently, the positive selection acting on these lipid-related genes is likely associated with intramuscular fat deposition or body fat distribution patterns, thereby influencing meat flavor and texture. Synthesizing these findings with the KEGG enrichment results, the significant enrichment of ‘Purine metabolism’ and ‘Inositol phosphate metabolism’ pathways further implies a distinct metabolic plasticity in energy homeostasis and substrate turnover. This adaptation likely underpins the YE’s tolerance to coarse feed and its high feed conversion efficiency.
Finally, this study substantiated the environmental adaptability and disease resistance of the YE at the genomic level. The identified TP53 and STAT3 genes serve as critical mediators of cellular stress responses and immune regulation [35]. As elucidated by Baliakas et al. [36] in their investigation of the TP53 tumor suppressor, p53 functions as a “master switch” bridging cellular stress and appropriate cellular or multicellular responses. Similarly, STAT3 acts as a central transcription factor within the JAK-STAT signaling pathway, directly facilitating cytokine signal transduction and immune cell activation [37]. The positive selection acting on these immunity-related genes likely signifies the evolution of adaptive genetic mechanisms conferred to the YE. These mechanisms have evolved to counteract pathogen invasion and cope with environmental stressors throughout their long natural history and domestication.
4.4. Limitations of the Study
While this study provides the first comprehensive genomic characterization of the Yan goose, we acknowledge certain limitations. First, the sample size (N = 15) is relatively modest, which is a common constraint in studies involving specific conserved indigenous breeds. Although the iHS method serves as a robust statistic for detecting signatures of recent positive selection, its application to smaller datasets may introduce statistical noise or reduce the power to detect low-frequency variants. To mitigate these potential biases, we implemented stringent quality control measures and conservative detection thresholds; nevertheless, the possibility of false positives cannot be entirely ruled out. Second, our sampling strategy was restricted to a single ex situ conservation population. While this population captures the core germplasm of the breed, future investigations should aim to incorporate larger cohorts from diverse agroecological zones and conduct comparative genomic analyses with other domestic goose breeds and wild Anser cygnoides (Anser cygnoides). Such endeavors will be instrumental in further validating the specificity of the identified selection signatures and refining the evolutionary trajectory of the breed.
5. Conclusions
In this study, we employed whole-genome resequencing technology to provide the first systematic elucidation of the genomic variation profiles and evolutionary history characterizing the conserved YE breed. Our findings substantiate the rich genetic diversity and distinct evolutionary trajectory of this breed, while clearly reconstructing the historical demographic sculpting exerted by climatic fluctuations during the Quaternary glaciation. Crucially, our analyses successfully pinpointed key candidate genes intimately associated with growth and development (GHRL, AKT1, and MAPK3), lipid metabolism (PLPP4, SAMD8, and LPIN1), and immune adaptation (TP53, STAT3). These discoveries decipher the molecular mechanisms underlying the YE’s distinguishing germplasm attributes, specifically its large body size, superior meat quality, and robust stress resilience at the genomic level. Furthermore, this work establishes a robust foundation for future endeavors in molecular marker-assisted selection and genomic breeding programs targeting economically important traits in domestic geese.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Eda M. Itahashi Y. Kikuchi H. Sun G. Hsu K.H. Gakuhari T. Yoneda M. Jiang L. Yang G. Nakamura S. Multiple lines of evidence of early goose domestication in a 7000-y-old rice cultivation village in the lower Yangtze River, China Proc. Natl. Acad. Sci. USA 2022119 e 211706411910.1073/pnas.211706411935254874 PMC 8944903 · doi ↗ · pubmed ↗
- 2Chen G. Liu J. Xu Q. The current situation and prospects of the protection and utilization of goose genetic resources in China China Poult. Ind. Guide 202239612(In Chinese)
- 3Li H. Tu Y. Tang Q. Chen K. Genetic Diversity of 6 Key Protected Local Goose Breeds in China J. Sichuan Agric. Univ.200523466469(In Chinese)
- 4Qi S. Wu T. Wu H. Liang Y. Zhao W. Zhang Y. Xu Q. Chen G. Whole-genome resequencing reveals the population structure and domestication processes of endemic endangered goose breeds (Anser cygnoides)Poult. Sci.202510410500410.1016/j.psj.2025.10500440088535 PMC 11957519 · doi ↗ · pubmed ↗
- 5Chen X. Lu J. Yu F. Protection of Livestock and Poultry Genetic Resources Abroad and Its Implications for China Genetics 202345545552(In Chinese)
- 6Yu S. Liu Z. Li M. Zhou D. Hua P. Cheng H. Fan W. Xu Y. Liu D. Liang S. Resequencing of a Pekin duck breeding population provides insights into the genomic response to short-term artificial selection Gigascience 2023201201610.1093/gigascience/giad 01636971291 PMC 10041536 · doi ↗ · pubmed ↗
- 7Zhang X. He Y. Zhang W. Wang Y. Liu X. Cui A. Gong Y. Lu J. Liu X. Huo X. Development of Microsatellite Marker System to Determine the Genetic Diversity of Experimental Chicken, Duck, Goose, and Pigeon Populations Biomed. Res. Int.202114885188810.1155/2021/8851888 PMC 782267033511214 · doi ↗ · pubmed ↗
- 8Zhang X. Qiu G. Huang D. Ma Z. Wei J. Zhong R. Li R. Huang M. Gou J. Ye F. Genetic assessment and Identification of genes related to characterization of Guangdong local goose breeds based on modern and historical genomes Commun. Biol.20258113210.1038/s 42003-025-08550-640738957 PMC 12310927 · doi ↗ · pubmed ↗
