Genome-Wide Association Study of Gross Hair Weight and Hair Length Traits in Different Body Regions of Tianzhu White Yak
Yicheng Liu, Xuedong Qi, Yongfu La, Xiaoming Ma, Wenwen Ren, Guowu Yang, Zhenyu Zhang, Min Chu, Xiaoyun Wu, Xian Guo, Shaobin Li, Wanzhen Qi, Chunnian Liang

TL;DR
This study identifies genetic loci and genes associated with hair weight and length in Tianzhu White Yak, providing insights into the genetic basis of hair production traits.
Contribution
The study reports genome-wide significant loci and candidate genes for hair traits in Tianzhu White Yak, enhancing understanding of their genetic mechanisms.
Findings
Chromosome 6 had the highest number of significant SNP loci linked to hair traits.
Candidate genes like FGF5, CFAP299, and PRDM8 were found to influence hair weight and length traits.
Functional analysis showed enrichment in cytoplasm and the MAPK signaling pathway.
Abstract
This study systematically measured gross hair weight and hair length traits across five body regions of 759 Tianzhu White Yak individuals. The BSL trait exhibited moderate heritability, while the BL trait demonstrated high heritability (h2 = 0.450). All other traits showed low heritability. GWASs were conducted using whole-genome resequencing data comprising 22,566,255 high-quality SNP loci. The MLM model identified 519 genome-wide significant loci and 767 chromosome-wide significant loci. Chromosome 6 harbored the highest number of significant SNP loci, while the remaining significant loci were distributed across multiple autosomes. Strong long-range linkage disequilibrium (r2 > 0.7) was observed between numerous significant SNPs on chromosome 6 associated with Gw and HL traits. A total of 73 candidate genes were annotated, including FGF5, CFAP299, and PRDM8. Functional enrichment…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8- —Central Guidance Funds for Local Science and Technology Development Projects
- —Modern Beef Yak Industry Technology System
- —Innovation Project of Chinese Academy of Agricultural Sciences
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHair Growth and Disorders · Skin and Cellular Biology Research · Wnt/β-catenin signaling in development and cancer
1. Introduction
Animal hair is a significant livestock product and vital raw material for the textile industry. The length of mammalian hair is the result of long-term natural and artificial selection, encompassing its adaptability to environmental changes. Tüfekci et al. [1] demonstrated that primary wool production traits are closely linked to climatic conditions and regulated by intrinsic factors, including breed genetics, age, sex, nutrition, and shearing frequency. Baba et al. [2] further confirmed that these traits are influenced by a combination of genetic (breed), physiological (age), management (collection site, shearing interval), and environmental factors, as well as wool color. For instance, cattle in tropical regions exhibit the slick-hair phenotype, regulated by the slick-hair gene, which shortens hair length and smooths skin to maintain lower rectal temperatures and enhance perspiration efficiency in hot, humid climates [3]. Ding et al. [4] identified a longer hair follicle growth phase in long-haired versus short-haired rabbits; via transcriptome sequencing, they characterized key candidate genes involved in follicle development, lipid metabolism, and apoptosis, providing a critical molecular basis for studying hair follicle cycle regulation.
Genome-wide association studies (GWASs) are extensively employed in livestock genetic breeding research to screen InDel markers, candidate genes, and quantitative trait loci (QTL) regions significantly linked to target traits [5,6,7]. This approach reveals relationships between phenotypes and genotypes and identifies potentially significant associated genes influencing important economic traits within populations. The findings can be effectively utilized in livestock breeding programs to enhance economic benefits and shorten breeding cycles. In recent years, numerous studies have identified key genes influencing important economic traits in yak, which have been applied in yak genetic breeding. The findings of Wang et al. [8] indicate that seven single-nucleotide polymorphism (SNP) loci exhibit significant associations with body weight in the Maowa yak. Further analysis annotated these SNPs to three functional genes associated with body weight: MFSD4, LRRC37B, and NCAM2. Liu et al. [9] conducted a GWAS analysis on 94 yak individuals from six provinces/regions, including Sichuan and Qinghai, examining traits such as height, body length, and weight. They identified six SNP loci significantly associated with height and further annotated these to four candidate genes, including FXYD6 and SOHLH2. In the study of Zebu cattle, Santana [10] utilized the Illumina Bovine SNP 50 array to genotype 720 male Zebu, identifying genes like PDE4B near the rs42518459 locus on chromosome 3, which play crucial roles in regulating various productive traits.
The yak (Bos grunniens), a species unique to the Qinghai–Tibet Plateau, has evolved to thrive in its harsh environment, characterized by high altitudes, low oxygen levels, and short grass growth cycles, allowing it to efficiently utilize the forage resources of alpine grasslands [11]. As a central animal in pastoralist livelihoods, yaks provide local herders with economic products such as milk, meat, hides, and fuel, constituting a vital source of income [12]. The Tianzhu White Yak is a unique local breed found in Tianzhu Tibetan Autonomous County, Wuwei, Gansu Province, mainly thriving in the Xidatan, Aiyangou grasslands, and Zhaqixiu Longtan regions. This breed is distinguished by its thick, plentiful fur, granting it remarkable resistance to cold. Most individuals boast pure-white coats, pinkish skin, and black spots reminiscent of dairy cattle [13]. The Tianzhu White Yak maintains a relatively large population with stable genetic traits. Its average hair yield is approximately 3.62 kg, with a maximum recorded at 6 kg. Underhair yield averages around 0.4 kg, while tail hair yields approximately 0.62 kg [14]. The Tianzhu White Yak not only provides economic benefits to herders through meat and milk production, but its hair also serves a highly practical purpose. Research reveals that its hair boasts considerable functional, economic, and cultural value. Adapted to the frigid high-altitude environment, the yak’s hair forms an effective thermal insulation barrier, minimizing heat loss and enabling the animal to thrive in extreme weather conditions [15]. The distinctive aesthetic appeal and superior fiber characteristics of Tianzhu White Yak hair confer substantial utility across industries, including cultural apparel, carpet manufacturing, and the production of cold-weather protective gear [16]. Consequently, investigating the traits of gross hair weight and hair length in Tianzhu White Yak is of paramount importance and constitutes a vital approach for genetic improvement of the breed. Although genome-wide association studies (GWASs) have been employed to identify genetic variants influencing growth traits in yak, research on hair fiber-related traits remains insufficient [17,18]. Existing studies have primarily focused on single-hair traits or have been constrained by issues such as insufficient marker density [19]. A gap exists in the field of GWAS utilizing whole-genome resequencing data to investigate hair weight and length across different body regions. Furthermore, only a limited number of studies have validated SNP loci, which has hindered their practical application in molecular breeding [20]. This study conducted genome-wide association analyses on hair length traits at five different body regions and hair weight in 759 Tianzhu White Yaks. Our objective is to identify functional candidate genes, elucidate their association with wool production traits, and establish a molecular foundation for enhancing wool production in Tianzhu White Yak through selective breeding. We hypothesize that a combination of shared genetic factors and region-specific genetic factors contributes to variations in hair weight and hair length across different body regions in the Tianzhu White Yak. Through genome-wide association studies (GWASs) based on whole-genome resequencing, we aim to identify genetic variants and candidate genes associated with these hair production traits.
2. Materials and Methods
2.1. Experimental Materials
The experimental samples for this study were collected from Xiamiangou Village, Dachaigou Town, Tianzhu Tibetan Autonomous County, Wuwei City, Gansu Province. Samples were collected in mid-June 2025. All phenotypic traits were measured in vivo by the same team of personnel on 759 healthy Tianzhu White Yak individuals from the same breeding herd. Each trait was independently measured twice per individual, with the average used for subsequent statistical analysis. These animals ranged in age from 1 to 10 years, exhibited similar body conformation, and were raised under consistent husbandry conditions. The herd comprised 564 females and 195 males. Five milliliters of whole blood were collected from each yak via venipuncture into clean, anticoagulant-treated vacuum blood collection tubes. The samples were mixed thoroughly and stored at −20 °C in a low-temperature freezer for subsequent DNA extraction.
2.2. Phenotypic Data Measurement and DNA Extraction from Blood Samples
Using a ruler and electronic scale, measure the tail hair length, skirt hair length, body hair length, head hair length, back hair length, and gross hair weight of the Tianzhu White Yak. Prior to GWASs, the overall distribution of phenotypic traits was assessed using descriptive statistics and distribution plots. Only a few trait datasets showed slight deviations from normal distribution without extreme skewness; subsequent analyses proceeded directly with raw phenotypic values. Genomic DNA was extracted from Tianzhu White Yak blood according to the manufacturer’s instructions of the TIANamp Genomic DNA Kit (Tiangen Biotech, Beijing, China). Prior to use, add anhydrous ethanol to the washing buffer PWB. Remove yak blood samples from −20 °C storage and allow to thaw at room temperature. Transfer 600 μL of blood to a centrifuge tube, add 600 μL cell lysis buffer CL, centrifuge to remove supernatant and retain pellet (repeat once), then add 200 μL buffer GS and mix thoroughly. Add 200 μL buffer GB and 20 μL Proteinase K premix. Incubate at 56 °C in a metal bath for 10 min until the solution becomes clear. Allow to stand at room temperature for 2–5 min, then add 350 μL buffer BD and mix thoroughly. Transfer the solution and pellet to the adsorption column CG2, centrifuge and discard the supernatant. Add sequentially 500 μL and 600 μL of Buffer GDB, centrifuge and discard the supernatant; centrifuge again for 2 min and discard the supernatant; air-dry at room temperature; transfer the column to a 1.5 mL centrifuge tube, add 50 μL of Elution Buffer TB dropwise, centrifuge after standing at room temperature, then add 25 μL dropwise and repeat the procedure to obtain 75 μL of DNA product. DNA purity was assessed by measuring the OD260/280 ratio using a NanoDrop spectrophotometer (Allsheng Instruments, Hangzhou, China). DNA concentration was precisely determined using a Qubit fluorometer (Thermo Fisher Scientific, Shanghai, China) (70 to 145 ng/μL).
2.3. Genotyping and Quality Control
DNA samples that pass quality control undergo library preparation through the following steps: restriction enzyme digestion, random fragmentation, end repair, A-tailing, sequencing adapter ligation, purification, and ligation. The constructed library is sequenced using the DNBSEQ-T7 high-throughput sequencer. Raw image data files obtained from high-throughput sequencing undergo base-calling analysis to convert them into raw sequencing reads. Upon acquisition of the data, to ensure the quality of subsequent information analysis, quality assessment and filtering of the data were performed using SOAPnuke software (v.2.2.1) [21]. The filtered high-quality sequences were then aligned against the reference genome (LU_Bosgru_v3.0) using the BWA software (v.0.7.17) [22] against the reference genome (LU_Bosgru_v3.0). The resulting alignments were converted into sorted BAM format files using Samtools (v.1.9) [23]. Subsequently, the bamqc module of Qualimap2 (v.2.2.2-dev) [24] was employed to perform statistical analysis and quality assessment on the BAM files. Building upon this, GATK (v.4.2.6.1) [25] was employed to detect single-nucleotide polymorphisms (SNPs) under default parameters. The resulting SNPs were functionally annotated and classified using SnpEff (v.5.0) [26]. To enhance data quality, stringent quality control was performed using Plink (v.1.90) with the following criteria: --geno 0.2 (filtering out variants with more than 20% missing genotype data), --hwe 1 × 10^−5^ (Hardy–Weinberg equilibrium test p-value > 1 × 10^−5^), --maf 0.01 (filtering out variants with minor allele frequencies below 0.01), and --mind 0.3 (excluding individuals with genotype missing rates exceeding 30%). This process yielded 22,566,255 high-quality SNP sites for subsequent analysis.
2.4. Principal Component Analysis, Phylogenetic Analysis, and Linkage Disequilibrium Analysis
To reduce the false positive rate, principal component analysis (PCA) was performed using PLINK software (v.1.90). Population stratification was assessed by examining the clustering patterns among samples. To assess DNA sequence similarity among individuals, the GCTA software (v.1.94.0) was used to construct a kinship G matrix, which was incorporated into the data processing model. Simultaneously, PopLDdecay (v.1.90) was applied to the quality-controlled genotype data to perform linkage disequilibrium (LD) decay analysis.
2.5. Estimation of Genetic Parameters
Using SNP data processed with PLINK, GCTA software (v.1.94.0) GREML module was employed to estimate heritabilities for each trait and genetic correlations between traits. A genomic-relationship matrix (GRM) was constructed based on chromosomal SNPs, and SNP heritabilities for each trait were estimated via univariate models. Subsequently, a bivariate model was employed to estimate the additive genetic correlation coefficient and residual correlation coefficient between traits. The models and calculation formulas used are as follows:
where y is the phenotype vector, Xβ is the fixed effect (age and gender), g is the SNP additive genetic effect, and e is the residual; is the genetic variance and is the residual variance; is the additive genetic covariance between two traits, and are the additive genetic variances for traits 1 and 2, respectively; is the residual covariance between two traits, and and are the residuals for traits 1 and 2, respectively.
2.6. Genome-Wide Association Study (GWAS)
This study employed mixed linear models (MLM) from the rMVP [27] package in R software (v.4.2.2) to perform trait association analysis on SNP loci retained after Plink quality control. Gender and age were incorporated into the mixed linear model as a fixed effects combination. The Bonferroni correction method reduces false positives from multiple tests in GWAS analysis. It achieves this by dividing the conventional significance threshold of 0.05 by the number of marker loci, thereby controlling the cumulative probability of Type I errors within 0.05 [28]. Therefore, this study sets the genome-wide significance threshold at 0.05/22,566,255 = 2.049 × 10^−9^ and the chromosome-wide significance threshold at 1/22,566,255 = 4.098 × 10^−8^ for screening potentially significant SNP loci. The mixed model employed is as follows:
where y denotes the phenotypic trait, X is the indicator matrix for fixed effects, and α is the vector of estimated fixed-effect parameters (including age and sex); Z is the single nucleotide polymorphism matrix, β represents the SNP effects, Q is the covariate matrix, γ denotes the covariate effect parameters corresponding to the first five principal components (PC1–PC5) of the population structure, W is the random-effects matrix (kinship matrix G), μ is the predicted random individual, and e is the random error.
Considering that population stratification may lead to spurious associations in the GWAS, the genomic inflation factor (λ) was calculated. Quantile–quantile (Q-Q) plots for six traits were generated using R software to assess population stratification. The first two principal components were extracted using R software and plotted as a principal component analysis scatter plot with the ggplot2 package to observe whether stratification existed within the cohort. The first five principal components were added as covariates to the mixed linear model to correct for spurious associations in the GWAS analysis caused by population stratification. Manhattan plots for all traits were generated using R software and visualization parameters.
2.7. Gene Annotation and GO/KEGG Enrichment Analysis
To identify candidate genes associated with target traits, the study utilized Ensembl genome annotation resources (release 109) were usedto locate potential candidate genes within a 160 kb upstream and downstream range of significant SNPs identified through association analysis based on domestic yak genes (LU_Bosgru_v3.0) annotation. Simultaneously, the DAVID database (https://davidbioinformatics.nih.gov/, accessed on 4 December 2025) was employed to perform GO functional annotation on the screened candidate genes. KEGG pathway analysis was performed using KOBAS 3.0 (http://bioinfo.org/kobas, accessed on 4 December 2025). In the enrichment analysis, p < 0.05 was considered statistically significant.
2.8. SNP Site Confirmation and Primer Design
From a population of 759 healthy Tianzhu White Yaks, six individuals were randomly selected for Sanger sequencing to validate genotyping accuracy. Based on quality control, six significant SNP sites were randomly selected for genotype validation. Information on these six loci is presented in Table 1. Using the yak genome reference Bosgu_v3.0 from the European Nucleotide Archive, the upstream and downstream reference sequences for each locus were obtained, and the corresponding fasta files were downloaded. Specific primers were designed using the NCBI online tool Primer-BLAST (https://www.ncbi.nlm.nih.gov/tools/primer-blast/, accessed on 4 December 2025), and primer synthesis was commissioned to Shanghai Sangon Biotech Co., Ltd. (Shanghai, China). Primer sequence information and annealing temperatures are shown in Table 2.
2.9. PCR Amplification and Sequencing
Perform PCR amplification experiments on the selected SNP sites. Standard PCR amplification was performed using DNA extracted from the blood of Tianzhu White Yak as the template. The reaction system had a total volume of 25 μL, comprising 6.5 μL ddH_2_O, 4 μL cDNA (10 mg/μL), 1 μL each of primers F/R (10 μmol/L), and 12.5 μL TaKaRa Taq™ Version 2.0 plus dye. PCR program: 95 °C pre-denaturation for 3 min; 35 cycles (95 °C denaturation for 30 s, 59.5 °C annealing for 30 s, 72 °C extension for 20 s); final extension at 72 °C for 5 min; store at 4 °C. Amplified products were detected via 1% agarose gel electrophoresis (110 V, 40 min). After confirming amplification fragments matched expected results, PCR products, along with upstream and downstream primers, were sent to Shanghai Sangon Biotechnology Co., Ltd. (Shanghai, China). for Sanger sequencing analysis.
2.10. Data Statistics and Analysis
We used MEGA 12.0 software to view sequencing results and complete genotyping operations, followed by organizing relevant data using Excel 2019. Based on genetic parameters such as effective number of alleles (Ne), polymorphic information content (PIC), and allele frequency, calculations were performed using Excel functions, followed by Hardy–Weinberg equilibrium (HWE) testing. Simultaneously, univariate analysis of variance (ANOVA) was performed using the general linear model (GLM) in SPSS 26.0 software to assess correlations between genotypes at selected SNP loci and corresponding phenotypes. The general linear model is
Here, yi denotes the phenotypic value (dependent variable) of the individual, μ represents the population mean for the corresponding trait, Gi indicates the SNP genotype effect for the individual, and ei signifies the random error. Multiple comparisons were performed using Duncan’s multiple range test, with results presented as “mean ± standard error.”
2.11. Linkage Disequilibrium Analysis of Significant SNPs
To further investigate the cause of the dense-clustering phenomenon of genome-wide significant SNPs for Gw and HL traits on chromosome 6 and analyze the linkage disequilibrium relationships among five SNP loci on the same chromosome as determined by Sanger sequencing, we employed Haploview software (v.4.2) to conduct linkage disequilibrium analysis on the significant SNPs of Gw and HL on chromosome 6, as well as on the five SNP sites on the same chromosome identified through Sanger sequencing.
3. Results
3.1. Phenotypic Data Statistics
This study compiled and performed descriptive statistics on the tail hair length, skirt hair length, body side hair length, head hair length, back hair length, and hair weight data measured from 759 Tianzhu White Yaks (see Table 3). The coefficients of variation for tail hair length (TL), skirt hair length (SL), body side hair length (BSL), head hair length (HL), back hair length (BL), and gross hair weight (Gw) were 24.09%, 14.61%, 38.89%, 32.03%, 45.78%, and 39.86%, respectively. The results indicate substantial phenotypic variation in these traits within the Tianzhu White Yak population. Further correlation analysis between all individual hair length traits and hair weight revealed (Figure 1) significant positive correlations among most traits, with the highest Pearson correlation coefficient of 0.41 observed between Gw and HL.
3.2. SNP Data Statistics
This study used low-depth, whole-genome sequencing (LcWGS) with an average sequencing depth of approximately 1.35×, with a mapping rate of 98.84%. This indicates that the majority of sequencing reads successfully aligned to the reference genome, yielding high-quality alignment results. The reference genome employed in this study is the Tianzhu White Yak Bosgu_v3.0 version (European Nucleotide Archive accession number: GCA_005887515.1). A total of 2.914 Tb of raw data was generated, averaging 3.84 Gb of raw data per sample. The filtered clean data totaled 2.894 Tb, averaging 3.81 Gb per sample. The GC content of the 759 samples ranged from 39.66% to 46.05%. The average Q20 score after quality control was 98.57%, and the average Q30 score was 95.85%, indicating high sequencing quality and no significant bias in library preparation or sequencing processes (Supplementary Table S1).
This study conducted whole-genome resequencing of 759 Tianzhu White Yaks to evaluate their hair production traits, including gross hair weight (Gw), tail hair length (TL), skirt hair length (SL), body side hair length (BSL), head hair length (HL), and back hair length (BL). Prior to data filtering, the raw data contained 56,389,052 SNP sites. After quality control, 22,566,255 SNP sites were obtained, specifically: (1) missing-genotype-data quality control (--geno 0.2) removed 5,638,905 SNP sites; (2) Hardy–Weinberg equilibrium testing removed 1,015,003 SNP sites; (3) failure to meet the minimum allele frequency threshold removed 27,168,889 sites. The changes in SNP site numbers after quality control are shown in Figure 2, demonstrating their uniform distribution across the 29 yak chromosomes (Figure 2).
3.3. Analysis of Population Genetic Diversity
To reduce the generation of false positives during GWAS analysis, this study performed principal component analysis (PCA) on the 22,566,255 SNP loci obtained and incorporated the top five principal components of the population as covariates into the GWAS model. Visualization was performed using the ggplot2 package in R software. As shown in Figure 3A, most individuals clustered together with only a few outliers, indicating relatively consistent genetic structure within the population and no obvious stratification. After understanding the population’s genetic relationships, a kinship matrix (G matrix) for Tianzhu White Yak was constructed based on genotype data. The results (Figure 3C) show that most individuals exhibit distant genetic relationships, indicating low levels of inbreeding within the population. Using the ggplot2 package in R software, we plotted the LD decay curve (Figure 3B). The results showed that when the LD coefficient r^2^ decayed to 0.1, the corresponding physical distance was approximately 160 kb, with the rate of decline gradually flattening. This indicates that the closer two SNP loci are on the chromosome, the stronger their correlation. Therefore, the study will identify potential key genes within a 160 kb upstream and downstream region of the significant locus.
3.4. Estimation of Genetic Parameters for Gross Hair Weight and Hair Length Traits
Table 4 indicates that heritability varies significantly among traits. The heritabilities of Gw, TL, SL, and HL were all below 0.20, indicating low-heritability traits where phenotypic variation is primarily influenced by environmental factors. BSL exhibited a heritability of 0.284, classified as a medium-heritability trait. BL demonstrated the highest heritability (h^2^ = 0.450), qualifying as a high-heritability trait with substantial potential for genetic improvement. Table 5 shows that in terms of additive genetic correlations, Gw and HL exhibited a moderate positive genetic correlation, while TL and BSL demonstrated a strong positive genetic correlation. Genetic correlations between other trait pairs were generally weak. Residual correlation analysis revealed positive residual correlations between most traits, with correlation coefficients generally higher than their corresponding genetic correlation coefficients.
3.5. Genome-Wide Association Study of Gross Hair Weight and Hair Length Traits
GWAS analysis was performed using a mixed linear model with GWAS, TL, SL, BSL, HL, and BL across 22,566,255 detected SNPs, successfully identifying SNPs significantly associated with target traits. Manhattan plots and Q-Q plots were generated using the ggplot2 package in R software (Figure 4 and Figure 5). The total PVE by significant SNPs was estimated for each trait, with values of 0.01908 for Gw, 0.22688 for BL, 0.01164 for BSL, 0.00958 for SL, and 0.00560 for HL.
Based on preset thresholds, 485 SNPs reached chromosome-wide significance in Gw, while 329 achieved genome-wide significance. These 329 genome-wide significant SNPs were distributed across eight chromosomes. Annotation using Ensembl software identified 54 genes associated with gross hair weight traits (Supplementary Table S2). A total of 73 distinct genes were identified for both gross hair weight and hair length traits. Further analysis based on linkage disequilibrium at significant SNP loci revealed that some associated genomic regions contained annotated candidate genes, while several significant haplotype blocks lacked annotated genes. This suggests that some association signals may originate from non-coding regions (Supplementary Table S3). The Q-Q plot revealed a genomic inflation factor close to 1 (λ = 0.970), indicating no genome inflation and good model fit suitability for this trait (see Figure 4 and Supplementary Table S1).
In the hair length trait analysis, no SNPs significantly associated with TL were identified. For SL, HL, and BL, 2, 179, and 9 SNPs, respectively, reached genome-wide significance. Across all six traits, 767 SNPs achieved chromosome-wide significance. For SL, two genome-wide significant SNPs were identified on two chromosomes and annotated 4 genes, such as the AFG2 ATPase homolog A (AFG2A) gene. An additional 9 SNPs reached chromosome-wide significance; HL had 179 genome-wide significant SNPs exclusively on Chr6, annotated to six genes including Anthrax Toxin Receptor 2 (ANTXR2) and Fibroblast growth factor 5 (FGF5), totaling 236 chromosome-significant SNP sites; BL identified 9 relevant SNP sites across four chromosomes, annotated to 5 genes including the Calcium voltage-gated channel subunit alpha1 B (CACNA1B) and ANTXR2 genes, with an additional 20 chromosomally significant SNP sites; BSL identified 16 chromosomally significant SNP sites across five chromosomes annotated to 33 genes, such as Glycoprotein hormone subunit beta 5 (GPHB5), Protein tyrosine phosphatase non-receptor type 21 (PTPN21), and Ras homolog family member J (RHOJ). Q-Q plots revealed genomic inflation factors of 0.995, 0.983, 1.002, 0.971, and 1.028 for TL, SL, BSL, HL, and BL, respectively, indicating good model fit without evidence of population stratification. (See Figure 5 and Supplementary Table S1)
3.6. GO Functional Annotation and KEGG Enrichment Analysis of Candidate Genes
To further explore the functions and potential regulatory relationships of these significant SNP loci and their candidate genes, systematic functional enrichment analysis was performed on annotated genes using GO and KEGG databases. GO functional enrichment analysis revealed that candidate genes were significantly enriched (p < 0.05) in eight biological processes (BP), one cellular component (CC), and two molecular functions (MF). Among these, BP was primarily enriched in pathways such as protein K11-linked ubiquitination, protein K48-linked ubiquitination, positive regulation of cell division, and glial cell differentiation; CC was significantly enriched in cytoplasm; and MF was significantly enriched in hydrolase activity, hydrolyzing O-glycosyl compounds, and hydrolase activity, acting on glycosyl bonds (see Figure 6 and Supplementary Table S4). KEGG pathway enrichment analysis (Figure 6B and Supplementary Table S5) revealed significant enrichment of candidate genes in pathways including the MAPK signaling pathway, Calcium signaling pathway, cAMP signaling pathway, and Nicotine addiction, involving genes such as FGF5, CACNA1B, and GABBR2. Although the limited number of genes participating in enrichment resulted in a small number of enriched genes, these findings still hold considerable reference value.
3.7. SNP Sequencing and Genotyping
By amplifying and sequencing six SNP loci across 759 Tianzhu White Yak genomic samples, genotype data for each SNP locus were successfully obtained from all 759 samples. Figure 7 displays peak plots of partial genotyping results for the selected SNP loci.
3.8. Analysis of Genetic Diversity in SNPs
Genotyping statistics were performed for the selected polymorphic sites, including calculation of genotype frequency, allele frequency, information content, homozygosity, and Hardy–Weinberg equilibrium. Detailed results are presented in Table 6 and Table 7. Analysis revealed that the GG, GA, and AA genotypes were present at all three loci: g.25500413 G > A, g.25650960 G > A, and g.25388433 G > A. Among these, the GG genotype was the dominant allele at both the g.25500413 G > A and g.25650960 G > A loci. Assessment indicated moderate polymorphism (0.25 < PIC < 0.5) and Hardy–Weinberg equilibrium (p > 0.05) at these loci, suggesting no strong selective pressure. At the g.25388433 G > A locus, GA is the dominant genotype with G as the dominant allele. This locus also exhibits moderate polymorphism (0.25 < PIC < 0.5) and complies with Hardy–Weinberg equilibrium (p > 0.05). At the g.25393687 C > G locus, allele C is the dominant allele, with the dominant genotype being CG. This locus exhibits moderate polymorphism (0.25 < PIC < 0.5) and is in Hardy–Weinberg equilibrium (p > 0.05). At both the g.25127272 A > T and g.64978665 T > A loci, A is the dominant allele, and the AA genotype is the dominant allele; the g.25127272 A > T locus exhibited moderate polymorphism (0.25 < PIC < 0.5) but violated Hardy–Weinberg equilibrium (p < 0.05); the g.64978665 T > A site exhibits low polymorphism (0 < PIC < 0.25) and similarly fails to meet Hardy–Weinberg equilibrium (p < 0.05).
3.9. Correlation Analysis of Gross Hair Weight and Hair Length Traits
A genetic association analysis was conducted between SNP data from six loci and corresponding hair production traits in 759 randomly selected Tianzhu White Yak individuals. The results (Table 6, Table 7 and Table 8) showed that at the g.25393687 C > G locus, individuals with the GG genotype exhibited significantly higher gross hair weight than those with CC or CG genotypes (p < 0.05), while no significant difference was observed between CC and CG genotypes (p > 0.05). At the g.25500413 G > A and g.25388433 G > A loci, AA genotype individuals exhibited significantly longer head hair than GG and GA genotypes (p < 0.05), while GA genotype individuals had significantly longer head hair than GG genotypes (p < 0.05). At the g.25127272 A > T locus, TT genotype individuals exhibited significantly higher gross hair weight than AA and AT genotypes (p < 0.05), while AT genotype individuals showed significantly higher gross hair weight than TT genotype individuals (p < 0.05). At the g.25650960 G > A locus, AA genotype individuals exhibited significantly greater gross hair weight than GG and GA genotypes (p < 0.05), while GA genotype individuals showed significantly greater gross hair weight than GG genotype individuals (p < 0.05). At the g.64978665 T > A locus, TT individuals exhibited significantly longer flank hair than AA and TA individuals (p < 0.05), while no significant difference was observed between AA and TA individuals (p > 0.05).
Sanger sequencing confirmed variants at FGF5 (g.25393687 C > G, g.25388433 G > A), PRDM8 (g.25500413 G > A), CFAP299 (g.25127272 A > T), ANTXR2 (g.25650960 G > A), GPHB5 (g.64978665 T > A). These SNP loci effectively distinguish individuals with different gross hair weight and length traits, reflecting the genetic variation patterns of these traits.
3.10. Analysis of Linkage Disequilibrium at SNP Sites
The results are shown in Supplementary Figures S1 and S2. On chromosome 6, most significant SNP loci for both Gw and HL traits exhibited strong linkage disequilibrium (r^2^ > 0.7), forming one or more contiguous LD blocks. Therefore, the observed clustering of significant SNP sites in the Manhattan plots for Gw and HL traits is primarily driven by long-range linkage disequilibrium rather than the additive effects of multiple independent association signals. Moreover, the linkage disequilibrium structures between Gw and HL traits are highly consistent, with significant SNP sites showing substantial overlap within the same LD block. Furthermore, linkage disequilibrium analysis of five selected SNP sites on the same chromosome reveals strong linkage disequilibrium (r^2^ > 0.6) among these SNPs, as shown in Figure 8A. Figure 8B shows that Block1, comprising loci 6_25388433 and 6_25393687, generates two haplotypes through linkage: H1 (GC) and H2 (AG), with haplotype frequencies of 0.590 and 0.410, respectively.
4. Discussion
The Tianzhu White Yak generates economic value for herders through meat and milk production, while its hair also holds significant economic worth. Mammalian hair growth is influenced by multiple factors, including genetics, nutrition, and environment, with genetics playing a dominant role. Recent domestic and international research has focused on identifying genes associated with hair traits and elucidating their regulatory mechanisms. By analyzing molecular markers correlated with gross hair weight and hair length, candidate genes can be screened to enhance the precision of molecular-assisted evaluation and breeding for hair production performance in the Tianzhu White Yak [29].
This study used low-depth, whole-genome sequencing (LcWGS) with an average sequencing depth of approximately 1.35×. Low-depth sequencing combined with genotype imputation technology has been widely used in large-scale association studies due to its good balance between cost and statistical power, especially when population-matched, high-depth reference panels are available [30]. In our dataset, the sequencing quality was high (average alignment rate 98.84%), with an average ≥1× genome coverage of 59.64%, consistent with expectations for low-depth sequencing. To compensate for the low coverage, we used a population-matched reference panel (sequencing depth approximately 20×) consisting of 950 Tianzhu White Yaks to impute genotypes in 759 individuals with low-depth sequencing, achieving high imputation consistency (approximately 0.93–0.95). Therefore, the sequencing depth and imputation strategy employed are suitable for genome-wide association analysis.
In this study, the coefficient of variation for tail hair length, skirt hair length, flank hair length, head hair length, back hair length, and gross hair weight among 759 Tianzhu White Yaks ranged from 14.61% to 45.78%. This indicates moderate to high levels of phenotypic variation in hair-related traits within the population. This level of variation provides substantial potential for genetic improvement of hair-related traits. This finding is consistent with previous reports indicating substantial variation in hair length among Tianzhu White Yak populations, as well as the generally high level of genetic variability observed in hair fiber livestock, such as yaks and cashmere goats [31,32]. Significant positive correlations were observed between hair length at different body sites and gross hair weight, with the strongest correlation found between head hair length and gross hair weight (Pearson’s r = 0.41). This finding aligns with previous studies on breeds such as the Colombian sheep, suggesting that increased wool length may contribute to higher gross hair weight in the Tianzhu White Yak [33]. Furthermore, Bao Q et al. [23] identified multiple genetic regions influencing hair length and volume in Tianzhu White Yak through resequencing and copy number variation analysis. Its genome-wide association study confirmed that this trait exhibits a polygenic, small-effect genetic architecture.
Regarding hair length traits, this study identified 179 significant SNPs associated with the HL trait exclusively on chromosome 6, annotated to genes such as FGF5 and Cilia- and flagella-associated protein 299 (CFAP299). FGF5 has been repeatedly reported as a key regulator of hair growth and has previously been identified on chromosome 6 in Tianzhu White Yak, suggesting that this chromosome may represent a major QTL region controlling head hair length [23]. For skirt hair length, only two SNPs with genome-wide significance were identified, both annotated to the AFG2A gene. This gene encodes a mitochondrial m-AAA metalloprotease previously implicated in alopecia-like phenotypes. [34]. This suggests that mitochondrial homeostasis and energy metabolism may indirectly influence hair follicle growth. The CACNA1B gene was annotated exclusively for the BL trait. The α1B subunit encoded by this gene participates in calcium ion signaling during the development of human skin and hair follicle-associated cells [35]. Ca^2+^ signaling is a key regulator of proliferation and differentiation in hair follicle epithelial cells [36]. Chromosomal regions significantly associated with BSL were annotated to genes, including GPHB5 and RHOJ. They participate in regulating glucose and lipid metabolism, angiogenesis, and adhesion to skin and vascular endothelial cells, indicating that the cyclic growth of hair follicles depends on precise regulation of the skin’s vascular and metabolic microenvironment [37,38]. This study did not detect GWAS SNPs in TL, potentially due to the trait’s complex interaction with numerous small-effect loci and environmental factors. Similar observations have been made in genome-wide association studies of traits such as wool, body weight, and cashmere in sheep and goats, where certain traits exhibit very few or even no significant SNP loci. This is primarily attributed to their polygenic, small-effect genetic architecture and limitations in sample size [39,40].
This study simultaneously identified dense clusters of SNPs significantly associated with Gw and HL traits on chromosome 6. Such clustering typically arises not from the direct causal effects of each significant SNP, but rather reflects linkage disequilibrium between genotyping markers and underlying functional variants [41]. Linkage disequilibrium analysis revealed strong LD relationships among most significant SNPs on chromosome 6, forming one or more contiguous LD blocks. Previous GWASs have confirmed that regions exhibiting extended LD patterns tend to generate multiple association signals that are statistically significant but not genetically independent [42]. Gabriel suggests that these haplo-blocks could have been produced through the interplay of several possible mechanisms, including domestication, population subdivision, founding events, selection, and recombination hotspots [43]. Such extended linkage disequilibrium blocks are frequently observed in livestock populations. More importantly, Saber [44] argues that limited population size is often considered a primary cause of linkage disequilibrium due to the significant reduction in effective population size in most domesticated animals. The linkage disequilibrium on chromosomes 6 and 16 in this study of Tianzhu White Yaks may be due to the relatively limited population size. Smith and Haigh [45] first proposed genetic hitchhiking: this occurs when the frequency of a haplotype carrying a dominant allele increases rapidly, leading to a corresponding increase in the frequency of neighboring loci. Wolfgang [46] showed that when a strong mutation occurs and spreads through a population via directional selection, the frequency of neutral (or weakly selected) variants linked to it inevitably increases. In this study, the HL and Gw traits showed a large number of significantly correlated SNPs on chromosome 6, and the long-distance strong linkage disequilibrium between them may be due to the genetic hitchhiking.
GO enrichment analysis revealed that at the cellular component level, the cytoplasm pathway annotated the highest number of enriched genes, including FGF5, AFG2A, CFAP299, GPHB5, PTPN21, and Zinc Finger CCCH-Type Containing 14 (ZC3H14). Research indicates that FGF5 exerts a negative regulatory role in the mammalian hair growth cycle, with its activity being closely correlated with hair length and shearing yield [47]. Xiang [37] et al. demonstrated that GPHB5 is synthesized in the cytoplasm, undergoes glycosylation processing, and is secreted as a glycoprotein hormone. It is associated with basal metabolic rate and energy homeostasis, and may indirectly influence hair follicle growth by regulating skin metabolic status. Significant enrichment of cytoplasm-related entries indicates that hair shaft formation depends on cell division and intracellular regulation [48]. Additionally, KEGG enrichment analysis shows that CACNA1B and FGF5 are significantly enriched in the MAPK signaling pathway. Chen et al. [49] indicated that MAPK is crucial for stem cell maintenance, hair follicle growth, and transitions between different hair growth cycle phases. Previous studies have demonstrated that the MAPK signaling pathway plays a pivotal role in hair follicle development. It guides cell differentiation and activates stem cells, thereby shaping hair follicle morphology, highlighting its significance in hair follicle development and regeneration [50]. KEGG results also enriched the cAMP signaling pathway and Calcium signaling pathway. Matilde et al. [51] directly demonstrated that the classical cAMP/CRE-binding protein-signaling pathway mediated by adrenergic receptors regulates hair follicle stem cell (HFSC) activation and the hair growth cycle. Relevant studies indicate that adenosine exerts an anti-hair loss effect by activating the cAMP signaling pathway. This occurs through inhibiting GSK3β activity in human dermal papilla cells and activating the Wnt/β-catenin pathway [52]. Calcium signaling has been demonstrated to play a crucial role in regulating hair follicle stem cells. Gozde et al. [53] investigated the baldness associated with mutations in the voltage-gated calcium channel (VGCC) Cav1.2 underlying Timothy syndrome (TS), indicating that it can regulate hair follicle stem cell function and hair follicle homeostasis. Wang et al. [54] have established that Ca^2+^ influx plays a role in the mechanosensing of hair follicle stem cells (HF-SCs).
Genetic polymorphism analysis shows the six SNP loci related to hair traits have moderate polymorphism (PIC 0.25–0.50), indicating the Tianzhu White Yak population has relatively rich genetic diversity. Except for two loci, most SNP sites follow Hardy–Weinberg equilibrium, suggesting these regions are not under intense selection and hold potential for molecular breeding. Qi Zengyuan et al. [55] studied two SNPs in the third intron of the SCD gene and found that they exhibited low polymorphism and were in Hardy–Weinberg equilibrium, suggesting they are suitable genetic variants for use as molecular markers. This finding is consistent with the results of the present study. Among the two loci annotated to FGF5, the g.25393687 C > G and g.25388433 G > A loci showed extremely significant correlations (p < 0.01) with the gross hair weight and head hair length of Tianzhu White Yak, respectively. Disrupting the FGF5 gene in three sheep using CRISPR/Cas9 technology resulted in wool lengths significantly longer than those of wild-type sheep [56]. Similarly, studies on three rabbit breeds, including Angora rabbits, found significant associations (p < 0.05) between SNPs encoded by the FGF5 gene and fur yield [57]. The g.25127272 A > T site in the chromosome 4 open-reading frame 22 (C4orf22 gene downstream of FGF5) was also significantly correlated with gross hair weight in this study (p < 0.01). Bao Qi [31] analyzed GO and KEGG signaling pathways encompassing the CFAP299 hotspot region, primarily involving the MAPK signaling pathway, which aligns with the KEGG enrichment results of this study. PRDM8 plays a crucial regulatory role in cell proliferation, differentiation, and maturation [58]. In this study, the g.25500413 G > A site of PRDM8 showed a highly significant correlation with head hair length (p < 0.01). Dan et al. [59] confirmed through genome-wide resequencing that multiple genes, including PRDM8, play a crucial role in goat high-altitude adaptation, cashmere production, and climate change. Although the SNPs subjected to Sanger sequencing were randomly selected across the entire chromosome rather than confined to GWAS-associated regions, several SNPs located on chromosome 6 were found to be in strong linkage disequilibrium. Due to potential linkage disequilibrium structures, regions with high-density linkage signals tend to be overrepresented in randomly sampled markers [41]. Therefore, linkage disequilibrium among validated sites reflects the distribution of correlated markers on the chromosome rather than indicating multiple independent association signals [60]. To gain a deeper understanding of the biological functions of key candidate genes and further elucidate the genetic basis of yak wool weight and wool length traits, future research should combine functional validation experiments with broader multi-population genome-wide association studies.
5. Limitations
Several limitations of this study warrant clarification. First, although genome-wide association analysis identified multiple genetic loci and candidate genes associated with hair-related traits, functional validation experiments have not yet been conducted. Therefore, the biological roles of these candidate genes in hair follicle development require confirmation through future in vitro or in vivo studies. Second, two SNP loci showed deviation from Hardy–Weinberg equilibrium, possibly due to population structure, selection, or sampling effects. Third, all individuals were sampled from a single Tianzhu White Yak population. Although this population exhibits genetic stability and representativeness within its locality, validation in other populations is warranted.
6. Conclusions
This study conducted a genome-wide association analysis of gross hair weight and hair length traits across five distinct body regions of the Tianzhu White Yak, successfully identifying 519 SNPs reaching genome-wide significance and 767 SNPs reaching chromosomal significance. On chromosome 6, the most significant SNPs associated with GW and HL traits exhibited strong long-range linkage disequilibrium (r^2^ > 0.7), possibly due to genetic hitchhiking. Gene annotation identified 73 key candidate genes (including FGF5, CFAP299, and PRDM8), which were proposed as candidate factors potentially associated with the variation in gross hair weight and hair length traits. Genetic parameter analysis indicated that BSL exhibits moderate heritability, while BL demonstrates high heritability (h^2^ = 0.450), with all other traits showing low heritability. SNP genotypes validated by Sanger sequencing revealed that mutations in the FGF5, CFAP299, PRDM8, ANTXR2, and GPHB5 genes were significantly associated with the GW, HL, and BSL traits in Tianzhu White Yak. Linkage disequilibrium analysis indicated strong linkage disequilibrium (r^2^ > 0.6) at the validated SNP loci located on the same chromosome. These results provide candidate loci and genes for investigating the genetic basis of hair length and hair weight traits in white yak. However, functional validation experiments and additional independent populations are still required to verify the biological functions of these candidate loci with the traits.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Tüfekci H. Sejian V. Stress Factors and Their Effects on Productivity in Sheep Animals 202313276910.3390/ani 1317276937685033 PMC 10486368 · doi ↗ · pubmed ↗
- 2Baba M.A. Ahanger S.A. Hamadani A. Rather M.A. Shah M.M. Factors affecting wool characteristics of sheep reared in Kashmir Trop. Anim. Health Prod.2020522129213310.1007/s 11250-020-02238-132076995 · doi ↗ · pubmed ↗
- 3Olson T.A. Lucena C. Chase C.C. Hammond A.C. Evidence of a major gene influencing hair length and heat tolerance in Bos taurus cattle J. Anim. Sci.200381809010.2527/2003.81180 x 12597376 · doi ↗ · pubmed ↗
- 4Ding H. Zhao H. Cheng G. Yang Y. Wang X. Zhao X. Qi Y. Huang D. Analyses of histological and transcriptome differences in the skin of short-hair and long-hair rabbits BMC Genom.20192014010.1186/s 12864-019-5503-x · doi ↗
- 5Song K. Gao B. Halvarsson P. Fang Y. Jiang Y.-X. Sun Y.-H. Höglund J. Genomic analysis of demographic history and ecological niche modeling in the endangered Chinese Grouse Tetrastes sewerzowi BMC Genom.20202158110.1186/s 12864-020-06957-5 · doi ↗
- 6Han M. Wang X. Du H. Cao Y. Zhao Z. Niu S. Bao X. Rong Y. Ao X. Guo F. Genome-wide association study identifies candidate genes affecting body conformation traits of Zhongwei goat BMC Genom.2025263710.1186/s 12864-024-11097-1 · doi ↗
- 7Bett R.C. Kosgey I.S. Bebe B.O. Kahi A.K. Breeding goals for the Kenya dual purpose goat. II. Estimation of economic values for production and functional traits Trop. Anim. Health Prod.20073946747510.1007/s 11250-007-9013-517969710 · doi ↗ · pubmed ↗
- 8Wang J. Li X. Peng W. Zhong J. Jiang M. Genome-Wide Association Study of Body Weight Trait in Yaks Animals 202212185510.3390/ani 1214185535883402 PMC 9311934 · doi ↗ · pubmed ↗
