Genome-Wide Association Study and Whole-Genome Resequencing Reveal Key lincRNA and Candidate Gene for Polled Phenotype in Yak
Chuang Zhong, Ziying Wang, Shujie Liu, Zhian Zhang, Zhanhong Cui

TL;DR
This study identifies a key lincRNA and candidate genes linked to the polled (hornless) trait in yaks, offering insights for breeding.
Contribution
The study reveals a conserved genetic basis for the polled trait across different yak breeds using GWAS and resequencing.
Findings
A 273.6 kb region on chromosome 1 was significantly associated with the polled trait in Xueduo yaks.
Candidate genes EPCIP, OLIG1, and PAXBP1 were identified near the associated region.
Ashdan and Xueduo yaks share the same candidate genes and some genetic variants for the polled trait.
Abstract
The polled trait enhances yak breeding and management efficiency, making it a key breeding objective aligned with the developmental goals of modern animal husbandry. In this study, horned Xueduo yaks, polled Xueduo yaks, and Ashdan yaks (a polled breed derived from Datong yaks) were selected as research subjects. Using genome-wide association study (GWAS) technology, we aimed to identify key genes regulating the polled trait in yaks and to clarify whether the same genetic loci underlie the polled phenotype across different yak breeds, thereby providing theoretical support and technical guidance for polled yak breeding. Yaks are important livestock in high-altitude regions, and their polled trait can effectively improve breeding and management efficiency. In this study, whole-genome resequencing combined with a GWAS was employed to identify a significantly associated region of…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5- —Agriculture and Rural Affairs Department of Qinghai Province, Yak Breeding Collaboration Research Project of Qinghai Province in 2024
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenetic and phenotypic traits in livestock · Genetic Mapping and Diversity in Plants and Animals · Animal Vocal Communication and Behavior
1. Introduction
China accounts for over 90% of the world’s yak population, with Qinghai Province supporting over one-third of the national herd. As a core production area for the yak industry in China and globally, Qinghai possesses diverse yak genetic resources and is regarded as a primary center for yak domestication [1]. The Xueduo yak, alongside the Datong yak and the Huanhu yak, is one of the “Three Great Yaks of Qinghai”. The Datong yak is valued for its rapid growth, high meat production capacity, and strong stress resistance, while the Xueduo yak excels in meat quality and body conformation. Breeding programs combining these two breeds offer considerable potential and advantages. Among these, the Xueduo yak possesses the largest horns, which are characterized by a thicker base, a rounded and elongated shape, and a wide inter-horn distance. The horns form a double-arched, non-closed circular pattern. The primary functions of horns include defending against predators and warding off intruding males during mating competition, reflecting long-term evolutionary adaptation [2]. However, within contemporary livestock production systems, to mitigate potential risks associated with horns, dehorning procedures are routinely applied to young animals. These methods include cauterization with heated iron, mechanical removal of horn buds, or the application of caustic ointments to inhibit subsequent horn development [3]. With the advancement of livestock industry philosophies, animal health and welfare have been incorporated into broader breeding objectives [4]. In this context, the polled trait is considered a key characteristic aligned with these objectives, as it significantly reduces the health hazards posed by horns to other livestock and caretakers in domestic settings, as well as the tissue damage caused by horn trimming during meat processing procedures [5]. Furthermore, polled populations show no significant differences in the estimated breeding values for 12 production traits (including growth and carcass traits) compared to horned populations and may even exhibit advantages in some traits. This effectively safeguards animal health and welfare while maintaining productivity [6].
Research on the genetic regulation of the polled trait in ruminants has yielded significant breakthroughs, providing crucial insights for analyzing horn phenotypes in yaks. As early as 1993, the genetic locus for polledness was mapped to the proximal centromere region of bovine chromosome 1 [7]. A horn-suppressing gene is controlled by two alleles (with polled being dominant over horned), exhibiting a dominant inheritance pattern [8]. Subsequent research further identified four DNA sequence variants associated with polledness on chromosome 1: the Celtic POLLED variant (P_C_), Friesian POLLED variant (P_F_), Mongolian POLLED variant (P_M_), and Guarani POLLED variant (P_G_) [9]. Among these, the P_C_ variant primarily regulates the polled trait in beef cattle (e.g., Simmental), which involves a complex insertion–deletion within a 212 kb interval on chromosome 1 [10,11]; the P_F_ variant primarily regulates it in dairy cattle (e.g., Holstein), with 182 candidate variants within a 932 kb region on chromosome 1, centered on an 80 kb segmental duplication [12,13]. The P_M_ variant chiefly controls polledness in Mongolian yak and Mongolian Turano cattle, dependent on the insertion of a 219 bp segmental duplication [14]. The P_G_ variant primarily governs polledness in Brazilian Nellore cattle, which is mediated by copy number variation in an approximately 110 kb repeat. As the most extensively studied mutation types, both P_C_ and P_F_ are located in non-coding intergenic regions [15]. In GWAS analyses, significant association sites within such regions are often overlooked due to their presumed lack of direct protein-coding function. However, polled research revealed higher expression of lincRNA#1 (a long intergenic non-coding RNA annotated as LOC100848368) in the horn bud region of horned fetuses compared to polled fetuses, suggesting the existence of an intergenic regulatory mechanism mediated by lincRNAs in yaks [16]. LincRNAs are evolutionarily conserved functional sequences, and their presence is essential for organismal survival or development [17]. Previous studies successfully generated polled dairy cattle by precisely introgressing the polled allele (P_C_) from beef cattle breeds into dairy cell lines, in combination with somatic cell nuclear transfer (SCNT) cloning techniques [18]. Subsequent breeding of these genome-edited polled bulls (P_C_) with horned dairy cows resulted in all calves displaying the polled trait [19]. Recent studies precisely knocked out the full sequence of lincRNA#1 in bovine embryos, confirming that this lincRNA knockout does not cause fetal death, and all offspring displayed the polled phenotype post-birth [20]. Although the precise mechanisms by which lincRNAs exert their effects remain incompletely understood [21], it is known that lincRNAs can function as cis-acting elements to regulate the expression of neighboring genes [22]. The above evidence indicates that lincRNA are not “non-functional fragments”, but possess the capacity to influence the methylation of coding genes and regulate elements such as enhancers and silencers [23], thereby potentially regulating the expression of keratin-coding genes and influencing horn presence or absence. When lincRNA is normally expressed, keratin-coding genes function correctly; however, mutations such as insertions, deletions, or duplications in lincRNA may lead to the non-expression or misexpression of adjacent keratin-coding genes, thereby causing the polled phenotype. Xu et al. [24] further demonstrated that cranial neural crest stem cells—the common cellular origin of horns in ruminants—are regulated by multiple genetic factors, including RXFP2, FOXL2, HOXD1, MTX2, ZEB2, and TWIST1. However, although a 200 kb lincRNA associated with horn presence was identified in the Datong yaks and showed signals of recent artificial selection, no structural variation identical to that of known polled alleles in cattle (e.g., P_F_ and P_C_) was found; instead, candidate genes SYNJ1, C1H21orf62, and PAXBP1 were identified, which do not correspond to known polled regulatory genes in cattle. Chai et al. demonstrated that all domestic yak populations exhibit a single domestication origin and show close genetic proximity [25]. However, in genome-wide SNP analysis, the Datong yak exhibited 18,061,823 SNPs, whereas the Xueduo yak showed only 10,394,482 SNPs [26]. Preserving polledness as a desirable trait during cross-breeding remains a critical challenge, necessitating in-depth comparative studies of genetic differences in polledness between yak breeds to elucidate the similarities and differences in genetic backgrounds and regulatory loci governing polledness. Therefore, this study aims to identify the key genes regulating the polled trait in yaks, clarify whether consistent genetic regulatory loci underlie the polled phenotype across different yak breeds, and provide theoretical support and technical guidance for the breeding of polled yaks.
2. Materials and Methods
2.1. Experimental Design and Sample Collection
In this study, 45 healthy adult female yaks were selected, comprising both horned and polled Xueduo yaks, as well as Ashdan yaks (a polled breed originating from Datong yaks). Based on horn phenotype (horned or polled) and breed, the animals were divided into three groups of 15 individuals each: horned Xueduo yaks, polled Xueduo yaks, and Ashdan yaks.
2.2. Measurement of Body Weight and Morphometric Traits
Body weight and morphometric data were collected from all animals in a single session by experienced livestock technicians. Body weight was measured with an electronic scale before morning feeding. The following morphometric traits were measured using a measuring stick: withers height (vertical distance from the highest point of the withers to the ground), body length (straight-line distance from the anterior edge of the shoulder to the posterior edge of the pin bone), chest width (horizontal distance between the outer edges of the shoulder blades at their posterior margin), and rump width (horizontal distance between the outer edges of the hip bones). Chest girth (the circumference perpendicular to the body axis at the posterior margin of the shoulder blades) was measured with a flexible tape.
2.3. Whole-Blood DNA Extraction and Whole-Genome Resequencing Analysis
Genomic DNA was isolated from thawed whole-blood samples using a magnetic bead-based universal DNA extraction kit (Tiangen Biotech (Beijing) Co., Ltd., Beijing, China; catalog No. DP705). DNA concentration and integrity were assessed using an Agilent 5400 AATI instrument (Agilent Technologies, Inc., Santa Clara, CA, USA). Sequencing libraries were constructed using the Rapid Plus DNA Lib Prep Kit for Illumina (New England Biolabs, Ipswich, MA, USA; catalog No. RK20208) according to the manufacturer’s instructions. After sample qualification, DNA was randomly fragmented using a Covaris ultrasonicator (Covaris, Inc., Woburn, MA, USA). The library preparation involved end repair, A-tailing, adapter ligation, size selection, PCR amplification, and purification to obtain the final DNA library. Following library validation, paired-end 150 bp (PE150) sequencing was performed on the Illumina NovaSeq X Plus platform (Illumina, Inc., San Diego, CA, USA).
Raw sequencing reads were quality-filtered using fastp v1.1.0 [27] with the parameters -z 6 -g -q 5 -u 50 -n 15. The filtering criteria were as follows: (1) the removal of read pairs containing adapter sequences; (2) the exclusion of paired reads in which ≥10% of bases in either read were unidentified (N); (3) the discarding of read pairs if over 50% of bases had a quality score (Q) ≤ 5. High-quality clean data were obtained for downstream analysis. Clean reads were aligned to the yak reference genome (Bos_grunniens.LU_Bosgru_v3.0.dna.toplevel.fa, downloaded from Ensembl release 114) using BWA (version 2.2.1, parameters: mem -t 4 -k 32 -M) [28]. The resulting alignment files were sorted and deduplicated using samtools (version 0.1.19) [29] to generate final BAM files for variant detection.
2.4. SNP Detection and Annotation
Population single-nucleotide polymorphism and insertion/deletion (indel) detection was performed using bcftools (version 1.21) [30]. The mpileup and call commands (parameters: -m -v) were used to identify variant sites. A total of 21,092,310 initial SNP sites were stringently filtered based on the following criteria: (1) minimum depth ≥ 3 per sample per SNP; (2) maximum missing rate per SNP ≥ 0.1; (3) minor allele frequency (MAF) ≥ 0.05. After filtering, 10,142,457 high-quality SNP loci were retained for downstream analysis.
Functional annotation of the filtered SNPs was carried out with ANNOVAR software (version 2013) [31] to determine the physical location and functional class of each SNP. Annotations included upstream regions, 3′UTR, 5′UTR, exonic regions (synonymous, non-synonymous, and stop–gain/loss variants), intronic regions, splice sites, downstream regions, and intergenic regions.
2.5. Population Structure Analysis
2.5.1. Principal Component Analysis (PCA)
PCA was performed to reveal patterns of genetic differentiation within the population. The analysis was conducted using GCTA software (version 1.24.2) [32], which is well suited for genome-wide complex trait analysis and effective for resolving population structure from SNP data. A high-quality dataset of 10,142,457 SNP loci, free of significant missing data, errors, and biases, was used as input. We set the parameter—pca 3 to calculate the first three principal components based on inter-individual SNP allele frequency differences. The three principal components calculated above capture the majority of population genetic variation, thereby reflecting individual clustering patterns. Finally, the results were visualized in a three-dimensional scatter plot to intuitively present the distribution of individuals in genetic space, assess population stratification, and infer genetic relatedness.
2.5.2. Admixture-Based Genetic Structure Analysis
Genetic structure and individual ancestry were inferred using ADMIXTURE software (version 1.23) [33], which employs a model-based Bayesian algorithm. The same high-quality SNP dataset used for PCA was analyzed to ensure consistency. We performed independent runs for potential ancestral population numbers (K) ranging from 1 to 10 using the default expectation–maximization algorithm. For each K, five-fold cross-validation (CV = 5) was conducted to calculate the cross-validation error. The optimal K (K_opt) was identified as the value corresponding to the lowest cross-validation error, as visualized in a plot of CV error versus K. This model accurately reflects the population’s true genetic structure, estimates each individual’s ancestral component proportions, quantifies genetic background differences, and reveals patterns of population admixture or genetic differentiation.
2.5.3. Construction of Population Phylogenetic Tree
To visualize evolutionary relatedness among individuals and populations, a phylogenetic tree was constructed. We used TreeBest software (version 1.92) to construct a tree rapidly from molecular sequence data (http://treesoft.sourceforge.net/treebest.shtml).
An inter-individual genetic distance matrix, calculated from the high-quality SNP dataset using the Kimura 2-parameter model (which estimates DNA base substitution rates), served as the input. The neighbor-joining (NJ) method, based on the minimum evolution criterion, was employed to efficiently generate the optimal tree topology by iteratively merging the most closely related taxa. Finally, the branching and clustering patterns of the resulting tree were used to infer evolutionary relationships and validate the population’s genetic grouping.
2.6. Linkage Disequilibrium Analysis
We used PopLDdecay [34] to calculate genome-wide linkage disequilibrium (LD) across all pairs of single-nucleotide polymorphisms (SNPs), using the squared correlation coefficient (r^2^) as the measure. The r^2^ value ranges from 0 to 1, with values closer to 1 indicating stronger genetic linkage and higher LD between loci, and values closer to 0 indicating weaker LD. The coefficient is computed as follows:
where D is the absolute linkage disequilibrium between two loci; and are the frequencies of the major alleles, and and are the frequencies of the minor alleles, with = 1 − and = 1 − .
The decay of r^2^ with increasing physical distance was visualized by plotting LD decay curves. From this analysis, the estimated LD decay distance was 10,000 bp, a key parameter for designing the subsequent GWAS.
2.7. Genome-Wide Association Study
The GWAS was conducted using the Mixed Linear Model (MLM) implemented in GEMMA software (release 0.98.5) [35]. This model incorporates population genetic structure as a fixed effect and individual kinship as a random effect, thereby correcting for their potential influence on association signals. The significance threshold for associations was defined by the Bonferroni correction. For SNPs that showed significant associations, candidate genes were screened within an LD window upstream and downstream of their physical positions and then functionally annotated.
Here, y represents the target phenotypic trait of this study, namely the polled/horned status of yaks, which was recorded as a binary variable. X denotes the design matrix for fixed effects, and α represents the estimated parameters associated with these fixed effects; together, they quantify the influence of population genetic structure on the phenotype, thereby correcting for false positive associations arising from population stratification. Z denotes the design matrix for SNPs, and β denotes the effects of SNPs; these two components characterize the strength of the association between each individual’s genotype at genome-wide SNP loci and the target phenotype, serving as the core indicator for identifying trait-associated loci. W denotes the design matrix for random effects, and μ denotes the predicted random individual-specific effects; collectively, they quantify the impact of individual kinship on the phenotype to mitigate false positive associations induced by kinship. e denotes the random residual error, encompassing the effects of non-genetic factors (e.g., environmental factors, phenotypic measurement errors) that are not accounted for by the model. The significance threshold for association analysis was set using the Bonferroni correction. For SNPs significantly associated with the polled/horned trait, candidate genes were screened within one LD window upstream and downstream of their physical positions and then functionally annotated.
2.8. Construction of Haplotype Maps
Haplotypes at significant association loci were identified based on the physical positions of trait-associated SNPs and the corresponding LD analysis results.
2.9. Data Statistics and Analysis
Statistical analysis was performed on the six morphometric traits defined in Section 2.2: body weight, withers height, body length, chest girth, chest width, and rump width. After verifying the normality (Shapiro–Wilk test) and homogeneity of variances (Levene’s test) for each trait, an independent two-tailed Student’s t-test was used to compare differences between horned (n = 15) and polled (n = 15) female Xueduo yak groups.
3. Results
3.1. Phenotypic Comparison Between Horned and Polled Individuals and Analysis of Body Weight and Morphometric Measurements in Female Xueduo Yak
Typical phenotypes of horned and polled yaks are shown in Figure 1. Horned individuals (Figure 1A,C) exhibit fully developed horn structures, in contrast to polled individuals (Figure 1B,D), which completely lack horn tissue.
Body weight and morphometric measurements of the horned and polled groups of female Xueduo yaks are shown in Table 1.
3.2. Results of SNP Distribution and Functional Annotation
In this study, whole-genome resequencing of 45 yaks yielded 10,142,457 high-quality SNPs which were then annotated and evaluated. The SNPs were widely distributed across the genome, with intergenic regions being the most prevalent (63.32%), followed by intronic regions (33.88%) (Table 2).
3.3. Population Genetic Structure Analysis of the Experimental Yak Population
To comprehensively investigate the genetic background, differentiation patterns, and their association with the horned/polled trait within the experimental population, we employed an integrated analytical approach comprising Admixture analysis (Figure 2A), cross-validation error analysis (Figure 2B), phylogenetic tree construction (Figure 2C), and three-dimensional principal component analysis (Figure 2D).
3.3.1. Analysis of Ancestral Components and the Optimal K Value Using Admixture
To infer the ancestral components of the population, we performed Admixture analysis on the high-quality SNP dataset (Figure 2A). Cross-validation error analysis indicated that the error was minimized when the assumed number of ancestral populations was set to K = 2 (Figure 2B), suggesting that the current population most likely evolved from the admixture of two ancestral genetic components. Under this optimal model (K = 2), both the Ashdan yaks (Z1–Z15) and the polled Xueduo yaks (Z22–Z42) were predominantly composed of Cluster 1 (dark blue), indicating a convergent genetic background. In contrast, the horned Xueduo yaks (Z50–Z69) exhibited a more complex and heterogeneous genetic composition, with both Cluster 1 and Cluster 2 (light blue) present in varying proportions across individuals. These results clearly demonstrate that the polled populations, shaped by directional selection, have formed a subgroup with a more homogeneous genetic background, whereas the horned population retains richer ancestral genetic diversity.
3.3.2. Phylogenetic Relationship Reconstruction
The neighbor-joining phylogenetic tree, constructed from genome-wide SNP distances (Figure 2C), further clarified the evolutionary relationships among the groups. It revealed limited branch intermingling between Xueduo and Ashdan yaks, indicating clear genetic differentiation between these breeds. Within the Xueduo population, although “horned” and “polled” individuals shared some genetic background (as seen in partial branch overlap), they formed two relatively distinct clades. This pattern suggests that despite originating from the same breed, strong artificial selection for the polled trait and phenotype-based mating have driven genomic homogenization within the polled subgroup. As a result, this subgroup has diverged evolutionarily from the more genetically diverse horned group, which retains a genetic background closer to the ancestral state.
3.3.3. Principal Component Spatial Distribution Analysis (3D-PCA)
The three-dimensional principal component analysis (Figure 2D) results provide an intuitive spatial visualization that supports the above findings. As shown in the figure, all polled individuals (including polled Xueduo yaks and Ashdan yaks) are tightly clustered in the PCA plot, indicating that they share a similar genetic composition. In contrast, horned Xueduo individuals are more dispersed in the plot, reflecting higher genetic diversity within this group. These results are consistent with the aforementioned Admixture and phylogenetic tree analyses, collectively revealing that long-term selective breeding for the polled trait has led to a systematic and convergent purification of the population’s genetic background.
3.4. Results of Genome-Wide Association Study
Through genome-wide association analysis of horned and polled Xueduo yak populations (Figure 3A), we detected association signals exceeding the genome-wide significance threshold (–log_10_(P) ≥ 5) on chromosomes 1, 3, and 4. A highly significant association peak (–log_10_(P) = 19.89) was identified on chromosome 1, exceeding the significance threshold (interval: 36,313,286–36,586,879 bp, spanning approximately 273.6 kb). This region falls within a lincRNA domain and contains a total of 1001 significant SNPs. Adjacent to this, we identified three key candidate genes: EPCIP, OLIG1, and PAXBP1. Upon incorporating the Ashdan yak into the joint analysis (Figure 3B), this region on chromosome 1 remained highly significant (interval: 36,279,403–36,586,795 bp, spanning approximately 307.3 kb; −log_10_(P) = 15.84), encompassing 994 significant SNPs and retaining proximity to the two protein-coding genes EPCIP and PAXBP1. QQ plots (Figure 3C,D) reveal that observed p-values deviate significantly from expected values at the tails of the distribution, while aligning closely with theoretical distributions across most intervals. This demonstrates that the association analysis model effectively controls for false positives, confirming that the detected signals originate from genuine genetic effects.
3.5. Identification, Haplotype Analysis, and Validation of Core Variant Clusters for the Polled Trait Using Linkage Inheritance
3.5.1. Haplotype Block Characteristics and Variant Locus Screening in the Core Associated Region
By screening horned and polled yak populations from Xueduo, we identified a 273.6 kb lincRNA region for in-depth analysis, in which 204 indels were detected. Haplotype analysis (Figure 4) revealed a large linkage block (~146.3 kb; 36,367,387–36,513,726 bp) within this region. This block contains 1280 SNPs (including 629 significant SNPs) and 118 indels.
3.5.2. Identification, Quality Control, and Genetic Characteristic Analysis for Cross-Population Shared Core SNPs/INDELs
Through comparative analysis of polled versus horned Xueduo yaks and of Ashdan versus horned Xueduo yaks, followed by integrated validation, we identified 69 SNPs and 6 key indels within the previously mentioned 146.3 kb linked region that were shared by polled Xueduo yaks and Ashdan yaks but absent in horned Xueduo yaks. These variant sites are shared by polled Xueduo yaks and Ashdan yaks but not detected in horned Xueduo yaks, making them potential candidate causal variants for the polled phenotype in yaks. After additional stringent quality control, we finalized 6 core SNPs (QUAL: 3752.93–4158.79; DP: 431–493) and 6 key indels (QUAL: 2900.75–4039.32; DP: 388–469) (Table 3). Genotypic analysis revealed that all of the aforementioned SNP sites were homozygous for the wild-type allele in horned Xueduo yaks, whereas polled Xueduo yaks and Ashdan yaks were predominantly heterozygous (Het) at these loci. Notably, Ashdan yaks, an independently bred breed developed through long-term selective breeding, exhibited a higher degree of homozygosity at these variant sites. This suggests that genetic variation in this region—whether heterozygous or homozygous—may disrupt normal lincRNA function. Notably, these variant sites cluster within a core associated region spanning 36.35–36.43 Mb and show strong linkage. LD analysis (Figure 5) revealed an LD decay distance of approximately 10 kb in this yak population. This rapid LD decay indicates that the marker density used in this study is adequate for fine-mapping association analysis.
4. Discussion
Population genetic structure analyses—including Admixture, phylogenetic tree, and PCA—consistently demonstrated that polled yaks (both Xueduo and Ashdan), subjected to long-term, high-intensity selection, have formed a subgroup with a homogeneous genetic background. Notably, Ashdan yaks, an independently bred breed developed through prolonged selective breeding, exhibit a higher degree of homozygosity at polled-related variant sites. In contrast, horned Xueduo yaks retained greater genetic diversity, exhibiting a more complex genetic composition closer to the ancestral state. This differentiation pattern stems from directional breeding practices, whereby individuals with similar genetic backgrounds were preferentially selected for reproduction, while those with substantial genetic differences were progressively eliminated. Such rapid genetic homogenization driven by intense artificial selection has also been reported in other livestock, such as Holstein cattle [36]. Collectively, this population genetic analysis provides an indispensable theoretical foundation for subsequently identifying key genomic regions controlling the horn phenotype.
GWAS of horned and polled Xueduo yaks revealed a highly significant association peak on chromosome 1 (–log_10_(P) = 19.89). When Ashdan yaks were included in the joint analysis, this genomic region still maintained extremely high significance (–log_10_(P) = 15.84) and was closely adjacent to the EPCIP and PAXBP1 genes. Compared with polled Xueduo yaks, both horned Xueduo yaks and Ashdan yaks demonstrated significant SNP and Indel variations in this core region, indicating that Xueduo and Ashdan yaks share a common genetic basis for the polled trait. Selecting for polledness across these breeds does not significantly increase the genetic complexity of this trait. Notably, similar findings emerged from genome-wide studies on the polled trait across Angus, Hereford, and Limousin cattle, as well as in 376 Brangus, Droughtmaster, and St. Gertrudis cattle (a composite breed of double-humped and humpless cattle) [14]. Analysis of the EPCIP and PAXBP1 genes revealed that EPCIP is not involved in neural crest cell development. GWAS analysis of horned versus polled populations of Datong yaks has successfully identified a 200 kb lincRNA and the PAXBP1 gene. However, the role of the PAXBP1 gene in horn formation, and whether the lincRNA influences PAXBP1 expression via long-range regulatory mechanisms to participate in controlling the horn-presence trait in yaks, remains unexplained [37]. PAXBP1 (PAX7 and PAX3 Binding Protein 1) functions as a linker protein, connecting transcription factors PAX3 and PAX7 to histone methylation mechanisms [38]. PAX3 encodes a crucial transcription factor expressed in neural crest cells [39], while PAX7 is indispensable for regulating neural crest patterning and differentiation [40,41]. Neural crest cells are pluripotent stem cells capable of differentiating into multiple cell types, including craniofacial cartilage, skeletal muscle, and smooth muscle [42]. They share a common cellular origin with the horns of ruminants, and the migration of neural crest cells plays a key role in horn development [43]. Consequently, abnormalities in the PAX3/PAX7 pathway lead to severe craniofacial defects, a conclusion validated in human disorders (e.g., Waardenburg syndrome) and mouse cleft palate models [44,45]. This clearly demonstrates the crucial role of PAXBP1 in neural crest development. Given that neural crest stem cells are the cellular origin of horns [46], we hypothesize that PAXBP1 likely regulates horn formation in ruminants by modulating neural crest cell function. Furthermore, we successfully mapped the key polled phenotype regulator PAXBP1 and its associated lncRNA to the cattle BTA1 region, which is consistent with findings from horn phenotype studies in other cattle breeds [7,16,47]. This not only demonstrates evolutionary conservation in the core genetic pathways governing horn development across ruminants, but also provides direct guidance for polled yak breeding programs.
In the yak genome, intergenic regions harbor the highest proportion of RNA editing sites (38.15%), followed by genic regions (28.38%) [48]. This distribution is consistent with the conclusion that the yak genome is enriched in intergenic regions (accounting for 63.32%) and also confirms the potential functional value of regulatory elements in intergenic regions. Notably, the 146.3 kb strong linkage block, which is within the 273.6 kb lincRNA region on chromosome 1, is the core interval regulating the polled trait in yaks. This block contains 6 core SNPs and 6 key Indels. Genotype distribution analysis shows that horned Xueduo yaks are homozygous for the wild-type allele at all these loci, while polled populations (polled Xueduo yaks and Ashdan yaks) are predominantly heterozygous for these variants. This genetic pattern conserved across breeds not only verifies the stability of the genetic basis underlying the polled trait but also further highlights the important role of the lincRNA region and the core loci it carries in regulating the polled trait. Some research labels these transcripts as lincRNAs, while other sources omit the “intergenic” definition and refer to them simply as lncRNAs [49]. LncRNAs lack protein-coding capability [50,51], and they frequently interact with one or more proteins to regulate gene expression both locally at the transcription site (cis) and at other cellular locations (trans). They participate in biological processes including RNA transcription and splicing, mRNA transport, stability and translation, as well as protein stability and post-translational modifications [52]. Furthermore, lncRNAs can form chromatin loops to interact with target gene promoters outside their immediate vicinity, enabling precise regulation of neighboring protein-coding gene expression [53]. Compared with protein-coding genes, lincRNAs typically exhibit lower expression levels and are more specific to certain tissues or cell types [54]. They participate in diverse biological processes including cell proliferation, morphogenesis, pluripotency, development, neural processes, and gametogenesis [55]; these key non-coding regulators have been extensively implicated in research concerning cancer, tumors, and innate immunity [56,57,58]. Furthermore, the core lincRNA region on chromosome 1 identified in this study is exactly adjacent to the key candidate gene PAXBP1 and exhibits a strong linkage relationship. This spatial distribution characteristic suggests that the lincRNA may affect the function of PAXBP1.
The finding of this study concerning the role of lincRNA in regulating horn formation is not the first of its kind. Aldersey et al. confirmed through Hi-C data analysis that the four polled alleles in cattle (P_C_, P_F_, P_M_, and P_G_) are all enriched within a region of approximately 300 kb on chromosome 1, and each of them resides within distinct topologically associating domains (TADs). Among these, TAD1, which harbors the P_C_ allele, contains two lincRNAs (LOC104970778 and LOC112447121), while the key lincRNA#1 (LOC100848368) is located within TAD2. These lincRNAs exhibit colocalization features with horn development-related genes such as LOC526226 and OLIG2, either within the same TADs or in their immediate genomic vicinity [59]. This implies that lincRNAs may participate in the formation of the polled trait in cattle by regulating the expression of neighboring genes through local chromatin interactions. Furthermore, studies on goat horns have also identified two key structural variants located within intergenic regions on chromosome 1 as major loci influencing the polled phenotype in goats [60]. This further confirms the critical importance of genetic variants in the non-coding region of chromosome 1 in regulating horn development in ruminants. Notably, a regulatory relationship between lincRNA and the PAXBP1 gene has been reported in relevant studies, providing a reference for understanding the functional connection between them. Specifically, Wang et al. found that in lead-exposed SH-SY5Y neuroblastoma cells, the expression of the antisense lincRNA of PAXBP1 (PAXBP1-AS1) was significantly downregulated. This lncRNA is known to regulate cell proliferation and apoptosis and may influence the expression or function of PAXBP1 through its antisense action [61]. This provides an important reference for understanding the molecular mechanism by which lincRNA regulates PAXBP1 in yaks. Furthermore, previous studies have shown that fetuses with lincRNA#1 knockout are all born with a polled phenotype [20]. This suggests that lincRNAs likely regulate horn formation by controlling key protein-coding genes, identified in this study as PAXBP1. In horned yaks, the lincRNA gene is intact and functional; it is expressed in embryonic horn bud cells, where it activates or maintains the horn development program, thereby promoting horn formation [16]. In polled yaks, however, mutations in lncRNAs—such as loss-of-function variants (e.g., the 212 kb deletion at the P_C_ locus), gain-of-function variants (e.g., the 80 kb duplication at the PF locus), or clustered indels (e.g., the 146 kb dense indel cluster)—can disrupt the function of the protein-coding genes regulated by these lncRNAs. Our study identifies PAXBP1 as a key gene affected through this mechanism. Such disruption ultimately leads to the polled phenotype.
Notably, this study has certain limitations, including a relatively small sample size and a limited number of included breeds. These constraints hinder the complete elucidation of the regulatory relationship between the identified lincRNA and PAXBP1, as well as the underlying mechanisms by which they influence the horn phenotype in a broader range of yak breeds. To strengthen the research conclusions, future studies should expand the sample size and incorporate additional polled yak breeds (e.g., Gannan yaks, Huanhu yaks, and Tianzhu white yaks) in a multi-breed joint GWAS to verify the specificity and stability of the associated genetic loci. Furthermore, the target lincRNA could be specifically knocked out using CRISPR-Cas9 technology. Combining this with molecular biology techniques such as quantitative real-time PCR (qPCR) and Western blotting would allow for the detection of subsequent changes in PAXBP1 mRNA and protein levels in horn bud tissues. This approach would elucidate the molecular mechanism by which the lincRNA regulates PAXBP1 expression, thereby providing a stronger theoretical foundation and technical basis for marker-assisted selection of polled yaks.
5. Conclusions
This study identified a 273.6-kilobase lincRNA region on chromosome 1 of Xueduo yaks which harbors a cluster of tightly linked genetic variants. This region may regulate horn development by modulating the expression of its adjacent gene, PAXBP1—a core neural crest development gene that is closely linked to the cellular origin of horns in ruminants. In addition, the genetic basis of the polled trait (including key coding genes and a subset of variant loci) is shared between Xueduo yaks and Ashdan yaks, which suggests that the genetic regulatory mechanism of the polled phenotype has a certain degree of conservation between these two yak populations. Collectively, these findings provide a valuable theoretical foundation for the genetic improvement and marker-assisted breeding of polled yaks.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Guo S. Savolainen P. Su J. Zhang Q. Qi D. Zhou J. Zhong Y. Zhao X. Liu J. Origin of mitochondrial DNA diversity of domestic yaks BMC Evol. Biol.200667310.1186/1471-2148-6-7316995938 PMC 1626082 · doi ↗ · pubmed ↗
- 2Schafberg R. Swalve H. The history of breeding for polled cattle Livest. Sci.2015179547010.1016/j.livsci.2015.05.017 · doi ↗
- 3Cozzi G. Gottardo F. Brscic M. Contiero B. Irrgang N. Knierim U. Pentelescu O. Windig J. Mirabito L. Eveillard F.K. Dehorning of cattle in the EU Member States: A quantitative survey of the current practices Livest. Sci.201517941110.1016/j.livsci.2015.05.011 · doi ↗
- 4Miglior F. Fleming A. Malchiodi F. Brito L.F. Martin P. Baes C.F. A 100-Year Review: Identification and genetic selection of economically important traits in dairy cattle J. Dairy Sci.2017100102511027110.3168/jds.2017-1296829153164 · doi ↗ · pubmed ↗
- 5Mc Connachie E. Hötzel M.J. Robbins J.A. Shriver A. Weary D.M. von Keyserlingk M.A.G. Public attitudes towards genetically modified polled cattle P Lo S ONE 201914 e 021654210.1371/journal.pone.021654231075123 PMC 6510451 · doi ↗ · pubmed ↗
- 6Randhawa I.A.S. Mc Gowan M.R. Porto-Neto L.R. Hayes B.J. Lyons R.E. Comparison of Genetic Merit for Weight and Meat Traits between the Polled and Horned Cattle in Multiple Beef Breeds Animals 20211187010.3390/ani 1103087033803763 PMC 8003249 · doi ↗ · pubmed ↗
- 7Georges M. Drinkwater R. King T. Mishra A. Moore S.S. Nielsen D. Sargeant L.S. Sorensen A. Steele M.R. Zhao X. Microsatellite mapping of a gene affecting horn development in Bos taurus Nat. Genet.1993420621010.1038/ng 0693-2068348158 · doi ↗ · pubmed ↗
- 8Harlizius B. Tammen I. Eichler K. Eggen A. Hetzel D.J. New markers on bovine chromosome 1 are closely linked to the polled gene in Simmental and Pinzgauer cattle Mamm. Genome 1997825525710.1007/s 0033599004049096105 · doi ↗ · pubmed ↗
