Genomic Analysis of Indel and SV Reveals Functional and Adaptive Signatures in Hubei Indigenous Cattle Breeds
Liangyu Shi, Pu Zhang, Bo Yu, Lei Cheng, Sha Liu, Qing Liu, Yuan Zhou, Min Xiang, Pengju Zhao, Hongbo Chen

TL;DR
This study explores genetic variations in Hubei cattle breeds, revealing insights into traits like meat quality and disease resistance through indels and structural variants.
Contribution
The study provides a comprehensive analysis of indel and SVs in Hubei cattle, highlighting their functional and adaptive significance.
Findings
Over 5 million indels were identified, many in non-coding regions linked to key traits.
Transposable elements significantly contributed to structural differences in the cattle genome.
A notable insertion in the NOTCH2 gene was validated, suggesting a role in bone remodeling and adaptation.
Abstract
Understanding genetic variation in cattle is essential for taking advantage of economically important traits such as meat quality, reproduction, and disease resistance. While most studies have focused on single nucleotide polymorphisms (SNPs), this study investigated small indel and structural variants (SVs) across five native cattle breeds from Hubei, China. Whole-genome sequencing of 98 individuals identified over 5 million insertions and deletions, many of which were located in non-coding regions but were still associated with key traits. Several variants, particularly in immune gene-rich regions, were linked to health and meat quality. Our analysis also revealed that transposable elements and simple repeats significantly contributed to these structural differences. A notable insertion in the NOTCH2 gene, which plays a role in bone remodeling by promoting osteoclast maturation and…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7- —Key R&D Project of the Department of Science and Technology of Hubei Province
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenetic and phenotypic traits in livestock · Genomics and Phylogenetic Studies · Chromosomal and Genetic Variations
1. Introduction
Cattle are essential to rural livelihoods for meat and dairy production, as well as trade worldwide [1,2]. Indigenous cattle breeds are important for genetic resource conservation due to their unique adaptations to local environmental conditions, including disease resistance and environmental adaptation [3,4,5]. Characterizing and conserving these breeds is crucial for understanding their genetic potential and improving livestock production.
Traditionally, genetic research on cattle has focused on single nucleotide polymorphisms (SNPs) to provide insights into the genetic control of traits such as production traits [6,7,8], meat quality [9,10], and disease resistance [11]. Moreover, small insertion-deletions (indels) and structural variants (SVs) also significantly affect phenotypes. Indels and SVs can influence gene dosage, disrupt coding sequences, or modify regulatory regions, thereby affecting gene expression and contributing to various phenotypes [12,13,14]. Moreover, compared to SNPs, indels and SVs affect more base pairs in the genome [15,16]. indels and SVs in immune-related genes, including those in the Jak-STAT and Toll-like receptor pathways, enhance parasite and pathogen resistance [17,18]. Additionally, SVs correlate with ecological gradients such as altitude, temperature, and dry climates, influencing heat tolerance, thermoregulation, and drought resilience [19,20,21]. More importantly, indigenous breeds harbor rare SVs that are mostly absent in commercial breeds, serving as critical reservoirs of adaptive diversity [22,23].
Beyond coding regions, indels and SVs frequently intersect with gene regulatory elements (REs) [24,25,26], thereby modulating gene regulation and splicing. Additionally, transposable elements (TEs), including long interspersed nuclear elements (LINEs) and short interspersed nuclear elements (SINEs), contribute to structural rearrangements by creating novel regulatory sites or disrupting existing ones [27,28]. TEs contribute to insertions and deletions and have shaped the evolution of ruminant interferon (IFN) responses, potentially influencing immune gene regulatory differences across modern breeds [29]. The Bov-tA1 TE has been implicated in immune response and adaptation in global cattle populations [30]. However, explicit analyses of their role in adaptation are limited.
The five Hubei indigenous breeds, situated in the center of China, display comparable production characteristics and overlapping distributions, with minor phenotypic and genetic divergence reported [31]. This study characterized the distribution of indels and SVs across five Hubei indigenous cattle breeds. We identified variation hotspots and explored their functional associations. By annotating the genome, we cataloged indels and SVs, mapped their distribution, and analyzed overlaps with gene structure, QTL, and REs. We further investigated TE-mediated changes and assessed genetic differentiation among these breeds. Our findings reveal genetic differences among Hubei indigenous cattle breeds, which may influence phenotypic traits and local adaptations.
2. Materials and Methods
2.1. Sample Collection, Genomic Resequencing Read Filtering and Alignment
Ear tissue samples were collected from 80 cattle representing four breeds from Hubei, including Dabieshan (n = 28), Wuling (n = 14), Yiling (n = 20), and Yunba (n = 18). The sampled animals were aged between 4 and 60 months. Additionally, sequencing data for the Zaobei breed (n = 18) were obtained from a previously published study [32]. All samples were sourced from five core breeding farms in Hubei Province. For each sample, paired-end sequencing libraries were prepared with an average insert size of 500 bp and a read length of 150 bp. High-throughput sequencing was performed using the BGI MGI-T7 platform (MGI Tech Co., Ltd., Shenzhen, China).
Raw sequencing reads underwent quality control using Trimmomatic (v0.39) [33] to remove adapter sequences and low-quality bases, retaining only reads longer than 50 bp with sufficient quality. The filtered reads were then aligned to the Bos taurus reference genome (ARS-UCD1.3; GCA_002263795.3) using BWA-MEM (v0.7.17) [34]. The aligned reads were sorted and indexed with Samtools (v1.10) [35], and duplicate reads were marked using GATK MarkDuplicates (v4.1.4.1) [36].
2.2. Variant Calling and Filtering
Variant calling analysis includes the detection of single nucleotide polymorphisms (SNPs) and insertions (INSs) and deletions (DELs). The INSs and DELs comprise small indels and structural variations (SVs). All identified indels and SVs were categorized as deletions or insertions and further classified by size: Small (110 bp), Medium (1150 bp), and Large (>50 bp) [37,38].
SNPs and indels calling were performed using HaplotypeCaller [39] in GATK to generate GVCF files for each sample. SNPs and indels were extracted separately using GATK SelectVariants and subjected to quality filtering with GATK VariantFiltration. SNP filtering was based on the following criteria: QualByDepth (QD) < 2.0; Quality (QUAL) < 30.0; StrandOddsRatio (SOR) > 3.0; FisherStrand (FS) > 60.0; RMSMappingQuality (MQ) < 40.0; MappingQualityRankSumTest (MQRankSum) < −12.5; and ReadPosRankSumTest (ReadPosRankSum) < −8.0. Indels were filtered with QD < 2.0, QUAL < 30.0, FS > 200.0, and ReadPosRankSum < −20.0. Only biallelic variants with a missing genotype rate < 0.1 were retained using Bcftools (v1.10.2). Additionally, if an indel was within 10 bp of another indel, the one with the lower QUAL score was removed [40]. These filters were implemented using a custom R script.
SVs were detected using a graph-based genotyping strategy (Figure 1). Four software tools were applied with default parameters: Manta (v1.6.0) [41], Delly (v1.3.1) [42], Wham (v1.7.0) [43], and Smoove (v0.2.8) (https://github.com/brentp/smoove/, accessed on 20 April 2025). Only deletions and insertions were identified. SVs of the same type with an overlap greater than 50 bp were merged using SURVIVOR [44]. The candidate SVs were then genotyped with vg software [45,46,47] for each sample, and further filtering was applied to retain only those with a missing rate below 30% and a minor allele frequency (MAF) greater than 0.01 by Vcftools (v0.1.17) [48].
2.3. Identification of Insertions and Deletions Hotspots
For insertions and deletions, chromosomes were divided into non-overlapping 100 Kb bins [49]. Regions where the breakpoints ranked in the top 1% were classified as INS and DEL hotspots.
2.4. Identification of Genomic Repetitive Sequences in Hubei Cattle
Genomic repetitive sequences, including transposable elements (TE) and tandem repeats, play essential roles in genome evolution and function. Annotation of these sequences was performed using RepeatMasker (v4.1.7) (https://www.repeatmasker.org/, accessed on 20 April 2025) with two reference libraries: RepBase (v201880126) [50] and Dfam (v3.8) [51]. Various TE classes were identified, including DNA transposons, long terminal repeat (LTR) retrotransposons, short interspersed nuclear elements (SINEs), and long interspersed nuclear elements (LINEs). To ensure that genomic repetitive sequences were the primary component of insertions and deletions, only those where the TE length accounted for more than 80% of the SV length were considered in the analysis.
2.5. Functional Annotation of Deletions and Insertions in Regulatory and Functional Genomic Regions
Variants annotation was performed using ANNOVAR (v2020Jun08) [44]. Variants were classified into six groups: exonic regions and splice sites, noncoding RNA regions, intronic regions, 5′ and 3′ untranslated regions (UTRs), upstream and downstream regulatory regions, and intergenic regions.
To examine the overlap of indels and SVs with QTLs and regulatory elements (REs), 192,336 QTLs were obtained from the Cattle Quantitative Trait Locus Database (Cattle QTLdb) [52]. The RE dataset [53] included regulatory elements across multiple tissues, such as adipose, cerebellum, cortex, hypothalamus, liver, lung, muscle, and spleen.
To evaluate whether INS and DEL variants overlapped with annotated QTLs in Cattle QTLdb and REs, we performed Z-score calculations and permutation tests using the regioneR package (v1.34.0) [54]. A total of 100 permutations were conducted to assess statistical significance.
2.6. Functional Annotation of Indels and SVs in Regulatory and Functional Genomic Regions
We performed linkage disequilibrium (LD) analysis using PLINK to evaluate r^2^ between SNPs and indels, as well as SNPs and SVs. Variants were categorized based on r^2^: high LD (r^2^ ≥ 0.8), medium LD (0.2 ≤ r^2^ < 0.8), and low LD (r^2^ < 0.2). To further explore regulatory associations, we examined the mapping of SNPs to expression quantitative trait loci (eQTL) and splicing quantitative trait loci (sQTL). eQTL and sQTL data were retrieved from the FarmGTEx database [55], which includes expression data from 37 tissues, such as blood, colon, embryo, kidney, leukocytes, lymph nodes, macrophages, mammary gland, multiple muscle subtypes, reproductive tissues, and various other organs.
2.7. Population Structure Analysis
Principal component analysis (PCA) of SNPs, indels, and SVs was carried out using Plink (v1.90) [56]. To assess the genetic relationship between each pair of breeds, pairwise genetic differentiation (Fst) was estimated using Vcftools (v0.1.17) [48]. For different length indel analysis, a sliding window approach was used, with a 50 kb window size and a 20 Kb step. For SV analysis, Fst base on SV frequencies were calculated within each breed pair. The top 1% of genomic regions were identified as potential selective regions.
2.8. Annotation and Enrichment Analysis of Indels and SVs
To investigate the functional enrichment of genes affected by genes located in hotspots and potential selective regions, GO and KEGG pathway analyses were performed using WebGestalt [57,58] (https://www.webgestalt.org/, accessed on 20 April 2025).
2.9. PCR Validation of the NOTCH2 67 bp Insertion
To validate the presence of the 67 bp insertion identified in the fourth intron of NOTCH2, PCR genotyping was performed using genomic DNA extracted from ear tissue samples of Zaobei, Wuling, and Yunba cattle. A pair of primers flanking the insertion site was designed based on the Bos taurus reference genome (ARS-UCD1.3) (forward primer: ACCTTCCAACCAGCAGTGTA; reverse primer: TGGTTGAAGCATGGCCTCTG). The PCR amplification was carried out in a 10 μL reaction system containing 5 μL Taq DNA polymerase (Takara, Shiga, Japan), 3 μL nuclease-free water, 0.5 μL of each primer, and 1 μL of genomic DNA. The cycling conditions included an initial denaturation at 95 °C for 5 min, followed by 35 cycles of 94 °C for 30 s, 62.8 °C for 30 s, and 72 °C for 1 min, with a final extension at 72 °C for 10 min. The PCR products were separated by 2% agarose gel electrophoresis.
3. Results
3.1. Overview of Resequencing Data and Identified Variants in Hubei Indigenous Cattle
A total of 98 cattle from five indigenous breeds in Hubei Province underwent whole-genome resequencing at an average depth of ~20×, ranging from 17.8× to 28.7×. The mapping rate of reads varied between 97.03% to 99.89%, with an average of 99.72%. The sampled individuals included 25 males and 73 females from five breeds: Dabieshan (n = 28), Wuling (n = 14), Yiling (n = 20), Yunba (n = 18), and Zaobei (n = 18).
After quality control, 31,716,252 SNPs, 5,278,767 indels, and 12,653 SVs were identified. To further investigate the distribution patterns of insertions (INSs) and deletions (DELs), 2,082,604 INSs and 3,208,816 DELs were identified (Figure 2a). Small variants accounted for the majority of both INSs and DELs. The average length of small INSs was 2.10 bp, while small DELs averaged 2.39 bp. Large variants exhibited significantly greater lengths and variation, particularly for deletions, which had an average length of 1027.03 bp, with a maximum length of 87,101 bp (Figure 2b). The length distribution of INSs and DELs decreases rapidly with increasing length, with DELs consistently outnumbering INSs across all length categories (Figure 2c–e).
3.2. Insertions and Deletions Overlap with Genes, Regulatory Elements and QTLs
To assess the genomic distribution of INSs and DELs, all identified variants were annotated by genomic region (Figure 3). In total, 44,844 INSs and 71,197 DELs were detected. Most variants were located in intergenic (67.6276.12%) and intronic regions (15.6926.44%), while only a small fraction overlapped with exonic regions (0.602.75%), untranslated regions (UTRs) (0.390.74%), and upstream/downstream regions (1.58~2.99%) (Figure 3a). INSs and DELs were strongly depleted in coding regions (CDS, exon, gene, and mRNA), with Z-scores ranging from −132.73 to −4.04 (Figure 3b). In contrast, pseudogenes and pseudogenic transcripts showed enrichment (Z-scores: 1.81 to 6.94). Small INSs and DELs displayed the depletion in non-coding RNA (ncRNA) regions, with Z-scores of −1.79 and −3.02, respectively.
Overlap analysis between INSs and DELs and the reported QTLs revealed that most detected variants were located within QTL regions. By length, 1.92% of INSs and 1.69% of DELs overlapped with QTLs associated with meat and carcass traits, particularly smaller insertions. This was followed by overlaps with health-related QTLs (1.69%) and milk production traits (0.86%) (Figure 3c). Both INSs and DELs overlapped with QTLs across all major trait categories at rates significantly higher than expected by chance, with notable enrichment in QTLs related to exterior, health, meat and carcass, milk, production, and reproduction traits (Figure 3d). Health QTLs showed the strongest enrichment signals. All length classes of INSs and DELs had positive Z-scores in health QTLs (ranging from 4.99 to 28.08), with small INSs and DELs showing the highest values, indicating strong enrichment in health-related functional regions. In contrast, all variant types showed depletion in exterior, meat and carcass, and reproduction QTLs. For production traits, small and medium INSs showed depletion (Z = −2.87 and −2.16, respectively), large DELs showed weak depletion (Z = −1.83), while large INSs showed enrichment (Z = 2.28). These findings suggest that INSs and DELs may play regulatory roles in the phenotypic expression of these traits.
A total of 42.12 Kb of INSs and 81.71 Kb of DELs overlapped with candidate REs, including 23.64 Kb within genebody (23.64 Kb/149.77 Mb, 0.02%) and 59.36 Kb TSS (59.36 Kb/133.48 Mb, 0.04%). These INSs and DELs exhibited similarly low frequencies across different tissues (Supplementary Figure S1).
3.3. Distribution of Insertions and Deletions Hotspots
To characterize the genomic distribution of regions enriched for INSs and DELs, we identified hotspots as genomic regions with a high density of insertions and deletions (Figure 4). A total of 254 hotspots were detected, encompassing 116,040 insertion and deletion variants. The insertions and deletions within these hotspots were most abundant on chromosomes 12, 23, 15, and X, with a clear clustering pattern. By comparing the hotspots with known QTLs, we identified 135 hotspots overlapping with 1594 QTLs, and 76 hotspots for meat and carcass showed the highest hotspot count, including hotspots such as shear force and marbling score.
In the 69.772.8 Mb region of chromosome 12, 12,341 insertions and deletions were identified, with annotations for two genes: TUBGCP3 and DCUN1D2. Additionally, ENSBTAG00000026070 was annotated as ncRNA intronic. Two clusters were annotated on chromosome 23, located at 25.626.8 Mb and 28.5~30.0 Mb. These regions included two annotated genes: CARMIL1 and OR14J1.
To assess the potential biological implications of these hotspots, we performed GO/KEGG pathway enrichment analyses on genes located within these regions. The analysis of hotspots has a total of 70 GO terms and 26 KEGG pathways (FDR < 0.05) (Table S1).
3.4. Repeat-Driven DEL and INSs
We investigated the role of transposable elements (TEs) and simple repeats in INSs and DELs. These TEs may have influenced gene function by altering regulatory elements, disrupting coding sequences, or facilitating genomic rearrangements (Figure 5). No TEs or simple repeats were detected among small INSs and DELs. A total of 41.79% of the large DELs were driven by TEs, and 45.68% of the large INSs were driven by TEs, mainly located in intergenic (Figure 5a).
A total of 2.20% of the large and medium DELs and 2.92% of the large and medium INSs were associated with simple repeats. Repeat units of length 2 showed the highest frequency of INSs and DELs, with medium DELs being predominant (n = 4851). Both INS and DEL counts showed a decreasing trend with increasing repeat unit length from 3 to 10. DELs were consistently more frequent than INSs across all repeat lengths.
LINE and SINE elements were the predominant TE categories, with LINE elements showing the highest frequency. LINE/L1 and SINE/Core-RTE elements were more frequently observed in the 2550 bp, likely due to the higher abundance of medium-sized INS and DEL in this category. Notably, SINE/Core-RTE elements showed a distinct peak at 150 bp, with most fragments clustering within the 120150 bp range. Over 98% of these SINE/Core-RTE elements were identified as BOV-A2.
The majority of these TEs and simple repeats were located in intergenic regions. A total of 3194 genes contained these elements. Among them, PRKG1 had the highest number (23). It was followed by CSMD3 (20), PCDH15 (19), and CTNNA3 (19).
3.5. LD-Tag
A total of 9,041,468 SNPs were found to be in LD with INS and DEL related to eQTLs, and 4,700,300 SNPs were in LD linked to sQTLs. Across both eQTL- and sQTL-linked INSs and DELs, small variants (≤10 bp) represented the majority, whereas large variants (>50 bp) were relatively rare. For eQTL-related variants, only five INSs showed low LD with surrounding SNPs. For sQTL-related variants, 1690 INSs and 488 DELs exhibited low LD.
Tissue-specific patterns were observed for low-LD variants, particularly in reproductive and metabolic tissues (Figure 6). Among eQTL-linked variants, higher proportions of low-LD variants were found in muscle and mammary tissues, while lower proportions were detected in blood and monocytes (Figure 6a). Large variants contributed only 104 pairs of total LD-tagged variants and were primarily found in muscle and uterus. For sQTL-linked variants, large INSs and DELs showed the highest relative proportion in the low-LD group compared to the medium- and high-LD categories. The highest counts of low-LD large variants were observed in conceptus, muscle, and pharyngeal tonsil.
3.6. Population Genetic Differentiation Based on Fst Analysis
Principal component analysis (PCA) based on SNPs, indels, and SVs revealed that Dabieshan cattle were the most genetically distinct among the five Hubei indigenous breeds (Supplementary Figure S2). To further investigate population differentiation, pairwise Fst values were calculated using small, medium, and large INSs and DELs (Supplementary Figures S3–S5). Among the comparisons, the Dabieshan vs. Wuling pair exhibited the highest Fst values. Overall, the mean pairwise Fst values indicated low genetic differentiation among the five breeds (Supplementary Figure S6). However, Wuling cattle consistently exhibited slightly higher levels of differentiation from the other breeds. Fst values for small indels ranged from 0.0040 (Yiling vs. Zaobei) to 0.0323 (Dabieshan vs. Wuling), medium indels from 0.0038 to 0.0296, and large indels from 0.0009 to 0.0208. Across all size ranges, the highest differentiation consistently occurred between Dabieshan and Wuling.
In general, large INSs and DELs showed higher Fst values compared to medium and small variants. When comparing Wuling cattle to the other breeds, larger variants tended to result in elevated Fst values. Among all breed comparisons, the Dabieshan vs. Wuling contrast yielded the highest Fst values across all INSs and DELs classes, indicating substantial genetic divergence between these two populations. Wuling cattle also showed differentiation from Yunba and Yiling breeds, whereas its comparison with Zaobei cattle resulted in relatively lower, but still noticeable, levels of genetic divergence.
To explore potential regions under selection, we identified genes located within the top 1% of Fst windows for small and medium INSs and DELs, as well as the top 1% of Fst sites for large INSs and DELs across different size classes (Table 1). Several genes were shared across multiple comparisons. For instance, UBXN2B was identified in both the Wuling vs. Yunba and Wuling vs. Yiling comparisons, while RUNX1 appeared in both the Wuling vs. Dabieshan and Wuling vs. Zaobei comparisons. Notably, the Wuling vs. Zaobei comparison yielded the largest number of shared genes.
3.7. NOTCH2 Gene
In the Fst analysis across multiple Hubei indigenous cattle populations, a significant differentiation signal was detected in the NOTCH2 gene region. A 67 bp INSs located in the fourth intronic regions of NOTCH2 showed high genetic differentiation between Zaobei and Wuling cattle. Notably, the INS was identified as LINE/L1-derived elements. This gene was also detected in both large-sized Fst outlier regions when comparing Zaobei cattle with Yunba. The insertion was present on both chromosomes in Zaobei cattle but appeared as a single-copy insertion in Wuling and Yunba (Figure 7). To validate this variant, PCR primers were designed to flank the insertion site, and genotyping was performed across individuals from the three populations (Supplementary Figure S7). These patterns suggest that this insertion represents a population-specific variant in NOTCH2, potentially shaped by local adaptation or historical selection pressures.
4. Discussion
Structural variants and small indels are increasingly recognized as significant contributors to genetic diversity and phenotypic variation in livestock, such as disease resistance and growth [59]. For example, a 108-bp insertion in SPN was linked to tuberculosis resistance in East Asian breeds [20]. Our study provides a detailed characterization of INSs and DELs in five indigenous cattle breeds in Hubei. Indigenous cattle breeds are crucial genetic reservoirs, harboring unique variations associated with adaptation to local environmental stressors such as disease challenges, climatic extremes, and resource limitations. Our analysis offers insights into the importance of these variants in shaping genetic diversity and environmental adaptation. Dabieshan cattle exhibited the highest indel frequency and predominantly deletions. As a representative Chinese indigenous breed, Dabieshan cattle inhabit the surrounding areas of the Dabie Mountains and the middle and lower reaches of the Yangtze River [60]. This elevated mutation numbers might reflect specific adaptive responses to local environmental pressures, given that Dabieshan cattle are widespread across diverse geographical regions including mountainous areas and riverine environments. These unique adaptive pressures likely drive breed-specific evolutionary dynamics.
The distribution of INSs and DELs reflects strong purifying selection, as shown by their depletion in coding regions, likely due to selective pressure against disruptive mutations in essential genes [61,62]. In contrast, their enrichment in pseudogenes and pseudogenic transcripts reflects a possible role in driving pseudogenization [63]. Many pseudogenes originate from INSs and DELs that disrupt gene function [64]. Processed pseudogenes originate from mRNA that lacks regulatory elements, making them nonfunctional from the start [65]. These elements accumulate INSs and DELs faster than functional genes [66], highlighting the role of structural variants in gene inactivation. Similarly, INSs and DELs occurred at low frequencies in regulatory elements (REs), likely due to evolutionary constraints on transcription factor binding site (TFBS) spacing and motif arrangement. Compensatory mechanisms such as enhancer redundancy and TFBS turnover help maintain regulatory function despite sequence variation [67,68,69,70].Trait-specific patterns of enrichment further support the role of INSs and DELs. Health QTLs showed consistent enrichment, especially for small and medium variants, suggesting a potential regulatory role in complex, multifactorial health traits [71]. QTLs associated with reproduction, milk production, and other economically important traits showed depletion, indicating stronger purifying selection in these regions to preserve essential functions [72,73].
The high frequency of INSs and DELs observed in dinucleotide repeats (repeat length = 2) is likely due to replication slippage, a common mechanism in short tandem repeats that promotes strand misalignment during DNA replication [74,75]. In contrast, longer repeat units (3~10 bp) exhibit increased sequence stability and are less prone to such slippage events [76]. Additionally, mismatch repair systems may more effectively recognize and correct errors in longer, more complex repeats [77]. Transposon insertions can disrupt gene function, alter gene expression, and induce chromosomal rearrangements [28]. These effects contribute to genome evolution by introducing genetic variability and structural changes [78]. Genomic hotspot analyses identified chromosomes 12, 23, 15, and X as enriched regions for INSs and DELs, with meat and carcass traits showing the strongest overlap between hotspots and QTLs. In particular, shear force and marbling score accounted for 18 and 14 hotspots, respectively, emphasizing the selective importance of these traits in Hubei beef cattle [79,80].
Variation in body conformation, reproductive performance, and immune regulation in Hubei cattle appear to be interconnected through overlapping genetic pathways. The insertions and deletions identified in this study are concentrated in growth-related genes such as TUBGCP3 [81,82], CTNNA3 [83,84,85,86], CSMD3 [87,88]. A suite of growth- and immune-related genes further modulate reproductive traits. UBXN2B overlap QTLs for carcass weight, intramuscular fat deposition and age at first calving, as shown by QTL [89,90] and CNV analyses [91]. Moreover, immune-related genes, including those in the MHC region such as OR14J1 contribute to immune-reproductive interactions [92]. Functional enrichment analyses point out key pathways, namely, MHC class II complex assembly, peptide antigen binding, and T-cell differentiation, all being critical for embryo implantation, immune tolerance, and pregnancy maintenance. These findings underscore the complex genetic regulation of reproductive traits in cattle. Autoimmune-related pathways, such as systemic lupus erythematosus [93] and type 1 diabetes [94], can disrupt reproductive outcomes by causing immune and endocrine imbalances, potentially increasing the risk of miscarriage and pregnancy complications. The superior immune characteristics of Hubei indigenous cattle are essential for their resilience to local disease challenges. A CNV in DCUN1D2 is associated with disease resistance [95]. CARMIL1 plays a role in immune modulation, influencing IL-1-mediated ERK activation [96] and impacting neuroimmune interactions [97].
SVs and small indels that overlap coding exons, promoters, or annotated QTLs represent promising genomic markers for breed identification and selection in indigenous Hubei cattle. This study presents SVs and small indels across five indigenous breeds, providing new insights into genetic diversity. Many polymorphisms are located in loci related to immunity, reproduction, and carcass traits, offering hypotheses for potential trait-associated mechanisms. However, the functional interpretation remains preliminary. Moderate sample sizes per breed limit statistical power. Short-read may fail to detect complex or repetitive structural. In addition, the lack of matched transcriptomic or chromatin-accessibility data limits our ability to infer regulatory impacts in non-coding regions. As a result, many candidate variants are located in intergenic, where their phenotypic effects are likely context-dependent and difficult to detect without integrative data. Future studies should combine long-read sequencing and multi-omics integration. Functional validation approaches such as genome editing will also be essential to confirm causality and identify truly breed-specific loci. Despite current limitations, the dataset presented offers a valuable genomic resource that will support the dissection of adaptive variation and promote precision breeding strategies in Chinese indigenous cattle.
5. Conclusions
Genome-wide investigation into insertions and deletions in Hubei indigenous cattle provides insights into adaptation and genetic diversity. We identified 3,208,816 deletions and 2,082,604 insertions across five breeds, revealing hotspots in regions enriched with immune-related genes and pathways. Transposable elements were common and may contribute to local adaptation. Insertions and deletions were associated with traits such as meat quality, disease resistance, and reproduction. Smaller variants were linked to appearance and health, while larger variants were enriched in production-related regions. The NOTCH2 gene showed high population differentiation and is a potential candidate for adaptation in immune and reproductive pathways. These findings provide valuable genomic resources that can support future breeding strategies to improve livestock productivity and environmental adaptation.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Gilbert M. Nicolas G. Cinardi G. Van Boeckel T.P. Vanwambeke S.O. Wint G.R.W. Robinson T.P. Global distribution data for cattle, buffaloes, horses, sheep, goats, pigs, chickens and ducks in 2010 Sci. Data 2018518022710.1038/sdata.2018.22730375994 PMC 6207061 · doi ↗ · pubmed ↗
- 2Latawiec A.E. Strassburg B.B. Valentim J.F. Ramos F. Alves-Pinto H.N. Intensification of cattle ranching production systems: Socioeconomic and environmental synergies and risks in Brazil Animal 201481255126310.1017/S 175173111400156626263189 · doi ↗ · pubmed ↗
- 3Kim K. Kwon T. Dessie T. Yoo D. Mwai O.A. Jang J. Sung S. Lee S. Salim B. Jung J. The mosaic genome of indigenous African cattle as a unique genetic resource for African pastoralism Nat. Genet.2020521099111010.1038/s 41588-020-0694-232989325 · doi ↗ · pubmed ↗
- 4Guan X. Xiang W. Qu K. Ahmed Z. Liu J. Cai M. Zhang J. Chen N. Lei C. Huang B. Whole genome insights into genetic diversity, introgression, and adaptation of Yunnan indigenous cattle of Southwestern China BMC Genom.20252621610.1186/s 12864-024-11033-3PMC 1188151240038604 · doi ↗ · pubmed ↗
- 5Buggiotti L. Yurchenko A.A. Yudin N.S. Vander Jagt C.J. Vorobieva N.V. Kusliy M.A. Vasiliev S.K. Rodionov A.N. Boronetskaya O.I. Zinovieva N.A. Demographic History, Adaptation, and NRAP Convergent Evolution at Amino Acid Residue 100 in the World Northernmost Cattle from Siberia Mol. Biol. Evol.2021383093311010.1093/molbev/msab 07833784744 PMC 8321547 · doi ↗ · pubmed ↗
- 6Gualdron Duarte J.L. Yuan C. Gori A.S. Moreira G.C.M. Takeda H. Coppieters W. Charlier C. Georges M. Druet T. Sequenced-based GWAS for linear classification traits in Belgian Blue beef cattle reveals new coding variants in genes regulating body size in mammals Genet. Sel. Evol.2023558310.1186/s 12711-023-00857-438017417 PMC 10683324 · doi ↗ · pubmed ↗
- 7Niu Q. Zhang T. Xu L. Wang T. Wang Z. Zhu B. Zhang L. Gao H. Song J. Li J. Integration of selection signatures and multi-trait GWAS reveals polygenic genetic architecture of carcass traits in beef cattle Genomics 20211133325333610.1016/j.ygeno.2021.07.02534314829 · doi ↗ · pubmed ↗
- 8Sanchez M.P. Tribout T. Kadri N.K. Chitneedi P.K. Maak S. Hoze C. Boussaha M. Croiseau P. Philippe R. Spengeler M. Sequence-based GWAS meta-analyses for beef production traits Genet. Sel. Evol.2023557010.1186/s 12711-023-00848-537828440 PMC 10568825 · doi ↗ · pubmed ↗
