Genomics Insights Into High‐Latitude Adaptation of Tibetan Macaques
Rusong Zhang, Ying Hu, Yang Teng, Jiwei Qi, Yangzhen Ciren, Lin Zhang, Qiao Du, Wencai Xu, Liang Zhou, Zhenxin Fan, Jinchuan Xing, Ming Li, Jing Li

TL;DR
Tibetan macaques have evolved shortened tails and increased fat storage to survive in cold, high-latitude regions, with genetic evidence pointing to specific mutations and adaptations.
Contribution
Discovery of a TBX6 mutation linked to tail morphology and genomic evidence of lipid metabolism adaptations in Tibetan macaques.
Findings
A homozygous Pro71Thr mutation in TBX6 causes reduced caudal vertebrae in Tibetan macaques.
Genomic evidence shows enhanced lipid metabolism and fat storage adaptations in Tibetan macaques.
Tibetan macaques have 9.3-fold more abdominal fat than rhesus macaques, aiding thermoregulation.
Abstract
Few nonhuman primates inhabit high‐latitude regions that pose significant adaptive challenges. The Tibetan macaque (Macaca thibetana) represents a rare primate species entirely distributed north of the Tropic of Cancer. To investigate the genetic basis underlying its adaptation to high latitudes, we generated a refined Tibetan macaque reference genome (99.41% completeness). Genomic analyses identified a species‐specific homozygous mutation (Pro71Thr) in the TBX6 gene, which potentially explains their characteristic shortened tail morphology. Functional validation using CRISPR‐Cas9‐edited mice demonstrated that this mutation reduces caudal vertebrae count, providing a mechanistic basis for the shortened tail. Quantitative CT revealed that Tibetan macaques accumulated approximately 9.3‐fold more abdominal fat than rhesus macaques. Genomic analysis uncovered enhanced lipid metabolic…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
FIGURE 1
FIGURE 2
FIGURE 3
FIGURE 4| Tibetan macaque (old) | Tibetan macaque (new) | Rhesus macaque | Crab‐eating macaque | Pig‐tailed macaque | |
|---|---|---|---|---|---|
| BUSCO (protein mode) | 81.8% | 99.4% | 99.6% | 98.6% | 99.0% |
| Average gene length (bp) | 39883 | 69527 | 42086 | 46811 | 45893 |
| Average CDS length (bp) | 1505 | 2007 | 2094 | 2237 | 2037 |
| transcripts per gene | 1 | 2.57 | 2.62 | 2.97 | 2.65 |
| exons per transcript | 8.53 | 11.71 | 11.84 | 12.92 | 10.94 |
- —National Natural Science Foundation of China10.13039/501100001809
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrimate Behavior and Ecology · Retinoids in leukemia and cellular processes · Adipose Tissue and Metabolism
Introduction
1
Mammals inhabiting high latitudes encounter distinct environmental challenges relative to their low‐latitude counterparts, primarily prolonged cold stress and pronounced seasonal resource scarcity. These critical ecological factors impose significant selective pressures, driving the evolution of diverse morphological, physiological, behavioral, and genetic adaptations to mitigate cold exposure and seasonal constraints. Following the Bergmann's rule [1] and Allen's rule [2], the morphological adaptation of mammals at high latitudes tends to develop larger body sizes and shorter appendages or tails, enhancing thermoregulatory efficiency and minimizing heat loss. Moreover, thick fur is also favored by high latitude mammals for effective heat retention. Such physiological and genetic changes are vital for mammalian success in high‐latitude environments. For example, the polar bear (Ursus maritimus), an archetypal high‐latitude mammal, stores extensive adipose in its body (constituting up to 50% of body mass) for energy storage or thermoregulation [3, 4]. Correspondingly, genes associated with adipose tissue development (e.g., APOB) are under strong positive selection in polar bear genome [5]. In snub‐nosed monkeys (Rhinopithecus spp.), cold high‐latitude environments may have promoted group aggregation from one‐male, multifemale groups into large multilevel societies. Meanwhile, both the positively selected genes and the rapidly evolving genes in snub‐nosed monkey genome were significantly enriched in Gene Ontology terms and pathways related to lipid metabolism and adipocyte differentiation [6]. In humans, the TRPM8 gene encoding a core receptor for moderate cold perception shows a strong latitudinal cline in allele frequency driven by positive selection, fine‐tuning thermosensation and response to cold environments [7]. Therefore, the molecular mechanisms underpinning high‐latitude adaptation vary substantially among mammals, and their characterization is essential for understanding the evolutionary processes shaping these critical adaptive phenotypes.
Despite the wide distribution, primates as an order exhibit comparatively lower cold tolerance than many other mammalian orders, such as Carnivora, bats, and rodents. Although primates were historically widespread and diverse at high latitudes, reaching approximately 50°N in North America and Europe during the Eocene, their distributions dramatically retreated due to global climatic events, glacial cycles, and geological changes [8]. Extant nonhuman primates are predominantly associated with tropical or subtropical climates, with most taxa confined to regions between 30°N and 30°S [8, 9, 10]. In contrast, the Tibetan macaque (Macaca thibetana), an endemic species to China, occupies relatively high‐latitude montane forests spanning 23°48'N to 33°00'N from southwestern to southeastern China [11]. Within Macaca, only the Japanese macaque (M. fuscata) and a subspecies of rhesus macaque (M. mulatta tcheliensis) surpass the Tibetan macaque in latitudinal distribution (Figure 1A). Notably, the Tibetan macaque is one of the few primate species distributed entirely north of the Tropic of Cancer. This biogeographic specialization raises a question: through what mechanisms has this species adapted to colder environments at higher latitudes?
Distribution map of macaque species and phylogenetic relationships among macaque species. (A) Distribution map of macaque species. (B) Field photograph of a Tibetan macaque (Macaca thibetana). (C) Phylogenetic tree constructed for 14 primates. Numbers near branches indicate estimated divergence times, with 95% confidence interval of divergence time in parentheses. Gene family significant expansion (+) and significant contraction (−) are shown in green and red above branches, respectively. The adult male body mass, relative tail length, and caudal vertebrae number for each species are annotated adjacent to the corresponding scientific names and macaque illustrations. All macaque illustrations copyright 2013 Stephen D. Nash / IUCN SSC Primate Specialist Group. Used with permission.
Tibetan macaques (Figure 1B) exhibit morphological adaptations characterized by a shortened tail relative to other Macaca species, consistent with Allen's rule. The average tail length is only 8.5 cm, with a relative tail length (tail length/head‐body length) of approximately 12%. This is significantly shorter than that of the closely‐related Assamese macaque (M. assamensis) (38%), and distinct from the M. radiata and M. sinica, which exhibit relative tail lengths of 108% and 124%, respectively (Figure 1C; Table S1). Anatomical investigations reveal that M. thibetana possesses only 12 caudal vertebrae, significantly fewer than the 26 found in the M. fascicularis and M. radiata (Figure 1C; Table S1). Other phenotypic adaptations in M. thibetana include its large body size and thick fur even covering its face (Figure 1B). Adult males average over 15 kg, with maximum weights exceeding 30 kg, making M. thibetana the largest species in the genus (Figure 1C; Table S1) [11, 12, 13]. In contrast, male M. fascicularis, distributed exclusively in tropical regions, average only 5.4 kg [11, 12]. Despite such distinct morphological adaptions, the physiological and genetic adaptations associated with high‐latitude environment and the evolutionary history during which the Tibetan macaques adapted to such environment remains poorly understood.
Numerous studies have been carried out on the ecology and behaviors in M. thibetana [13, 14, 15]. In addition, we previously reported the first Tibetan macaque genome based on the next‐generation technology [16] and a chromosome‐level genome using PacBio long‐read sequencing and Hi‐C technology [17]. Although the reported genome showed high assemble quality, annotation of the genome was less comprehensive, with a BUSCO (Benchmarking Universal Single‐Copy Orthologs) score of only 81.8% (Table 1). This knowledge gap significantly hinders our ability to uncover the molecular mechanisms underlying the adaption of M. thibetana to high‐latitude environments and our understanding of its population evolutionary history. In this study, we present a more complete functional annotation of the Tibetan macaque genome. Combining comparative genomics and population genomics analysis, we aim to 1) characterize the metabolic adaptation and genetic adaptation in Tibetan macaque, 2) identify genetic mutations associated with high‐latitude environments and validate functional effects of candidate adaptive mutation, 3) investigate evolutionary history and assess the genetic diversity based on population genome resequencing data from different distributions of Tibetan macaque.
Results
2
Tibetan Macaque Genome Refinement and Phylogeny
2.1
We annotated a total of 29,108 genes using the NCBI Eukaryotic Genome Annotation Pipeline. Compared to our prior release [17], the average gene length has significantly increased from 39,883 bp to 69,527 bp (Table 1, S2 and S3). The average coding sequence (CDS) length of the 21,402 protein‐coding genes has increased from 1,505 bp to 2,007 bp. Alternative splicing predictions increased the average transcripts per gene from 1 to 2.57, and the exons per transcript increased from 8.53 to 11.71 (Table 1). 21,319 genes (99.61%) are located on specific chromosomes and only 83 genes (0.39%) are on unplaced scaffolds (Table S4). Of the 13,780 primate orthologous genes, 13,699 (99.41%) were identified to be complete BUSCO genes, with only 0.44% BUSCO genes missing and 0.15% were partial BUSCO genes. The final gene set displays considerable completeness, indicative of high‐quality annotation among sequenced macaque genomes (Table 1; Table S5 and Figures S1 and S2).
We identified a total of 22,053 orthogroups in the 14 primate genomes, containing 9,647 one‐to‐one orthologous genes. The 4‐fold degenerate sites (4d sites) in all one‐to‐one orthologous genes were used to construct a phylogenetic tree, which indicated seven well‐supported species groups in the genus Macaca and Tibetan macaque was closely related to Assamese macaque (M. assamensis). Using MCMCtree analysis under fossil calibrations, Tibetan macaque and Assamese macaque diverged approximately 2.06 Mya (Figure 1C). The phylogenetic relationships were consistent with previous studies based on nuclear genes or resequencing data [18, 19, 20, 21, 22, 23, 24].
Genomic Variations in the Tibetan Macaque Genome
2.2
We identified the expansion/contraction of gene families and the positively selected genes (PSGs) in the Tibetan macaque genome. The analysis revealed 448 significantly expanded gene families (encompassing 1,975 genes) and 161 significantly contracted gene families in Tibetan macaque (Figure 1C). Functional annotation of the significantly expanded gene families revealed they were predominantly enriched in biological pathways associated with energy metabolism, fat generation or storage, and thermoregulation (Figure S3 and Table S6), including Oxidative phosphorylation (mcc00190), Glycolysis / Gluconeogenesis (mcc00010), Thermogenesis (mcc04714), and Non‐alcoholic fatty liver disease (mcc04932). Branch‐site model analysis in PAML v4.8 (FDR Q‐value < 0.05) identified 345 one‐to‐one orthologous genes under lineage‐specific positive selection in the Tibetan macaque. These PSGs were functional enriched in protein binding (GO:0005515), lipid storage (GO:0019915), and negative regulation of bone resorption (GO:0045779) (Figure S4 and Table S7).
Pairwise comparison of the Tibetan macaque genome with genomes of other macaque species identified six classes of structural variations (SVs), including insertions, deletions, contractions, duplications, inversions, and translocations, with the insertions and deletions as the predominant categories of SV (Table S8). The numbers of SV varied significantly among species, with the highest SV counts in Tibetan macaque against M. tonkeana and the lowest SV counts against its closest relative M. assamensis. Iterative integration of cross‐species SV datasets identified 1,175 Tibetan macaque specific SVs (TMSSVs) (1,086 insertions, 86 deletions, 3 translocations) spanning a cumulative genome length of 1,144,511 bp (0.04% of the Tibetan macaque genome). Including more macaque species in the comparison led to a progressive decline in novel TMSSV (Figure S5), suggesting saturation in SV detection in the current 12‐species dataset. Through comparison of SVs overlapping annotated gene regions, we identified 661 TMSSVs distributed across 694 protein‐coding genes, which encompassed 612 insertions, 47 deletions, and 2 translocations. Notably, 672 genes exhibited intronic SVs, while exonic SVs were detected in 22 genes. Functional annotation of protein‐coding genes overlapping TMSSVs revealed that they were mainly enriched in metabolic pathways (mcc01100), Wnt signaling pathway (mcc04310), and pyruvate metabolism (mcc00620) (Figure S6 and Table S9).
Mutation in the TBX6 gene leads to shortened tail in Tibetan macaque
2.3
One of the Tibetan macaque PSGs, TBX6 (T‐box transcription factor 6), harbored a nonsynonymous mutation. TBX6 is a key developmental regulator conserved across vertebrates, orchestrating paraxial mesoderm differentiation into somitic precursors that ultimately form skeletal musculature, dermal tissues, and axial skeletal structures. This transcription factor critically governs rostro‐caudal somite patterning through spatiotemporal expression modulation. We identified a mutation in Tibetan macaque TBX6 (c.211C>A; exon 2), inducing a proline‐to‐threonine substitution (CCT→ACT; p.Pro71Thr) at a conserved residue within a low‐complexity region adjacent to the DNA‐binding T‐box domain (Figure 2A), which may alter protein‐DNA interaction dynamics. Computational prediction of the potential function impact using CADD (Combined Annotation Dependent Depletion, v1.7) based on the human TBX6 sequence (GRCh38) received a score of 18.55, suggesting a deleterious effect. We validated the mutation was Tibetan macaque‐specific with samples from other species in the genus (Table S10). In addition, with Sanger sequencing of 18 individuals from four geographic populations of Tibetan macaque, we confirmed the mutation in TBX6 gene was homozygous and fixed in the sampled individuals (Table S11).
*Mutation in TBX6 leads to shortened tail in the Tibetan macaque. (A) Allelic information of sequence variants in TBX6 for the eleven macaques. (B) Whole‐body images of TBX6‐mutated mice and wild‐type mice. (C) The tail length of 60‐day‐old TBX6‐mutated mice and wild‐type mice (n = 10 males and n = 10 females per group; unpaired Student's t‐test was performed for each sex separately; **p < 0.01, ***p < 0.001). (D) The number of caudal vertebrae of 35‐day‐old TBX6‐mutated mice and wild‐type mice (n = 3 for each group; unpaired Student's t‐test; ***p < 0.001). (E) The mRNA expression changes of TBX6 and TBX6‐regulated genes in wild‐type mice and TBX6‐mutated mice at E10.5 and E11.5. (n = 3 per group; unpaired Student's t‐test; ns, not significant, *p < 0.05, **p < 0.001). (F) Alcian Blue and Alizarin Red stained skeletons in TBX6‐mutated mouse and wild‐type mouse.
To investigate the functional impacts of this variant, we used the CRISPR‐Cas9 technology to edit the mouse TBX6 gene, generating mice with a threonine substitution in TBX6 (referred to as TBX6‐mutated mice). Sanger sequencing of TBX6 gene (Figure 2A) and its transcripts in CRISPR‐generated mutant mice, compared with wild‐type (WT) mice, confirmed the mutation at both genomic and transcriptional levels. Statistical analysis of mice at 30 days‐, 60 days‐ and 90 days‐old found no significant difference in body weight and body length (from the midpoint between the ears to the junction of the body and tail) between the TBX6‐mutated mice and WT mice. However, the TBX6‐mutated mice had significantly shorter tail length than WT mice at all the ages, with the change observed in both males and females (Figures 2B,C; Figure S7). Furthermore, we found there were more caudal vertebrae in WT mice (28–29) than in TBX6‐mutated mice (24–25) (Figures 2D,F). It is evident that the mutation in TBX6 gene contributes to the shortened tail phenotype in Tibetan macaque. Given the critical role of TBX6 as a core transcriptional regulator during somitogenesis, we examined the expression of TBX6 and downstream target genes (MESP2, DLL1, RIPPLY1, FGF8) in WT versus TBX6‐mutant mice embryos at two somitogenesis stages (E10.5 and E11.5) (Figure 2E; Figure S8). At both stages, expression of TBX6, MESP2 (Mesoderm posterior bhlh transcription factor 2), and DLL1 (Delta like canonical notch ligand 1) exhibited no significant difference between the WT and the mutated mice (p > 0.05) (Figure S8). In contrast, RIPPLY1 (Ripply transcriptional repressor 1) and FGF8 (Fibroblast Growth Factor 8) were significantly upregulated (Figure 2E). RIPPLY1 displayed 6‐fold upregulation in mutated mice at E10.5 and 1.5‐fold upregulation at E11.5 (Student's t‐test, p < 0.05), while FGF8 was 1.5‐fold higher in mutated mice at E10.5, and 4‐fold higher at E11.5 (Student's t‐test, p < 0.05).
Potential Genetic Variations Contribution to Short Tail Phenotype
2.4
In addition to Tibetan macaque, several other macaque species exhibit shortened tail phenotype, such as Celebes crested macaque (M. nigra) and Japanese macaque (M. fuscata) (Figure 1C). Interestingly, despite no shared mutation among Tibetan macaque and other short‐tailed macaques, we found they all had nonsynonymous mutations in TBX6 gene, including p.Pro71Thr in Tibetan macaque, p.Gly351Ser in Japanese macaque, and p.Ala409Val in Celebes crested macaque (Figure 2A). Functional prediction based on CADD indicated that p.Ala409Val in Celebes crested macaque received a score of 16.97, suggesting a deleterious effect, while p.Gly351Ser in Japanese macaque had a score of 3.75, implying a neutral effect. Structural domain analysis using SMART revealed that p.Ala409Val is located in a low‐complexity region, while p.Gly342Ser resided near the C‐terminal end of the T‐box domain. To assess whether these mutations were fixed within their respective species, we examined population‐level resequencing data including 19 macaque species (n = 165 individuals) mapped to the Tibetan macaque reference genome (Table S12). The p.Ala409Val mutation was fixed (homozygous derived allele, 1/1) in the examined M. nigra (n = 4), M. nigrescens (n = 1), and M. hecki (n = 3) individuals, all of which exhibit extremely short tails and are species from Sulawesi island (Figure 1C; Table S1). In contrast, the p.Gly351Ser mutation showed low frequency in M. fuscata, with only one heterozygous individual (0/1) among eight individuals. Our findings suggest that polymorphism in the TBX6 gene may be associated with shortened tail phenotype in different macaque species.
Given that only a small number [4, 5] of caudal vertebrae are lost in the *TBX6‐*mutated mice compared to the WT mice, other potential genetic variations might have contributed to the significantly shortened tail in Tibetan macaques. We identified a nonsynonymous substitution in another PSG, FGFR1 (fibroblast growth factor receptor 1) in Tibetan macaque. FGFR1 encodes a transmembrane tyrosine kinase receptor that mediates ligand‐dependent activation of extracellular signal‐regulated kinases (ERKs), which is critical for embryonic development, cellular proliferation, and morphogenetic patterning. Notably, murine models with FGFR1 mutations exhibit comparable caudal truncation phenotypes [25]. We found the substitution occurred in the second exon of FGFR1, where a GTA→ATA transition results in a valine‐to‐isoleucine substitution at position 102 (p.Val102Ile). CADD prediction obtained a score of 24.7, ranking within the top 1% of deleterious missense substitutions genome‐wide (ProtVar: https://www.ebi.ac.uk/protvar/). Bioinformatic mapping localizes this mutation to the extracellular Ig‐like C2‐type 1 domain, a region indispensable for FGF ligand binding and receptor dimerization.
In addition, we identified a 300‐bp novel insertion in the second intron of WNT3A (Wnt family member 3A) (Figure S9), a secreted glycoprotein that activates TCF/LEF transcription factors through canonical Wnt signaling. This pathway is essential for embryonic mesoderm specification, caudal somite segmentation, and neural tube patterning [26]. The insertion site falls within a potent regulatory element exhibiting characteristic epigenetic features of an active enhancer, and harbors binding sites for multiple transcription factors, including ZNF320 and ZNF449. Thus, although this insertion does not alter the protein coding sequence, its location within a defined cis‐regulatory element suggests it may disrupt enhancer function and perturb the transcriptional output of WNT3A. Previous study demonstrated that complete knockout of this gene resulted in total absence of embryonic tails in mice, and reduced WNT3A activity has been linked with premature termination of somite formation, indicating its critical role in tail development and somitogenesis [27].
Lipid Metabolism Adaptation in Tibetan Macaque
2.5
According to Bergmann's rule, mammals inhabiting high‐latitude cold climates tend to develop larger body sizes to enhance thermoregulatory efficiency. Due to the largest body size of Tibetan macaque within the genus Macaca, we performed quantitative computed tomography (QCT) analyses on captive individuals of Tibetan macaque and rhesus macaque. Both subjects were 13‐year‐old males. Abdominal cavity fat volume estimates revealed that the Tibetan macaque (1190.68 cm^3^) possessed approximately 9.3‐fold greater abdominal fat than the rhesus macaque (127.99 cm^3^) (Figure 3A,B), highlighting the lipid metabolic adaptations in Tibetan macaques. This pronounced divergence in fat accumulation is further supported by multiple lines of genomic evidence in our study. First, we identified three lipid storage‐related genes under strong positive selection: DGAT2 (Diacylglycerol o‐acyltransferase 2), DYSF (Dysferlin), and CAV1 (Caveolin 1) (Figure 3C). The DGAT2 gene encodes an integral membrane enzyme that catalyzes the synthesis of triglycerides, a primary form of energy storage in most organisms, by transferring fatty acids onto diacylglycerol, and it plays a crucial role in lipid droplet formation and expansion [28]. Tibetan macaques exhibit a unique two‐residue insertion in the C‐terminal catalytic domain of DGAT2 that is predicted to induce significant conformational changes in the C‐terminal region (Figure S10A), potentially enhancing substrate binding affinity and catalytic efficiency. The DYSF gene, implicated in adipocyte differentiation and lipid storage [29], harbors a guanine‐to‐adenine transition (CGA→CAA) in Tibetan macaque causing an p.Arg1972Gln substitution in its dysferlin domain, which was predicted to be functionally deleterious (CADD score = 17.04). Similarly, CAV1, encoding caveolin‐1 that modulates cellular cholesterol trafficking via cholesterol‐binding scaffolding domains [30], shows a TGC→AGC transversion in Tibetan macaque resulting in a p.Ser89Cys substitution within its membrane‐proximal oligomerization domain.
*Enhanced lipid metabolism adaptation in the Tibetan macaque. (A), (B) Quantitative computed tomography (QCT) analysis of fat distribution in Rhesus macaque and Tibetan macaque. (C) Information of sequence variants in CPE, LEPR, PRKD1, DGAT2, CAV1 and DYSF for 13 macaques. (D) Validation of gene expression differences by qRT‐PCR in subcutaneous white adipose tissue. Expression levels of DGAT2, DYSF, CAV1, PRKD1, LEPR, and CPE were compared between Tibetan macaque (n = 3) and Rhesus macaque (n = 3). Data are presented as mean ± SEM (Standard Error of the Mean). **p < 0.01, **p < 0.001 (unpaired Student's t‐test).
In addition, an adipogenesis‐related gene (PRKD1, Protein kinase D1) also exhibited signatures of positive selection. As a key member of the protein kinase D family, PRKD1 promotes lipid accumulation through two mechanisms: 1) to enhance expression of fatty acid synthase and acetyl‐CoA carboxylase; and 2) suppression of AMPK (AMP‐activated protein kinase) activity to facilitate adipocyte lipid synthesis. Experimental evidence demonstrates that PRKD1‐overexpressing mice exhibit increased susceptibility to diet‐induced obesity, whereas PRKD1 knockout mice show reduced adipogenesis [31]. We identified two nonsynonymous mutations in the PRKD1 gene of Tibetan macaques, locating in exons 8 and 11 respectively (Figure 3C). These mutations induced a transition converting a hydrophobic alanine to hydrophilic threonine (GCG→ACG, p.Ala381Thr; CADD = 17.28, deleterious), and a transition substituting valine with methionine (GTG→ATG, p.Val558Met; CADD = 10.12, moderately deleterious), respectively.
Furthermore, we revealed two appetite regulation‐associated genes exhibiting specific mutations in Tibetan macaques. The LEPR (Leptin receptor) gene plays pivotal roles in energy homeostasis, appetite regulation, body weight maintenance, and endocrine functions. Impaired LEPR signaling can result in hyperphagia, diminished energy expenditure, and consequent obesity. Previous studies have established that mutations in this gene promote excessive fat deposition across mammalian species [32, 33]. A cytosine to thymine transition was identified in the last exon of the LEPR gene of Tibetan macaques, resulting in a p.Pro1011Ser (CCC→TCC) substitution that is predicted to have minimal functional impact (CADD score = 0.771). We identified a 390‐bp species‐specific deletion variant in the CPE (carboxypeptidase E) gene of Tibetan macaques (Figure 3C), resulting in premature termination of protein translation which is predicted to cause significant functional impairment of the encoded enzyme. Structural analysis revealed that this deletion causes significant disruption of the protein's tertiary structure (Figure S10B). The CPE gene encodes carboxypeptidase E, a peptide‐processing enzyme involved in the maturation of neuropeptides and hormones critical for appetite regulation and glucose metabolism [34]. Meanwhile, CPE is recognized as an obesity‐susceptibility gene, and CPE mutant mice exhibited increased total fat mass leading to obesity and hyperglycemia [35]. Notably, fixed SVs were found in the CPE gene across all extant great ape lineages compared with small body‐sized primates, suggesting a potential role of the SV in CPE in driving the marked body size divergence of this clade from other primates [36].
We further investigated the transcriptional impacts of these genomic variants by quantitative real‐time PCR (qRT‐PCR) analysis of their expression in subcutaneous white adipose tissue from Tibetan macaque and rhesus macaque. The results demonstrated significantly elevated transcript levels of three lipid storage‐related genes in Tibetan macaques, including DGAT2 (2.03‐fold, p < 0.01), DYSF (4.53‐fold, p < 0.001), and CAV1 (5.38‐fold, p < 0.001) (Figure 3D). PRKD1 expression was also markedly upregulated (3.93‐fold, p < 0.001) (Figure 3D), supporting enhanced activity in promoting adipogenesis. For appetite regulation genes, we observed a significant downregulation of LEPR (0.19‐fold, p < 0.001) but an upregulation of CPE (1.73‐fold, p < 0.01) in Tibetan macaques (Figure 3D), consistently indicating potential modulation of energy intake and storage. Meanwhile, we revealed significant expansions of gene families involved in lipid mobilization and energy transduction pathways in Tibetan macaques. The lactate dehydrogenase (LDH) gene family, which encodes enzymes that catalyze the interconversion of lactate and pyruvate during anaerobic glycolysis and hepatic gluconeogenesis [37], exhibited notable expansion in the genome of Tibetan macaques (a total of seven LDH genes), which is higher than that in most other macaque species (3–6) (Figure S11 and Table S13). This genomic expansion may enhance metabolic flexibility by accelerating redox balance maintenance and substrate shuttling between carbohydrate and lipid metabolism, thereby supporting high‐altitude energetic demands through improved fuel‐switching efficiency. Notably, we identified coordinated expansions in gene families related to oxidative phosphorylation and thermogenesis pathways. The NADH dehydrogenase gene family showed 12 significantly expanded gene subfamilies in Tibetan macaques (Figure S11 and Table S14), and the ATP synthase family demonstrated expansions in ATP5MC1 and ATP5MC2 subfamilies (Figure S11 and Table S15). These two enzyme complexes (Complex I and V of the respiratory chain) play crucial roles in energy metabolism [38, 39]. These coordinated genomic expansions likely enhance lipid‐catabolic capacity by sustaining high rates of β‐oxidation‐derived NADH turnover, while simultaneously optimizing carbohydrate‐to‐lipid energy substrate transitions essential for thermogenesis and sustained activity in cold environments. Collectively, positive selection in genes related to lipid storage and adipogenesis, a deletion in an obesity‐susceptibility gene, and the expansion of gluconeogenesis‐related gene families constitute genetic adaptations underlying enhanced lipid metabolism in Tibetan macaque for high‐latitude cold environments.
To figure out whether the enhanced lipid metabolism is favored by other primates at high latitudes, we compared Tibetan macaque and another high latitude macaque, the Barbary macaque (M. sylvanus), a species divergence earliest from all Asian macaques, and it is disjunctly distributed at northern Africa with a latitudinal range of 31°15'N to 36°45'N (Figure 1A,C) [40]. The similar biogeographic pattern suggests that they have faced comparable selective pressures from colder and more seasonal environments. Interestingly, both macaques exhibit similar morphological adaptations, including relatively large body size, short tail and very thick furs (Figure 1C; Table S1). We conducted a genome‐wide scan for parallel amino acid substitutions between Tibetan macaque and Barbary macaque genome using site‐wise likelihood ratio tests. We identified a total of 24 genes exhibiting significant parallel evolution, which were enriched in lipid metabolic pathways, specifically including biosynthesis of unsaturated fatty acids (mcc01040), fatty acid elongation (mcc00062) and fatty acid metabolic process (GO:0006631) (Figure S12 and Table S16).
Population Genetics Analysis of Tibetan Macaque
2.6
To determine the population structure of Tibetan macaques, population genetic analyses were conducted based on genomes of 26 wild individuals from seven geographic populations across their main distribution range (Figure 4A). Our analysis of the genome‐wide independent SNPs revealed varying levels of genetic differentiation among these populations. Principal component analysis (PCA), admixture analysis, and phylogenetic analysis indicated clear divergence between the eastern China clade (Huangshan (HS), Wuyanling (WYL) and Mangshan (MS)) and western China clade (Chuanxi (CX), Wumengshan (WMS), Baishuijiang (BSJ) and Fanjingshan (FJS)), while gene flow probably existed in several populations (Figure 4B,C,D). We then conducted ABBA‐BABA tests and Treemix analysis to reveal gene flow occurred between populations. We identified 26 trios with significant D values (P<0.01, Table S17), with significant gene exchanges between FJS populations and populations from eastern China, as well as between FJS and other western populations (Figure S13A). Given the geographic location of FJS is between eastern Tibetan macaque populations and other western populations, genetic exchanges between populations are more likely to happen here. Treemix analysis suggested the direction of these gene flow was from the ancient eastern group to the FJS population (Figure S13B).
Population genetic studies of the Tibetan macaque. (A) The sampling locations of Tibetan macaque samples used in this study. BSJ, Baishuijiang population; CX, Chuanxi population; WMS, Wumengshan population; FJS, Fanjingshan population; WYL, Wuyanling population; HS, Huangshan population; MS, Mangshan population. (B, C, D) Population genetic structure based on autosomal SNPs. (B) PCA results based on autosomal SNPs. (C) ADMIXTURE results with K values 2–3 based on autosomal SNPs. (D) ML tree based on autosomal SNPs. (E) The divergence and speciation route of Tibetan macaques in China. (F and G) Selective sweep signals of the gene region.
Next, we estimate the divergence history of Tibetan macaque populations using BEAST (Figure 4E; Figure S14). After diverging from other macaque species, Tibetan macaques in China divided into western and eastern clades around ∼240 Kya (thousand years ago). In the western clade, the FJS population diverged early as ∼123 Kya, and WMS and BSJ populations diverged from the CX population ∼99 Kya and ∼89 Kya, respectively. Within in the eastern clade, the MS population diverged first ∼89 Kya, followed by the divergence between the HS and WYL populations ∼60 Kya. Demographic history estimated by Pairwise Sequentially Markovian coalescent (PSMC) demonstrated similar demographic history among different populations, with two population expansions and one obvious population bottleneck (Figure S15). The population decline resulted in the bottleneck ∼300 Kya, most likely caused by penultimate glaciation (300–130 Kya). After the glaciation, the populations gradually began to recover, coinciding with the last interglacial period. However, the effect population size of Tibetan macaque kept very low until recently.
Our genome‐wide pairwise genetic differentiation (*F_ST_ *) indicated strong genetic differences among Tibetan macaque populations (Figure S16A), with *F_ST_
- varying from 0.099 (between the BSJ and the CX) to 0.783 (between the BSJ and the WYL). We further estimated the whole‐genome genetic diversity for each population by calculating nucleotide diversity (π) and Watterson's estimator (θw) values. The FJS population showed the highest nucleotide diversity (π = 9.05×10^−4^, θw = 1.30×10^−3^), whereas the HS population had the lowest nucleotide diversity (π = 3.49×10^−4^, θw = 4.07×10^−4^). Individual genome‐wide heterozygosity estimates (Figure S16B,C and Table S18) were significantly different among populations (ANOVA: F[6,19] = 28.26, P<0.0001), with the lowest heterozygosity in HS population and the highest heterozygosity in FJS population. We also estimated levels of inbreeding by calculating the length of homozygous regions, known as runs of homozygosity (ROH) and fraction of the genome covered by runs of homozygosity (FROH) (Figure S16D,E). The FROH was inversely related to genome‐wide heterozygosity (R = ‐0.83, p = 1.77e‐07), lowest in the FJS population and highest in the HS population.
Finally, to detect genomic signatures of local adaptation between the eastern and western clades, we conducted a selective sweep analysis based on *F_ST_
- and π ratio. Genomic regions were defined as under potential selective sweeps if they fell within the top 5% of both *F_ST_
- and π ratio distributions (Figure S17). The eastern clade (HS, WYL, MS) exhibited stronger and more frequent selective signals related to lipid metabolism and energy homeostasis. Specifically, multiple key genes involved in fat synthesis, storage, and regulation showed strong selective signals in eastern populations, including ELOVL6 (critical for fatty acid elongation) (Figure 4F), COMMD9 (involved in cholesterol homeostasis) (Figure 4G), HMGA2 (a regulator of diet‐induced obesity and body size) (Figure S18A), METAP2 (a key regulator of energy metabolism and lipid storage) (Figure S18B), NEGR1 (facilitates lipid droplet formation) (Figure S18C), and SLC39A10 (regulates lipid metabolism) (Figure S18D). This genomic evidence indicates that severe winters and greater fluctuation in resources have driven intense adaptive evolution toward enhanced fat storage and thermogenic capacity [41], consistent with observed phenotypic trends such as larger body size in eastern populations [42]. In contrast, western clade showed selective signals primarily in genes involved in ketone body metabolism and fatty acid β‐oxidation, including BDH1 (Figure S18E) and BDH2 (Figure S18F).
Discussion
3
Genetic Basis of Shortened tail of Tibetan Macaque
3.1
Tail length displays extensive variation among primate species, and such phenotype diversity may arise from multifactorial interactions among genetic, ecological, and evolutionary forces, positioning tail length as a paradigmatic adaptive trait in primate evolution [43, 44]. Long‐tailed species, predominantly the Platyrrhini, exhibit smaller body sizes and longer tails optimized for arboreal locomotion and balance, whereas shorter‐tailed or tailless morphologies are typically associated with larger‐bodied terrestrial primates [45]. Meanwhile, length of primate tails may be shaped by selection from cold climate according to Allen's rule. For instance, Japanese macaques, the primate species at the highest distribution latitude, possess an extremely short tail [46]. Given the ecological and evolutionary significance of primate tails, elucidating the genetic basis underpinning variations on primate tail length has always been a hot topic in the adaptive evolution field. Emerging evidence suggests these mechanisms vary among primate lineages, and multiple genes and various regulatory pathways may drive this phenotype diversity. Studies implicate SVs or transposable elements may have driven tail loss in apes and humans [36, 47], whereas diverse regulatory elements in HOXD13 gene may mediate tail length differences between M. fascicularis and M. mulatta [48].
By combining comparative genomics and functional validation, we report a genetic mechanism whereby a TBX6 mutation can cause the reduction of caudal vertebrae numbers and the shortened tail in Tibetan macaque, providing new insights into our understanding on the phenotype diversity in primates. TBX6 is a member of the T‐box gene family, a family that is important for the formation of the basic vertebrate body plan, and for the cell differentiation and organogenesis [49]. A recent study found that an Alu insertion in the intron of TBXT, another member of the T‐box family, led to alternative splicing, potentially contributing to tail loss in apes relative to monkeys [47]. We found TBX6 is under strong positive selection, carrying a species‐specific nonsynonymous mutation in Tibetan macaques. The p.Pro71Thr mutation has been fixed as a homozygote across Tibetan macaque populations. Functional validation based on gene‐edited mice confirmed that the DNA mutation alters RNA transcripts level and leads to a reduction of caudal vertebrae numbers in mutant mice compared to wild‐type mice. Although TBX6 expression showed no significant difference from the wild‐type mice, expression of the downstream targets, particularly RIPPLY1 and FGF8, were significantly upregulated. Our result thus suggests that the mutation affects TBX6 protein through post‐translational modification or altered protein‐protein interactions, rather than affecting the transcriptional output. Like many other quantitative traits, tail length modulation probably involves a multigenic regulatory network. In addition to the mutation in TBX6, we also identified other two mutations potentially contribute to the shortened tail trait. A 300‐bp insertion within an enhancer region of WNT3A may attenuate Wnt signaling critical for axial elongation; and the p.Val102Ile mutation in FGFR1, predicted to have a pathogenic CADD score (24.7), likely compromises FGF‐mediated progenitor maintenance. These three genes (FGFR1, WNT3A, and TBX6) are critical regulators of paraxial mesoderm formation and somitogenesis pathways (Figure S19). The coordinated action of Wnt signaling, FGF signaling and TBX6 drives somitogenesis [50, 51]. Consistently, significantly upregulated expression of FGF8 gene was detected in the TBX6 mutant mice in our results. The exact impacts of these multiple mutations on the shortened tail phenotype warrant of further functional validation.
Given the relatively lower latitude distribution ranges and longer tails in other close related species of sinica group, the shortened tail trait in Tibetan macaque must have occurred after its species divergence 2.06 Mya (Figure 1C), representing an adaptive feature to high‐latitude cold climates. We hypothesis that the strong selection from the cold environment at high latitude promoted the smaller and arboreal Tibetan macaque ancestor to evolve adaptive traits of the shortened tail and the increased body size for more efficiently heat retention. The heavy build further made the macaques to be more inclined to terrestrial activities, which have reduced the need for tail balance enhancing their mobility on the ground, therefore contributing to the shortened tail trait. Field observations also support Tibetan macaques use terrestrial substrates in nearly two‐thirds of the recorded instances [14].
Notably, other Macaca species including the M. arctoides, M. fuscata, and M. nigra exhibit short‐tailed phenotypes despite belonging to distinct species groups in phylogenetic relationships (Table S1). This pattern suggests that the short‐tailed phenotype may have evolved independently in different lineages after speciation, rather than representing an ancestral trait inherited from a common progenitor. Interestingly, we identified two additional nonsynonymous mutations in TBX6 gene of M. fuscata and M. nigra, both species has significantly short tails in the Macaca. The p.Ala409Val in Celebes crested macaque was predicted to be deleterious and located in the functional domain. Moreover, population genetics analysis found this mutation was shared by three Sulawesi macaque species and seemed to be fixed across populations. It is well‐known that all the Sulawesi macaques exhibit distinct shortened tails, given the key role of TBX6 in caudal vertebrae development, our study suggests that polymorphism in the TBX6 of Macaca species may be critically associated with shortened tail trait, whereas the contributions of these mutations need further functional validation.
Adaptation on Lipid Metabolism in Tibetan Macaques for High‐Latitude Habitats
3.2
Compared to most primates predominantly clustering in tropical and subtropical regions [8, 9, 10], Tibetan macaques mainly inhabit in colder temperate zones, making it one of few primate species whose entire distribution range falls north of the Tropic of Cancer (Figure 1A). Our integrative analysis reveals for the first time the physiological and genetic adaptations associated with high‐latitude environment in Tibetan macaques. Quantitative CT scans demonstrated that abdominal fat content in Tibetan macaque was 9.3‐fold greater than in rhesus macaque (Figure 3A,B), underlining lipid storage as a metabolic adaptation to high‐latitude. Such lipid accumulation was also observed in polar bears [3, 4, 5] and in arctic human populations such as Northeast Siberians [52] and Inuit from Greenland [53]. Large amounts of lipids could be derived from the high fat diet observed in polar bears and arctic human populations, whereas the diet of Tibetan macaques is dominantly low‐fat plants [54]. Through what mechanisms do the Tibetan macaques enhance lipid accumulation like polar bears and arctic humans?
We further revealed genomic variation associated with the enhanced lipid metabolism in Tibetan macaque. First, mutations in appetite‐regulating genes (LEPR and CPE) are suggested to promote feeding behavior and enhance nutrient intake. We confirmed that the expression of LEPR showed a dramatic downregulation (0.19‐fold) in adipose tissue of Tibetan macaque compared to rhesus macaque, indicating an impairment in leptin signaling that would predispose to hyperphagia and reduced energy expenditure. A previous study found variation in LEPR gene was associated with cold adaptation in modern East Asians, ancient Neandertal and Denisovan populations. Notably, both ancient people experienced the last glacial period [55]. Concurrently, a deletion found in CPE caused 1.7‐fold upregulation of gene expression in Tibetan macaque adipose tissue, which likely disrupts neuropeptide processing and further contributes to a better appetite. Second, genes related to lipid storage (DGAT2, DYSF, CAV1) and adipogenesis (PRKD1) are under strong positive selection in Tibetan macaques (Figure 3C). qRT‐PCR confirmed substantial upregulation of expression of these genes (2.03 to 5.38‐fold) in Tibetan macaques, strongly supporting their roles in lipid accumulation. In line with our results, comparison of blood transcriptomes of Tibetan macaque and rhesus macaque demonstrated that differentially expressed genes were significantly enriched in fatty acid metabolism pathways, with many fatty acid metabolism‐related genes showed significantly higher expression in Tibetan macaques [56]. Third, Tibetan macaques have expanded gene families related to oxidative phosphorylation and glycolysis/gluconeogenesis pathways, which suggests evolved metabolic flexibility to balance lipid storage and utilization. Consequently, rapid evolution of these genes and gene families, coupled with their significantly changed gene expression, might have enhanced lipid metabolism, representing successful metabolic adaptations of Tibetan macaques. These genetic adaptations, provide several advantages: 1) stored lipids serve as energy reserves mitigating winter food scarcity; 2) fat accumulation reduces surface‐to‐volume ratio minimizing heat loss in cold environment; 3) fat provides insulating barriers against cold.
In addition, comparison of genome of Tibetan macaque and another high latitude macaque, the Barbary macaque identified 24 genes exhibiting significant parallel evolution, which were enriched in lipid metabolic pathways, including biosynthesis of unsaturated fatty acids, fatty acid elongation and fatty acid metabolic process (Figure S12 and Table S16). These findings suggest lipid storage enhancement is a fundamental adaptive strategy for macaques in colder environments. However, except the high‐latitude cold environment, both macaque species exhibit diverse climate regimes, whether the shared amino acid substitutions are induced by convergent evolution requires further investigation. Moreover, genomic studies on human populations revealed strong signatures of positive selection in genes associated with lipid metabolism for survival in the Arctic, such as TBX15, WARS2 and FADS in Greenland Inuit [53], CPT1A in Northeast Siberians [52] and LEPR in modern East Asians [55]. In particular, the human brown adipose tissue, which are the most specialized cells in non‐shivering thermogenesis, appears to be more metabolically active in response to cold exposure [55]. These findings suggest that the enhanced capacity to use lipids as primary metabolic fuel source was widespread in high‐latitude primates despite their omnivorous diets. This study not only deepens our understanding on metabolic and evolutionary strategies of primates for adaptation to high latitudes but also offers new insights on the local adaptations to extreme environmental conditions and the origination of metabolic disease such as obesity.
Population Genetic Studies of Tibetan Macaque
3.3
Previous studies on the genetic diversity and evolutionary history of the Tibetan macaque have largely relied on limited genetic markers, such as microsatellites and mitochondrial DNA fragments [57, 58]. Here, we carried out genome‐wide population analyses based on individuals across the range of Tibetan macaque distribution. Our results indicate that after diverging from other sinica group species, Tibetan macaque populations expanded from west China to east China, splitting into the eastern and the western clades approximately 240 Kya. Genetic differentiation was detected not only between the two clades, but also between populations within each clade. These results suggest significant historical isolation among populations, potentially due to geographical barriers or interference by human activities. The western populations diverged earlier and exhibited relatively higher genetic diversity than the eastern populations, suggesting Tibetan macaques originated in western populations. In contrast, the eastern populations diverged recently with substantial genetic differentiation between populations and exhibited lower genetic diversity than the western populations. The Huangshan population of eastern clade, in particular, shows the most recent divergence, lowest genetic diversity, and evident inbreeding, highlighting it as a critical population in need of urgent conservation attention. While the FJS population, the most eastern population of the western clade, exhibited significant ancient gene flow with different eastern populations, implying the location used to be a genetic exchange hotspot of different Tibetan macaque populations. Notably, our population genomics analysis revealed distinct local adaptation patterns between the eastern and western clades. The eastern clade exhibited stronger selective signals in genes related to fat synthesis and storage, which is probably an adaptive response to the more severe winters and greater resource fluctuation in the environmental conditions of eastern populations [40]. Correspondingly, a relatively larger body size in eastern populations might be an adaptation for better thermogenesis regulation [41]. These findings refine our understanding of local metabolic adaptations to cold environments within the species.
In summary, this study has significantly improved the quality of the Tibetan macaque reference genome, providing a comprehensive genomic resource for future primate genome‐wide research. In contrast to most Macaca species that generally dispersed southward to lower latitudes following speciation, the Tibetan macaque undergone a distinct westward‐to‐eastward population expansion, maintaining a relatively high‐latitude distribution. Under strong natural selection from the cold environment, Tibetan macaques evolved the enhanced lipid metabolism characterized by substantial fat storage capacity and relatively large body size. Concurrently, cold environment selection also promoted rapid evolution of a shortened tail to minimize heat loss. The reduced tail length and increased body size may have driven a terrestrial lifestyle of the Tibetan macaque. Although these adaptive characteristics allow Tibetan macaques to succeed in the cold environment at high latitudes, under the current global warming climate conditions, they will also make Tibetan macaques face more adaptive challenges than other primates at low latitudes. These challenges are more severe in the HS population, which exhibits the lowest genetic diversity and the highest inbreeding coefficient across all populations.
Materials and Methods
4
Sample Collection and Sequencing
4.1
Detailed information on the samples used for genome assembly, as well as the sequencing strategies and assembly methods, can be found in reference [17]. For Tibetan macaque whole‐genome resequencing, we collected muscle and peripheral blood samples of 26 wild Tibetan macaques from 7 populations, which covered most of the species’ range in China (Figure 4A; Table S19). The DNA of all samples was extracted using the DNeasy Blood & Tissue Kit (Qiagen, Valencia, CA, USA). Then, we constructed genomic libraries with insert sizes of 200–500 bp and performed genome resequencing at an average depth of 34.20× for each individual using different platforms due to the protracted process of collecting samples. All samples were collected following the Regulations for the Implementation of the Protection of Terrestrial Wildlife of China (State Council Decree [1992] No. 13) and approved by the Ethics Committee of the College of Life Science, Sichuan University, China (Permit No. 20200327009).
Genome Annotation
4.2
Gene locations and structure were predicted using NCBI Eukaryotic Genome Annotation Pipeline [59]. This method involves integrating evidence from multiple sources to predict gene locations and structure within the genomic sequence. Existing gene models from closely related species were utilized as references to guide predictions. Evidence from RNA sequencing (RNA‐seq) data, which offers insights into the transcriptome, was used to help identify expressed genes and their splice variants. Additionally, protein homology information supported gene predictions, particularly for conserved genes across species. The annotation process was facilitated by automated annotation software, which was subsequently verified and refined through manual curation.
In order to perform gene functional annotation, the protein‐coding genes were aligned against several known public databases, including Swiss‐Prot [60], TrEMBL (TRanslation of EMBL) [60], InterPro [61], gene ontology (GO) [62] and Kyoto Encyclopedia of Genes and Genomes (KEGG) [63]. To evaluate the annotation quality of the Tibetan macaque genome, the completeness of the final gene set was assessed by the Benchmarking Universal Single‐Copy Orthologs (BUSCO) (v5.7.0) in the protein mode using primates_odb10 dataset [64].
Gene Family Identification
4.3
Gene sequences from 13 well‐assembled macaque genomes and an olive baboon genome (GCF_008728515.1) were downloaded from the NCBI database for comparative genomic analysis. The gene sequences of M. cyclopis were predicted using miniport [65] based on the protein sequences of M. mulatta. For genes with numerous transcript isoforms, we only kept the longest for downstream analyses. Orthologous clusters among these 14 primates were identified using OrthoFinder (v2.3.11) [66] with default settings.
Phylogenetic Analysis and Divergence Time Estimation
4.4
One‐to‐one orthologous genes were first aligned using MAFFT (v7.475) [67] with default parameters, and the corresponding coding sequence (CDS) alignments were back‐translated from corresponding protein alignments using PAL2NAL (v14) [68]. We extracted the 4d sites of each single‐copy gene using an in‐house Perl script, Gblocks (v0.91b) [69] was used to remove ambiguously aligned blocks, and the remaining alignments of each family were concatenated and used to build the maximal likelihood (ML) phylogenetic tree with IQ‐TREE (v2.3.1) software [70] with the parameters “‐m MFP ‐bb 10000 ‐bnni”. The divergence time between each species was estimated by MCMCtree from the PAML (v4.8) [71] package. We used two calibrations for this analysis: [1] the split between African (M. sylvanus) and Asian macaques: 4.5‐6.5 Mya; [2] the split between Macacina and Papionina: 5.8‐8.0 Mya.
Gene Family Expansion and Contraction Analysis
4.5
Based on the identified gene families and the constructed phylogenetic tree with predicted divergence time of those species, CAFÉ (v4.2) [72] was used to predict gene family expansions and contractions. In CAFÉ, a random birth and death model is proposed to study gene gain or loss in gene families across a phylogenetic tree. Finally, we conducted KEGG and GO enrichment analysis of the significant expanded gene families in Tibetan macaque using KOBAS‐i [73].
Positive Selection Analysis
4.6
Based on the alignment results of 1:1 orthologous genes and the phylogenetic tree, we employed the codeml program within PAML (v4.8) to estimate the ratio of nonsynonymous (dN) to synonymous (dS) substitutions for each gene. The branch‐site model was applied, setting the branch corresponding to the Tibetan macaque as the foreground branch and the other species as the background branches. The alternative model assumed positive selection on the foreground branch, while the null model represented either purifying selection or neutral evolution. A likelihood ratio test (LRT) was performed to compare the two models, and the significance of the differences was assessed using a chi‐square test. The P‐values were adjusted for multiple testing using the false discovery rate (FDR) method, with a threshold of Q‐value <0.05 for identifying positively selected genes (PSGs). We conducted KEGG and GO enrichment analysis of the PSGs using KOBAS‐i. The PSGs related to tail development and body size were searched from published literature [36, 74, 75] and MGI databases (http://www.informatics.jax.org/) based on their known functions.
To further investigate the genetic basis of short‐tail phenotype in Tibetan macaques, we performed selection pressure analysis on 137 candidate genes associated with caudal development (Table S20). We downloaded reference genomes and annotation information for 29 primate species from NCBI. The phylogenetic relationships of these species were determined using TimeTree (http://www.timetree.org/) (Figure S20). Using procedures described above, we identified positively selected genes related to tail development in the Tibetan macaque.
Comparative Fat Content in Tibetan Macaque and Rhesus Macaque
4.7
Quantitative computed tomography (QCT) was employed to compare the fat content between a 13‐year‐old male Tibetan macaque and a 13‐year‐old male rhesus macaque, with the following acquisition parameters: tube voltage: 120 kV, tube current: 150 mAs, slice thickness: 0.6 mm. Multiplanar reconstructions (MPR) were used to generate axial, sagittal, and coronal views for comprehensive analysis. The fat tissue was classified based on CT attenuation values ranging from ‐200 HU to ‐40 HU, and the total volume of fat tissue was extracted from the segmented region.
Detection of Tibetan Macaque Specific SVs
4.8
We performed pairwise comparisons between the Tibetan macaque genome and twelve other Macaca species genomes using MUMandCo [76], identifying structural variation between Tibetan macaque and each comparative species. To identify Tibetan macaque‐specific structural variants (TMSSVs), we implemented an iterative merging approach that progressively integrated SV datasets from successive species comparisons while calculating overlapping regions. A custom script was developed to determine overlapping regions based on chromosomal coordinates, with the following criteria: when two genomic intervals overlapped and the overlapping region covered ≥50% of both intervals' lengths, the overlapping coordinates were recorded as consensus variants. Furthermore, we conducted KEGG and GO enrichment analysis of protein‐coding genes overlapping TMSSVs using KOBAS‐i. Finally, COBALT (https://www.ncbi.nlm.nih.gov/tools/cobalt/) was employed to perform multiple sequence alignment of the identified TMSSVs.
To evaluate the potential functional impact of the 300‐bp insertion located in the second intron of the Tibetan macaque WNT3A gene, we performed a bioinformatic analysis using publicly available genomic databases. The insertion site, corresponding to the human genomic coordinates chr1:228,027,005 (GRCh38/hg38) based on synteny alignment, was analyzed using the UCSC Genome Browser. We systematically examined relevant annotation tracks from the ENCODE project, with particular focus on histone modification marks such as H3K27ac, H3K4me1, and H3K4me3 across various cell lines to identify potential regulatory elements, in addition to investigating Transcription Factor ChIP‐seq peaks to characterize transcription factor binding sites at this locus.
Polymerase Chain Reaction (PCR) Validation and Functional Experiment of the TBX6 Gene
4.9
The TBX6 gene sequence was first extracted and aligned based on the sequences and annotation files from 13 well‐assembled macaque genomes to identify the positive selection site, and then the positive selected site of TBX6 gene was validated by PCR and sequenced using the following primer pairs: F1, 5'‐CTTGGCTGGCTCCTCATTTT‐3'; R1, 5'‐CTCCTCAGGGACTCTGCTTTT‐3'. Validation was performed across seven species within the Macaca genus (Table S10), as well as in 18 individuals from four geographic populations of Tibetan macaques (Table S11) to further confirm the site's specificity.
Point mutated mice with S71T mutations of murine TBX6 were designed and generated by the Cyagen Biosciences Inc. (Suzhou, China). All mouse experiments were approved by the Ethics Committee of the College of Life Science, Sichuan University (Permit No. SCU250106001).
Briefly, Cas9 mRNA was in vitro transcribed using the mMESSAGE mMACHINE T7 Ultra Kit (Ambion, TX, USA) according to the manufacturer's instructions; the mRNA was then purified using the MEGAclearTM Kit (ThermoFisher, USA). The 5’‐ GCCCCTTCTCCCATCTGCTCTGG ‐3’ sequence was selected as the Cas9‐targeted guide RNA (sgRNA), in vitro transcribed using the MEGAshortscript Kit (ThermoFisher, USA), and subsequently purified using the MEGAclearTM Kit. The transcribed Cas9 mRNA and sgRNA, as well as a 120‐bp single‐stranded oligodeoxynucleotide (ssODN) were co‐injected into zygotes of C57BL/6J mice. Obtained mice were validated by PCR and sequenced using the following primer pairs: F1, 5’‐ACTACAACATGTACCATCCACGA‐3’; R1, 5’‐ TAGTTAGATTCTGCTGTCTGGGTT‐3’, which confirmed the successful introduction of the TBX6 mutation. To confirm the presence of the TBX6 mutation at the transcript level, total RNA was extracted from E10.5 embryos of three wild‐type and three homozygous mutant mice. The TBX6 transcript was subsequently amplified by RT‐PCR using gene‐specific primers (Table S21), and the products were subjected to sanger sequencing. The resulting sequences were aligned using MAFFT (v7.475) confirmed the mutation in all homozygous mutant embryos. 30‐day‐old WT mice and TBX6‐mutated mice were randomly selected, with 10 males and 10 females from each group. The tail length (from the base of the tail, where it connects to the body, to the tip of the tail) of WT mice and TBX6‐mutated mice were measured. Independent Student's t‐tests were conducted to compare the values between the WT mice and the TBX6‐mutated mice. The length of the two groups of 60‐day‐old and 90‐day‐old mice was also measured using the same method. After sacrificing the three WT and three TBX6‐mutated 35‐day‐old mice, skeletons were stained with Alcian Blue (staining the cartilage) and Alizarin Red (staining the bone) as described by Rigueur et al. [77]. After staining, skeletons were cleared in 1% KOH and stored in 1% KOH:glycerol (90:10).
To investigate TBX6‐mediated transcriptional regulation during somitogenesis, we performed comparative expression analysis of TBX6 and TBX6‐regulated genes in embryonic day 10.5 (E10.5) and 11.5 (E11.5) mouse embryos between WT mice and TBX6‐mutated mice. These developmental timepoints correspond to Theiler stages TS16‐17, spanning the peak expression window of TBX6 (E9‐E12.25). Three gravid F3‐generation WT and TBX6‐mutated dams per genotype were euthanized via cervical dislocation. Uterine horns were aseptically excised in a laminar flow cabinet, rinsed in ice‐cold PBS (pH 7.4), and embryos were microdissected under 8× magnification using a Leica M205 FA stereomicroscope. Collected embryos were flash‐frozen in liquid nitrogen and stored at −80°C until RNA extraction. Total RNA was isolated using a Column‐based RNA Extraction Kit (RC101, Vazyme, China) and quantified spectrophotometrically (A260/A280: 1.74–1.97; concentration: 306.93–1231.08 ng/µL). Reverse transcription employed HiScript III RT SuperMix (R333, Vazyme, China), with cDNA stored at −20°C. TBX6‐regulated genes (RIPPLY1, FGF8, DLL1, MESP2) were selected based on their roles in somitogenesis boundary formation. Gene‐specific primers (Table S21) were designed using Primer Premier (v 5.0) and validated via NCBI Primer‐BLAST, ensuring amplicon specificity. Quantitative real‐time PCR (qRT‐PCR) was used to examine the transcription profiles of TBX6 and other TBX6‐regulated genes. The qRT‐PCR experiments were performed on an ABI StepOnePlusTM Real‐Time PCR System with Tower using Tap Pro Universal SYBR qPCR master mix. The 2^−∆∆Ct^ method [78] of relative quantification was used to examine the qRT‐PCR results. To normalize the mRNA expression levels, the housekeeping gene β‐actin was used as internal reference. Samples were analyzed in triplicate.
RNA Extraction and qRT‐PCR of Adipose Tissue From Macaques
4.10
To validate the expression patterns of lipid metabolism genes identified under selection, we performed qRT‐PCR analysis on three subcutaneous white adipose tissue (sWAT) samples from an adult Tibetan macaque and an adult rhesus macaque. All animals were captive‐born males of comparable age (13 years old) to minimize physiological variation.
Total RNA extraction, cDNA synthesis, and qRT‐PCR were performed as described in Section 4.9. We designed specific primers for six candidate genes (DGAT2, DYSF, CAV1, PRKD1, LEPR, CPE) and a housekeeping gene (PPIA), with primer sequences detailed in Supplementary Table S22.
The relative expression levels of target genes were calculated using the comparative 2^−∆∆Ct^ method, with PPIA as the endogenous control and the mean values of rhesus macaque sample as the calibrator. Statistical significance of expression differences between species was determined using unpaired Student's t‐tests on ΔCt values, with p < 0.05 considered statistically significant.
Identification of Convergent Evolution Between Tibetan and Barbary Macaques
4.11
Tibetan macaques and Barbary macaques (M. sylvanus) both inhabit relatively high‐latitude environments and exhibit larger body sizes compared to their close relatives. To identify genes underlying these potential convergent phenotypic adaptations, we conducted site‐wise likelihood ratio tests for parallel amino acid substitutions based on orthologous gene alignments. We estimated site‐wise log‐likelihood support (SSLS) using RAxML (v8.2.12) [79]. We compared two phylogenetic hypotheses: the null model (H_0_) reflecting the accepted species phylogeny, and an alternative model (H_a_) constraining Tibetan and Barbary macaques as monophyletic. The likelihood difference (ΔSSLS = lnL(H_0_)—lnL(H_a_)) was computed for each site; negative ΔSSLS values indicate support for parallel substitutions, whereas positive values support the species tree. Sites were considered to exhibit significant parallel substitutions if they satisfied all following criteria: (i) same amino acid present in both Tibetan and Barbary macaques, (ii) other macaque species and outgroups displayed a different ancestral amino acid, and (iii) a negative ΔSSLS value. Genes identified as undergoing convergent evolution through this process were subsequently subjected to KEGG and GO enrichment analysis using KOBAS‐i.
Variant Calling and Filtering of Whole‐Genome Resequencing Data
4.12
For the raw whole‐genome resequencing data of 26 Tibetan macaques, we first used Trimmomatic (v0.39) [80] to filter low‐quality reads with the default parameters, and high‐quality sequence reads were further mapped against the Tibetan macaque genome using BWA‐mem (v0.7.17) [81] with the default parameters. We then used SAMtools (v1.6) [82] to sort BAM files and create index files. Potential PCR duplications were filtered using “MarkDuplicates” in Picard (https://broadinstitute.github.io/picard/). The Genome Analysis Toolkit (GATK v4.1.2.0) [83] was used for SNP calling with the analysis type of HaplotypeCaller with the “‐ERC GVCF” option across 26 individuals. All the individual GVCF files were merged using “CombineGVCFs” in GATK. We called and selected candidate SNPs from the combined GVCF file using “GenotypeGVCFs” and “SelectVariants”, respectively. To obtain reliable candidate SNPs, hard filters were applied to the raw SNPs using GATK and VCFtools (v0.1.16) [84] according to the following criteria: QD <2.0, MQ <40.0, FS >60.0, SOR >3.0, MQRankSum ←12.5, ReadPosRankSum ←8.0, minDP <4, maf <0.05 and max‐missing >0.2. Autosomal SNPs and X chromosome SNPs were extracted using VCFtools.
Population Genetic Structure and Phylogenetic Analyses
4.13
We first performed LD‐based pruning for the genotype data using PLINK (v1.9) [85] with the “‐indep‐pairwise 50 5 0.2” option for the autosome‐SNP data set. Principal component analysis (PCA) was performed using PLINK with default parameters. To determine the genetic ancestry and the admixture pattern, further population structure was investigated using ADMIXTURE (v1.23) [86], in which the number of assumed genetic clusters K changed progressively from 2 to 4, with 10,000 iterations for each run. Cross‐validation (CV) was performed with “–cv”, and K = 2 was chosen based on the lowest CV error.
For phylogenetic analysis, ML tree of the autosomal SNPs was constructed using IQ‐TREE (v2.3.1) with the parameters “‐m MFP ‐bb 10000 ‐bnni”. The NWK format tree files were further visualized using online tree illustration website (https://itol.embl.de/).
Gene flow analysis
4.14
We first used the Dsuite (v.0.5r52) [87] to perform D‐statistic (ABBA‐BABA tests) for detecting introgression. This method assigns evidence of gene flow to specific, potentially internal, branches on a phylogeny while taking into account incomplete lineage sorting. The test involves four populations or taxa in the form (((P1, P2), P3), P4) and assesses potential gene flow between P3 and P1 or P2 based on the relative site patterns of ABBA and BABA. Initially, the f4‐ratio was calculated on the basis of the ML phylogenetic tree for all species. This calculation was performed using the ‘Dtrios’ program, with Papio anubis serving as the outgroup. The ML tree was derived from the phylogenetic analysis based on autosomal SNPs. Subsequently, the f‐branch statistic values for each phylogenetic branch were estimated using the ‘Fbranch’ program with the parameter ‘‐p 0.01’. Finally, the obtained f‐branch statistics were visualized using the ‘dtools.py’ script. We also performed population‐level admixture analysis for detecting gene flow among genetic populations and its directions using the TreeMix (v1.13) [88] method with the following running parameters: treemix ‐bootstrap 100 ‐k 1000 ‐noss ‐m 1–5.
Genomic Diversity Analysis
4.15
VCFtools was used to calculate *F_ST_ *, nucleotide diversity (π and θw) in 50 kb sliding windows. We examined site heterozygosity in non‐overlapping 100 kb windows across the genome of each individual. We defined heterozygosity as the number of heterozygous genotypes divided by the total number of sites that were called. The total genotypes called within each window included the sum of heterozygous, homozygote derived, and homozygote reference genotypes. We then quantified the extent of runs of homozygosity (ROH) in each individual using PLINK with the following options: –homozyg –homozyg‐density 50 –homozyg‐gap 1000 –homozyg‐kb 100 –homozyg‐snp 25 –homozyg‐window‐het 1 –homozyg‐window‐snp 100 –homozyg‐window‐threshold 0.05. We binned these segments into three different size categories. The categories were short ROH [0 Mb–1 Mb], medium ROH [1 Mb–10 Mb], and long ROH [>10 Mb].
Demographic and Divergence Inference
4.16
We randomly selected 20,000 SNPs from autosomal SNPs and concatenated these SNPs to form an alignment to estimate divergence time with BEAST (v2.7.7) [89]. The best substitution model of GTR was selected by ModelGenerator v0.85. A total of 1,000,000 iterations were implemented with 10% burn‐ins.
To reconstruct the detailed demographic history of each Tibetan macaque population, we called the consensus sequences using Samtools mpileup by applying: “samtools mpileup ‐q 1 ‐C 50 ‐S ‐D ‐m 2 ‐F 0.002 ‐u ‐f ∗.fa (genome) ∗.bam | bcftools view ‐c | vcfutils.pl vcf2fq ‐d 10 ‐D 100 ‐Q 20 – > ∗.psmc.fq” and “fq2psmcfa ‐q10 ‐s 100 ∗.psmc.fq >∗.psmc.fa.” The PSMC model was used to estimate the population histories from the individual genomes with the following parameters: ‐N 30 ‐t 15 ‐r 5 ‐p 4+25×2+4+6 [90]. We chose a generation length of 11 years and a mutation rate per generation (µ) of 4.67 × 10^−9^ [91].
Selective sweep Analysis of Tibetan Macaque Populations
4.17
To identify genomic regions under selection between the eastern and western clades of Tibetan macaques, we performed selective sweep analysis between eastern and western clades using VCFtools with 50‐kb sliding windows and 25‐kb steps to calculate *F_ST_
- and π ratio. Putative selective regions of each clade were identified as those simultaneously falling within the top 5% of *F_ST_
- values and either top 5% (indicating selective sweeps in the eastern clade) or bottom 5% (indicating selective sweeps in the western clade) of π ratio distributions. Genes located in these regions are expected to represent targets of selection.
Statistical Analysis
4.18
All data are presented as mean ± standard Error of the Mean (SEM) unless otherwise specified. Phenotypic comparisons between TBX6‐mutated and wild‐type mice were analyzed using two‐tailed unpaired Student's t‐tests. For qRT‐PCR, relative gene expression was calculated using the 2^−∆∆Ct^ method, and differences were assessed using unpaired Student's t‐tests. All analyses were performed using R and custom scripts. Significance levels were defined as *p < 0.05, **p < 0.01, ***p < 0.001.
Author Contributions
Conceptualization: J.L., M.L., Z. F., J.Q., Y.T., and R.Z. Methodology: R. Z., Y. H., Y. C., Q. D., L. Z., and W. X. Investigation: R. Z., Y. H., and J. L. Data curation: R. Z., Y.T., Y. C., L. Z., and Q. D. Visualization: R. Z., Y. H., and L. Z. Writing—original draft: R. Z., and Y. H. Writing—review & editing: R. Z., J. Q., J. X., M. L., and J. L. Supervision: M. L., and J. L.
Funding
National Natural Science Foundation of China (Grants No. 32171607 and 32371696).
Conflict of Interest
The authors declare no conflicts of interest.
Supporting information
Supporting File 1: advs73197‐sup‐0001‐SuppMat.pdf.
Supporting File 2: advs73197‐sup‐0002‐Table S6 S7 S9 S12 S16.xlsx.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1C. Bergmann , "Über die Verhältnisse der Wärmeökonomie der Thiere zu ihrer Größe,"Gottinger Studien 3 (1848): 595‐708.
- 2J. A. Allen , “The Influence of Physical Conditions in the Genesis of Species,” Scientific American 63 (1907): 26217–26219, 10.1038/scientificamerican 05111907-26217 csupp. · doi ↗
- 3S. N. Atkinson and M. A. Ramsay , “The Effects of Prolonged Fasting of the Body Composition and Reproductive Success of Female Polar Bears (Ursus maritimus),” Functional Ecology 9 (1995): 559–567.
- 4S. N. Atkinson , R. A. Nelson , and M. A. Ramsay , “Changes in the Body Composition of Fasting Polar Bears (Ursus maritimus): The Effect of Relative Fatness on Protein Conservation,” Physiological Zoology 69 (1996): 304–316, 10.1086/physzool.69.2.30164186. · doi ↗
- 5S. Liu , E. D. Lorenzen , M. Fumagalli , et al., “Population Genomics Reveal Recent Speciation and Rapid Evolutionary Adaptation in Polar Bears,” Cell 157 (2014): 785–794, 10.1016/j.cell.2014.03.054.24813606 PMC 4089990 · doi ↗ · pubmed ↗
- 6X.‐G. Qi , J. Wu , L. Zhao , et al., “Adaptations to a Cold Climate Promoted Social Evolution in Asian Colobine Primates,” Science 380 (2023): abl 8621, 10.1126/science.abl 8621.37262163 · doi ↗ · pubmed ↗
- 7F. M. Key , M. A. Abdul‐Aziz , R. Mundry , et al., “Human Local Adaptation of the TRPM 8 Cold Receptor Along a Latitudinal Cline,” PLOS Genetics 14 (2018): 1007298, 10.1371/journal.pgen.1007298.PMC 593370629723195 · doi ↗ · pubmed ↗
- 8S. M. Lehman and J. G. Fleagle , Primate Biogeography (Springer Science & Business Media, 2006).
