Inferring Domestic Goat Demographic History Through Ancient Genome Imputation

Jolijn A M Erven; Alice Etourneau; Marjan Mashkour; Mahesh Neupane; Phillipe Bardou; Alessandra Stella; Andrea Talenti; Clet Wandui Masiga; Curtis P Van Tassell; Emily Clark; François Pompanon; Licia Colli; Marcel Amills; Marco Milanesi; Paola Crepaldi; Bertrand Servin; Benjamin D Rosen; Gwenola Tosser-Klopp; Kevin G Daly

PMC · DOI:10.1093/gbe/evaf181·November 10, 2025

Inferring Domestic Goat Demographic History Through Ancient Genome Imputation

Jolijn A M Erven, Alice Etourneau, Marjan Mashkour, Mahesh Neupane, Phillipe Bardou, Alessandra Stella, Andrea Talenti, Clet Wandui Masiga, Curtis P Van Tassell, Emily Clark, François Pompanon, Licia Colli, Marcel Amills, Marco Milanesi, Paola Crepaldi, Bertrand Servin

PDF

Open Access

TL;DR

This paper shows how ancient goat genomes can be analyzed using imputation to study genetic changes during domestication and dispersal.

Contribution

The study introduces a novel approach for analyzing low-coverage ancient genomes using imputation and ROH/IBD patterns to infer demographic history.

Findings

01

Imputation of low-coverage goat genomes achieves high concordance with high-coverage genomes after filtering.

02

ROH patterns suggest increased inbreeding in Neolithic goats farther from the Zagros Mountains.

03

IBD analysis shows less relatedness in early herding sites compared to later dispersal periods.

Abstract

Goats were among the earliest managed animals, making them a natural model to explore the genetic consequences of domestication. However, a challenge in ancient genomic analysis is the relatively low genome coverage for most samples, limiting analysis to pseudohaploid genotypes. Genotype imputation offers potential to alleviate this limitation by improving information content and accuracy in low coverage genomes. To test this, we used published high coverage (>8✕) goat palaeogenomes, imputing downsampled genomes using the VarGoats dataset (1,372 individuals) as a reference panel. Measuring concordance between imputed and high coverage genotypes, we find high concordance after filtering for common (>5%), high confidence variants, with 0.5✕ genomes reaching >0.97 concordance. There is a trade-off between coverage, genotype probability (GP) thresholds, and genotype recovery, where higher…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species1

Capra hircus(domestic goat · species)

Figures6

Click any figure to enlarge with its caption.

Fig. 1 — Overview of high coverage goat palaeogenomes and downsampling approach. Goat silhouettes from phylopic.org.

Fig. 2 — Non-reference concordance across MAF tranches for four samples, with downsampled coverages ranging from 0.1 to 4✕. Concordance values are calculated for transitions and transversions combined. Colors indicate different genotype probability (GP) thresholds. 0.1–1✕ is shown in Fig. S1.

Fig. 3 — Non-reference concordance across MAF tranches for four samples, with downsampled coverages ranging from 0.1 to 1✕. Concordance values are calculated for transitions and transversions separately, depicted by type. Colors indicate different genotype probability (GP) thresholds. 0.1–4✕ is shown in Fig. S2.

Fig. 4 — Patterns of imputed heterozygous veracity. a) false-positive rate (FPR) and b) false-negative rate (FNR), at the 5% MAF threshold. “True” heterozygotes are defined using the validation nonimputed genotypes calculated using the entirety of sequencing data available for each test sample.

Fig. 5 — Analysis of ROH for imputed ancient goat genomes with ≥0.5✕ mean coverage. Key indicates the general archeological period the samples derived from. a) Map of archeological settlements from which imputed genotypes are analyzed. b) ROH length bin profiles for ≥0.5× imputed ancient goats. ROH was calculated with each ancient sample individually, using transversions-only and downsampling to 1.6 million sites, which corresponds to the lowest number of imputed sites in the 0.5✕ imputed test samples. ROH profiles are the result of combining plink ROH and long ROH (>4 Mb) from bcftools.

Fig. 6 — IBD sharing a) within-settlement, normalizing relatedness for assemblages with ≥3 imputed genomes with ≥0.5✕ and b) between-settlement cumulative shared IBD (cM). For Tappeh Sang-e Chakhmaq, we excluded a single ∼7,000 cal BCE individual from the West Mound, Semnan1-2. Between-settlement IBD is represented by lines between-settlement pairs and does not imply direct cultural connectivity or dispersal patterns.

Funding3

—Taighde Éireann—Research Ireland
—BBSRC Awards
—France Génomique ‘Call for high impact projects’”

Keywords

ImputationaDNAGoat domestication

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenetic and phenotypic traits in livestock · Forensic and Genetic Research · Genetic diversity and population structure

Full text

Introduction

The domestic goat (Capra hircus) was one of the earliest animals brought under human management, likely beginning before 8,200 year BCE (Before Common Era) in Southwest Asia (Zeder 2008). Faunal assemblages from the Central Zagros Mountains of Iran show demographic indicators of young slaughtering consistent with a preference for kid- and milk-producing females (Zeder and Hesse 2000). Genomic data from these assemblages support this conclusion, showing differential uniparental diversity patterns and signs of recent inbreeding [homozygosity tracts >5 Mb, tracing back to the last 4–8 generations or more recently (Stoffel et al. 2021)], expected within a more limited breeding population (Daly et al. 2021). Control of goats is posited at other regions during this Aceramic/Pre-Pottery Neolithic period. For example, at Aşıklı Höyük in Central Anatolia, there is evidence for a gradual development of goats and sheep being kept within enclosed spaces of the settlement, and with tethering of animals likely practiced (Zimmermann et al. 2018; Abell et al. 2019). Across the Southern Levant, there are indications of fodder usage at Abu Ghosh ∼8,000 BCE, and a widespread increase in the occurrence of goat remains relative to wild animals (Makarewicz and Tuross 2012; Munro et al. 2018). Goat management practices—and managed goats themselves—dispersed throughout neighboring environs in the following millennia. Targeted slaughtering of young, grown males became ubiquitous in Southwest Asia by the mid-7th millennium BCE (Arbuckle and Atici 2013), while managed goat herds reached Northeast Iran in the mid-to-late 8th millennium BCE (Roustaei et al. 2015) and Europe by the late 7th millennium BCE (Zeder 2008; Arbuckle et al. 2014).

The consequences of these initial phases of management and diffusion have been broadly characterized using palaeogenomes, with evidence of recent inbreeding among early managed goats in the Zagros Mountains (Daly et al. 2018, 2021). However, genetic data from ancient remains is defined by its relatively poor quality, due to postmortem degradation and associated low sequencing depth. Consequently, most palaeogenomic analyses are limited to pseudohaploid genotypes, and lack the richer demographic inferences possible from diploid genotypes. Similarly, missing genotypes restrict the accurate detection of shared haplotypes between chromosomes, both within-individual [runs-of-homozygosity (ROH)] and between-individual [identity-by-descent (IBD)] regions (Browning and Browning 2012; Peripolli et al. 2017; Blondeau Da Silva et al. 2024). This excludes many palaeogenomes from analyses such as IBD-based demographic inference (Fournier et al. 2023), or their incorporation into ancestral recombination graphs (Speidel et al. 2021).

A long-standing model of domestication is the domestication bottleneck, where a small number of animals are used to found managed herds and interbreeding with wild populations (Zeder et al. 2006). Under this standard model, the resulting managed population is expected to show reduced genetic diversity due to the imposed genetic drift and lack of gene flow (Frantz et al. 2020). A concomitant prediction is an increase of inbreeding events and ROHs, due to the reduced effective population size. Genetic data have required increasingly nuanced modifications to this domestication-diversity model. Genomes from horses (Schubert et al. 2014), dogs (Marsden et al. 2016), and chickens (Wang et al. 2021) have revealed a greater burden of deleterious variation in domestic populations compared to wild, suggestive of a fixation of variants which may otherwise be purged in a larger population.

However, in both sheep (Sandoval-Castellanos et al. 2024) and goat (Daly et al. 2021) mitochondrial diversity is high in the earliest millenia of herding, but subsequently declines. Data from sorghum, maize, and barley suggest a postdomestication of erosion of diversity rather than an initial sharp decrease (Allaby et al. 2019). In contrast, ancient horse populations show several distinct population declines, linked to episodes of intensified management and breeding practices (Librado et al. 2024). Nuclear genomes from early herded goats also show a degree of inbreeding, but paradoxically higher estimated heterozygosity values than later domestics (Daly et al. 2021). These apparently contradictory findings necessitate a reassessment of the standard domestication model (Frantz et al. 2020), informed by species-specific biological and cultural histories. A meaningful reassessment would require ancient samples with sufficient coverage (information content) to provide the power needed to address these standard domestication models.

A powerful and cost-effective way to increase the information content of ancient samples is genotype imputation. Genotype imputation is the statistical inference of genotypes at unknown or missing sites, using known haplotype variation in a set of reference individuals (Browning and Browning 2016). This method facilitates the inference of diploid genotypes from low coverage ancient genomes, enabling haplotype-aware and genealogical analyses. Genotype imputation is widely employed in ancient humans (Gamba et al. 2014; Martiniano et al. 2017; Cassidy et al. 2020; Hui et al. 2020; Ariano et al. 2022; Ausmees et al. 2022; Ausmees and Nettelblad 2023; Sousa da Mota et al. 2023), ancient pigs (Erven et al. 2022), ancient bovids (Erven et al. 2024; Rossi et al. 2024), ancient canids (Bougiouri et al. 2024), historic-era horses (Todd et al. 2023), and modern livestock (Wang et al. 2022; Ding et al. 2023; Lloret-Villas et al. 2023; Zhang et al. 2023). The method has proven to be a step-change in our increasingly fine-scale understanding of human population history, offering insights into prehistoric social structures at the level of shared haplotypes (Cassidy et al. 2020; Allentoft et al. 2024; Lazaridis et al. 2025), demographic bottlenecks in the establishment of insular populations (Ariano et al. 2022), and the trajectory of selection within different ancient ancestries (Irving-Pease et al. 2024; Vaughn and Nielsen 2024).

The VarGoats project has surveyed commercial and local breeds, the progenitor species Capra aegagrus (the bezoar ibex), and other Capra species to synthesize a dataset of over 1,100 novel or publicly available genomes (Denoyelle et al. 2021), now expanded to 1,372 individual genomes representing 155 country-breed groupings with a worldwide distribution. This global-scale genomic dataset represents an opportunity to employ imputation on goat palaeogenomes. With the aim to test the long-standing assumption of an early domestication-related population bottleneck in livestock species, we first performed downsampling experiments on high quality goat palaeogenomes to ensure imputation accuracy and then utilized the imputed genotypes to directly assess changing inbreeding patterns through time.

Results

We systematically evaluated genotype imputation of ancient low coverage goat genomes, following the GLIMPSE2 pipeline (Rubinacci et al. 2023), schematically described in Fig. 1. The accuracy of imputation—the frequency at which imputed genotypes matched nonimputed genotypes for the same sample—was tested by downsampling four ancient high coverage goat genomes. These four genomes comprise different ancestries across domestic and wild goats which are variably represented within our reference panel (Fig. 1, Tables S1 to S2). Heterozygous and homozygous alternative genotypes are expected to impute less effectively than homozygous reference (Zhang et al. 2021), so we particularly focused on these former variants in the assessment of accuracy. We additionally assessed the number of variant sites recovered per-sample in the imputed data, relative to the number of variant sites covered by the sequencing data alone.

Overview of high coverage goat palaeogenomes and downsampling approach. Goat silhouettes from phylopic.org.

Imputation Accuracy

The imputation accuracy of ancient goats, measured by the genotype concordance of non-reference genotypes between high coverage and imputed genotypes (heterozygous and homozygous alternative) showed a clear dosage dependency (Fig. 2, Tables S5 to S6). Consistent with previous findings in ancient humans (Sousa da Mota et al. 2023), cattle (Erven et al. 2024), and canids (Bougiouri et al. 2024), higher coverage led to greater imputation concordance. This effect of coverage is more pronounced in rare alleles (<2% MAF), which show a greater decline in genotype concordance as coverage decreases compared to more common alleles (Fig. 2). Genotype concordance plateaus between the 2% to 5% and 5% to 10% MAF tranches (Figs. 2, S1, and S2), where it stabilizes from 0.5✕ coverage onwards (>0.90 concordance; Figs. S1 and S2, Table S5). We find concordance gains at higher coverage in both heterozygous and homozygous alternative genotypes, with sample-specific effects likely due to uneven representation of ancient ancestries in the reference panel (Supplementary Note S1, Figs. S3 and S4, Tables S7 and S8).

Non-reference concordance across MAF tranches for four samples, with downsampled coverages ranging from 0.1 to 4✕. Concordance values are calculated for transitions and transversions combined. Colors indicate different genotype probability (GP) thresholds. 0.1–1✕ is shown in Fig. S1.

Genotype concordance was additionally sensitive to the minimum genotype probability (GP) threshold, which was applied to the imputed data as a quality filter (Methods—Validation of genotype imputation). A more stringent GP threshold resulted in higher genotype concordance; this gain in genotype concordance was greater in the 0.1✕ imputed data compared to the higher coverage imputed data (>1✕; Figs. 2 and S1), as observed in humans and cattle (e.g. Sousa da Mota et al. 2023; Erven et al. 2024). In general, we observed genotype concordance for common alleles (MAF tranches >5%) exceeding 0.69 and 0.87 at a minimum coverage of 0.1✕ for GP70 and GP99; 0.89 and 0.95 for a minimum coverage of 0.5✕; and 0.96 and 0.98 for a minimum coverage of 2✕ (Table S5).

We observed differences in genotype concordance between transversions and transitions, where transitions showed a lower concordance than transversions (Figs. 3 and S2, Table S6), particularly at MAF tranches below 2%. Heterozygous genotypes specifically appear to be the primary driver of this divide between transversion and transition concordance rates (Figs. S3 and S4, Table S6). Differences between transversions and transitions in more common alleles (MAF tranches >5%) are minimal (Figs. S5 to S7): transversions have on average a 0.019, 0.014, and 0.011 higher genotype concordance at 0.1✕, 0.5✕, and 2✕, respectively (Table S6). Finally, we observe variation in the difference of transition/transversion concordance rates between individuals (Fig. 3). This does not appear to be driven by residual postmortem deamination error, as we see little evidence of systematic differences in error at CpG sites (Table S2), and may be instead due to a combination of sample-specific error and dissimilarity to the reference panel.

Non-reference concordance across MAF tranches for four samples, with downsampled coverages ranging from 0.1 to 1✕. Concordance values are calculated for transitions and transversions separately, depicted by type. Colors indicate different genotype probability (GP) thresholds. 0.1–4✕ is shown in Fig. S2.

We then examined heterozygous sites, known to be more challenging to impute (Fig. S3; (Hui et al. 2020; Sousa da Mota et al. 2023; Erven et al. 2024; Escobar-Rodríguez and Veeramah 2024). We calculated the heterozygous false-positive rate (FPR), measured as number of imputed heterozygous called as homozygous in the validation dataset (FP) divided by the sum of the number of FP and correctly imputed homozygous sites (TN). The heterozygous FPR was sensitive to coverage and GP, with larger decreases in FPR seen at stricter GP thresholds (Figs. 4 and S8, Tables S7 and S8). We also see that variation between GP thresholds shows sample-specific patterns (Supplementary Note S2, Fig. S8, Tables S7 and S8). We finally explored the false-negative rate (FNR) of heterozygous sites and found a large effect of the GP filter (Supplementary Note S2).

Patterns of imputed heterozygous veracity. a) false-positive rate (FPR) and b) false-negative rate (FNR), at the 5% MAF threshold. “True” heterozygotes are defined using the validation nonimputed genotypes calculated using the entirety of sequencing data available for each test sample.

Given the concordance rates observed across individuals (including the Epipalaeolithic bezoar Direkli1-2) at 0.5✕ and associated suppression of false heterozygous genotypes (FPR), we chose to focus analyses at this coverage level and primarily use a GP99 filter in the following sections.

Imputation Recovery

There is a clear positive trend between coverage, GP threshold, and site recovery; higher coverage and more lenient GP thresholds lead to a greater number of imputed sites (Figs. S9 and S10, Table S9). The total number of non-reference sites recovered for the test samples at 0.5✕, 5% MAF, and the strictest GP (0.99) are 1.9 M (Acem2), 2.1 M (Blagotin3), 1.5 M (Direkli1-2), and 1.9 M (Semnan3) (Table S9). Lowering the GP threshold to 0.95 increases the number of recovered sites to a greater degree than subsequent lower GP thresholds (Figs. S9 and S10). On average, the number of recovered heterozygous sites was greater than the recovered homozygous alternative sites, with low coverages (<0.4✕) an exception (Fig. S10), possibly due to a higher heterozygous FPR at these coverage levels (Fig. 4).

We compared the recovery rate before and after imputation by comparing pseudohaploid (i.e. the genetic data which would be considered if no imputation was performed) and imputed calls, focusing on common alleles (MAF ≥5%). Imputation improves recovery rates, particularly for low coverage samples (<1✕) (Fig. S11). Moreover, imputation also enables us to call heterozygotes variants which show a low error rate (∼1.5% FPR for 0.5✕, Fig. 4). Finally, we compared different GLIMPSE software to evaluate which pipeline would be most suitable for our data (Tables S10 and S11). GLIMPSE1 (Rubinacci et al. 2021) showed a slightly higher genotype concordance (Fig. S12), while GLIMPSE2 showed higher recovery rates (Fig. S13). We opted for GLIMPSE2, due to the substantially higher recovery in imputed sites (e.g. ∼12% gain at MAF 5%, GP99 at 0.5✕).

Genome-wide Concordance and Heterozygous FPR

We next explored differences in chromosomal region imputation accuracy. We calculated non-reference concordance for our 0.5✕ downsampled genomes with a 5% MAF and GP99 threshold across the genome by utilizing a sliding window approach (500 kb window, 100 kb step). Non-reference concordance was relatively stable in all four samples, with the lowest observed for the Epipaleolithic wild bezoar Direkli1-2 (Figs. S14 to S17). The bottom 1st percentile of windows were considered as outlier regions; the lowest concordance was observed in Semnan3 on chromosome 27 (Fig. S17). The number of individual outlier regions ranged from 91 to 102, with Semnan3 exhibiting the fewest and Direkli1-2 the most. Among the identified outlier regions, 12 were detected in three or more samples (represented as black bars in Figs. S14 to S17).

Notably, four of the imputation outlier regions are located within or overlap with the first and last 2 Mb of chromosomes, likely reflecting the inaccurate imputation of variants mapping to telomeric regions (Table S12). Additionally, three regions contain olfactory receptor or immunity-related genes (GO terms are in Table S12; pooled regions show no specific GO enrichment), consistent with previous findings of difficult to impute regions in both ancient and modern cattle (Zhang et al. 2023; Erven et al. 2024). Finally, we characterized chromosomal distributions of the heterozygous FPR and found a greater degree of sample-specific variation than compared to concordance rates (Supplementary Note S4, Figs. S18 to S21, Table S13). The outlier regions present in the majority of the samples (at least three; Tables S12 and S13) for both non-reference concordance and heterozygous FPR were excluded from subsequent analysis.

Downstream Analyses

To assess and quantify potential biases introduced by imputation already known to occur (Erven et al. 2022), we performed comparisons between high coverage samples and their imputed counterparts. Both genotype-based analyses, such as principal component analysis (PCA), ROH analysis, and the haplotype-based IBD analysis were performed to assess imputation accuracy. Reduced concordance and high FPR outlier regions (Tables S12 and S13) were excluded from downstream analyses.

As a preliminary assessment of the imputed genotypes we performed PCA, comparing high coverage and pseudohaploid data (Figs. S22 to S24). We found consistency with previously reported affinities but also observed a large effect of missingness in PCA placement, suggesting that using lower GP thresholds on imputed data may be justified to ensure adequate site number (Supplementary Note S5).

Outgroup f3 Statistics

As a test case of potential applications of using imputed goat palaeogenomes in concert with globally distributed modern genotypes, we assessed the genetic history of breeds today and their relationship to ancient populations. Evidence from modern goat breed genotype data suggests that the continental gene pool of Europe, Asia, and Africa were primarily established by the dispersal of managed herds from Southwest Asia, with some subsequent admixture events (Colli et al. 2018). To address this, we measured shared genetic drift between ancient and modern genomes in the Vargoats dataset by imputing 36 previously published ancient goat palaeogenomes with ≥0.5✕ coverage (Table S3), and subsequently filtering genotypes (GP99). We then computed outgroup f_3_ statistics (Patterson et al. 2012), calculated between pairs of ancient genomes and also with modern breed groups (Table S14). We plotted these values on a world map indicating the provenance of each genome or breed grouping; we present a subset of these in Figs. S25 to S31 and the remainder at 10.17605/osf.io/av4f9.

The shared drift measurements with modern domestic goats allowed a broad description of how ancient goat genomes relate to breeds today. We observe for the first time that African breeds, particularly those in East Africa, showed greater affinity with a Bronze Age goat from Israel (Yoqneam2; Fig. S25) and Central Türkiye (Acem2; Fig. S26), likely reflecting their initial dispersal from Southwest Asia (Marshall and Hildebrand 2002; Prendergast et al. 2019). Among African breeds, there are differential affinities to Neolithic genomes. Goats from North and West Africa show greater affinity with Neolithic European genomes, while breeds from East Africa show more affinity with ancient Iranian and Turkish goats, similar to patterns observed from modern data (Colli et al. 2018). Asian goats today, particularly those from Bangladesh, showed high affinity with the Neolithic East Iranian genomes (e.g. Semnan3; Fig. S27); goats from South Asia may therefore better represent the ancestry of animals during the initial dispersal of livestock herds into Central and eastern Asia.

There are hints in our data that this may be confounded by admixture with wild bezoar, with modern Asia breeds having low but varying affinity with a predomestication genome from Armenia (Hovk1; Fig. S28); there are some evidence of admixture from other Capra species into domestic goat (Zheng et al. 2020; Li et al. 2022). Breeds from Europe or having European origin showed affinity with the Neolithic Serbian genomes (e.g. Blagotin3; Fig. S29), while a Late Bronze Age goat from Potterne, Wiltshire (UK) showed highest affinity with Old Irish Goat (Fig. S30). Finally, ancient bezoar showed higher affinity with European goats and least with those from East Asia (Fig. S31), in line with previous inferences of Turkish bezoar admixture in West Eurasian domestic genomes (Daly et al. 2018, 2022). Shared drift between pairs of ancient genomes recapitulates other indicators of affinity, including broad affinity among post-Neolithic Iranian genomes and notable affinity of the Bronze Age goat from Israel (Yoqneam2) to post-Neolithic genomes across Southwest Asia.

Testing Runs-of-Homozygosity Inference

The recovery of imputed diploid genotypes permits the direct testing of the domestication bottleneck hypothesis by tracking the burden of ROHs through time. To do this, we first optimized ROH inference by assessing the parameters affecting their estimation in our imputed genotypes. We computed ROHs with no missingness across test samples simultaneously using transitions and transversions (“all sites”) or with transversions-only. We additionally computed ROHs on an individual basis using transitions and transversions or transversions-only. For each dataset, we further downsampled to 1 M, 750, and 500 K sites to determine a minimal number of SNPs necessary for accurate ROH calculation. The nonimputed high coverage samples were used as validation of ROH profiles and were computed without downsampling on an individual basis using either transitions and transversions or transversions-only. Exploring the effect of site number of ROHs (Fig. S32, Supplementary Note S6), we opted to calculate ROHs on datasets >750 K sites with a minimum of 200 SNPs per ROH, using both the combined datasets with no missingness and individual-based ROH calculation.

We tested the consistency between imputed and high coverage genotypes, when no missingness is allowed within the dataset (i.e. when the same sites are analyzed in high coverage and imputed genotypes). We find high consistency between the imputed and high coverage ROH profiles, indicating that false-positive and false-negative heterozygotes have less effect on ROH profiles at a GP threshold of 0.99 (Fig. S33). A similar pattern is observed when using bcftools to calculate ROHs, which produce similar profiles to plink for high coverage and 0.5✕ imputed genomes assessed together with no missingness (Fig. S34), although we find instances of erroneously fragmented long ROHs (Figs. S35 to S38, other sample figures available at 10.17605/osf.io/av4f9).

Given the robust parameter space identified, we computed ROHs using imputed ≥5% MAF genotypes of goat palaeogenomes with ≥0.5✕ coverage, applying a GP99 filter and removing sites with any missingness across samples (Fig. 5, Table S3 and S15). We compared these combined ROH profiles to ROH profiles calculated from transversions-only, with ROHs calculated on a per-individual basis (Fig. S32, Supplementary Note S6), downsampling each individual to a random 1.2 and 1.6 M sites (matching the lowest number of sites across samples for 0.4✕ and 0.5✕, respectively—Table S15). We observed large differences between ROH profiles generated using plink and bcftools in the combined dataset (Fig. S39). Using ROHan to infer reference ROH profiles (Renaud et al. 2019), plink consistently detects fewer ROHs, especially long ROHs (>8 Mb). In contrast, bcftools overestimated short ROHs, but correctly called longer ROHs based on comparison to heterozygosity profiles and ROHan profiles (Figs. S35 to S39). These discrepancies were diminished when using higher SNP density datasets (1.6 M sites; Fig. S40). Analysis of local ROHs and heterozygosity profiles across the different datasets suggests that ROH fragmentation in the combined dataset is likely due to lower density of SNPs rather than an increase of heterozygous SNPs (Figs. S36 to S38), leading to the lower number of long ROHs called by plink. These findings highlight the importance of SNP density in ROH estimation, the potential underestimation of long ROHs when using plink, and overestimation in small ROHs using bcftools.

Analysis of ROH for imputed ancient goat genomes with ≥0.5✕ mean coverage. Key indicates the general archeological period the samples derived from. a) Map of archeological settlements from which imputed genotypes are analyzed. b) ROH length bin profiles for ≥0.5× imputed ancient goats. ROH was calculated with each ancient sample individually, using transversions-only and downsampling to 1.6 million sites, which corresponds to the lowest number of imputed sites in the 0.5✕ imputed test samples. ROH profiles are the result of combining plink ROH and long ROH (>4 Mb) from bcftools.

To mitigate these opposing biases we combined approaches, merging bcftools-identified long ROHs (>4 Mb) with plink-computed ROHs. This integrative approach resulted in similar ROH profiles for individual-based transversions (Fig. 5b) and “combined” transition and transversions datasets (Fig. S41). However, it is important to note that bcftools when applied on limited sites (combined dataset: 897 K sites) may overcall long ROHs (Figs. 5 and S35), and results with lower site numbers should be interpreted with caution.

Spatio-temporal Patterns of Runs-of-Homozygosity in Goats

Having established a pipeline for ROH inference, we investigated how these genomic signatures of inbreeding varied during and after the initial phases of goat domestication. Assessing computed ROH patterns, goats from Ganj Dareh, a settlement with some of the earliest known evidence of livestock management ∼8,200–7,600 cal BCE (Zeder and Hesse 2000; Daly et al. 2021), typically show a total ROH amount of <100 Mb, with individual ROHs occurring across length categories (Fig. 5b). In contrast, two genomes were outliers in having longer ROH stretches (Ganjdareh18 and 22). The long stretches of ROHs in these samples were previously identified (Daly et al. 2021) and linked to potential kin matings within a management-induced small breeding population. In contrast, the overall genetic diversity in Ganj Dareh has been reported as high relative to other ancient goats (Daly et al. 2021). Other Ganj Dareh goats (Ganjdareh3, 20, 26, and 34) showed a low overall cumulative sum of ROHs (Fig. 5b). Genomes (Abdul4 and Lur12) from the nearby Tepe Abdul Hosein (inhabited ∼8,250–7,800 and ∼7,600–7,550 cal BCE; Nishiaki 2022), show a comparable ROH profile to these lower ROH Ganj Dareh samples.

Goats from the Northeast Iranian settlement of Tappeh Sang-e Chakhmaq (Mashkour et al. 2014; Roustaei et al. 2015), with both Aceramic (∼7,000 cal BCE; sample Semnan1-2 in Fig. 5b) and Ceramic (∼6,000 cal BCE; remaining Semnan samples) phases, showed significantly elevated ROH profiles compared to those from the earlier site of Ganj Dareh (Mann–Whitney U test; U = 7, one-tailed P = 0.01465). Furthermore, no outliers with large cumulative sums of long ROHs were present, with analyzed genomes showing an inflation of ROHs across size categories. These consistent elevated ROH profiles are indicative of a bottleneck during the dispersal from a region of domestication and beyond the range of their wild progenitor C. aegagrus (Zheng et al. 2020; Daly et al. 2021; Petretto et al. 2024; Zeder 2024).

Following the Neolithic period, goats from archeological assemblages across Iran, Türkiye, Israel, Armenia, Georgia, Turkmenistan, and Uzbekistan exhibited a reduced cumulative sum of ROHs. For instance, at Monjukli Depe in southern Turkmenistan, the ROH profile of Monjukli8, a sample from Neolithic layers (6,400–5,800 BCE), closely resembled that of contemporaneous goats from Tappeh Sang-e Chakhmaq. This pattern shifts in the subsequent Chalcolithic period (5,100–4,500 BCE; also known as Aeneolithic) as evidenced by a later genome from Monjukli Depe (Monjukli4), which displayed a lower cumulative sum of ROHs. This pattern, observed across Southwest Asia, indicates that herd genetics were shaped by increased outbreeding or a sustained lack of inbreeding over the intervening millennia. Alternatively, bezoar brought under human control may already carry inbreeding signals, due to their nonpanmictic polygynous reproduction system (i.e. isolated populations with male competition prior to reproduction). Notably, contrasting ROH profiles can be observed within goats in Bronze Age Uzbekistan (Tilla Bulak, ∼2,000–1,700 BCE), where Bulak1 shows an excess of long ROHs compared to Bulak2. Additional samples from Central Asia are required to draw broader regional conclusions.

ROH profiles from the westward dispersal of managed goats to Europe also suggested a population bottleneck. At Blagotin-Poljna (Serbia) ∼6,000 cal BCE, we saw one outlier (Blagotin16) with a high amount of long ROHs, indicative of recent inbreeding. While this is not observed in the other Blagotin-Poljna goats, their overall summed ROHs (average ∼197 Mb) was larger than those from the Central Zagros Mountains (8,200–7,600 cal BCE; on average ∼100 Mb), however this was not statistically significant, likely due to the small sample size (Mann–Whitney U test; U = 6, one-tailed P = 0.05455). Our dataset contained one European sample postdating the Neolithic in Europe, Potterne1 (Potterne, Wiltshire, UK; ∼800 cal BCE), which showed a notable excess of long ROHs, suggestive of recent inbreeding.

Identity-by-descent

To assess our ability to recover shared chromosomal segments we identified IBD regions (Supplementary Note S7, Table S16). Aware of limitations in our IBD recovery for ancestries not well represented in the phasing panel (e.g. Direkli1-2), we analyzed pairwise IBD sharing among imputed genotypes of goat palaeogenomes with ≥0.5✕ (Table S3), applying a GP99 and MAF 5% filter, to compare within-settlement relatedness and to evaluate IBD sharing between-settlements. We restricted this to the Neolithic period due to the low sample size available for other time periods (Table S3). Normalized relatedness was calculated for settlements with a minimum of three samples. Individuals were considered related if they shared at least three IBD segments >3 cM, with a cumulative IBD length of at least 21 cM.

Normalized relatedness was calculated for Ganj Dareh (Iran, ∼8,200–7,600 cal BCE), Tappeh Sang-e Chakhmaq (Iran ∼7,000–6,000 cal BCE), and Blagotin-Poljna (Serbia ∼6,000 cal BCE), shown in Fig. 6a. Among these settlements, Tappeh Sang-e Chakhmaq showed the highest normalized relatedness (1), indicating that all of the tested individuals were related (Table S17). We excluded from these group comparisons the single genome from the Aceramic West Mound (∼7,000 cal BCE) at Tappeh Sang-e Chakhmaq, Semnan1-2, which incidentally has short IBD with the ∼6,000 cal BCE East Mount goats (Table S18). The Blagotin-Poljna assemblage also showed a relatively high normalized relatedness (0.55), indicating that roughly half of the individuals were related (Table S17). In contrast, goats from Ganj Dareh showed a normalized relatedness of 0, indicating no detectable genetic relatedness among individuals.

IBD sharing a) within-settlement, normalizing relatedness for assemblages with ≥3 imputed genomes with ≥0.5✕ and b) between-settlement cumulative shared IBD (cM). For Tappeh Sang-e Chakhmaq, we excluded a single ∼7,000 cal BCE individual from the West Mound, Semnan1-2. Between-settlement IBD is represented by lines between-settlement pairs and does not imply direct cultural connectivity or dispersal patterns.

Between-settlement IBD broadly matched geographic proximity (Fig. 6b), with little long-range IBD detected. We saw notable IBD sharing in Neolithic East Iran, specifically between Tappeh Sang-e Chakhmaq and Monjukli Depe in nearby Turkmenistan (Fig. 6b; the older Semnan1-2 genome also showed IBD sharing with Monjukli8, see Table S18), sites showing cultural affinities (Roustaei et al. 2015). IBD sharing between Tappeh Sang-e Chakhmaq and Ganj Dareh was notably lower, suggesting a weaker connection at least partially driven by the temporal difference (∼2,200–600 years) between the assemblages.

Discussion

We report the successful imputation of ancient goat genomes. Using the large-scale VarGoats reference datasets and GLIMPSE2 (Rubinacci et al. 2023), non-reference concordance rates for both transition and transversion variants reach 0.7 for the lowest tested coverage and GP (0.1✕, at GP70—Figs. 2 to 3, Table S7). Once stratified for common variants (MAF ≥5%) and GP99, concordance increases to >0.9 at the lowest tested imputation coverage (0.1✕). Higher MAF thresholds (i.e. 10%) produce similar concordance values to those seen at 5% (Fig. 2, Table S7). For genomes imputed here, concordance is >0.99 at 0.5✕ for common variants. Notably, we imputed common variants in an Epipaleolithic wild bezoar (Direkli1-2) with a non-reference genotype concordance of 0.97 at 0.4✕ and 0.5✕, demonstrating that variation shared between wild and domestic populations can be accurately imputed even with a largely domestic reference panel. This suggests that ancient domestic populations not studied here will impute with similar accuracy using the VarGoats reference genotypes, when restricted to common variants (≥5% MAF).

We note some limitations in our imputation approach. Variants below the frequency threshold of 5% impute less accurately (Fig. 2), particularly at lower coverages; studies across species have observed that alleles less common in a reference panel are more difficult to impute (Sousa da Mota et al. 2023; Bougiouri et al. 2024; Erven et al. 2024; Yao et al. 2025). As a consequence, variants unique to ancient populations or rare today (i.e. having a MAF below 5%) are not recovered accurately (Figs. 2 and 3). We additionally observe a difference between transitions and transversions in rare alleles (Fig. 2), also reported in ancient humans and dogs (Sousa da Mota et al. 2023; Bougiouri et al. 2024) but to a lesser extent. A potential reason for this could be differences in how genotype concordance was calculated—comparing all genotypes versus only heterozygous and homozygous alternatives in this study. Alternatively, those imputation reference panels may have better representation of the ancestry of tested samples compared to the present study, including at rarer alleles. Future work should consider incorporating high quality ancient genomes into imputation reference panels, thus making the panel more representative of the divergent ancestry to be imputed, potentially improving accuracy (Escobar-Rodríguez and Veeramah 2024).

Filtering criteria for imputed genotypes necessitate a trade-off between genotype concordance and the number of genotypes available for analysis. Heterozygous errors—both spurious (FPR) and missed (FNR)—are apparent in our data (Fig. 4). We find stringent GP thresholds can mitigate these errors, at the cost of genotype number. As such, we recommend that for this and comparable datasets, the GP threshold be tailored to the downstream analysis. For analyses in which accuracy is paramount, such as ROHs or IBD, a high GP filter is necessary to detect shared haplotypes or regions depleted of variants. For PCA analyses, we find missingness to be a larger source of error than imputation (Supplementary Note S4); a lower GP filter such GP80 or GP90 may therefore be reasonable.

We demonstrate the potential of imputed genotypes to infer ROH profiles accurately and report broad trends of inbreeding in this ancient livestock. Our exploration of plink-based ROH parameters suggest individual-level ROH calculation, restricted to transversion genotypes and matching variant numbers by random downsampling to match the sample with the lowest variant count, captures many of the same patterns as a “combined” dataset of all individuals. This may circumvent an issue of “combined” ROH calculation, which is limited by decreasing site number when a “no missingness” filter is used and the number of individuals increases. In general, our testing highlights the sensitivity of plink-calculated ROH to variant density and to the parameters used (Supplementary Note S5). Contrasting this are ROH profiles inferred using bcftools, which appear to accurately recover longer ROHs but can over-call short homozygosity stretches (Figs. S36 to S38). We therefore suggest a novel approach to ROH estimation, by merging long (≥4 Mb) bcftools-computed ROHs with plink-computed ROHs, mitigating the opposing bias of both tools.

Our initial description of the inbreeding landscape of ancient goat populations (Fig. 5) reveals distinct patterns during the Neolithic period: increased inbreeding and total ROHs in genomes from locations which managed goats were introduced to, as observed in ancient cattle (Rossi et al. 2024). This finding is consistent with a growing body of palaeogenomic-informed research (Allaby et al. 2019; Frantz et al. 2020) indicating that domestication of many species involved multiple wild populations, geographically and temporally dispersed (Daly et al. 2018; Frantz et al. 2019; Verdugo et al. 2019; Bergström et al. 2022; Kaptan et al. 2024; Erven et al. 2025). Some signals of inbreeding are detected within Aceramic Neolithic assemblages in the Central Zagros Mountains as previously reported (Daly et al. 2021), indicating a degree of management-induced or standing inbreeding level. In contrast, ROH lengths rise as managed herds are transferred to more distant locales, such as Southeast Europe (Blagotin-Poljna) and Northeast Iran (Tappeh Sang-e Chakhmaq), providing parallel direct evidence for the accumulation of inbreeding in a livestock species during its human-induced dispersal in the Neolithic period. This may have been further exacerbated by the reduction or lack of gene flow with wild bezoar populations, particularly in Europe where C. aegagrus was not present.

Today, goat breeds from more isolated regions show elevated ROH signals (Bertolini et al. 2018), illustrating the general consequences of limited gene flow. For example, goats today from Ireland and Britain (and including a single Bronze Age genome from Potterne, Wiltshire, Fig. 5) are among the least genetically diverse of breeds surveyed (Petretto et al. 2024), suggesting prolonged genetic isolation. Inbreeding signals appear notably reduced—with some exceptions—from the Chalcolithic onwards. This may be due to a combination of many factors: improved husbandry practices, growing effective population sizes, and developed animal trade networks. Additional genomes appropriate for imputation, from regions and time periods not represented in this study, will shed light on the overall trend and regional variation of this observation.

Our application of IBD analysis using imputed genotypes (Fig. 6) echoes the ROH findings and together develop a more complete picture of the early demographic history of managed goats. By inferring shared haplotypes between genomes, IBD provides extra information about shared relatedness between genomes: within a population or settlement, or between those population pairs. The length of these shared haplotypes allows the inference of the age since a shared ancestor, identify cryptic relatedness otherwise missed using summary statistics, construct pedigrees, track ancestral haplotypes, and reconstruct ancestral migration paths.

The absence of detectable relatedness in the Aceramic Neolithic settlement of Ganj Dareh, or between Ganj Dareh and the nearby (∼60 km) contemporaneous Tepe Abdul Hosein (Fig. 6b) is in itself notable, given the demographic and genetic evidence of herd control (Zeder and Hesse 2000; Daly et al. 2021). This may be attributed to an extended initial phase of domestication, during which managed stock were more freely interbreeding with the unmanaged, genetically distinct animals (Daly et al. 2021). The long inhabitation period of both settlements (Ganj Dareh: 200–600 years, Tepe Abdul Hosein, ∼700 years (Meiklejohn et al. 2017; Daly et al. 2021; Nishiaki 2022)) may also explain the lack of within- and between-settlements IBD detection, although we do find IBD between pairs at Tappeh Sang-e Chakhmaq separated by ∼1,000 years (Tables S3 and S18). The absence of detected Ganj Dareh-Tepe Abdul Hosein IBD could be taken as evidence for limited animal exchange between the two settlements, or that such exchange was small relative to the herd sizes, potentially inflated with recruitment of local wild goats.

In contrast, herds later in the Neolithic period, in regions into which herded goat were introduced such as Europe (Blagotin-Poljna) or Northeast Iran (Tappeh Sang-e Chakhmaq), show greater within-settlement IBD, and indications of regional between-settlement IBD (Monjukli Depe, Fig. 6). These may reflect “founder herd” effects, i.e. the establishment of local Neolithic herds from a narrow pool of animals, during the centuries-long expansion of agriculture in Eurasia. Such herds may have limited outbreeding opportunities, with nearby settlements' flocks also established from the same founders. Indeed, both sheep and goat breeds today show an inverse relationship between genetic connectivity and IBD (Blondeau Da Silva et al. 2024).

Conclusion

Our analyses confirm the potential of imputation methods to leverage emerging large-scale nonhuman genome datasets and maximize genotype recovery in palaeogenomes for common alleles (MAF ≥5%). Rare alleles (MAF <5%) showed notably reduced imputation accuracy, highlighting the importance of haplotype and ancestry diversity within the reference panel being at least partially representative of the target genomes. Future work could explore how ancient genomes can be included within imputation panels, possibly through joint imputation of ancient samples (while circumventing batch effects). Such efforts will be crucial in broadening the scope of which species ancient imputation can be applied. The observed trade-off between genotype number and genotype accuracy must also be grappled with depending on the intended question and analysis.

With stringent filtering and a minimum coverage of 0.5✕, we show that ROH and IBD can be recovered from genomes ∼10,000 years old, and ROHs in wild bezoar genomes preceding evidence of goat husbandry. Imputed genotypes here have facilitated an initial picture of the demographic processes of the first millennia of goat management. Notably, parallel founder events are inferred in goats later in the Neolithic process; demographic costs of domestication may be more ascribed to dispersal rather than initial herd management. Imputed genotypes will likely facilitate other haplotype-based analysis of ancient livestock, such as Local Ancestry Inference, to build an increasingly detailed picture of how human activity has impacted the genetic trajectory of partner species.

Methods

Modern Genotypes

The VarGoats project dataset is currently composed of 1,372 animals across 155 country-breed groups and 7 wild Capra species (Table S1), including bezoar. Genotypes were called across all samples as per (Denoyelle et al. 2021), resulting in a set of 77 280,295 SNPs. This set of SNPs was further filtered, following three steps: (i) 85 animals were removed if their whole genome sequencing parameters were harboring MeanDP <6, MeanGQ <19, and %Missing >10 or their 60k genotyping data identified them as outliers in PCA and Admixture analysis (i.e. mislabeled animals, data not shown). (ii) Only biallelic variants with GQ > 20 and subsequent CR > 0.9 were retained. (iii) The remaining 28,645,747 high quality SNPs were recovered for the entire dataset. Details about this procedure can be found in Lazzari et al. (2025). Phasing was performed using Beagle 5.3 (Browning et al. 2021). Details about the methodology and relevant codes are available at https://github.com/goatimpuation/GoatWGSimputation.

To account for individuals with high genomic relatedness, prior to imputation individuals with pairwise π^HAT^ >0.4 with more than two individuals were removed using plink v1.07 (Purcell et al. 2007). For any remaining pair with π^HAT^ >0.4, a random member of the pair was removed, leaving 1,155 individuals in the imputation reference panel (Table S1). VCF files were then processed following the GLIMPSE2 documentation (Rubinacci and Delaneau).

Ancient Genomes

To evaluate the potential of GLIMPSE in the imputation of ancient goats, we performed a downsampling experiment on four high coverage goat genomes (Daly et al. 2018). These represented distinct time periods and ancestries, offering different perspectives to the reference panel's representation of past diversity (Tables S1 and S2). Direkli1-2 (mean genome coverage 11.21✕), a wild C. aegagrus from the Epipalaeolithic faunal assemblage at Direkli Cave in the Taurus Mountains of southern Türkiye (Arbuckle and Erek 2012), dates to the late 12th millennium BCE, prior to evidence of goat herd management. Two genomes represent herds early in their dispersal beyond their initial domestication region: Semnan3 (14.44✕), a ∼6,100 BCE goat from the West mound of Tappeh Sang-e Chakhmaq (Mashkour et al. 2014; Roustaei et al. 2015) in Northeast Iran; and Blagotin3 (11.40✕), a goat dating to ∼6,000 BCE from Blagotin-Poljna in Serbia (Greenfield and Jongsma-Greenfield 2014). Finally, Acem2 (8.63✕) represents an admixed or heterogenous ancestry from the Central Turkish settlement of Acemhöyük, and dates to ∼1,900 cal BCE.

The four high coverage goat bams, aligned to the ARS1 reference genome (Bickhart et al. 2017) following steps described previously (Daly et al. 2021), were individually downsampled to a range of coverages using samtools view: 0.1✕, 0.15✕, 0.2✕, 0.25✕, 0.3✕, 0.4✕, 0.5✕, 0.75✕, 1✕, 2✕, and 4✕. All bam files of ancient goats were soft-clipped 4 bp at either end of reads using a python script prior to analyses, as per (Daly et al. 2021). Mpileup files were generated for the high coverage and downsampled bams for the 28,242,942 autosomal variants in the VarGoats call set using samtools mpileup (v1.13, parameters -B -q 30 -Q 30, (Li et al. 2009)) and bcftools call (v1.13, parameters “-mO -C alleles”, (Danecek et al. 2021)). The high coverage validation dataset (i.e. full coverage) was further filtered for genotype quality (GQ) of 30 and for sites with ≥6 and ≤40 covering reads using a custom python script, producing diploid genotypes against which imputed genotypes were compared.

The range of downsampled coverages were pseudohaploidized using ANGSD version 0.938 (Korneliussen et al. 2014) doHaploCall, with the following parameters: doHaploCall 1, doCounts 1, dumpCounts 1, minimum base quality of 30 (-minQ 30), minimum mapping quality of 25 (-minMapQ 25), retain only uniquely mapped reads (-UniqueOnly 1), remove reads flagged as bad (-remove_bads), remove triallelic sites (-rmTriallelic 1e−4), downscale mapping quality of reads with excessive mismatches (-C 50), discarding 5 bases of both ends of the read (-trim 5), and remove transitions (-rmTrans 1). The abovementioned sites in the VarGoats reference panel were used as input for ANGSD using the parameter –sites. As a sanity check, the major/minor alleles of the low coverage ancient were compared to the modern reference panel and were removed if there were any discrepancies. ANGSD haplo files were transformed to plink tped files with the haploToPlink function from ANGSD version 0.938 and recoded into ped files with PLINK v.1.90 (Chang et al. 2015). Transitions were removed because transitions, unlike transversions, are affected by postmortem deamination of DNA, which might increase individual error rates.

Previously published goat palaeogenomes with ≥0.5✕ coverage (Table S3) were aligned to the ARS1 reference genome (Bickhart et al. 2017) as described previously (Daly et al. 2021). All bam files of ancient goats were soft-clipped 4 bp at either end of reads using a python script prior to analyses, as per (Daly et al. 2021).

GLIMPSE Imputation

Prior to imputation with GLIMPSE2 v2.0.0 (Rubinacci et al. 2023), 4 Mb chunk files were generated for ARS1 using GLIMPSE2 chunk (−window-mb 4 –buffer-mb 0.2), utilizing a sex-averaged megabase-scale recombination map (Etourneau et al. 2025). The ARS1 reference fasta was split using GLIMPSE2_split_reference and the recombination map. Each downsample bam file was then separately imputed using GLIMPSE2 using the chunk files and recombination map described above. The resulting bcf files were ligated using GLIMPSE2_ligate and converted to vcf files using bcftools convert. Previously published goat palaeogenomes (Table S3) were also imputed separately.

The same steps and parameters were used for GLIMPSE1 (Rubinacci et al. 2021) following their guideline (https://odelaneau.github.io/GLIMPSE/glimpse1/), GLIMPSE1 was performed individually on the downsampled bcftools generated genotypes.

Sheep/Outgroup Genotypes

For outgroup f3 analyses, an outgroup population was created by aligning domestic sheep to the goat reference ARS1. Ten domestic sheep or wild Asiatic mouflon (Table S4) were aligned to ARS1 using bwa mem (Li 2013). Genotypes were called using the same pipeline as used for the validation of ancient genotypes. Genotypes were then filtered for GQ30 and for sites covered by 6-40 reads. VCF files containing the VarGoats reference panel and sheep genotypes were then merged with bcftools.

Validation of Genotype Imputation

To evaluate the effect of postimputation quality filtering, a range of genotype probability (GP) filters (0.7, 0.8, 0.9, 0.95, and 0.99) were applied to the downsampled-imputed genotypes using a custom python script, with genotypes failing the filtering set to missing (“./.”). For each of the four higher coverage individuals, the number of matching sites and concordance rates between validation genotypes and the GP-filtered imputed genotypes of varying coverages and GP filters were calculated using a custom R script; only heterozygous and homozygous-alternative sites (non-reference) in the validation dataset. To assess the effect of minor allele frequency (MAF, here representing the frequency in the Vargoats 1372 individual panel) on imputation accuracy, concordance was calculated at different minimum MAF thresholds: 0%, 1%, 5%, and 10% and in MAF tranches: 0% to 1%, 1% to 2%, 2% to 5%, 5% to 10%, 10% to 20%, and 20% to 50%.

Non-reference concordance was calculated by dividing the total number of correctly imputed heterozygous and homozygous alternative genotypes by the total number of heterozygous and homozygous alternative genotypes in the validation dataset. Heterozygous false-positive rate (FPR) was calculated by dividing the number of imputed heterozygous called as homozygous in the validation dataset (FP) by the sum of FP and the number of correctly imputed homozygous sites (TN). The false-negative rate (FNR) was calculated by dividing the number of imputed homozygous called as heterozygous in the validation dataset (FP) by the total sum of called heterozygotes in the validation dataset.

Local Concordance and FPR

Non-reference concordance and FPR throughout the genome were calculated in sliding windows of 500 kb with a step size of 100 kb; sliding windows were created with BEDTools (Quinlan and Hall 2010) makewindows. Sliding windows consisting of correctly imputed and total number of genotypes were created with BEDTools map, using the sliding windows (500 kb with a step of 100 kb) created earlier. This approach was repeated for the FPR. The outlier non-reference concordance and FPR regions were extracted by filtering for the bottom and upper 1 percentile, respectively. These outlier regions were then merged with BEDTools merge. Genes in outlier regions were determined using RefSeq annotation GCF_001704415.1_ARS1, downloaded from UCSC genome browser (last accessed 29-05-2018), using BEDTools closest. To determine regions that were outliers in the majority of the samples, BEDTools merge was used on the individual bed files, regions that overlapped were counted, with the parameter -o sum, and these regions were subsequently filtered to only be present in a majority of the samples (at least three). A gene ontology (GO) enrichment analysis was performed to identify overrepresented biological processes and molecular functions of the genes in each individual outlier region. GO enrichment analysis was done using both g:Profiler (version: e112_eg59_p19_25aa4782 –https://biit.cs.ut.ee/gprofiler/gost) and STRING-DB (Version 12.0—https://string-db.org/) both accessed on 13-03-2025.

Principal component analysis

The downsampled pseudohaploid ancient samples were merged with the data set containing the imputed and high coverage genotypes and VarGoats reference panel, filtered for MAF 5% with a missingness filter of 1.5%; this was done for imputed samples filtered for GP99, GP95, and GP80, separately. Smartpca version 16000 was used with default parameters (Patterson et al. 2006; Price et al. 2006). The first 10 principal components were calculated using the modern reference panel; the pseudohaploid, imputed, and high quality genotypes were projected (lsqproject:yes, autoshrink:yes).

Outgroup f3

To estimate the degree of shared genetic drift between modern goat breeds and ancient genomes, outgroup f_3_ statistics (Patterson et al. 2012) were calculated using ADMIXTOOLS version 7.0.2 between imputed genotypes of each ancient genome with ≥0.5✕ mean coverage, and each domestic breed grouping in the VarGoats dataset. We used more granular groupings (i.e. “Distinct_split-off_breed”) where available, and breed groupings otherwise (Table S1). Domestic sheep were used as the outgroup (Table S4).

Runs-of-homozygosity

We performed ROH exploration on our downsampled 0.5✕ imputed genomes using four datasets filtered for MAF ≥5% and GP99. ROH were computed without missing data across test samples using either transitions and transversions (“all sites”) or with transversions-only. Additionally, individual-based ROH calculations (i.e. one ancient sample at a time, using sites passing filtering steps for that sample) were conducted using either all sites or transversions-only.

For each dataset, we further downsampled to 1 M, 750, and 500 K sites to determine a minimal number of SNPs required for accurate ROH estimation. The nonimputed high coverage samples were used as validation ROH profiles and were computed on an individual basis using either all sites or transversions-only, without additional downsampling. ROHs were estimated with PLINK v1.90 (Chang et al. 2015) with the parameters –homozyg –homozyg-density 50 –homozyg-gap 100 –homozyg-kb 500 –homozyg-snp 50 –homozyg-window-het 1 –homozyg-window-snp 50 –homolog-window-threshold 0.05, according to earlier studies (Gamba et al. 2014; Cassidy et al. 2016; Martiniano et al. 2017; Sousa da Mota et al. 2023). The minimum SNP threshold (−homozyg-snp; default of 50) was changed to 150, 200, 250, and 300, to test the effect on small ROHs within our dataset. We further explored ROH profiles for 0.25✕, 0.5✕, 0.75✕, and 1✕ imputed genomes on a per-sample basis, using the parameters described above, with –homozyg-snp set to 200.

Additionally, ROHs were computed using imputed genotypes from published goat palaeogenomes ≥0.5✕ coverage, applying a MAF ≥5% and GP99 filter. Two approaches were tested: one combined dataset with no missing data and another using transversions-only on a per-individual basis, with each individual downsampled to 1.2 and 1.6 million sites (matching the lowest number of sites for the 0.4✕ and 0.5✕ imputed genomes, Table S3) to ensure consistent SNP density.

Bcftools ROH was used as a complementary method for ROH detection and run with standard parameters aside from -G 30 –AF-dflt 0.4. The output files were filtered to retain regions containing at least 200 SNPs, a quality score >10, and a minimum length of 500 kb. Long ROHs identified by bcftools (>4 Mb) were merged with the plink ROH profiles using mergeBed, counting the number and keeping the sizes of merged ROHs with the parameters -c and -o count, collapse. Finally, ROHan (Renaud et al. 2019) was also used as an alternative method for ROH detection for medium to high coverage ancient genomes. The pipeline for ancient samples was followed, by first detecting possible damage, using the estimateDamage.pl script provided with ROHan. ROHan was run on the autosomes with a heterozygosity cut-off similar to the heterozygosity cut-off in Daly et al. (2021). The parameters used for ROHan were –rohmu 5.7e−5, and the deamination profiles from the estimateDamage.pl script.

Heterozygosity was calculated using 50-SNP windows by segmenting the VCF file into consecutive windows of 50 SNPs and counting the number of heterozygous sites within each window. This analysis was performed using a custom AWK script.

Identity-by-descent

To identify IBD segments between individuals, we used RefinedIBD (Browning and Browning 2013), which assumes no genotype errors and relies on phased haplotypes. Since phasing accuracy is sensitive to missing data, imputed genomes were filtered for GP >0.99 and individual genomes with >20% missingness were removed. The remaining imputed samples were imputed again and phased together as a single “batch” (i.e. not individually) using Beagle5 with standard parameters aside from setting impute = true and specifying a random seed. The second round of imputation with Beagle5 was performed to impute and phase individuals jointly, to minimize missingness while optimizing SNP count within the dataset. Genotypes below GP99 were set to missing, and SNPs were further filtered to keep only biallelic SNPs without missing data. We further filtered variants which were MAF 5%, transversions and had no missing data in imputed samples >2✕, resulting in 1,037,536 SNPs. This dataset included imputed 0.5✕ test genomes and previously published ancient palaeogenomes (>0.5✕).

RefinedIBD was run with default settings. To correct for breaks or short gaps in IBD segments, segments were merged with merge-ibd-segments.17Jan20.102.jar with a sex-averaged megabase-scale recombination map (Etourneau et al. 2025), a maximum of one discordant homozygote, and gaps that are <0.6 cM in length. The phasing, refinedIBD and merge-ibd were repeated a total of 3 times to account for potential variance in phasing. The first 2 Mb of chromosome 18 was excluded due to a very large estimated recombination rate region on the start of the chromosome, most likely indicative of an assembly error (Etourneau et al. 2025). The repeated runs were combined and filtered for a minimum LOD score of 3 and a length threshold of 3 cM. To obtain IBD segments for a pair of individuals, IBD segments for pairs of individuals were extracted from the IBD files and merged with bedtools MergeBed. For IBD detection of imputed 0.5✕ test genomes and their high coverage counterparts both IBD1 and IBD2 segments were considered (indicating that both haplotypes are in IBD), merging IBD segments within 4cM. For previously published ancient palaeogenomes, only IBD1 segments were taken into account without merging IBD segments. Close relatives up to the 8th degree were identified using a threshold of at least 3 shared IBD segments with a cumulative length of at least 21 cM. For within- and between-population comparisons, we excluded the Abdul4 genome from the Tepe Abdul Hosein assemblage, due to its wild genomic affinity (Daly et al. 2021), and Semnan1-2 from the Tappeh Sang-e Chakhmaq assemblage (deriving from the West Mound, (Roustaei et al. 2015)), due to the ∼1,000 yr gap between the remaining, younger genomes (from the East Mount).

Supplementary Material

evaf181_Supplementary_Data

Bibliography90

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Abell JT, et al Urine salts elucidate Early Neolithic animal management at Aşıklı Höyük, Turkey. Sci Adv. 2019:5:eaaw 0038. 10.1126/sciadv.aaw 0038.31001590 PMC 6469938 · doi ↗ · pubmed ↗
2Allaby RG, Ware RL, Kistler L. A re-evaluation of the domestication bottleneck from archaeogenomic evidence. Evol Appl. 2019:12:29–37. 10.1111/eva.12680.30622633 PMC 6304682 · doi ↗ · pubmed ↗
3Allentoft ME, et al Population genomics of post-glacial western Eurasia. Nature. 2024:625:301–311. 10.1038/s 41586-023-06865-0.38200295 PMC 10781627 · doi ↗ · pubmed ↗
4Arbuckle B, et al Data sharing reveals complexity in the westward spread of domestic animals across Neolithic Turkey. P Lo S One. 2014:9:e 99845. 10.1371/journal.pone.0099845.24927173 PMC 4057358 · doi ↗ · pubmed ↗
5Arbuckle BS, Atici L. Initial diversity in sheep and goat management in Neolithic south-Western Asia. Levant. 2013:45:219–235. 10.1179/0075891413 Z.00000000026. · doi ↗
6Arbuckle BS, Erek CM. Late Epipaleolithic hunters of the central Taurus: faunal remains from Direkli Cave, Kahramanmaraş, Turkey. Int J Osteoarchaeol. 2012:22:694–707. 10.1002/oa.1230. · doi ↗
7Ariano B, et al Ancient Maltese genomes and the genetic geography of neolithic Europe. Curr Biol. 2022:32:2668–2680.e 6. 10.1016/j.cub.2022.04.069.35588742 PMC 9245899 · doi ↗ · pubmed ↗
8Ausmees K, Nettelblad C. Achieving improved accuracy for imputation of ancient DNA. Bioinformatics. 2023:39:btac 738. 10.1093/bioinformatics/btac 738.36377787 PMC 9805568 · doi ↗ · pubmed ↗