The evolutionary history of the field vole species complex revealed by modern and ancient genomes
Mateusz Baca, Barbara Bujalska, Danijela Popović, Michał Golubiński, Paulo C. Alves, Edouard Bard, Claudio Berto, Gloria Cuenca-Bescós, Love Dalén, Helen Fewlass, Tatyana Fadeeva, Jeremy Herman, Ivan Horáček, Magdalena Krajcarz, Matthew Law, Anna Lemanik

TL;DR
This study uses ancient and modern genomes to uncover the evolutionary history of field vole species, revealing divergence times and genetic exchanges over the last 75,000 years.
Contribution
The study provides new divergence time estimates and evidence of gene flow among field vole lineages using ancient genomic data.
Findings
Portuguese field voles diverged from others around 220,000 years ago.
Hybrid origin of the Mediterranean field vole lineage is suggested.
Ancient specimens reveal a previously unknown lineage in the Italian Peninsula.
Abstract
The field vole, an abundant and widespread microtine rodent, is a complex comprised of three cryptic species: the short-tailed field vole (Microtus agrestis) which is present over much of Eurasia, the Mediterranean field vole (Microtus lavernedii) in southern Europe, and the Portuguese field vole (Microtus rozianus) in western Spain and Portugal. Previous research has shown high genomic differentiation of these three lineages. However, the details of the process underlying their divergence remain unknown. We analyse 70 mitogenomes and 16 nuclear genomes of modern specimens, and 83 mitogenomes and 12 nuclear genomes of ancient specimens spanning the last 75 thousand years (ka). We estimate the divergence of Portuguese from short-tailed and Mediterranean field voles to be ca. 220 ka ago and of the latter two species to be ca. 110 ka ago, earlier than previous estimates involving only…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6- —https://doi.org/10.13039/501100004281Narodowe Centrum Nauki
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnimal Ecology and Behavior Studies · Genetic diversity and population structure · Evolution and Paleontology Studies
Background
Voles are muroid rodents belonging to the subfamily Arvicolinae (within the family Cricetidae) that arose in the Late Miocene of Asia and North America. The appearance of the Arvicolinae in the Northern Hemisphere and their subsequent rapid diversification was associated with the gradual cooling of the climate and the spread of open and steppe environments starting from the Miocene/Pliocene boundary, and especially in the Quaternary [1, 2]. Phylogenetic relationships of Arvicolinae have been explored using both morphological and genetic methods [3–6]. According to the most up-to-date taxonomy, voles and their close relatives constitute a distinct tribe Arvicolini with 16 genera [7]. Among them the genus Microtus is the most speciose and rapidly evolving mammalian lineage comprising over 40 extant species and several subgenera [7].
Field voles (subgenus Euarvicola within the Microtus genus) are among the most abundant small mammals in many parts of Europe and their current populations form a complex of three allopatric cryptic species. The short-tailed field vole (Microtus agrestis) is present over much of Eurasia, the Mediterranean field vole (Microtus lavernedii) is found in southern Europe, and the Portuguese field vole (Microtus rozianus) is limited to western Spain and Portugal (Fig. 1A) [8]. Subfossil remains attributed to the field voles first appeared in Central Europe approximately 0.7–0.6 million years ago during the Marine Isotope Stage (MIS) 17–16 [9] and reached northern Spain and Ukraine by MIS 11 [10]. Their remains have been found across Europe from Britain to Poland, including at high latitudes, through the Last Glaciation (115–11.7 ka ago) and the Holocene [11–13]. Unlike many mammals which are commonly associated with the Late Pleistocene of mainland Europe, but became extinct there prior to the Holocene, such as cave bears (Ursus ingressus) [14], mammoths (Mammuthus primigenius) [15], collared lemmings (Dicrostonyx torquatus) [16] and narrow-headed voles (Stenocranius anglicus) [17], field voles survived in Europe throughout the dynamic climate changes of the Late Pleistocene and Holocene [18], making them particularly informative to investigate the impact of past climate changes on species demographic history. However, field vole subfossil remains are sometimes difficult to distinguish from another small mammal that survived this period, the common vole (M. arvalis) [19–21], therefore the actual extent of their occurrence in the Late Pleistocene deposits remains uncertain.Fig. 1. Sampling information and mitogenomic phylogeny of the field vole species complex. A Present-day ranges of field vole species and sampling localities. Circles mark modern samples and squares mark ancient samples. B Hypsometric map with sampling localities coloured according to mtDNA lineage assignment (for the Portuguese, Italian and Mediterranean lineages). All other samples are short-tailed field voles and we have coloured their sampling localities according to region of occurrence (Western, Central, Eastern and Scandinavia). Modern localities are denoted by filled circles, whereas other symbols, filled triangles, squares and pentagons, represent ancient localities. The shape and colour used for a given locality matches symbol used at a tip in panel C. Names of paleontological sites are indicated for reference. C Bayesian phylogeny of field vole mitogenomes calibrated using radiocarbon-dated specimens. Filled circles at nodes indicate a posterior probability ≥ 0.95. Grey bars denote 95% highest posterior density intervals of node ages. The tip symbols match the sampling locality coding in B. Labels include the sample ID and age, with the prefix ‘e’ indicating the age was estimated using a molecular dating approach. Directly radiocarbon-dated samples are indicated in red. Relation to the mtDNA lineages distinguished previously within short-tailed field voles based on cyt-b (Eastern, Western, Central, North Britain, France, Scandinavia) is indicated. Blue stripes mark the Last Glacial Maximum (LGM) and Younger Dryas (YD) periods
The three modern field vole lineages exhibit high genomic differentiation, with little evidence of gene flow based on analysis of whole-genome restriction site-associated DNA [22]. Limited and asymmetric gene flow has been detected only in the relatively narrow contact zones of short-tailed and Mediterranean field voles [23]. Despite this, the divergence of field vole lineages was previously estimated to be relatively recent; the divergence of Portuguese (M. rozianus) from short-tailed (M. agrestis) and Mediterranean (M. lavernedii) field voles was estimated to be ca. 70 ka ago, and that of the latter pair was ca. 25 ka ago [8]. It has been suggested that survival in different refugial areas during cold episodes, especially the Last Glacial Maximum (LGM; ca. 25–19 ka ago), was among the main factors causing genetic differentiation of these populations [22].
Based on variation in mitochondrial DNA (mtDNA) cytochrome b (cyt-b) sequences, modern short-tailed field voles (M. agrestis) comprise six parapatric lineages [24]. Calibration of the cyt-b phylogeny with biogeographic constraints suggested that the lineages coalesced ca. 22 ka ago and revealed a strong signal of population expansion starting at the onset of the Holocene [24, 25], suggesting a severe bottleneck in short-tailed field vole populations in northern Eurasia during the Younger Dryas (YD; 12.5–11.7 ka ago). However, the nature of this shallow cyt-b phylogeny limits our ability to infer the precise evolutionary history of the species prior to the Holocene, potentially compromising the between-species divergence estimates of Paupério et al. [8] that made use of within-species calibrations.
This limitation, as well as the problem with morphological identification of subfossil remains, may be overcome by studying DNA from paleontological specimens, which allows for direct observation of genetic changes in past populations. Recent studies combining the use of ancient DNA and direct radiocarbon dating have improved upon previous reconstructions of the Late Pleistocene and Holocene histories of several small and large mammals [26–29].
Here, we analysed 153 mitochondrial and 28 nuclear genomes of field voles spanning the last ca. 75 thousand years and included each of the three main field vole evolutionary lineages. We obtained new divergence estimates and reconstructions of past demography and gene flow using directly dated specimens to constrain the timing of inferred events and interpreted these estimates in the context of the Late Pleistocene palaeoclimatic records.
Results
Mitochondrial and nuclear genome sequencing
We generated mtDNA sequences comprising at least 70% of the mitogenome from 83 ancient and 68 modern field vole specimens. These specimens were obtained from 32 palaeontological sites and 48 localities, predominantly across Europe (Additional file 1: Tables S1–S2, Fig. 1A, B). We generated low- to high-coverage nuclear genomes (range: 0.73–36.7 ×; median 6.77 ×) for 14 modern and historical field vole specimens and low-coverage genomes (range: 0.3–2.57 ×; median 0.78 ×) for 12 ancient specimens (Additional file 1: Tables S3–S4, Additional file 2: Fig. S1). We also included genomic data from two field voles downloaded from the European Nucleotide Archive (ENA) database and generated high-coverage genomes for the three outgroup species, namely European pine vole (Microtus (Terricola) subterraneus, 45.3 ×), social vole (Microtus socialis, 42.6 ×), and Tatra vole (Microtus (Terricola) tatricus, 10.7 ×). Consequently, our dataset comprised at least two high-coverage genomes from each field-vole cryptic species and genomic data from samples spanning the last 75 thousand years.
Mitochondrial phylogeny
Ten samples selected for radiocarbon dating yielded high-quality collagen, which was subsequently dated using an accelerator mass spectrophotometer in the form of either graphite or CO_2_, yielding dates between 27 and 3.6 ka cal BP generally concordant with their stratigraphic position (Additional file 1: Table S5). The mtDNA sequences of these specimens, along with sequences of 67 modern specimens, served as a directly dated dataset used to estimate the age of undated specimens and the divergence times of mtDNA lineages. BETS analysis showed that this dataset contained sufficient temporal signal to calibrate the molecular clock (Additional file 2: Text S2.7; Additional file 1: Table S6). Leave-one-out analysis of the radiocarbon-dated specimens showed that the dataset enabled a relatively accurate estimation of the specimen ages. The difference in the median molecular age estimates and radiocarbon ages ranged between 2.6 and 89% (median 13.5%) of the radiocarbon ages (Additional file 2: Fig. S2). The largest error in the estimated age compared to the radiocarbon age was for the youngest specimen (MI057). For one sample, MI722, from the Portuguese lineage, the 95% HPD interval of the estimated age did not overlap with the radiocarbon age. The estimated ages of the samples that were not directly dated ranged between 80 and 0.8 ka and were broadly congruent with the stratigraphic position of the specimens (Additional file 1: Table S1, Additional file 2: Text S2.8).
The mtDNA phylogeny (Fig. 1C) revealed four main lineages; three were identified previously [30] and corresponded to the cryptic species, short-tailed field vole (ST), Mediterranean field vole (MED), and Portuguese field vole (POR) [8]. The fourth lineage, Italian (ITA), sister to the Mediterranean field vole, was found only in Late Pleistocene samples from the Italian Peninsula. The estimated divergence of these lineages ranged from 216 ka BP1 (95% HPD: 243–200 ka BP) for the split of ST and POR + MED + ITA to 110 ka BP (95% HPD: 118–101 ka BP) and 97 ka BP (95% HPD: 104–89 ka BP) for the splits of POR and MED + ITA and MED, and ITA, respectively (Fig. 1). The youngest sample from the Italian lineage was estimated to be 21 ka and overlapped with another sub-lineage found on the Italian Peninsula which diverged from the remainder of the Mediterranean lineage approximately 37 ka BP (95% HPD: 43–32 ka BP).
Within the M. agrestis mtDNA clade, we identified six lineages that corresponded to those previously identified in modern populations [24, 25]. These lineages all coalesced ca. 27 ka BP (95% HPD: 29–24 ka BP) and their division largely reflects their geographical locations. Among the ancient samples from Central Europe, we found representatives of all six of these mtDNA lineages already in the Late Glacial period. At least two lineages were present simultaneously at three sites (Shelter in Smoleń III, Nad Tunelem, and Býčí skala).
Bayesian demographic modelling based on mitogenomes of Central European ST samples revealed a rapid, 10-fold increase in the effective female population size (Nef) starting at the end of the LGM around 19 ka ago; then, the Nef remained largely constant until present. A slight increase in Nef was observed around the Middle Holocene (8.2–4.2 ka ago) (Fig. 2).Fig. 2. Demographic reconstruction of the Central European field vole population. The Bayesian Skyline plot was reconstructed using BEAST 1.10.5. The green line shows the median estimate of Nef and the light green area shows the 95% HPD interval*.* Blue stripes mark the Last Glacial Maximum (LGM) and Younger Dryas (YD) periods
Population structure, gene flow and demography based on nuclear data
The multidimensional scaling plot (MDS) revealed three main clusters corresponding to the three mtDNA lineages: M. agrestis, M. lavernedii, and M. rozianus (Fig. 3A). The most ancient samples within each lineage were shifted towards the centre of the plot. A single sample from the Italian mtDNA lineage (MI605) was located close to the M. lavernedii cluster, particularly to the Late Pleistocene specimen from the Italian site Grotta della Ferrovia (MI2950). Within the short-tailed field voles (ST), there was a clear division into two clusters, eastern and western, with the border running roughly across Central Europe. Both clusters included ancient samples younger than 20 ka, whereas the two pre-LGM samples (MI1081 and MI3076) were located outside these clusters, providing initial information on the age of the formation of these clusters. One sample, WM563, sampled in Tyrol, Austria, revealed a cytonuclear discordance, as it clustered with ST based on nuclear data but carried MED mtDNA. The Maximum Likelihood tree reconstructed using TreeMix without migration edges supports the structure revealed by the MDS (Fig. 3B); however, its topology was different from that recovered on mtDNA phylogeny, with the Portuguese field vole lineage being sister to short-tailed and Mediterranean field voles. Similar incongruence between trees based on mtDNA cyt-b and a set of nuclear genes has been reported previously [8]. The ADMIXTURE analysis, with the lowest cross-validation error for K = 2 (Additional file 2: Fig. S3), suggested the existence of two ancestral components corresponding to M. rozianus (POR) and M. agrestis (ST), whereas individuals belonging to M. lavernedii (MED) and Italian mtDNA lineages were an equal mixture of these components. However, at higher K values, particularly K = 4, the same overall structure was reproduced (Additional file 2: Fig. S4).Fig. 3. Structure and gene flow in field vole populations. A Multidimensional scaling plot of field voles based on IBS matrix calculated using ANGSD. B Maximum Likelihood tree obtained using TreeMix with one migration edge. C Branch-specific fb statistic heatmap. Fbranch statistics capture the excess of derived allele sharing between pairs of individuals and lineages. Darker colours in the heatmap depicts increasing evidence for gene flow. Grey cells in the heatmap correspond to combinations of branches and populations for which Fbranch cannot be calculated given the topology or the low number of informative sites. D Top admixture graph of the main field vole lineages obtained using AdmixtureBayes. On all panels the sample colours are as in Fig. 1
TreeMix analysis with one migration edge, selected as optimal using the Evanno approach (Additional file 2: Fig. S5), showed a strong signal of admixture from the base of the tree to the Italian lineage (MI605; Fig. 3B). This was further confirmed by reconstructed admixture graphs where the Italian lineage was modelled as a mixture with ca. 70% of ancestry coming from the MED lineage and up to 30% of ancestry from the base of the tree (Fig. 3D).
The fb analysis suggested several instances of gene flow between field vole lineages, some of which were also supported by TreeMix (Fig. 3B and Additional file 2: Fig. S6). In particular, it suggested gene flow from the base of the POR to the lineage leading to MED and ST, and a strong signal from the base of the MED lineage to the base of the ST lineage (Fig. 3C). Both signals involved all ST samples, setting the minimum age of this event to ca. 75 ka BP. Another instance was the gene flow between samples from the Italian Peninsula, both of which diverged from the MED lineage at different times, ca. 90 ka BP (MI605) and 35 ka BP (MI2950; Fig. 1).
There was also a signal of gene flow from MED to POR, involving only modern MED samples from Iberia. Since this signal does not include other modern representatives of the MED lineage (WM436 from Hungary), the event must have taken place after the post-LGM expansion of MED across the Mediterranean region. This is also supported by the coalescence of all modern MED samples around 30 ka ago (Fig. 1), which suggests a bottleneck around the LGM followed by a subsequent expansion. All TreeMix analyses with m > 1 also suggested recent gene flow between MED and ST, particularly an individual WM563 from Tyrol, which revealed cytonuclear discordance congruent with recent gene flow. This instance was not revealed by fb analysis, but most D-statistics in the form D(STeast,WM563,MED,OUT) were positive and significant (Z > 3), suggesting an excess of allele sharing between MI563 and MED. More generally, most D-statistics in the form D(STeast,STwest,MED,OUT) were positive and significant, suggesting a gene flow between MED and STwest, which occurred after the split of ST into western and eastern populations (Additional file 1: Table S7). Both fb and Treemix also suggested recent gene flow between STwest and STeast populations.
Demographic reconstruction based on the pairwise sequentially Markovian coalescent (PSMC) model suggested different demographic trajectories for the POR, ST, and MED lineages (Fig. 4A). The peak in the effective population size (Ne) of ST and MED between 200 and 100 ka ago was accompanied by a decrease in Ne in POR, and conversely, an increase in Ne in POR around MIS 4 was accompanied by a decrease in ST and MED. The final decrease of* N_e_ in the POR started in MIS 3 at ca. 60 ka, continued until ca. 30 ka, and N_e*_ remained stable after then. The latter decrease was accompanied by a range reduction, as evidenced by the presence of samples with POR mtDNA well outside its current range in the Artazu VII and El Mirón sites in Central Northern Spain dated to ca. 50 and 15 ka, respectively. The heterozygosity of POR individuals was also consistently lower than that of the MED and most of ST individuals.Fig. 4. Demography of field voles. A Demographic reconstruction of field voles using the pairwise sequentially Markovian coalescent (PSMC) model. The PSMC curve was scaled using a mutation rate of 5 × 10^−9^ substitutions per site per generation, and assuming two generations per year. The y-axis shows the effective population size and the x-axis shows the time in years on the log scale. Lighter coloured lines represent 100 bootstrap replicates. Blue stripes mark the Last Glacial Maximum (LGM) and Younger Dryas (YD) periods, boundaries of MIS periods are presented on the upper x-axis. B Individual heterozygosity calculated using ANGSD in 20 Mb windows for individuals with a mean coverage higher than 4 ×. The y-axis represents the fraction of heterozygous positions. The box extends from the 1^st^ to 3^rd^ quartile, while the whiskers extend to the data points within 1.5 times the interquartile range (IQR) from the ends of the box; values outside this range are denoted by circles
The PSMC suggests that both M. agrestis and M. lavernedii experienced a bottleneck at the end of the Late Pleistocene. This is supported by the mtDNA phylogeny, where both populations coalesce around the same time during the LGM, and by the genomic divergence of the western and eastern populations of ST field voles estimated to the time of the LGM. In addition, in both ST samples a large increase in Ne was inferred during the Holocene. However, PSMC has limited resolution close to the present because few coalescent events occur in this interval, so such recent peaks are generally regarded as potential artifacts [31].
Divergence dating
We calculated the divergence times between pairs of four samples with high-coverage genomes using the TT-method [32]. The estimates based on the two alternative mutation rates were visibly different (Additional file 2: Fig. S7, Additional file 1: Table S8). Those based on the mutation rate from mice yielded divergence similar to mitochondrial estimates (Fig. 3A). The divergence of POR (WM581) from MED (WM330) and the two ST samples (FV396 and WM009) was estimated to be ca. 246, 268, and 244 ka ago, respectively (means from two estimates for each pair). Similar divergence times were observed in the PSMC plots, where the effective population size curve of POR diverged from ST and MED between 300 and 200 ka ago (Fig. 4A). The divergence of MED and ST was estimated to be 125 ka ago and 114 ka ago, respectively, and that between the two lineages identified within ST was 25.2 ka BP. The use of the mutation rate estimated for the common vole resulted in estimates 40% more recent (Additional file 2: Figs. S7–S8, Additional file 1: Table S8).
Discussion
In this study, we inferred a model of the evolutionary history of the field vole species complex based on ancient and modern mitochondrial and nuclear genomes (Fig. 5). We found three main genomic clusters corresponding to the three cryptic field vole species, corroborating previous results [8, 22].Fig. 5. Reconstructed evolutionary history of the field vole species complex. The width of each branch represents the inferred effective population size (Ne) through time, with broader segments indicating larger Ne. Arrows denote identified instances of gene flow. On the left axis are dates of particular events based on the mtDNA phylogeny (median estimates with 95% HPDs), and on the right axis are corresponding estimates based on nuclear genome data (range of extreme values). In the box on the left the approximate boundaries of MIS periods are presented
Divergence times inferred from nuclear genomes varied with the mutation rate applied, but in all cases indicated that the three lineages formed earlier than previously estimated [8]. Models using the house mouse mutation rate fit the data better than those using mutation rates inferred for the phylogenetically closer common vole [8]. The divergence of Mediterranean (MED) and short-tailed (ST) field voles was estimated to be 68–60 ka ago using a common vole mutation rate, post-dating the sample MI3076, estimated to be 75 ka old, and assigned to ST based on both mtDNA and nuclear genome data. Similarly, the divergence of the two main genomic clusters identified within short-tailed field voles, STwest and STeast, estimated using this rate (14–11.8 ka), is more recent than the ages of some ancient samples that have already differentiated into these two groups. The mitochondrial phylogeny provides independent estimates that match the nuclear genome estimates, based on mouse mutation rates (Fig. 5). The split of POR and ST was estimated to be between 243 and 200 ka ago, and the Time to the Most Recent Common Ancestor (TMRCA) of all post-glacial and modern ST samples estimated to be ca. 27 ka corresponds well with the nuclear genome-based divergence estimate of the western and eastern populations of ST (ca. 25.2 ka).
The estimated divergence times of the main field vole lineages coincide with the MIS 7 (243–191 ka ago) and MIS 5 (130–71 ka ago) and are similar to the estimates obtained recently based on tip-calibrated mitogenomic datasets of a range of rodents, including collared lemmings, narrow-headed voles, and common voles (Fig. 6) [17, 27]. The latter species inhabit open landscapes, such as tundra, steppes or dry meadows and it was hypothesized that the expansion of forests throughout Eurasia during the MIS 7 and MIS 5 interglacials was the main cause of their population fragmentation and divergence [17]. The field vole is an opportunistic graminivore which prefers open and semiopen habitats with high surface moisture such as moist grasslands and meadows, rich in herbaceous vegetation, including early succession stages of forest patches. Yet, woodlands are considered marginal secondary habitats [33], and field vole density has been shown to decrease with forest succession [34]. Thus, reforestation and loss of surface humidity during interstadial periods may also affect the population structure of this species.Fig. 6. Divergence times of the main mtDNA lineages in various Arvicolids. Each point represents the median estimate of the mtDNA divergence, and whiskers denote the 95% HPD intervals. The estimates are derived from mitogenomic phylogenies calibrated using directly radiocarbon-dated specimens [17, 27]. These estimates are shown alongside the benthic foraminiferal oxygen isotopes ratio (δ^18^O) record (LR04 stack [35]) a proxy of the past global climate. Interstadial periods (MIS 7, 5, and 1) are highlighted, with the warmest intervals shown in gray and moderately warm intervals in light gray
We detected recurring gene flow events between and within major field vole lineages. The incongruent position of MED in mtDNA and nuclear phylogenies suggests that this lineage may have arisen as a result of the hybridisation of POR and ST lineages. This is indicated by similar divergence estimates of MED from POR (mtDNA) and MED from ST (nuclear genome) and by ADMIXTURE analysis, where for the best supported K = 2, MED is considered a mixture of POR and ST. Moreover, on each of the three top admixture graphs, the MED lineage was modelled as a mixture of POR and ST, although the two top graphs seem to record only the recent gene flow between MED and ST which occurred after separation of the two ST populations (FV396 = STwest and WM009 = STeast; Fig. 3C), while on the third one, MED was modelled with two gene flow events from ST, pre- and post-dating the split into STwest and STeast (Fig. 3D and Additional file 1: Fig. S9). Similar evolutionary histories have been recorded in other small mammals, for example the common vole [36], and in large mammals, such as giraffes [37], Grant’s gazelles [38] or hyenas [39], suggesting that reticulate evolution is widespread across mammals.
The pattern of divergence with gene flow and hybridisation may be explained by recurring episodes of allopatry and connectivity of populations in a framework referred to as mixing-isolation-mixing [40]. In the case of field voles, this may be related to the phases of landscape opening and reforestation connected with stadial–interstadial cycles. The history of the field vole population on the Italian Peninsula may represent a good example of this phenomenon.
At present, the field vole distribution in the Italian Peninsula is limited to the south-eastern Alps [41], but their presence during the Pleistocene has been recorded at numerous palaeontological sites, ranging at least from the Middle Pleistocene (MIS 7) to the Pleistocene/Holocene transition [42, 43]. Our data suggest that starting approximately 95 ka ago, a population separate from continental Europe existed on the peninsula. This population was modelled as an admixture of MED and an unknown/unsampled population (Fig. 3B, D). This unsampled population may have been a population which inhabited the peninsula earlier and was also isolated from the rest of Europe. After this episode, the Italian population remained isolated until at least ca. 35–30 ka, when another population derived from the MED entered the peninsula. This arriving population again admixed with the local population, as evidenced by the high fb values between MI605 (‘local’) and MI2950 (‘colonising population’; Fig. 3C). Taken together, the Italian population of field voles remained relatively isolated on the peninsula from at least MIS 5 and probably from MIS 6, with subsequent episodes of gene flow from continental populations until extirpation of this species from the Italian Peninsula. The older episode dated based on the divergence of mtDNA to ca. 95 ka BP (95% HPD: 104–89 ka BP), may be related to the short-term demise of closed broad-leaved deciduous forests and expansion of steppe vegetation in Italy during the MIS 5b stadial (ca. 86.6–84.2 ka BP), as respectively recorded in palynological records from Fimon Lake (pollen zone 14) [44] in the north-east of Italy and Monticchio Lake in the southern part of the peninsula (pollen zone 18) [45]. Although starting from ca. 75 ka BP, the landscape became more open throughout Southern Europe, the second episode, ca. 37 ka BP, may be related to the stadial with demise of temperate forests and emergence of grasslands recorded in both the Fimon Lake sequence (pollen zone 3; 39–31 ka ago) [46] and the Monticchio lake sequences (pollen zone 6; 34.9–31.8 ka ago) [45]. Field voles might have accessed the Italian Peninsula from the Northern Balkans through the Sava River basin and the Trieste Karst, and then through the Great Adriatic Po Plain, a vast alluvial landscape of the Po River plain and the Adriatic Plain which emerged as the level of the Adriatic Sea decreased during the cold oscillations [47].
Short-tailed field voles (M. agrestis) currently inhabit Northern Europe. It is considered a rare but constant element of the Late Pleistocene fauna at European high latitudes [11, 48]. Our results suggest its presence throughout MIS 4–2 (Fig. 1C); however, the sparse records come from only four sites and cannot confirm its continuous or widespread presence. Evidently, field voles were much less frequent than typical representatives of steppe-tundra communities, such as common voles, narrow-headed voles or collared lemmings [13]. Our data showed that it became much more numerous after the LGM, particularly during Late Glacial warming (Fig. 1C, Table S1). Both the nuclear and mtDNA results suggest a bottleneck in short-tailed field vole populations around the LGM (Figs. 2 and 4A). Demographic reconstruction based on mtDNA from Central European samples showed rapid expansion starting ca. 19 ka ago, similar to the common vole population in Southern Poland [49]. This may show a recovery from the bottleneck caused by severe cooling around 19–18 ka BP (Heinrich I stadial) revealed by new palaeotemperature records from the Iberian Margin and reanalysis of the thermal bipolar seesaw [50]. Our reconstruction did not recover the Early Holocene expansion previously inferred based on mtDNA cyt-b data from modern field vole populations [24, 25]. This is most likely a consequence of the lower mutation rate inferred here compared to previous estimates, which also resulted in older divergence estimates of the main mtDNA lineages. However, the plausible scenario previously proposed for the post-glacial expansion of field voles assumes two subsequent bottlenecks around both the LGM and YD [25] to explain the formation of the phylogeographic pattern named the Celtic fringe in Britain [51]. Most of our ancient samples were dated or estimated to the Late Glacial prior to the YD, and our sampling of modern specimens is limited even in Central Europe; therefore, we cannot exclude the possibility that our dataset did not have enough resolution to recover the population decrease caused by YD cooling.
We found that all mitochondrial lineages observed in modern ST populations were present in Central Europe in the Late Glacial period (Fig. 1C). Although the lack of samples from other regions limits our inference, such an accumulation of mtDNA diversity suggests that Central Europe, particularly the Carpathian area, served as a refugial zone for European field vole populations during the LGM. Divergence of mtDNA lineages may result from the survival of field vole populations in the pockets of suitable habitats. Ecological Niche Modelling indicates the south in the Balkans as a possible LGM refugial zone [22], but the Carpathian area has been shown to serve as a refugial zone for other temperate species, such as bank voles Clethrionomys glareolus [52]. The highly diversified geomorphological and geological conditions of Central Europe provide multiple areas that enable the long-term survival of diverse species throughout the glacial period [53]. In the case of M. agrestis, a species particularly susceptible to rarity dynamics, a mosaic of local refugia in that region leading to the formation of diverse clades can be expected. However, Central Europe has been shown to be a suture zone of mtDNA lineages of several mammalian species, and a similar pattern may have potentially arisen from post-glacial expansion from different refugia [29, 54]. Further high-resolution climatological, paleontological and genomic data would be desirable to investigate further the localisation of glacial refugia of the field vole species.
Conclusions
By integrating ancient and modern mitochondrial and nuclear genomes, we provide a temporally explicit reconstruction of the evolutionary history of the field vole species complex. We show that during the Late Pleistocene, the Italian Peninsula was inhabited by an endemic field vole population that nevertheless experienced recurrent gene flow from continental populations. We further demonstrate that the divergence of the major field vole lineages, now regarded as cryptic species, occurred primarily during interglacial periods, a pattern previously observed in other open-habitat rodents. Our analyses reveal repeated episodes of gene flow both between and within these major lineages. Finally, we detect signatures of population bottleneck associated with the Last Glacial Maximum in short-tailed and Mediterranean field voles. Overall, this pattern reflects recurrent cycles of population isolation and connectivity driven by changes in habitat availability related to stadial-interstadial climate oscillations.
Methods
Samples
We collected subfossil Arvicolinae samples dated to the Late Pleistocene from numerous localities in Europe. For this study, we selected a molar tooth or mandible, with the molar tooth assigned as Microtus arvalis/agrestis based on the occlusal surface of the first lower molar (Additional file 2: Table S1). In addition to ancient samples, tissue samples of modern and historical specimens were obtained from the collections of the Mammal Research Institute of the Polish Academy of Sciences (PAS), the Museum and Institute of Zoology PAS, the Hungarian Natural History Museum and the Centro de Investigação em Biodiversidade e Recursos Genéticos at the University of Porto. Additional tissue samples preserved in ethanol were collected and processed at the Mammal Research Institute, PAS (MRI PAS). We also used DNA from modern specimens used for previous studies and held at MRI PAS and National Museums of Scotland (Table S2) [25, 55].
DNA extraction, library preparation, enrichment and sequencing
The initial processing of ancient samples, ancient DNA extraction, and all pre-polymerase chain reaction (PCR) steps of library preparation were performed in a clean laboratory at the Laboratory of Paleogenetics and Conservation Genetics at the Centre of New Technologies of the University of Warsaw (CeNT UW). Vole teeth were thoroughly cleaned with ultra-pure water and crushed with a pipette tip. DNA was extracted using protocols optimised for retrieving ultrashort DNA molecules, either column- [56] or silica-based [57]. The fraction of the DNA extract was converted into double-indexed and either double-stranded or single-stranded sequencing libraries following the protocols described by Baca et al. [58] and Gansauge et al. [59], respectively. Some of the libraries, including all shotgun sequenced to low-genome coverage, were subjected to partial USER treatment [60] to remove uracil residues resulting from cytosine deamination, which is typical of ancient DNA. Sequencing libraries were enriched for vole mtDNA following the protocol of Horn [61]. Up to five libraries were pooled for each enrichment reaction and two subsequent rounds of hybridisation were performed. Enriched library pools were quantified using a Denovix spectrophotometer, pooled together, and sequenced on a NextSeq550 instrument (Illumina) in the PE75 mode. A subset of libraries was screened for endogenous DNA content using low-depth shotgun sequencing on NextSeq550 instrument in PE75, and selected libraries with the highest endogenous DNA content were subjected to deeper sequencing on NovaSeq6000 instrument in PE50 mode at the NGS Core Facility at CeNT UW.
DNA from modern and historical specimens was extracted using various methods, either in MRI PAS or CeNT UW (Additional file 2: Table S2). The quality of the extracted DNA was assessed by electrophoresis on 1% agarose gels. Extracts with high molecular weight DNA were sonicated to a mean length of 400 bp using Covaris M220 focused-ultrasonicator and size selected using magnetic beads. All extracts were transformed into double-stranded sequencing libraries, enriched for mtDNA, and sequenced on a NextSeq550 instrument or shotgun sequenced on a NovaSeq6000 instrument in PE150 mode. For a comprehensive account of the laboratory procedures, please refer to the Additional file 2: Text S2.1–2.5.
Sequencing data processing
Raw sequencing reads were trimmed, filtered and collapsed using AdapterRemoval v.2 [62]. To generate mtDNA consensus sequences, we mapped filtered reads resulting from mtDNA-enriched libraries or a subset of 20 million reads from shotgun sequenced libraries and field vole mtDNA references using bwa mem [63] with default parameters. We tested three different references, based on the de novo assembled mitogenomes of M. agrestis and M. rozianus. The resulting alignments were filtered for mapping quality (-q 30) and de-duplicated using samtools v.1.9 [64]. Consensus mtDNA sequences were called using a custom script based on the bcftools mpileup and ivar call [65]. We called a base if it was supported by 75% of the reads and masked positions with a coverage lower than three.
To generate nuclear genomic alignments, we mapped the filtered reads to the prairie vole (Microtus ochrogaster) reference genome MicOch1.0 (GCF_000317375.1) and the short-tailed field vole (M. agrestis) reference genome (GCA_902806775.1). The sequencing reads of modern samples were mapped using bwa mem with default parameters, filtered for mapping quality (-q 25), and de-duplicated using samtools v. 1.9. Filtered reads of ancient samples were mapped using bwa aln with relaxed settings suitable for ancient DNA (− l 16,500 − n 0.01 − o 2). Potential duplicates were removed using samremovedup.py script (https://github.com/pontussk/samremovedup). Sequences shorter than 30 bp were removed with reformat.sh from the BBtools suite (sourceforge.net/projects/bbmap/), and 2 bp from each end of the read were clipped with trimBam from the bamUtil suite [66]. Indel realignment was performed using GATK v.3.8 IndelRealigner [67].
In analyses involving only high-coverage samples, that is, PSMC, TT-dating, and heterozygosity calculations, we used data aligned to the ingroup field vole reference genome, whereas for analyses including both high- and low-coverage samples, that is, D-statistics, TreeMix, F-branch and AdmixtureBayes, we used data mapped to the outgroup prairie vole reference genome to avoid reference bias. In analyses involving data mapped to the field vole reference genome, we excluded putative sex-linked contigs. First, we identified sex-linked contigs using the SeXY approach [68] and aligned all the contigs to the X and Y chromosomes of Chionomys nivalis (mChiNiv1.1; GCF_950005125.1) using satsuma [69]. We verified the selected contigs by comparing the coverage of putative sex-linked contigs with the mean genome coverage. We noticed that many contigs mapped to C. nivalis Y or X chromosomes did not show the expected coverage ratio. Therefore, we additionally blasted the genome against C. nivalis X and Y chromosomes using BLAST + 2.14.0 (-evalue 1E − 10, -word_size 15, -max_target_seqs 1000) and flagged the contig as sex-linked when indicated by at least two of these three approaches (Additional file 1: Table S9).
Radiocarbon dating
We selected ten vole mandibles that yielded mtDNA genome sequences for radiocarbon dating. Collagen extraction and quality assessment were performed in the Department of Human Evolution at the Max Planck Institute for Evolutionary Anthropology (Leipzig, Germany) following the protocol for less than 100 mg bone samples described in Fewlass et al. [70]. Extracted collagen was dated using a MICADAS Accelerator Mass Spectrometer either in ETH-Zurich, Switzerland, or in Aix-en-Provence, France. A detailed description of the radiocarbon dating procedure is provided in the Additional file 2: Text S2.6.
Phylogenetic analyses based on mtDNA
Phylogenetic analyses of the mtDNA sequences were performed in BEAST 1.10.5 [71] using a dataset composed of mitogenomes of 70 modern and 83 ancient samples including 10 directly radiocarbon-dated specimens. The mtDNA sequences were aligned using MAFFT v7.407 [72]. The best partitioning scheme and appropriate substitution models were selected using PartitionFinder v.2 (Additional file 1: Table S10) [73]. We performed a Bayesian evaluation of the temporal signal (BETS) [74] to test whether our dated dataset, consisting of all modern and directly dated sequences (n = 80), was appropriate for calibrating the molecular clock. Next, we performed a leave-one-out analysis on the directly dated specimens to evaluate the accuracy of the molecular estimation of sample age. Then, we estimated the age of each undated specimen in a separate BEAST run. In each of these analyses, we set the gamma prior (shape = 2; scale = 30,000) on the age of the undated specimen. Finally, we performed a joint analysis using all sequences. We used the posterior from the individual age estimates as priors for the ages of the undated samples.
We performed a demographic reconstruction based on mtDNA sequences of field voles from Central Europe (Table S1), the only region with a sufficient number of modern and ancient specimens. We executed the analysis in BEAST using the Bayesian Skyline plot tree prior and visualised the demographic reconstruction in Tracer 1.7 [75]. Please refer to the Supplementary Information Text S2.7 for a detailed account of the phylogenetic analyses.
Generation of genomic SNP dataset
To generate an SNP dataset with alignments mapped to the MicOch1.0 genome, we used a two-step approach to minimise the impact of low- and unequal coverage and ancient DNA damage. First, we called the genotype likelihoods using ANGSD (v.0.930; htslib v.1.9) [76] and a dataset of all modern samples with a coverage above 3 ×. We filtered out transitions and positions in regions above two times the global mean coverage (600) and below the third of the global mean coverage (90) and filtered for a minor allele frequency of 0.05. We estimated the error rates using ANGSD’s perfect sample approach (-doAncError 1), with sample WM009 selected as a “perfect” sample since this had the highest coverage among short-tailed field voles. This analysis revealed that despite the use of partial USER treatment and clipping of 2 bp from read ends, some ancient samples exhibited elevated error rates in the C → T and G → A categories (Additional file 2: Fig. S10). Therefore, we used a custom Python script to remove all the transitions and the non-biallelic positions from our SNP dataset, resulting in 21,242,450 filtered transversion sites. We then sampled a single high-quality base per sample at each of these positions using ANGSD -doHaplocall 1. We allowed missing data for three of the 29 samples. Furthermore, we used PLINK 1.9 to prune our dataset for linkage disequilibrium (LD) using –indep-pairwise 10 kb 1 0.5. This resulted in a dataset of 860,300 filtered LD-pruned transversions.
Population structure
To reconstruct the affinities between modern and ancient field voles, we computed an IBS (identity-by-state) matrix in ANGSD. We used the following filtering criteria: -minMapQ 25 -minQ 25 -uniqueOnly 1 -remove_bads 1 -minFreq 0.08 -rmTrans 1 -minInd 25 -GL 2 -doGlf 2 -doMajorMinor 1 -doMaf 2 -SNP_pval 1e-6 -doIBS 1 -doCounts 1 -makeMatrix 1 -doCov 1. We limited the analysis to scaffolds longer than 1 Mb (-rf) and excluded the X chromosome from the MicOch 1.0 assembly. The resulting IBS matrix based on 1,286,183 variable positions was transformed into a Multidimensional Scaling Plot using cmdscale function in R. We used ADMIXTURE [77] to investigate the ancestry composition in the dataset using a set of 860,300 LD-pruned transversions. For each K = 2–7, we ran 10 replicates and selected the replicate with the lowest cross-validation (CV) error. CV errors were lowest at K = 2 (Additional file 2: Fig. S3), indicating two ancestral components as the optimal model.
Demographic reconstruction and divergence dating
We used the PSMC model [31] to reconstruct the past demography of each vole lineage. We chose two individuals per lineage and downsampled them to 20 × mean coverage to limit the impact of variable coverage on the analysis. Whole genome vcf files were called using bcftools v.1.9 mpileup and call commands, and vcfutils.pl was used to convert vcf files into diploid fastq sequences. We filtered the positions above twice and below one-third of the mean depth of coverage. The PSMC was run with default parameters (-N25 -t15 -r5 -d -p "4 + 252 + 4 + 6″). We performed 100 bootstrap replicates to assess the accuracy of the PSMC reconstruction. We scaled the output using a generation time of half a year and a mutation rate estimated for mice with 5.0E − 9 substitutions per site per generation [78]. We also tested the mutation rate recently estimated for the common vole, 8.7E − 9 substitutions per site per generation [36]. To some extent, the divergence time of the two populations can be inferred from the PSMC plots, although this is not always evident. To further assess the divergence of the three field vole lineages, we used the TT method [32]. This coalescence-based method relies on genomic data of a single diploid individual from each population. We used the scripts provided by the authors (https://github.com/jammc313/TT-method) to perform calculations. To generate the ancestral state at each position of the genome we generated genome sequencing data for three outgroup species, Tatra vole (Microtus tatricus; ERS17697868), social vole (M. socialis, ERS17697869)*, and European pine vole (M. (Terricola) subterraneus; ERS17697870) and mapped them to the short-tailed field vole reference genome. Whole-genome FASTA sequences were then generated using a consensus call (ANGSD -doFasta2) and included only sites with coverage higher than four (-minIndDepth 5). We limited the analysis to 57 scaffolds larger than 5 Mb. We called the ancestral state in a given position if the same base was present across all three outgroup species, using a custom Python script. We used the vcfs generated by the bcftools v. 1.9 call command and filtered for depth, as in the case of PSMC, but also explored more straightened filtering and removed positions with a quality below 10 and SNPs within 5-bp from the indels.
Individual genome-wide heterozygosity was estimated for all individuals with a mean genome coverage above three using ANGSD. First, the site allele frequency was calculated (− doSaf 1) using only positions with coverage higher than four. The folded site frequency spectrum was subsequently estimated in 20 Mb windows using realSFS, providing the reference genome as an ancestral state.
Gene flow between the field vole lineages
To investigate patterns of gene flow between the field vole lineages, we used TreeMix 1.13 [79] to estimate the number and intensity of historical migrations. We converted the dataset of 860,300 filtered transversions from transposed genotype files to the TreeMix format using PLINK 1.9 [80] and the plink2treemix script from the TreeMix distribution. We performed five iterations of TreeMix runs with each zero to eight migration edges. For each iteration, we subsampled 80% of the SNP positions to introduce variations in likelihood. To choose the optimal number of migration events we used the ‘Evanno’ approach implemented in the optM R package [81] (Additional file 2: Fig. S5). To assess node support of the Maximum Likelihood tree produced by TreeMix 1.13 we ran 100 bootstrap replicates (-bootstrap) of TreeMix analyses with zero and one migration edge. We computed node supports using the ape package in R. We further used Dsuite [82] to calculate Fbranch (fb) statistics based on classic D-statistics [83]. We ran the Dtrio program with default parameters and used the tree topology inferred by TreeMix with no migration edges to calculate the Fbranch statistic. To additionally evaluate the relationships between the main evolutionary lineages we reconstructed an admixture graph using AdmixtureBayes [84]. We limited this analysis to six individuals, outgroup (WM007), one each from the Portuguese (WM581), Mediterranean (WM330) and our new Italian (MI605) field vole lineages and two from the short-tailed (FV396 and WM009) field vole lineage. From our SNP dataset, we excluded sites with missing data, which resulted in 491,973 SNPs. We performed three independent analyses for 30 million iterations with 32 MCMC chains (–n 600,000 –MCMC_chains 32). We evaluated the convergence using the R script provided by the authors (https://github.com/avaughn271/AdmixtureBayes) and plotted three topologies with the highest posterior probabilities and best estimates of the branch lengths and admixture proportions.
Supplementary Information
Additional file 1: Supplementary Tables S1–S10.Additional file 2: Supplementary Figures S1–S10 and supplementary Methods and Results including additional references [85–198].
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Fejfar O, Heinrich WD, Kordos L, Maul LC. Microtoid cricetids and the early history of arvicolids (Mammalia, Rodentia). Paleont Electr. 2011;14. http://palaeo-electronica.org/2011_3/6_fejfar/index.html.
- 2Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. ar Xiv:1303.3997 v 2. 2013;00:1–3. http://arxiv.org/abs/1303.3997.
- 3Luzi E. Morphological and morphometric variations in Middle and Late Pleistocene Microtus arvalis and Microtus agrestis populations: chronological insight, evolutionary trends and palaeoclimatic and palaeoenvironmental inferences. Ph D thesis. Tarragona: Universitat Rovira i Virgili; 2018.
- 4Baca M, Bujalska B, Popović D, Golubiński M, Alves PC, Bard E, Berto C, Cuenca-Bescós G, Dalén L, Fewlass H, Fadeeva T, Herman J, Horáček I, Krajcarz M, Law M, Lemanik A, López-García JM, Luzi L, Murelaga X, Mahmoudi A, Peresani M, Parfitt S, Pauperio J, Pavlova SV, Pazonyi P, Rodríguez IR, Searle JB, Stojak J, Strukova T, Wójcik JM, Nadachowski A. Evolutionary history of field voles. 2024. European Nucleotide Archive. 2024. https://www.ebi.ac.uk/ena/browser/view/PRJEB 67915.
- 5Liu S, Zhou C, Meng G, et al. Whole genome sequencing data of genus Neodon and related species in Arvicolinae in China. European Nucleotide Archive. 2020. https://www.ebi.ac.uk/ena/browser/view/PRJNA 564473 PRJNA 290426.
- 6University of Liverpool. De novo assembly of the field vole (Microtusagrestis) genome; produced as part of a study into the diversity and evolution of immune genes in wild rodents. European Nucleotide Archive. 2015. https://www.ebi.ac.uk/ena/browser/view/PRJNA 290426.
- 7Baca M, Bujalska B, Popović D, Golubiński M, Alves PC, Bard E, Berto C, Cuenca-Bescós G, Dalén L, Fewlass H, Fadeeva T, Herman J, Horáček I, Krajcarz M, Law M, Lemanik A, López-García JM, Luzi L, Murelaga X, Mahmoudi A, Peresani M, Parfitt S, Pauperio J, Pavlova SV, Pazonyi P, Rodríguez IR, Searle JB, Stojak J, Strukova T, Wójcik JM, Nadachowski A. Git Hub. 2024. https://github.com/mateuszbaca/Field Voles.
