On the robustness of Bayesian inference of gene flow to intragenic recombination and natural selection
Yuttapong Thawornwattana, Bruce Rannala, Ziheng Yang

TL;DR
This study examines how accurately Bayesian methods can detect gene flow between species, even when factors like recombination and natural selection are present.
Contribution
The study reveals that Bayesian inference of gene flow is generally robust, but may produce false positives under specific conditions involving recent divergence and high recombination.
Findings
The Bayesian test of gene flow has very low false positives in most scenarios.
High recombination rates and recent species divergence can lead to false positives in sister lineage gene flow tests.
The test of gene flow between nonsister lineages is robust to recombination at all divergence levels.
Abstract
The multispecies coalescent (MSC) model provides a framework for detecting gene flow using genomic data, including between sister species. However, the robustness of the inference to violations of model assumptions are poorly understood. Here, we use simulation to study the false positive rate of a Bayesian test of gene flow under the MSC with multiple influencing factors including recombination, natural selection, discrete versus continuous gene flow, variable species divergence time, and gene flow involving sister versus nonsister lineages. We find that in almost all scenarios examined the test has very low false positives. However, the test of gene flow between sister lineages may be prone to high false positives in cases of very recent species divergence and very high recombination rate. At low recombination rates, the test is robust to selective sweeps, background selection and…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8| Segment | Parameters of interest | Indeterminate |
|---|---|---|
| ( | ||
| (a) |
| ( |
| (b) |
| ( |
| (c) |
| ( |
| (d) |
| ( |
| (e) |
| ( |
| ( | ||
| (a) |
| ( |
| (b) |
| ( |
| (c) |
| ( |
- —Biotechnology and Biological Sciences Research Council10.13039/501100000268
- —Natural Environment Research Council10.13039/501100000270
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenetic diversity and population structure · Evolution and Genetic Dynamics · Genetic Mapping and Diversity in Plants and Animals
Introduction
With recent advances in methodological and software development (Jiao et al. 2021; Mirarab et al. 2021; Hibbins and Hahn 2022) and increased availability of genomic data, the multispecies coalescent (MSC) provides a powerful statistical framework for inferring historical demography and gene flow from genomic data. The MSC method makes several modeling assumptions, including no intra-locus recombination, high inter-locus recombination, neutral evolution, and random mating (no population structure) (Jiao et al. 2021). Although the MSC could be extended to model recombination within population, this is a serious theoretical and computation challenge and existing implementations of the MSC model therefore do not explicitly account for recombination. When the MSC model is applied to multi-locus sequence data, a single genealogical tree is assumed to apply to all sites at a locus while different loci are assumed to have independent histories (Nielsen and Wakeley 2001; Rannala and Yang 2003). Intra-locus recombination is expected to be a more serious model violation than tightly linked loci, because gene tree correlations among loci due to linkage should influence the information content in the data only (similar to the use of a composite likelihood), whereas intra-locus recombination causes a violation of the likelihood model. Several simulation studies suggest that inference under the MSC model is largely robust to realistic levels of intra-locus recombination (Lanier and Knowles 2012; Zhu et al. 2022; Yan et al. 2023). The population recombination rate ( per site, where N is the effective population size and r is the recombination rate per site per generation) used in these studies is up to 0.02 (Lanier and Knowles 2012), 0.005 (Zhu et al. 2022) and 0.008 (Yan et al. 2023). These values are much larger than the genome average for humans of about 0.0005 (Reich et al. 2002; Jensen-Seaman et al. 2004). However, ρ in organisms such as insects can be orders of magnitude larger, due to more frequent recombination per site per generation (r) and larger effective population sizes (N) (DeLory et al. 2024). For example, in Drosophila melanogaster, ρ will be about 0.1, if per site per generation (Singh et al. 2005) and . The impact of such high recombination rates on inference under the MSC has not been explored. Moreover, the impact of recombination in combination with other factors such as species divergence time, the mode of gene flow, and selection remains largely unknown.
Current implementations of the MSC model also assume neutral sequence evolution. Natural selection may alter the pattern of shared polymorphism within and between species, potentially creating false signals of gene flow (Cruickshank and Hahn 2014; Smith and Hahn 2024). For instance, loci under balancing selection (frequency-dependent selection) that originated in the ancestral population and have persisted to present-day populations can mimic a signal of gene flow due to shared ancestral polymorphism. Background selection tends to reduce polymorphism at linked neutral loci across the genome (Charlesworth et al. 1993; Hudson and Kaplan 1995; Nordborg et al. 1996). Thus its effect on the MSC inference should be primarily a reduction in estimates of the effective population size and species divergence times (Shi and Yang 2018). Selective sweeps can also lead to reduced polymorphism around selected sites (Maynard Smith and Haigh 1974). However, selective sweeps may not have a large impact on inference under the MSC because beneficial mutations should be rarer than deleterious ones and their effects are transient (Przeworski 2002). Indeed in simulations MSC-based inference was found to be robust to positive selection (Adams et al. 2018; Wascher and Kubatko 2023).
In contrast to the findings outlined above a recent study by Smith and Hahn (2024) found that different types of natural selection caused excessive false positives when data were simulated under different schemes of selection without gene flow and analyzed using three methods to detect introgression between sister populations: Bpp (Yang 2015; Flouri et al. 2018, 2020), a i (Gutenkunst et al. 2009), and Fastsimcoal2 (Excoffier et al. 2013). Bpp is a full-likelihood method under the MSC applied to multilocus sequence data, whereas a i and Fastsimcoal2 use a composite likelihood of site frequency spectrum (SFS) data assuming independent sites. The authors concluded that by ignoring selection, all three methods generate false-positive signals of introgression in certain parameter regimes. However, the study design is flawed and their interpretations of the simulation results are misleading (see Discussion). The authors considered a Bayesian test to be significant if the 95% highest posterior density (HPD) credibility interval (CI) for the introgression probability excludes zero, an approach that is sensitive to the prior on parameters. While the authors ascribed the false positives to selection, their results showed an overwhelming effect of the species split time, whereas differences among the selection schemes were much less important (Smith and Hahn 2024, Figs. 1, 3, and 5 for a i , Fastsimcoal2, and Bpp, respectively). In particular, Bpp had 100% false positive rates at a shallow divergence time ( generations) regardless of the selection scheme (including neutral evolution) whereas the false positive rates were low at deeper divergence ( and ). As a result it is unclear what may have caused the high false positives reported by Smith and Hahn (2024).
Gene flow between sister populations is expected to be harder to detect than between nonsisters (Hibbins and Hahn 2022). For example, gene flow between nonsisters may cause asymmetries in the gene tree distribution and in site-pattern counts in genomic data and can thus be detected by summary quartet methods such as D (or Hyde) (Green et al. 2010; Blischak et al. 2018) and Snaq (Solis-Lemus and Ane 2016). In contrast, gene flow between sisters is undetectable by those methods. Methods that detect gene flow between sister species typically use multiple samples from the same species and use more information than the gene-tree frequencies or genome-wide site-pattern counts. However, Cruickshank and Hahn (2014, Fig. 4) found that inference of gene flow between two species using the Ima2 program (Hey 2010) generated excessive false positives at very shallow divergence, in particular in small datasets with a few loci. This was confirmed by Hey et al. (2015) but no practical solution has been proposed. The high false positives are surprising because in those studies, data were simulated assuming no recombination and no selection, and all assumptions of the MSC model were satisfied.
Recent research also suggests possible differences in the test of gene flow conducted under the discrete MSC-introgression (MSC-I) model versus the continuous MSC-migration (MSC-M) model (Huang et al. 2022; Thawornwattana et al. 2025; Yang et al. 2025). For example, when the rate of gene flow between sisters is very low, the estimated migration rate under MSC-M is close to 0 (Thawornwattana et al. 2025, Fig. 3), whereas the estimated introgression probability under MSC-I may not be close to 0, but rather fluctuate over the whole range ( ) (Huang et al. 2022, Fig. 3). Those results hint at possible differences depending on whether the MSC-I or MSC-M model is used to conduct the test. Indeed the test of introgression between sisters under MSC-I has a number of “nonstandard conditions” (e.g. Self and Liang 1987, Fig. 3), such as boundary problems, unidentifiability issues, and multiple routes from the alternative to the null hypotheses (Yang et al. 2025). In comparison, the corresponding test under the MSC-M model appears not to involve most of those irregularities.
Here, we conduct an extended simulation to examine the false positive rate of a Bayesian test of gene flow, influenced by several aspects of the data and model. We consider a variety of recombination rates, different schemes of natural selection, the size and shape of the species phylogeny, the species divergence time, the size of the dataset, the mode of gene flow (MSC-I versus MSC-M) with either unidirectional or bidirectional gene flow (Fig. 1b–e), and gene flow involving either sister or nonsister populations. Smith and Hahn (2024) used the bidirectional introgression model (Fig. 1b) only. Instead of the HPD CI used in the introgression test of Smith and Hahn (2024), we apply the Bayes factor to conduct the Bayesian test of gene flow, which is the standard device for hypothesis testing in the Bayesian framework (Jeffreys 1935).
a) Simulation and b)–e) analysis models. Data are simulated under the MSC model with no gene flow a) and analyzed under the discrete introgression model (MSC-I, b&c) or the continuous migration model (MSC-M, d&e) with either bidirectional (b&d) or unidirectional (c&e) gene flow, to calculate the Bayes factor to conduct the Bayesian test.
We find that excessive false positives for the Bayesian test of introgression reported by Smith and Hahn (2024) are due to a combination of very high recombination rate and very shallow species divergence, rather than natural selection as the authors claimed. We characterize the performance of the Bayesian test in multiple settings of parameters and data and in particular assess the effects of recombination, selection, and species divergence time. Overall the Bayesian test appears to be highly robust to recombination and selection. Even with excessive recombination, false positives are rare at high divergences between sister species and at all species divergences between nonsisters. We demonstrate that the high false positives for the likelihood ratio test (LRT) of gene flow reported by Cruickshank and Hahn (2014) and Hey et al. (2015) are due to inappropriate use of integrated likelihood and misinformative uniform priors in the Ima program. We show that the so-called “small data-low divergence” (SDLD) problem (Hey et al. 2015) is not a problem for a conventional Bayesian test using the Bayes factor. Our results should serve as useful guides for applying coalescent-based tests of gene flow using genomic data.
Results and discussion
Here, we summarize the results of simulations examining the false positive rate of the Bayesian test of gene flow under both the MSC-I and MSC-M models. We simulated data under the null model of no gene flow, with no or variable amounts of recombination and with neutral evolution or several modes and intensities of natural selection (see the summary in Table S1 and the Materials and Methods section). We also consider a range of population divergence times and data sizes (in the number of sequences sampled per population). Our main focus in this study is the robustness of the Bayesian test of gene flow in the MSC modeling framework. In all cases, the Bayes factor measures the evidence in support of the alternative hypothesis of gene flow (either continuous migration or discrete introgression) against the null hypothesis of no gene flow. Since we are concerned with false positives rather than the power of the test to detect gene flow, all datasets are simulated under the null model (with no migration or introgression) but other factors such as selection and recombination are varied among simulations. We examined biases in estimates of parameters as well (including rates of gene flow), but note that the true rates are zero in datasets simulated in this study.
Performance of Bayesian test: no recombination or selection
In our standard simulation, we evaluated the false positive rate of the Bayesian test of gene flow between sister species when the data are generated with either intralocus recombination or selection but no gene flow. We simulated data using the phylogeny for two species of Fig. 1a, with three species divergence times, at , and generations (or , θ and ), and then analyzed them under both the discrete MSC-I model and the continuous MSC-M model, assuming either unidirectional or bidirectional gene flow (BDI, UDI, BDM, and UDM, Fig. 1b–e). Under the MSC-I models (BDI and UDI), we also considered model specifications that placed constraints on population sizes (θ) on the phylogeny, reducing the number of parameters. As the data were simulated with no gene flow, any inferred gene flow will be a false positive.
As a reference for comparison we first examine the behavior of the Bayesian test when there is no recombination or selection, i.e. when there is no model violation (Fig. 2a, or Fig. 2b, Neutral). Note that each of the four models of gene flow of Fig. 1b–e contains the null MSC model ( , Fig. 1a) as a special case and can be used as the alternative hypothesis ( ) in the test. The asymptotic theory of Bayesian model comparison then predicts that when the number of loci , the null hypothesis dominates and the Bayes factor in support of the alternative hypothesis , (e.g. Dawid 2011). With absent or uninformative data, . In our simulation the test of gene flow using each of the four alternative hypotheses has false positive rates close to 0% (Figures S1–S8, ). We also note that in none of the datasets was . At the shallow divergence, is around 1 in most datasets, although at the high divergence in most datasets (Figures S1a, S2a, S3a, for ). There is more information about gene flow or its absence in datasets simulated at the high divergence, but overall the information content is low in datasets of this size (relative to the expectation for infinite data of ).
Bayes factors (B10) for testing gene flow under the BDI model in analysis of 20 replicate datasets simulated under a) neutral evolution with different recombination rates and b) under different selection schemes without recombination. B10 is calculated using the Savage-Dickey density ratio with ϵ=0.001 (Ji et al. 2023); those calculated using different ϵ as well as parameter estimates are shown in Figures S1 & S9. Columns correspond to three species divergence times: T=N,4N, 16N (or τ=14θ,θ,4θ). Horizontal dashed lines indicate the significance cutoffs at 100 and 0.01. Bayes factors larger than 1,000 are shown as 1,000. False positive rates are shown at the top of each subplot.
To interpret the parameter estimates, note that the test under the MSC-I models (BDI and UDI) involves several nonstandard features (Yang et al. 2025). For example the null MSC model does not correspond to one point in the parameter space of BDI ( ) but instead is represented by five lines or planes (Table 1(i)). The posterior CIs for are very wide and cover nearly the whole range ( ) (Figures S1b, S2b, S3b, for ). Furthermore, the BDI model without constraints on θs is affected by an unidentifiability of the label-switching type, involving the four parameters , , and (Yang and Flouri 2022). This type of unidentifiability occurs often in mixture models. Suppose there are two groups in proportions and with means . Then the parameter vectors and are unidentifiable as they involve switching labels for groups 1 and 2. Models involving unidentifiability of the label-switching type can still be used in inference. The unidentifiability of the BDI model caused (as well as ) to have large CIs (Figure S1). With constraints on θ, this unidentifiability disappears so that and are well estimated (even though are not) (Figures S2b & S3b ).
The analysis under the UDI model with A-to-B introgression (Fig. 1c) is similar (Figures S4–S6, ). is represented by three lines or line segments in the parameter space of (Table 1(ii)). As a result, in data simulated with no gene flow, the posterior of had wide CIs (Figures S4b, S5b, S6b, ). Again, irrespective of possible constraints on θ, tends to vary between 0.3-0.5 at the three divergence levels. In none of the datasets was . The information content about is relatively low, in sharp contrast with the very narrow CIs for some other parameters such as θ (Figure S6b, ).
The test under the MSC-M models (BDM and UDM, Fig. 1d&e) does not have the irregularities discussed above for the MSC-I models, and the results are simpler to interpret (Figures S7 & S8, ). With no gene flow in the data, the estimated migration rates ( ) are close to 0, and the Bayes factor is around 1 for the shallow tree (indicating lack of information), and close to or less than 0.01 for medium or deep trees. In other words, these datasets are informative enough that the Bayesian test rejects strongly the alternative hypothesis of gene flow.
In summary, the results of the Bayesian test when there is no model violation are consistent with standard theory (O’Hagan and Forster 2004). It is also interesting to note the differences between the discrete introgression models (BDI and UDI) and the continuous migration models (BDM and UDM). Under BDI and UDI, the biological scenario of no gene flow is represented not only by (segment a in Table 1(ii), say), but also by (segment c for UDI), as well as by any arbitrary value over ( ) (segment b for UDI, say). Datasets simulated in this study are not large enough for the Bayesian test to strongly reject the model of gene flow ( ) when the test is conducted under the MSC-I models. In contrast, data simulated at medium or high divergences are informative enough to reject when the test is conducted under the MSC-M models. Note that strong rejection of the alternative hypothesis is possible with the Bayesian test but impossible with Frequentist tests such as the LRT.
Performance of Bayesian test: sister populations with recombination
When data were simulated under the null MSC model (Fig. 1a) with recombination ( ) but no selection, and analyzed under the BDI model (Fig. 1b), the Bayesian test is conducted under misspecified models. The false positive rate for detecting introgression was zero at high species divergence ( ), mostly zero at medium divergence (40% at and at other recombination rates, all at ), but reached 100% at high recombination rates ( or higher) and shallow divergence ( ) (Fig. 2a). Using different cutoff values ( ) in the Savage–Dickey density ratio for estimating Bayes factors yielded similar results (Figure S1a). Constraining the population size parameters such that and or using the same size for all populations yielded similar results although the false positive rate was slightly lower in some cases (Figures S2a, S3a).
These results under the BDI model, in the absence of any selection, mimic those obtained by Smith and Hahn (2024, Fig. 5). Furthermore, our reanalysis of datasets generated by Smith and Hahn (2024) (with ) yielded similar results of high false positives at shallow divergence regardless of selection scheme used (Figure S17a). We suggest that the high false positives reported by the authors are mostly caused by high recombination rates combined with very shallow divergence, rather than by selection.
We then analyzed the data using the simpler unidirectional model with A-to-B introgression (UDI, Fig. 1c). The results were similar to those obtained under the BDI model (Figures S4–S6). At deep or medium divergences ( or ), there were no false positives. At shallow divergence ( ), the false positive rate was low at , about 20% or less, but reached 100% at the two highest values of ρ (Figure S4a). Again, constraints on the population size parameters led to similar results but with slightly lower false positive rates, with no false positives at medium and deep divergences (Figures S5a, S6a).
Lastly, we conducted Bayesian test of gene flow using the continuous migration model (BDM or UDM, Fig. 1d&e) as the alternative hypothesis (Figures S7a & S8a). At medium or deep divergences, no false positives were observed when either BDM or UDM was used in the test. At shallow divergence, we obtained high false positives under the BDM model at high recombination rates (with or higher, Figure S7a). At shallow divergence the test under UDM yielded lower false positives than under BDM at but 100% false positives at higher ρ (Figure S8a).
In summary, use of the unidirectional models leads to slightly lower false positives than the bidirectional models (BDI and BDM versus UDI and UDM), the constraints on θ did not have large effects, but the species divergence time had large effects, with the combination of very recent divergence and very high recombination rates producing excessive false positives in the test.
Effect of recombination on Bayesian parameter estimation
When there is no gene flow in the data, we expect estimates of the introgression probabilities ( and ) and the introgression time ( ) under the BDI model to fall into five regions of the parameter space: (i) with both and close to 0 (or 1) and matching the true split time (τ) and with unidentifiable; (ii) with one of and close to 0 and the other close to 1, and with matching the true split time (τ) and being unidentifiable; 3) with both and matching the true split time, with and unidentifiable (see Table 1(i)). Thus in datasets with no signal of gene flow, we expect and to vary over the whole range of ( ) with wide posterior CIs. Our results from the runs that did not detect gene flow support these predictions (Figure S1b).
When there is a false positive, the signal of intralocus recombination is misinterpreted by the model as signals of introgression, and the introgression probabilities ( and ) tend to have intermediate posterior means and narrow HPD CIs. For example, at and , the estimates of and were about 0.2 with tight CIs, leading to strong support for introgression (Figure S1b).
We find that the effective population sizes and species divergence time are always overestimated in the presence of recombination, regardless of whether the test falsely detects gene flow (Figure S1b), with higher recombination rates leading to larger biases in parameter estimates. For example, at , the estimates of and are nearly twice the true values while at , the estimates are nearly five times the true values. The bias in the estimate of the species split time ( ) is not as extreme. The ancestral population size ( ) is underestimated because it is negatively correlated with the species divergence time. We obtained the same pattern of biases when the model was constrained to have and (Figure S2b) or to have a single population size for all populations (Figure S3b). We observed similar biases in parameter estimates under the UDI model when the test detected introgression (Figures S4b, S5b, S6b).
Under the migration models (BDM and UDM) estimated migration rates ( ) were close to 0 when the test did not detect gene flow, but were nonzero, sometimes with narrow intervals, when the test detected gene flow at shallow divergence (Figures S7b & S8b). At very high recombination rates, all parameters (θ, τ) involved large biases even if the test did not detect gene flow.
In summary, intralocus recombination leads to an overestimation of the present-day population size parameters, and to a lesser extent, an overestimation of the species divergence time. However, only at shallow divergence do high recombination rates result in consistent nonzero estimates of the introgression probability in the MSC-I models or the migration rate in the MSC-M models, leading to false signals of gene flow.
Performance of Bayesian test: selection
We examine the effects of selection on the Bayesian test of gene flow by simulating data with no gene flow and no recombination under four different selection schemes: background selection, selective sweep in A, selective sweep in R and balancing selection. Again the test is conducted under all four models of gene flow (BDI, UDI, BDM and UDM; Fig. 1b–e), with different variants constraining the population sizes (θ) under the BDI and UDI models.
We find the false positive rate for the test of gene flow under the BDI model to be low across selection schemes (Fig. 2b; see also Figures S9–S16). An exception is balancing selection at deep divergence, which had a false positive rate of 40%. This pattern was robust to different cutoff values ( ) used in the Bayes factor calculation (Figure S9a). We note that in Smith and Hahn (2024), where the data were simulated with recombination, balancing selection also led to high false positive rates of 30–40% (their Figs. 5, S30–S32) but at the medium divergence ( ). We confirm this finding in our reanalysis of the data of Smith and Hahn (2024) (Figure S17a).
Unlike the neutral case, constraining the population size parameters such that and led to 100% false positives in the case of background selection and balancing selection at deep divergence ( ; Figure S10a). Similarly, for these two selection schemes, using the same population size parameter for all populations led to 100% false positives at deep divergence and 80–100% false positives at medium divergence (Figure S11a).
When we analyzed the data under the UDI model (Fig. 1c), the false positive rates are generally lower, with no false positives at shallow and medium divergence (Figures S12a, S13a, S14a). At deep divergence, balancing selection has about 30% false positive rate for the model without constraints on θ (Figure S12a) while the false positive rate is 80–100% for the two models with constraints on θ (Figures S13a, S14a). Background selection caused a high false positive rate of 100% only at deep divergence when all θ are constrained to be equal (Figure S14a).
Using the migration models, in particular, the UDM model (Fig. 1d&e) to analyze the data led to almost no false positives (Figures S15a, S16a).
In summary, our results demonstrate that the Bayesian test of gene flow is robust to selective sweeps and background selection while prolonged balancing selection can lead to false signals of gene flow under the discrete MSC-I models. The test under the continuous MSC-M model is highly robust to selection, with virtually no false positives.
Effect of selection on parameter estimation
When data are simulated under selection with no gene flow and no recombination but analyzed assuming gene flow with no selection, the model is seriously misspecified. If the test of gene flow is significant, examining the estimates may help us to understand the cause of the false positive error. If the test supports no gene flow, the estimates are expected to be close to those under the MSC model, which are still expected to be biased due to impacts of selection.
Under the BDI model, strong background selection (mean selection coefficient ) led to highly reduced estimates of the present-day effective population sizes ( and ) and species divergence time ( ) compared to the neutral case (Figure S9b). The effect on and was comparable across the three divergence levels while was most severely underestimated at deep divergence. The ancestral population size ( ) was overestimated since it was negatively correlated with . The two selective sweep schemes show no noticeable impact on parameter estimation (Figure S9b). Lastly, balancing selection led to a slight overestimation of and , and a slight underestimation of , with the strongest effect at the deep divergence (Figure S9b), corresponding to cases with high false positive rates of 20–40% (Figure S9a).
Constraining the population size parameters (either and or using the same size for all populations) led to more severe biases in parameter estimates (and high false positives) for background and balancing selection (Figures S10b, S11b). Similar patterns were obtained when the data were analyzed under the UDI model (Figures S12b, S13b, S14b) and under the migration models (BDM and UDM, Figures S15b, S16b).
In summary, Bayesian parameter estimation is robust to selective sweeps. Strong background selection leads to underestimation of , and while balancing selection leads to overestimation of and but reduced , agreeing with known effects of these selective forces (reviewed in Cutter and Payseur 2013). In particular, even if the Bayesian test under the BDM and UDM models is robust to selection and correctly reject gene flow, background selection and balancing selection may cause biased estimates of the species divergence time and population sizes.
Performance of Bayesian test: number of sequences and sequence length
The results presented above (Fig. 2a) are based on data of 40 sequences (20 from each species). We explored the impact of the number of sequences in the data on the false positive rate for the BDI model (Fig. 1b). We find that using fewer sequences leads to reduced false positives at , but at high rates of ρ (0.025 and 0.25) the number of sequences has little impact (Fig. 3).
Impact of the total number of sampled sequences (with equal number from each species) on the false positive rate of Bayesian test of gene flow (Ji et al. 2023) under the BDI model (Fig. 1b). Data were simulated under neutral evolution using Ms and Seq-gen. The “linked theta” model enforces the constraints θX=θA and θY=θB (Fig. 1b) while “same theta” refers to the assumption that all populations share the same θ.
We also examined the impact of sequence length. Having shorter sequences with sites helped to reduce the false positive rate on the shallow tree with recombination (Fig. 4). However for medium and deep divergence, the test appears to become more sensitive to selection and produced even higher false positive rates than at under some selection schemes.
Bayes factors for testing gene flow when the sequence length is 100 bps. Other settings are the same as in Fig. 2 where n=500. See legend to Fig. 2.
Performance of Bayesian test: nonsister populations with recombination
The results discussed above concern gene flow between sister populations (Fig. 1). We hypothesize that the false positives for high recombination and shallow divergence reflect the inherent difficulty of detecting gene flow between recently divergent sister populations. We expect the Bayes factor to be more robust for tests of gene flow between nonsister populations. We tested this hypothesis by simulating data under the MSC model of no gene flow on the phylogeny of Fig. 5a, with different recombination rates and no selection. We considered two levels of divergence between A and B: shallow ( and ) and medium ( and ), with in both cases, and with . The data were analyzed under the UDI model with introgression of C into B (inflow; Fig. 5b).
a) Phylogenies for three species (and an outgroup) with no gene flow used to simulate data at different recombination rates with no selection. Parameter values used in the simulation are τT=14θ, τS=12θ, τR=5θ for shallow divergence (T=N); and τT=θ, τS=2θ, τR=5θ for medium divergence (T=4N), with θ=0.005. b) An introgression model with C→B (or x→y) introgression used to analyze the data to assess the false positive of the Bayesian test in presence of recombination. c) Bayes factors for testing the C→B introgression in data simulated under the model of panel a.
No false positives were observed at any divergence level or recombination rate (Fig. 5c). While there is variation among datasets at the shallow divergence ( ), the test is able to strongly reject gene flow, with , in most datasets, especially at the high divergence ( ).
The robustness of the Bayesian test of gene flow to excessive recombination (in particular when involving nonsisters) is not well-understood. It appears to be widely accepted that if the per-base recombination rate (r) and per-base mutation rate (μ) are similar, recombination should confound genealogy-based inferences that ignore intra-locus recombination (e.g. Nielsen et al. 2025). Several recombination rates used in our simulations are much higher than the mutation rate ( and 0.25 correspond to and 50). The perceived large impact of recombination on inference in genealogy-based models is thus contradicted by our simulation results and is a misconception.
In the case of one single population, Hein et al. (2005, pp. 148–150) discussed the impacts of recombination on gene genealogies and derived the probabilities that a recombination event may (i) have no effect, (ii) change the branch lengths (coalescent times) but not the gene tree topology, or (iii) change the tree topology. The analysis suggests that a large proportion of recombination events do not change the gene tree topology (e.g. with a sample of ; Hein et al. 2005, Table 5.1). The case appears to be similar under the MSC model, as examination of the simulated gene trees suggests that most recombination events cause little or no differences between the gene trees of neighboring segments at the same locus (Zhu and Yang 2021). One may also expect changes of “within-species portions” of the gene tree to have little impact on inference under the MSC. Consider the case of two species (A and B). If sequences from each species are reciprocally monophyletic on the gene tree, there is no need to invoke migration of sequences for the likelihood model to explain data at the locus, and minor changes to the subtrees within species may not change that interpretation and are unlikely to impact a test of gene flow. This interpretation appears to agree with our finding that the test of gene flow between sisters is far more robust to recombination at deep divergence than at shallow divergence. It may also explain why the test of gene flow between nonsisters is least affected by recombination. We note that our discussions here are speculative, and leave it to future research to investigate the impact of recombination on the distribution of gene-tree topologies and branch lengths under the MSC model and its impact on different inference problems (including detection of gene flow, inference of species trees and estimation of species split times, etc.).
Potential causes of false positives and biased parameter estimates
Here, we study features of the data and the fitted gene trees to understand how excessive recombination combined with shallow divergence can cause biased parameter estimation and false positives in the Bayesian test of gene flow between sister populations. We note that at the recombination rate of , there are about 70–80 recombination events on a gene tree for a 500-bp locus at shallow divergence, although recombination may not always change the gene tree topology or branch lengths (Hein et al. 2005, pp. 148–150; Zhu et al. 2022). Recombination can generate chimeric sequences at each locus by combining sequence fragments from multiple ancestors, resulting in an increased time for all sites to reach the common ancestor (Griffiths and Marjoram 1996, 1997). As the MSC model (either with or without gene flow) ignores recombination and assumes a single gene tree for all sites at the locus, one expects recombination to cause elevated coalescent waiting times or branch lengths in the gene tree.
We consider gene trees for a locus with four sequences (two from each species) inferred under the MSC model with no gene flow. First we confirm that parameter estimates at shallow divergence agree with our results from the MSC-I or MSC-M models (Figures S1–S3, S5–S8): , and become increasingly overestimated as the recombination rate increases (Fig. 6a). At deep divergence, we see the same effect on while and appear to be unaffected by recombination, because using fewer sampled sequences per species leads to less biased parameter estimates (Figure S18). With 20 sequences per species, and were overestimated at all three divergence levels (Figures S1–S3, S4–S8). We calculate the frequencies of four types of gene trees for a locus with four sequences (two from each species), and , depending on whether the two sampled sequences from each species coalesce within that species. For example means the two sequences from A do not coalesce in A and instead coalesce in the common ancestor (R) while the two B sequences coalesce in B (Fig. 1a). The gene tree here may be considered the best fit under the MSC model which assumes one tree for all sites at the locus. We also calculate four statistics of gene-tree branch lengths from the posterior samples under the MSC model, averaged across loci: the tree height (H), the tree length (L), the sum of terminal branch lengths (S) and the sum of the branch lengths around the root (B) (Fig. 6). Analytical expressions for the expected value of these statistics under the simple model of no recombination and no gene flow are in Appendix A.
a) Parameter estimates (posterior means and 95% HPD CIs) under the MSC model without introgression (Fig. 1a) using four sequences (two from each species) from 20 simulation replicates. Horizontal lines indicate the true parameter values. b) Probabilities of four types of gene trees classified according to whether the coalescent event of the two sequences from the same species occurs after species divergence. For example, g00 means both coalescent events occur in the ancestral population and g01 means the two sequences from A do not coalesce in A while the two sequences from B coalesce in B. Horizontal dashed lines indicated the expected values when there is no recombination (ρ=0). Analytical expressions of these expected values are in Appendix A. c) Four gene tree statistics, averaged across posterior samples and loci: H (height; time to the most recent common ancestor), L (length; sum of total branch lengths), S (sum of terminal branch lengths) and B (average length of the two branches below the root). Horizontal dashed lines indicate the expected values when there is no recombination.
As the recombination rate increases, the terminal branches become longer and the two branches around the root become shorter without much impact on the tree height and tree length (Fig. 6c), making the gene trees more star-like, apparently because the model is using a single gene tree to explain a mixtures of gene trees. These patterns are consistent with previous findings in a single-population setting (Schierup and Hein 2000). We suggest that those effects on tree shapes with elongated external branches may explain the positive biases in the estimates of , and in the Bpp analysis, with larger biases at higher recombination rates (e.g. Figure S1b). Such biases occur at all three divergence levels and under all models of gene flow (Figures S1–S3, S4–S8).
Note that when recombination is present and ignored, both the null hypothesis of no gene flow ( ) and the alternative hypothesis of gene flow ( ) are misspecified. To understand why fits the data better than (thus generating high false positives) at high recombination and shallow divergence, we calculated the means and variances of three pairwise distances ( and within species and between species) across loci predicted under and , in comparison with the observed values that are easily calculated using the sequence data. We are able to derive the expected values under the UDM, UDI and BDI models, but not under BDM (see Appendix B). Here, we focus on the UDM model as parameter estimation do not suffer from identifiability issues (unlike BDI and UDI) (Fig. 7). Results for the BDI model are presented in Figure S19, but note that parameters may be poorly estimated in settings where fits the data well.
a) Means and b) standard deviations (SDs) in the observed and predicted pairwise sequence distances at different recombination rates for neutral evolution. Observed distances (daa and dbb within species and dab between species) are calculated by using a pair of sequences per locus. Predicted distances, under the MSC model with no gene flow and the MSC-M model with unidirectional migration (UDM; Fig. 1e), are calculated using the posterior means of parameters obtained from the Bpp analyses of the data (Figure S8b); see Appendix B.
We calculated the observed means and variances of pairwise distances, and used parameter estimates from the Bpp analysis of data generated with intralocus recombination to calculate the expected values under the MSC and UDM models. Regarding the observed means and variances, note that recombination reduces the variance of pairwise sequence distances without altering the mean (Hudson 1983). The combination of high recombination and shallow divergence leads to the largest reduction in the variance of across loci. At shallow divergence ( ), at is 36.7% that at . At deeper divergences ( and ), the reduction in is 24.5% and 10.2%, respectively. Recombination has a greater effect at more recent divergence. Thus including migration ( ) in the UDM model brings the variance of closer to the observed values. Those comparisons suggest that the UDM model of gene flow fits the means and variances of pairwise sequence distances better than the MSC model ( ) at recent divergence ( ) and high recombination rate ( ), cases where high false positives were observed (Figure S8b).
Bayesian test of gene flow solves the small-data low divergence problem
In the case where there is no gene flow, no recombination and no selection, the false positive rate of the Bayesian test is ∼0 at all three species divergence levels when the test is conducted under any of the four models of gene flow (BDI, UDI, BDM, UDM; Fig. 1b–e) including the three variants on constraining θ under the BDI and UDI models (Figure S1–S7, or Figure S15, Neutral; Figure S8, or Figure S16, Neutral). At medium and deep divergences, use of the BDM and UDM models to conduct the test leads to strong rejection of the alternative hypothesis of gene flow. Shallow species divergence alone does not cause false positives for the Bayesian test of gene flow.
Note that when data are simulated under , the results for the Bayesian test are known for two extremes of the number of loci (L): when , and when . In small or uninformative datasets, the posterior of parameters (e.g. ) should be similar to the prior, and should be ∼ 1. When , posterior estimates of converge to the true values (0), and (as has fewer parameters than ), leading to strong rejection of gene flow. There exists no theory to expect to decrease monotonically from 1 toward 0 when L increases from 0 to . It is sufficient if large occurs rarely (i.e. if in no more than of datasets). Judged by this criterion, the Bayesian test is often found to have good Frequentist properties in various testing situations, with the false positive rate below the nominal significance value. This is the case in our simulation examining the Bayesian test of gene flow. Also the Bayesian test is capable of strongly rejecting the alternative hypothesis of gene flow.
It is thus surprising that Cruickshank and Hahn (2014, Fig. 4) found excessive false positives at very shallow divergence, with error rates of up to in small datasets of a few loci, when they used the Ima2 program (Hey 2010) to test for gene flow (see also Hey et al. 2015). Note that both Ima and Bpp implement the same MSC-M model (i.e. the isolation-with-migration or IM model), and those simulations are conducted under the MSC model of no gene flow, with no violation of assumptions (cf: Cruickshank and Hahn 2014, p. 3,147). Sethuraman et al. (2025) described this SDLD problem (Hey et al. 2015) as an issue of identifiability, but all parameters in the model are identifiable. Hey et al. (2015) suggested that the false positives may be caused by the use of the integrated likelihood in the LRT implemented in the pioneering work of Nielsen and Wakeley (2001) and Hey and Nielsen (2007).
Our analysis suggests that the high false positives with the LRT implemented in Ima reported by Cruickshank and Hahn (2014) and Hey et al. (2015) are due to inappropriate use of the integrated likelihood, the poor choice of priors, and the general difficulty of using a Bayesian high-dimensional MCMC program to conduct maximum likelihood (ML) analysis. To see the difference between the (traditional) LRT and the LRT* based on integrated likelihood, let the parameter vector for the alternative hypothesis be , where ω are the parameters of interest and λ are the nuisance parameters. In the case of the MSC-M model for two species, and (Fig. 1d). The null hypothesis is represented by the parameters of interest taking special values: . Given the parameters , sequence data at multiple loci are i.i.d., and constitute data points in the MSC model.
In so-called “stable-estimation” problems, where each data point depends on the whole parameter vector , and the number of data points far exceeds the number of parameters, the standard practice is to optimize nuisance parameters in calculation of the likelihood values under both the null and alternative hypotheses (Stuart et al. 1999, Chapters 21 & 22, in particular § 22.7). In other words, the likelihood ratio test statistic is defined as
where the likelihood values under both models, with , are calculated at their maximum likelihood estimates (MLEs). This is the approach taken by Zhu and Yang (2012) and Dalquen et al. (2017) in their LRT for migration under the MSC-M model (see also Lohse et al. 2011). The approach integrates over the gene tree topologies and branch lengths at the loci numerically to calculate the likelihood function, but parameters ( ) are optimized via a numerical optimization algorithm. The program (3s), however, has the limitation of being able to handle at most three sequences at each locus. Note that gene trees at multiple loci are random variables and always integrated out.
The Ima3 program (Hey 2010; Hey et al. 2015, 2018) implements the MSC-M or IM model in the Bayesian framework, and generates samples from the joint posterior . The authors chose to summarize while ignoring , thus estimating the marginal distribution , effectively integrating out the nuisance parameters (λ). Note that even if no explicit step of integration is taken, the sampled values of ω (with the sampled values of λ ignored) are a sample from the marginal distribution, . Then with uniform priors,
the estimated marginal posterior density is treated as the integrated likelihood
where is the traditional likelihood. Calculation of the likelihood value under the alternative hypothesis, , thus involves estimating the density and finding its optimizer . Similarly the likelihood under the null hypothesis is an integral over the nuisance parameters (λ),
The (integrated) likelihood ratio test (LRT*) is then based on the statistic
While some statisticians argue for the advantage of the integrated likelihood as an approach to dealing with nuisance parameters (Berger et al. 1999), the idea appears to be proposed mostly to deal with complex or irregular situations. For example, in all examples of Berger et al. (1999) for illustrating the usefulness of the integrated likelihood, there exist either as many nuisance parameters as data points, or only one sampling replicate from the probability distribution indexed by the nuisance parameter.
Besides the use of integrated likelihood, a few other implementation features of the Ima program may have contributed to the problem of excessive false positives in the LRT*. First, uniform priors on parameters ( , and ) used in Ima are highly informative but implausible. With the uniform prior, a large upper bound is often used to avoid excluding unlikely but not impossible values, so that the prior mean may be too large. The integrated likelihood is well-known to be sensitive to the upper bound in the uniform prior (see Fig. 6.6 in Yang 2014 for an example). In the case of improper priors, marginal likelihood values may even be infinite (Dawid et al. 1973). Thus uniform priors, once described as “uninformative,” are here misinformative. Second, the predefined binning and estimation of multidimensional densities using empirical histograms may not be reliable. The bins for parameters are determined before the MCMC by using the priors, and the program collects sampled parameter values into those bins. If the prior and the posterior are very different, the binning strategy may be inadequate so that all sampled values may be lumped into one bin, and the run is wasted. A simple fix may be to save the sampled values and summarize the sample after the MCMC. Third, estimating the multi-dimensional density and finding its maximum using the MCMC sample appears to be too challenging a task to be practical. Fourth, Ima defines parameters θ and τ using a per-locus mutation rate. This parametrization means that the priors assigned on θ and τ assumes higher per-base mutation rates for shorter loci, which does not appear plausible in analysis of genomic sequence data. This is an issue whether or not mutation rates are allowed to vary among loci. In programs such as G-phocs (Gronau et al. 2011) and Bpp, the per-base mutation rate is used, so that the prior assumes comparable per-base mutation rates among loci. Finally, there may be issues with the MCMC algorithm implemented in Ima3, an improved version of Ima2 (Flouri et al. 2023, Fig. 4).
In Appendix C, we use the one-sample test (which compares against under the normal model with the unknown variance being the nuisance parameter) and the two-sample test ( and with known variance) to illustrate the differences among the Bayesian test, the (traditional) LRT, and the integrated likelihood ratio test (LRT*) (Fig. 8). Note that conducted at the 5% significance level, the type-I error rate for the LRT is exactly 5%, but differs considerably from 5% for the LRT* and the Bayesian test. When the data size , the false positive rate approaches 5% for the LRT and LRT*, and 0 for the Bayesian test. In small datasets with poorly chosen priors, the LRT* can generate excessive false positives, while the Bayesian test in general has false positive rate below 5%.
False positive rate at the 5% level for three tests (Bayes factor, LRT, and LRT) applied to a) the one-sample test and b) the two-sample test of Appendix C, calculated by simulating 106 datasets under the null hypothesis. A false-positive error occurs if the LRT statistic (equations (C5), (C7), (C13) & (C17)) is >3.84, or if B10>19 in the Bayesian test (equations (C4) & (C12)). For the one-sample test, we used k=10, and α=10,β=0.5 in the prior, while for the two-sample test, we used μ=1 for the data, and τ=10 in the prior.*
In summary the high false positives at shallow divergence for the test using Ima discussed by Cruickshank and Hahn (2014) and Hey et al. (2015) do not apply to the Bayesian test (Flouri et al. 2023; Ji et al. 2023) or the (traditional) LRT (Zhu and Yang 2012; Dalquen et al. 2017). They are a consequence of the use of the integrated likelihood and misinformative uniform priors in the Ima program (Nielsen and Wakeley 2001; Hey and Nielsen 2007). The so-called small data-low divergence (SDLD) problem (Hey et al. 2015) can be solved by using a Bayesian test (Ji et al. 2023) and more plausible priors.
Impact of parameter values used in simulation and the procedure of Frequentist simulation
The performance of a hypothesis test such as the Bayesian test of gene flow (Ji et al. 2023) may depend on the values of parameters. For example, the species split time appears to have a large impact on the false positive rate and power of the test of gene flow between sister populations with recombination (Yang et al. 2025 and this study). Here, we discuss the biological realism of parameter values used in this study (see Smith and Hahn 2024).
The population size parameter may vary among species by orders of magnitude, with to be a relatively small value (typical of humans, Patterson et al. 2012) and 0.01 to be a large value (typical of insects such as mosquitoes and butterflies, Thawornwattana et al. 2018, 2023a). The value used here ( ) lies between those extremes. Similarly the divergence time for a pair of species may vary hugely among biological systems. Note that the average coalescent waiting time for two sequences sampled from the same population of size N is . The shallow divergence used ( ), with the species split time to be half the average coalescent time, represents very recent divergence. The population recombination rate depends on the effective population size N and the recombination rate per base per generation, both varying considerably among species. The low rate ( ) is typical for humans. The high rate ( ) may be realistic for some social insects with very large population sizes and high recombination rates (DeLory et al. 2024) but is too large for mammals (see Zhu et al. 2022; Yan et al. 2023 and references therein). The selection coefficient is expected to vary hugely among species and across the genome. Background selection simulated in this study, at (Smith and Hahn 2024), appears to be very strong as a new mutant with is effectively “dead on arrival.” This was chosen based on estimated selection coefficients for nonsynonymous substitutions in Drosophila (Huber et al. 2017) and may not be representative of organisms with much smaller population sizes or of background selection due to loose linkage to other functional elements in the genome (Charlesworth and Jensen 2021).
It should be pointed out that Smith and Hahn (2024) did not conduct the Frequentist simulation correctly. In their simulation under models of introgression (which we did not address as all our simulations were conducted under the null model of no gene flow), the authors generated loci and replicate datasets using random values of introgression time and introgression probability ( and of Fig. 1b). Here, both and are parameters in the model and should be fixed when replicate datasets are generated. Frequentist properties of a hypothesis test (such as the false positive rate and power) or of an estimation method (such as the bias and variance) are defined with the values of parameters in the model fixed. For example, the false positive rate is the probability of rejecting the null hypothesis in replicate datasets, all of which are generated under the null model with the same set of parameter values, even though a respectful test should have the false positive rate under control whatever the parameter values are. Second, when simulating selective sweeps and background selection, Smith and Hahn (2024) sampled the middle 500 base pairs in each 10kb region, with the site under selection to be in the center of the region. This way the selected site is always included in the 500-bp sampled segment. When multilocus sequence data are compiled for analysis under the MSC model, we expect short segments to be sampled at random from the genome, and may not include the selected site. For example, a noncoding segment sampled at random from the genome may not necessarily be close to an exon. Unfortunately both sampling parameter values from a distribution for each data replicate and fixing random variables at particular values for all replicates are very common mistakes in simulations in molecular phylogenomics. Finally, Smith and Hahn (2024) conducted the test by examining whether the 95% HPD CIs for introgression probabilities include the null value 0. This approach may be very sensitive to the prior on parameters. If the dataset is uninformative, the posterior 95% CI will be very close to the prior CI, which excludes 0, so that the test will have a false positive rate of when the data are uninformative. The standard approach is to use the Bayes factor (Ji et al. 2023 ).
Nevertheless, we note that incorrect simulation and incorrect application of the Bayesian test are not the main causes for the high false positives for the Bayesian test of gene flow between sister species reported by Smith and Hahn (2024). In our simulations we find similar high false positives for the test of gene flow between sister populations in the setting of shallow divergence and high recombination.
Conclusions
In this study, we have examined a number of factors that may influence the Bayesian test of gene flow, including the size of the dataset, the assumptions about the population sizes on the phylogeny, the assumed mode of gene flow (MSC-I versus MSC-M), the direction of gene flow, as well as intralocus recombination and natural selection. Below we provide a summary of our major findings.
Recombination causes false positives in the Bayesian test of gene flow between sister species only when species divergence is shallow (with divergence time shorter than the expected coalescent time) and the population recombination rate (ρ) is very high. Otherwise, the test is robust (Fig. 2a, Figures S1a–S8a). Ignoring recombination inflates contemporary population-size and divergence-time estimates while reducing ancestral population sizes estimates and can mimic weak introgression (Figures S1b–S8b).Bayesian test of gene flow between nonsister species has very low false positives even in presence of excessive recombination (Fig. 5c).Sensitivity to high recombination for the test of gene flow between sisters at shallow divergence may be ameliorated by using a smaller number of sequences for each species (Fig. 3 & Figure S18) or by having shorter sequences at each locus (Fig. 4). One should however note that larger datasets with more samples per species and with longer sequences improve the precision of parameter estimation. It may be useful to examine whether parts of the genome under different influence from recombination, such as coding and noncoding parts of the genome, provide consistent signals of gene flow (Thawornwattana et al. 2022, 2023b). If possible, including an additional population that transforms a sister into a nonsister population to make gene flow become between nonsister populations will be effective at reducing false positives in gene flow tests.Natural selection at reasonable strengths have minimal effects on the test of gene flow between sister populations, except for prolonged balancing or background selection at deep divergence under the discrete introgression models (Fig. 2b, Figures S9–S16), possibly because persistent ancestral polymorphism might be misinterpreted by the Bayesian test as signals of gene flow. These results contradict the claim of Smith and Hahn (2024), that selection causes false positives in tests for gene flow between sister species. Instead, the high false positive rates they reported are primarily caused by high recombination rates combined with very shallow divergence.High recombination makes gene trees appear more star-like (Fig. 6) and reduces variance among loci (Fig. 7 & Figure S19), biasing parameter estimates and producing false signals of gene flow, especially at shallow divergence.Without recombination or selection, shallow divergence alone does not lead to false positives for the Bayesian test of gene flow between sister species (Figs. 2–4, Figure S1–S8, ). Excessive false positives of the LRT using Ima3 in such scenarios reported by Cruickshank and Hahn (2014) and Hey et al. (2015) are due to use of the integrated likelihood and misinformative uniform priors in the likelihood ratio test (Fig. 8 and Appendix C).The other factors examined in our simulation, such as different constraints among population-size parameters on the phylogeny, the assumed mode of gene flow (the discrete MSC-I versus continuous MSC-M models), the direction of gene flow (unidirectional versus bidirectional models) have less important impacts on the false positives of the Bayesian test of gene flow. The different model specifications may be useful when one examines the robustness of the test in analysis of empirical data.
Overall, we find that in most scenarios examined the Bayesian test has very low false-positive rates. The test is particularly robust to selection and recombination when conducted under the unidirectional continuous migration model (UDM), or if gene flow being tested involves nonsister lineages on the phylogeny.
This study has focused on false positive rates of the Bayesian test under the null model of no gene flow, but a number of recent studies have also examined the power of the test to detect introgression/migration, by simulating data under alternative models that include gene flow of various forms. Huang et al. (2020) examined the information content in multilocus datasets for inference under the MSC model either with or without gene flow, and found that among many factors that affect the information content in the data (such as the number of loci, the number of sequences sampled per species, the sequence length, and the mutation rate), the number of loci is often the most important, although to infer gene flow the use of multiple sequences per species, in particular, from the hybridizing species, is important. Thawornwattana et al. (2023a) examined the inference of gene flow with the direction misspecified (e.g. with introgression assumed when introgression is in fact from ) and found that the use of the bidirectional model can lead to reliable inference of the direction of gene flow. Huang et al. (2022) and Thawornwattana et al. (2025) examined the inference of gene flow when the mode of gene flow is misspecified (e.g. when the assumed model is the discrete MSC-I model when in fact gene flow is continuous as specified under the MSC-M model), finding that reliable inference of gene flow and estimation of parameters such as species split times is often possible despite the misspecified mode. Ji et al. (2025) assessed the impact of genotyping errors at low read depths and found that inference of gene flow is reliable at read depths of 10x or higher using today’s sequencing technologies. These simulations have improved our understanding of the statistical properties and behaviors of full-likelihood methods for inferring gene flow under a variety of conditions. Notably recent advancements in MCMC algorithms (Flouri et al. 2020, 2023) have made it possible to apply the MSC models with gene flow to datasets of ∼10,000 loci (albeit on a small phylogeny with no more than a dozen species) which are large and informative enough to allow meaningful inference of gene flow (e.g. Flouri et al. 2023; Thawornwattana et al. 2023a). We conclude that full likelihood methods for testing gene flow and estimating its rate as implemented in Bpp can reliably be applied to infer gene flow both between sister populations and between nonsister populations using genomic data.
Materials and methods
Simulation of multilocus data under the MSC with recombination or selection
In the first set of simulations, we used different recombination rates but no selection (and no gene flow): , 0.00025, 0.0025, 0.025, and 0.25. The highest rate may be plausible for some species of social insects (e.g. DeLory et al. 2024). Note that Smith and Hahn (2024) used a single high recombination rate of .
We simulated multilocus sequence data under a two-species model with no gene flow (Fig. 1a). The model has three populations (present-day populations and their common ancestor R) and two parameters: a species divergence time (τ) and a population size parameter (θ) assumed to be identical for all three populations. Here, and , where T is the divergence time in generations, μ is the neutral mutation rate per site per generation and N is the effective population size. Both parameters are in units of the expected number of mutations per site. We fixed and used , θ and (or , and generations) to represent shallow, medium and deep divergence, respectively. The sequence data consist of 500 independent loci. Each locus contains 40 sequences (20 from each species), with the sequence length to be 500 sites. There were 20 replicates for each parameter setting. In total 300 datasets were simulated (for 5 recombination rates, 3 divergence times, and 20 replicates).
For some settings both forward simulation using Slim v4.0.1 (Haller and Messer 2023) and backward coalescent simulation using Ms (Hudson 2002) and Seq-gen (Rambaut and Grassly 1997) were used and noted to produce nearly identical results. Then Ms was used as it was much faster than forward simulation by Slim.
In the second set of simulations, we used Slim to perform forward simulation under five selection schemes but no recombination (and no gene flow): neutral evolution, background selection, selective sweep in A, selective sweep in R and balancing selection. We followed the simulation procedure of Smith and Hahn (2024) with two small modifications (see below). The simulated history at the selected site was recorded as a tree sequence for sampled individuals. Neutral mutations at linked sites at the locus were subsequently added using Pyslim v1.0.3 (Haller et al. 2019) and Tskit v0.5.5 (Kelleher et al. 2018). The length of the region in Slim simulation was 10 kilobases (kb) but only the middle 500 positions were used in analysis (which may and may not include the site under selection). We followed Smith and Hahn (2024) to rescale the population size by 100 to reduce the computational load of forward simulation, while rescaling mutation rate (μ), recombination rate (r), and selection coefficient (s) so that population parameters , Nr, and Ns remain unchanged. Similarly, species divergence times and migration times were scaled down by a factor of 100. For background selection, 75% of the new mutations are deleterious with a selection coefficient (s) drawn from a gamma distribution with mean and shape parameter , and a dominance coefficient . We used individuals in the simulation (i.e. before the scaling), which gives . For selective sweeps, a beneficial mutation has a selection coefficient (s) drawn uniformly from the interval and a dominance coefficient . The simulation was discarded if the beneficial mutation was lost. For balancing selection (i.e. frequency-dependent selection), a mutation is beneficial when it is rare and deleterious when it is common. Let p be the frequency of allele A. Then the fitness values of the three genotypes , and aa were , , and 1, respectively. When , the fitness values were 0.5, 0.75, and 1.
We modified the simulation procedure of Smith and Hahn (2024) (with scripts available at https://github.com/meganlsmith/selectionandmigration) as follows. First, in the two selective-sweep schemes, we sampled a site under selection at random along the sequence rather than fixing it to be in the middle of the sequence. Second, under balancing selection (frequency-dependent selection), we assume no dominance ( ) whereas Smith and Hahn (2024) assumed complete dominance. For the two selective-sweep schemes and balancing selection, we fixed the proportion of loci under selection to 10% while background selection affected all loci. In total 240 datasets were simulated (for 4 selection schemes excluding neutral evolution, 3 divergence times, and 20 replicates).
As variations to our standard simulation, we simulated data under neutral evolution at the five different recombination rates, but the number of sampled sequences per locus is and 40 (with an equal number per species). There were 300 datasets in total (5 sample sizes, 3 divergence times, 20 replicates). We also included a set of simulations with the sequence length of 100 sites (cf: 500 sites in the standard simulation).
MCMC analysis of simulated data using Bpp
The simulated datasets were analyzed under both MSC-I and MSC-M models with either bidirectional or unidirectional gene flow (Fig. 1b–e), to calculate the Bayes factor to conduct the Bayesian test of gene flow (Ji et al. 2023). The bidirectional introgression (BDI, Fig. 1b) model involves nine parameters: five population sizes ( , and ), one species split time ( ), one introgression time ( ) and two introgression probabilities ( for introgression and for introgression). We also analyzed the data using two variants of this BDI model, assuming either and or a single θ for all populations. Additionally, we analyzed the data using a unidirectional model (UDI, Fig. 1c), with introgression from A to B forward in time, using the same sets of constraints on θs. Finally, we analyzed the data under the continuous migration model (MSC-M), assuming either bidirectional or unidirectional migration (BDM and UDM, Fig. 1d–e). The BDM model has six parameters: three population sizes ( , and ), one species split time ( ) and two migration rates ( for migration and for migration) while the unidirectional model has five parameters (no ). Here, is the mutation-scaled migration rate (i.e. when one time unit is the expected amount of time needed to accumulate one mutation per site), where is the proportion of immigrants in the recipient population B from A per generation (with time running forward) and is the expected number of migrants per generation.
We assigned gamma priors on θ and with prior means matching the true values: for all populations (mean 0.005), when the true divergence time is (mean 0.00125), when (mean 0.005), and when (mean 0.02). For the introgression model (Fig. 1b&c), both introgression probabilities ( and ) were assigned the uniform prior . The age of the root and the introgression time ( ) were assigned a uniform-Dirichlet prior (Yang and Rannala 2010, eq. 2), with . For the migration model (Fig. 1d&e), both migration rates ( ) were assigned the gamma prior with mean 2.
We used Bpp v4.8.2 (Flouri et al. 2020, 2023) to perform posterior inference. For each dataset, we ran two chains of MCMC, each with initial 64,000 iterations of burn-in followed by iterations of the main chain, recording samples every 100 iterations. Convergence was assessed by comparing posterior estimates from the two runs. Samples from convergent runs were combined to produce final posterior summaries and to calculate the Bayes factor. Each MCMC run took about 70 hours for the MSC-I model (Fig. 1b&c) and about 120 hours for the MSC-M model (Fig. 1d&e).
Unlike Smith and Hahn (2024), we did not use the locusrate and heredity options in Bpp (which allow for variation in the rate and ploidy among loci), to avoid computational issues from fitting too many parameters in the model when data were from only two species (Fig. 1a). We reanalyzed all datasets generated by Smith and Hahn (2024) (assuming the high recombination rate of with different selection schemes) under the BDI model using the prior and MCMC setting as above to confirm that these two options did not impact their results. There were 660 datasets in total (5 selection schemes, 3 divergence times, 20 replicates, with three of the five selection schemes having three proportions of loci under selection).
Calculation of the Bayes factor for testing gene flow under four different MSC models of gene flow
A false positive occurs when the Bayes factor in support of gene flow so that the null hypothesis of no gene flow ( ) is strongly rejected. The false positive rate is thus estimated by the proportion of simulated replicate datasets in which . We calculated for each simulated dataset using the Savage–Dickey density ratio (Dickey 1971; Ji et al. 2023).
We note that the test comparing the null hypothesis of no introgression ( , Fig. 1a) against the alternative hypothesis of BDI ( , Fig. 1b) has a number of nonstandard features; see Yang et al. (2025) for details. Here, we provide an overview. In regular cases, should correspond to one point (called the ‘null point’) in the parameter space for . However, here in the test of MSC with no gene flow (Fig. 1a) against the BDI model (Fig. 1b), the null point is not one point but instead consists of five lines or planes in the parameter space of (Table 1(i)). For example, while the line specified by and with introgression time being indeterminate represents the biological scenario of no gene flow, the line with indeterminate also represents no gene flow. We define a “null region” ( ) around the null point in the parameter space of , within which introgression is negligible, using two small cutoff values (for ) and (for ),
(Yang et al. 2025). is then given by the Savage-Dickey density ratio, approximated by
that is, the ratio of the prior and posterior probabilities that the parameters are in the null region. We used and 0.01, to confirm that the result is not sensitive to . As we used the uniform prior on and β, the prior probability is simply , while the posterior probability was calculated as the proportion of the posterior MCMC samples in which equation (3) holds.
For the UDI model (Fig. 1c, Table 1(ii)), the null region is defined as
Then , with and with estimated from the posterior MCMC samples.
We also conducted the Bayesian test of gene flow under the continuous migration models (BDM and UDM, Fig. 1d&e) (Ji et al. 2023). In case of BDM, the hypotheses are and and and . We then define the null region as
In the case of UDM model (Fig. 1e), the null region is
We used , and 0.1486, corresponding to the tail probabilities , 0.005 and 0.01, where is the cumulative distribution function (CDF) of the gamma prior . The prior probability is then for the BDM model and α for the UDM model (Fig. 1d&e). For example, for the test under BDM, , with estimated by the proportion of MCMC samples in which and .
We note that Smith and Hahn (2024) calculated the false positive rate for Bpp using the 95% HPD CI, with a false positive if the CIs of or did not include 0. It may be possible to generate useful test results using the HPD CI, but care is needed concerning the prior on (Ji et al. 2023). The prior has the 95% CI 0.025–0.975 so that the test based on the HPD CI has a false positive rate of 100% when the data are uninformative.
Simulation to test gene flow between nonsister species
We simulated data under the MSC model using a phylogeny for three species (with an outgroup) (Fig. 5a), using different recombination rates but assuming no gene flow. Each dataset consists of 500 independent loci, with 31 sequences per locus (10 for each ingroup species and 1 for the outgroup), and with the sequence length to be 500 sites. The number of replicates was 20 at each recombination rate. Other settings are the same as in the case of two species (Fig. 2a).
Each dataset was analyzed under the model of Fig. 5b with (or ) introgression to test for gene flow. The null hypothesis of no introgression (Fig. 5a) is represented by the parameter of interest taking the null value in the alternative hypothesis (Fig. 5b). While the null value is at the boundary of the parameter space for , and furthermore when , the introgression time is unidentifiable. Those irregularities do not pose a problem to the use of the Savage-Dickey density ratio to compute the Bayes factor (Ji et al. 2023). We define the null region as . Then , with estimated from the posterior MCMC sample.
Supplementary Material
msaf327_Supplementary_Data
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Adams RH, Schield DR, Card DC, Castoe TA. Assessing the impacts of positive selection on coalescent-based species tree estimation and species delimitation. Syst Biol. 2018:67:1076–1090. 10.1093/sysbio/syy 034.29757422 · doi ↗ · pubmed ↗
- 2Berger J, Liseo B, Wolpert R. Integrated likelihood methods for eliminating nuisance parameters. Stat Sci. 1999:14:1–28. 10.1214/ss/1009211803. · doi ↗
- 3Blischak PD, Chifman J, Wolfe AD, Kubatko LS. Hy De: a Python package for genome-scale hybridization detection. Syst Biol. 2018:67:821–829. 10.1093/sysbio/syy 023.29562307 PMC 6454532 · doi ↗ · pubmed ↗
- 4Charlesworth B, Jensen JD. Effects of selection at linked sites on patterns of genetic variability. Annu Rev Ecol Evol Syst. 2021:52:177–197. 10.1146/ecolsys.2021.52.issue-1.37089401 PMC 10120885 · doi ↗ · pubmed ↗
- 5Charlesworth B, Morgan MT, Charlesworth D. The effect of deleterious mutations on neutral molecular variation. Genetics. 1993:134:1289–1303. 10.1093/genetics/134.4.1289.8375663 PMC 1205596 · doi ↗ · pubmed ↗
- 6Cruickshank TE, Hahn MW. Reanalysis suggests that genomic islands of speciation are due to reduced diversity, not reduced gene flow. Mol Ecol. 2014:23:3133–3157. 10.1111/mec.2014.23.issue-13.24845075 · doi ↗ · pubmed ↗
- 7Cutter AD, Payseur BA. Genomic signatures of selection at linked sites: unifying the disparity among species. Nat Rev Genet. 2013:14:262–274. 10.1038/nrg 3425.23478346 PMC 4066956 · doi ↗ · pubmed ↗
- 8Dalquen D, Zhu T, Yang Z. Maximum likelihood implementation of an isolation-with-migration model for three species. Syst Biol. 2017:66:379–398. 10.1093/sysbio/syw 063.27486180 · doi ↗ · pubmed ↗
