Viral Simulation Reveals Overestimation Bias in Within-Host Phylodynamic Migration Rate Estimates Under Selection

Nicolas Ochsner; Judith Bouman; Timothy Vaughan; Tanja Stadler; Sebastian Bonhoeffer; Roland Regoes

PMC · DOI:10.1093/molbev/msag014·February 2, 2026

Viral Simulation Reveals Overestimation Bias in Within-Host Phylodynamic Migration Rate Estimates Under Selection

Nicolas Ochsner, Judith Bouman, Timothy Vaughan, Tanja Stadler, Sebastian Bonhoeffer, Roland Regoes

PDF

Open Access

TL;DR

This study shows that selection pressures can cause overestimation of migration rates in viral phylodynamic analyses within a host.

Contribution

A novel agent-based simulation tool, virolution, was developed to assess the impact of selection on migration rate estimates.

Findings

01

Migration rates are overestimated under selection using BEAST2 models.

02

Phylogeographic methods assuming neutrality are biased in selective regimes.

03

Selection regimes significantly affect phylodynamic migration rate estimates.

Abstract

Phylodynamic methods are widely used to infer the population dynamics of viruses between and within hosts. For HIV-1, these methods have been used to estimate migration rates between different anatomical compartments within a host. These methods typically assume that the genomic regions used for reconstruction are evolving without selective pressure, even though other parts of the viral genome are known to experience strong selection. In this study, we investigate how selection affects phylodynamic migration rate estimates. To this end, we developed a novel agent-based simulation tool, virolution, to simulate the evolution of virus within two anatomical compartments of a host. Using this tool, we generated viral sequences and genealogies assuming both, neutral evolution and selection governed by an empirically-supported distribution of fitness effects that is concordant in both…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species1

Human immunodeficiency virus 1(no rank)

Figures3

Click any figure to enlarge with its caption.

Schematic representation of the simulation. (a) shows the simulation of a single generation of viral evolution. The simulated viruses mutate, replicates (based on the reproductive fitness), and is sub-sampled to a maximum population size. (b) evolution and migration in the two compartments (blue and orange). The viral population (white circles) is simulated independently in both compartments as in (a) before migration occurs. (See Methods for more details of the simulation.)

Selection Bias in Phylogenetic Estimates of HIV-1 Migration Rates Using Relaxed Clock Models. Average mean migration rate estimate with DTA and MASCOT in BEAST2. For each migration rate, 50 simulations were conducted, maintaining a constant population of 100 virions in each compartment, with a mutation rate of 2.16×10−5 mutations bp−1 generation−1, and spanning a total of 1,000 generations. All sequences were sampled and analyzed in BEAST2 with a relaxed clock model. The left figure illustrates the inference while sampling trees, while the right figure displays the migration rate estimates based on the true–simulated–genealogy. The lighter colored bands illustrate the 95% confidence interval.

Increased Overestimation of HIV-1 Migration Rates with Increased Mutation Rates with Relaxed Clock. Average mean migration rate estimate with DTA and MASCOT in BEAST2. For each migration rate, 50 simulations were conducted, maintaining a constant population of 100 virions in each compartment, with a mutation rate of 2.16×10−4 mutations bp−1 generation−1, and spanning a total of 1,000 generations. All sequences were sampled and analyzed in BEAST2 with a relaxed clock model. The left figure illustrates the inference while sampling trees, while the right figure displays the migration rate estimates based on the true–simulated–genealogy. The lighter colored bands illustrate the 95% confidence interval.

Tables1

Table 1. Probability of Overestimation of Migration Rates with Selection using the Relaxed Clock Model.

				migration rate	0e+00	1e-03	2e-03	3e-03	4e-03	5e-03	6e-03	7e-03	8e-03	9e-03	1e-02
method	tree	population size	n samples	mutation rate bp⁻¹ generation⁻¹
DTA	reconstructed	100	200	2.16e-5	0.69	0.67	0.64	0.62	0.67	0.62	0.64	0.62	0.63	0.59	0.66
			200	2.16e-4	0.71	0.76	0.71	0.74	0.75	0.70	0.73	0.71	0.65	0.72	0.67
			100	2.16e-5	0.70	0.67	0.65	0.65	0.66	0.62	0.62	0.60	0.61	0.61	0.66
			100	2.16e-4	0.71	0.76	0.74	0.73	0.74	0.70	0.73	0.71	0.66	0.69	0.63
			50	2.16e-5	0.72	0.68	0.67	0.67	0.69	0.65	0.66	0.61	0.60	0.59	0.65
			50	2.16e-4	0.70	0.77	0.71	0.73	0.72	0.67	0.70	0.66	0.63	0.61	0.63
		1,000	200	2.16e-5	0.76	0.75	0.75	0.80	0.75	0.77	0.80	0.81	0.74	0.76	0.72
		1,000	200	2.16e-4	0.87	0.82	0.85	0.86	0.82	0.89	0.86	0.82	0.83	0.84	0.83
	true	100	200	2.16e-5	0.53	0.57	0.56	0.50	0.54	0.49	0.54	0.49	0.56	0.46	0.52
			200	2.16e-4	0.59	0.65	0.58	0.60	0.58	0.56	0.56	0.55	0.48	0.54	0.50
			100	2.16e-5	0.49	0.54	0.52	0.44	0.57	0.48	0.46	0.43	0.44	0.38	0.44
			100	2.16e-4	0.56	0.69	0.60	0.55	0.59	0.61	0.65	0.59	0.54	0.60	0.52
			50	2.16e-5	0.54	0.51	0.50	0.44	0.58	0.53	0.56	0.56	0.46	0.42	0.48
			50	2.16e-4	0.49	0.70	0.59	0.65	0.63	0.58	0.64	0.58	0.54	0.58	0.53
		1,000	200	2.16e-5	0.51	0.55	0.51	0.53	0.56	0.48	0.57	0.45	0.46	0.41	0.43
		1,000	200	2.16e-4	0.63	0.66	0.66	0.68	0.63	0.75	0.66	0.59	0.65	0.57	0.58
MASCOT	reconstructed	100	200	2.16e-5	0.74	0.71	0.67	0.64	0.70	0.65	0.65	0.64	0.65	0.60	0.68
			200	2.16e-4	0.74	0.77	0.73	0.75	0.76	0.71	0.74	0.71	0.67	0.74	0.68
			100	2.16e-5	0.77	0.74	0.67	0.66	0.72	0.61	0.66	0.65	0.67	0.64	0.69
			100	2.16e-4	0.75	0.79	0.77	0.74	0.76	0.72	0.73	0.71	0.68	0.70	0.66
			50	2.16e-5	0.80	0.75	0.74	0.67	0.73	0.68	0.67	0.69	0.64	0.62	0.73
			50	2.16e-4	0.77	0.81	0.76	0.73	0.77	0.70	0.70	0.74	0.66	0.72	0.62
		1,000	200	2.16e-5	0.76	0.75	0.76	0.81	0.76	0.78	0.81	0.82	0.75	0.77	0.73
		1,000	200	2.16e-4	0.87	0.82	0.85	0.86	0.82	0.89	0.87	0.83	0.83	0.85	0.84
	true	100	200	2.16e-5	0.74	0.70	0.67	0.58	0.63	0.59	0.62	0.58	0.65	0.53	0.62
			200	2.16e-4	0.74	0.77	0.74	0.74	0.75	0.71	0.74	0.71	0.67	0.75	0.69
			100	2.16e-5	0.78	0.70	0.66	0.60	0.59	0.51	0.55	0.54	0.61	0.54	0.61
			100	2.16e-4	0.74	0.73	0.69	0.67	0.69	0.64	0.66	0.58	0.67	0.69	0.60
			50	2.16e-5	0.80	0.69	0.67	0.58	0.61	0.59	0.52	0.55	0.58	0.53	0.53
			50	2.16e-4	0.77	0.75	0.70	0.65	0.66	0.65	0.62	0.66	0.67	0.59	0.60
		1,000	200	2.16e-5	0.72	0.61	0.67	0.65	0.67	0.67	0.66	0.59	0.54	0.51	0.37
		1,000	200	2.16e-4	0.83	0.79	0.78	0.83	0.72	0.86	0.79	0.71	0.69	0.70	0.66

Equations3

Funding1

—Swiss National Science Foundation10.13039/501100001711

Keywords

PhylodynamicsSimulationSelectionMigrationVirus Evolution

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHIV Research and Treatment · Evolution and Genetic Dynamics · Genomics and Phylogenetic Studies

Full text

Introduction

Phylodynamic methods have transformed pathogen research by enabling the estimation of population dynamic parameters from genetic data alone. These methods leverage the principle that phylogenetic tree topology is shaped by the underlying population dynamics (Grenfell et al. 2004; Volz et al. 2013). Phylodynamic approaches simultaneously estimate the population dynamical parameters and the phylogeny from viral sequences. The parameters commonly inferred include population sizes and their change over time (Strimmer and Pybus 2001; Ewing et al. 2004; Volz et al. 2009; Stadler et al. 2013; Popinga et al. 2015; Müller et al. 2025), the basic reproduction number, $[eqn]$ , characterizing an outbreak (Fraser et al. 2009; Rasmussen et al. 2011; Stadler et al. 2014; Vaughan et al. 2024), and the origin of an epidemic as well as the transmission rates between geographic locations (Lemey et al. 2009, 2010; Faria et al. 2014; Vaughan et al. 2014; Müller et al. 2018). In addition to their application to epidemics, phylodynamic methods have also been used to quantify the within-host dynamics of virus, such as HIV-1 (Lorenzo-Redondo et al. 2016).

However, despite their widespread application, the reliability of phylodynamic inference under realistically simulated scenarios has rarely been assessed (see Ratmann et al. 2017 for a noteworthy exception).

One application of phylodynamic methods focuses on the migration of pathogens between different subpopulations (Suchard and Rambaut 2009). For example, Lemey et al. (2009) used phylodynamic inference to study the spatial dispersion patterns of Avian influenza A-H5N1 in Eurasia (Lemey et al. 2009).

These phylogeographic approaches can also be applied to investigate the spread and differentiation of viruses between the anatomical compartments within a single infected host (Poss et al. 1998; Salemi and Rife 2016; Bons and Regoes 2018). For instance, the brain constitutes a sanctuary site, where the blood-brain barrier limits the exchange of viruses between anatomical compartments (Nickle et al. 2003; Zárate et al. 2007; Salemi and Rife 2016).

Beyond the brain, other tissues such as the genital tract (Poss et al. 1998), the spleen, the intestinal mucosa, CD4 $[eqn]$ T cells and follicular dendritic cells (Salemi and Rife 2016) have been identified as compartments between which exchange of viruses can be limited. The limited exchange between compartments can lead to genetic differentiation between the viral populations because of local adaptation or genetic drift. To study the exchange/migration of viruses between different anatomical compartments a new field of within-host phylogeographic analysis has emerged, applying phylodynamic methods (Ewing et al. 2004; Cybis et al. 2013; Chaillon et al. 2014; Lorenzo-Redondo et al. 2016). This new field is sometimes referred to as phyloanatomy (Shirreff et al. 2013; Lorenzo-Redondo et al. 2016; Salemi and Rife 2016; Bons and Regoes 2018).

To analyze within-host migration dynamics, phylodynamic methods rely on population genetic models that allow for subpopulations at various locations with migration between them. A variety of such models have been developed, including discrete traits with character mappings (Lemey et al. 2009), isolation-with-migration models (Hey 2010), structured coalescent models (Kingman 1982; Vaughan et al. 2014; Müller et al. 2017), and multi-type birth–death models (Kühnert et al. 2016). While each of these models has its own strengths and weaknesses, all of them impose assumptions on the underlying processes that shape the phylogeny. With few exceptions (e.g. Rasmussen and Stadler 2019), these models assume that both the population dynamic process and the evolutionary process are independent and neutral (unless specifically accounted for). In other words, the accumulation of genomic differences only depends on the time since divergence and the mutation rate.

The inference of these models is typically performed using Bayesian inference. A popular software framework that implements Bayesian inference for phylodynamic methods is BEAST2 (Bouckaert et al. 2019). This framework samples many possible phylogenetic histories and simultaneously estimates the posterior probability of tree and model parameters. While this approach allows the estimation of complex models on limited data, it also scales poorly with the number of sequences. Alternative approaches based on maximum likelihood estimation (MLE) exist, but rely on effective heuristics to maximize the likelihood of a timed tree (Sagulenko et al. 2018; Prillo et al. 2023), which can then be used to estimate the parameters of a population dynamic model. While these methods are generally faster than Bayesian inference, they may also miss important features of the underlying process.

In contrast to the assumptions, the dynamics of many viruses are subject to strong selection. For instance, in SIV infections Cytotoxic T cells (CTL) have been shown to exert very strong selection pressures (Fernandez et al. 2005; Mandl et al. 2007). HIV in humans is under weaker, but still strong, selection pressure from CTL (Asquith et al. 2006). Additionally, there is evidence for purifying and positive selection (Edwards et al. 2006; Zanini et al. 2017). This is further supported by evidence for parallel evolution in HIV-infected individuals (Keele et al. 2008; Bertels et al. 2021) and even in cell culture (Bertels et al. 2019) in the absence of the immune system. The in-vivo HIV sequence data of Keele et al. (2008) and its patterns of parallel evolution (Bertels et al. 2021) have been used to infer the mutational fitness effect distribution (MFED) of the env gene. Specifically, they found that 4.5 TextHighlightpercent of mutations are beneficial (Bons et al. 2018). Thus, HIV dynamics is subject to strong purifying and diversifying selection.

Even though non-neutral evolution is common for HIV-1 and other pathogens, it is not accounted for in phylodynamic methods that are applied to within-host dynamics (Roje 2014). Some studies do mention that selection could potentially affect their results, but argue that it is either compensated for by using several different genes to build the genealogy or captured by the effective populations size (Bedford et al. 2011). Roje (2014) and Castoe et al. (2009) illustrate that including multiple genes into the analysis can help, but that phylogenetic techniques are, despite this multiplicity, vulnerable to selection (Castoe et al. 2009; Roje 2014). Another method that is frequently used to counterbalance the effect of selection allows for varying mutation rates along tree branches, i.e. an uncorrelated log-normal relaxed molecular clock model (Drummond et al. 2006; Chaillon et al. 2014). However, we are not aware of any studies that tested to what extent this model actually accounts for the effects of non-neutral evolution.

In this study, we test the effect of selection on phylodynamic inference of two popular one-step estimators of the migration rate between two within-host compartments. For this purpose, we simulate an infection, mutation and replication cycle within a host using a detailed agent-based simulation. The simulation relies on the mutational fitness effects distribution (MFED) estimated from the HIV-1 sequence data of Keele et al. (2008) (Keele et al. 2008; Lee et al. 2009; Bons et al. 2018) to account for the effects of selection. We simulate the evolution of a viral genome in two compartments with migration. We assume that the MFEDs in the two compartments are identical. The sequences and genealogies obtained from the simulation are then analyzed with BEAST2 using DTA as in Lemey et al. (2009) and MASCOT as in Müller et al. (2018). Our results indicate that selection leads to an overestimation of migration rates and that the relaxed molecular clock model—which has been suggested as a potential solution to account for the effects of selection (Drummond et al. 2006)—is particularly susceptible to this effect.

Results

Simulation of within-host viral evolution in two compartments with selection

We simulated the evolution of viral populations using a newly-developed agent-based simulation tool (virolution). The simulation features viral genomes (agents) that replicate and mutate in cells in two distinct anatomical compartments as illustrated in Fig. 1. The genomes are assumed to migrate between the two compartments at the same rates.

Schematic representation of the simulation. (a) shows the simulation of a single generation of viral evolution. The simulated viruses mutate, replicates (based on the reproductive fitness), and is sub-sampled to a maximum population size. (b) evolution and migration in the two compartments (blue and orange). The viral population (white circles) is simulated independently in both compartments as in (a) before migration occurs. (See Methods for more details of the simulation.)

We considered two selection scenarios: neutral evolution as the baseline, and a selection scenario in which the replicative fitness differs between genotypes. The fitness differences are calculated on the basis of an empirically-supported mutational fitness effects distribution (MFED) for HIV-1 (Bons et al. 2018). We assumed that the MFED is identical in both compartments.

The output of these simulations are the sequences and genealogy of the viral genomes. In our simulations, we assumed a population size of 100 or 1,000 virions per compartment and varied the migration rate from 0 to 0.01 (which corresponds to a migration of up to 1% of the virions per generation). Given the computational constraints of inferring large trees in BEAST2, we used 200 samples from each simulation, independent of the simulated population size. We simulated the evolution of the virus for 1,000 generations. A detailed description of the simulation can be found in the Methods.

We then used DTA and MASCOT in BEAST2 to estimate the migration rate for all simulated scenarios from a fixed number of samples. Each method was applied in two ways: once with the true genealogy from the simulation and once with the sequence data, where the genealogy was reconstructed during the analysis. Finally, we compared these estimates with the true migration rate, i.e. the migration rate that has been used in the simulation. We expected a bias in the migration rate estimates with selection caused by errors in the phylogenetic reconstruction.

Selection-driven overestimation of migration rates in phylogenetic reconstructions

In both phylogeographic methods, DTA and MASCOT, we find evidence of structural overestimation biases in the inferred migration rates. In Fig. 2, we show the inferred migration rates based on the phylogenetic reconstruction and the true genealogy. The estimates are inferred from simulations with a population size of 100 virions per compartment and a mutation rate of 2.16×10^−5^ mutations bp^−1^ generation^−1^ (estimated for HIV-1). In many of the scenarios with and without selection, the migration rates are significantly overestimated (see Table S2). We therefore distinguish between two types of biases: biases that affect both regimes (with / without selection) and biases that only affect the selection regime.

Selection Bias in Phylogenetic Estimates of HIV-1 Migration Rates Using Relaxed Clock Models. Average mean migration rate estimate with DTA and MASCOT in BEAST2. For each migration rate, 50 simulations were conducted, maintaining a constant population of 100 virions in each compartment, with a mutation rate of 2.16×10−5 mutations bp−1 generation−1, and spanning a total of 1,000 generations. All sequences were sampled and analyzed in BEAST2 with a relaxed clock model. The left figure illustrates the inference while sampling trees, while the right figure displays the migration rate estimates based on the true–simulated–genealogy. The lighter colored bands illustrate the 95% confidence interval.

In both regimes the compartments are initiated with the same wildtype sequence. This choice is motivated by the initial dynamics of an HIV-1 infection, where the majority of infections are initiated by a single genotype (Keele et al. 2008). Because estimation of migration rates relies on a joint genealogy annotated with compartment states, a single state change is required to join the otherwise independent genealogies, hence causing a structural bias in absence of migration. This effect—caused by a discrepancy between the inference and simulation model—does not limit itself to the absence of migration, but maintains a consistent bias through all migration rates (see Fig. 2).

Additionally, we find that the estimates based on the phylogenetic reconstruction are increasingly overestimated in the presence of selection (see Fig. 2). In Table 1, we show the average posterior probability of overestimation caused by selection for each migration rate for 50 simulations. The posterior probability of overestimation caused by selection is calculated as the proportion of samples where the inferred migration rate of a simulation with selection is higher than a sample from a neutral simulation (see Fig. S27 for the distributions). Values higher than 0.5 indicate that the migration rate is more likely to be overestimated in the presence of selection than estimates without selection.

We find that for the simulation parameters used in Fig. 2, selection introduces a significant overestimation bias to the migration rate estimates based on the phylogenetic reconstruction, which is reflected in the average posterior probability of overestimation averaging at 0.642 for DTA and 0.666 for MASCOT.

While this effect does not limit itself to the estimates based on the phylogenetic reconstruction, it is more pronounced in these compared to the estimates based on the true genealogy, where the average posterior probability of overestimation is 0.524 for DTA and 0.628 for MASCOT.

Indeed, during phylogenetic reconstruction, the evolutionary model assumes neutral evolution. Selection may compromise the phylogenetic reconstruction either by skewing the reconstructed topology when mutations occur independently in both compartments, or by purifying selection which leads to removal of deleterious mutations that can reduce inferred branch lengths. Both effects may contribute to the overestimation of migration rates with phylogenetic reconstruction.

Increased bias with selection when assuming relaxed clock models

Relaxed clock models have been suggested to be more robust and potentially subsume the effect of selection (Drummond et al. 2006; Chaillon et al. 2014). Here, we cannot confirm this. The migration rate estimates shown in Fig. 2 have been obtained using a relaxed clock model. In fact, when using a strict clock model, we find less evidence for structural overestimation biases for the same simulated data as the posterior probability of overestimation is lower for both DTA (0.566) and MASCOT (0.631) (see Figs. S1, S27, S29, and Table S1). Evidently, the flexibility of the relaxed clock model does not subsume, but instead reinforce the effect of selection on the migration rates.

Elevated mutation rates amplify overestimation bias in migration inference with selection

We also investigated the robustness of the inference to changes in mutation rate. In Fig. 3, we show the results of simulations with a 10-fold higher than empirical estimates. Without selection, there is no significant difference in the migration rate estimates to the simulation with lower mutation rate (2.16×10^−5^ mutations bp^−1^ generation^−1^). However, with selection, we observe that the overestimation bias increases with both methods with the relaxed clock model.

Increased Overestimation of HIV-1 Migration Rates with Increased Mutation Rates with Relaxed Clock. Average mean migration rate estimate with DTA and MASCOT in BEAST2. For each migration rate, 50 simulations were conducted, maintaining a constant population of 100 virions in each compartment, with a mutation rate of 2.16×10−4 mutations bp−1 generation−1, and spanning a total of 1,000 generations. All sequences were sampled and analyzed in BEAST2 with a relaxed clock model. The left figure illustrates the inference while sampling trees, while the right figure displays the migration rate estimates based on the true–simulated–genealogy. The lighter colored bands illustrate the 95% confidence interval.

Here, both the inference including the phylogenetic reconstructions and the true genealogy are affected. Specifically, we start to see a strong bias with MASCOT even when the tree is fixed to the true genealogy. This observation suggests that selection directly affects the population dynamic inference beyond the phylogenetic reconstruction.

It should be noted that, with increased mutation rate, the estimates obtained with the strict clock do not show the same increase in overestimation bias due to selection with the increased mutation rate (Fig. S2). However, using the strict clock model, estimates will show a general trend towards underestimation with higher migration rates, which is not observed with the relaxed clock model (see Table S3).

Increased population size further decreases the accuracy of phylodynamic inference

Given that the overestimation bias was strongly affected by the mutation rate, it is expected that an increase in population size would have a similar effect on the estimates of migration rates. Figures S3 and S4 show the results of simulations with a population size of 1,000 virions per compartment. When compared to the migration rate estimates for simulations with a smaller population size of 100 virions per compartment, both methods performed significantly worse.

Figure S3 shows inference on simulations with a mutation rate of 2.16×10^−5^ mutations bp^−1^ generation^−1^ and a population size of 1,000 virions per compartment. In this scenario, there is a similar supply of mutations as in the simulations with a smaller population size, and higher mutation rate of 2.16×10^−4^ mutations bp^−1^ generation^−1^. Therefore, one might expect the estimates to show similar biases. However, this is not the case. In fact, we observe that the estimates based on the phylogenetic reconstruction perform better than the estimates based on the true genealogy for both, DTA and MASCOT (Figs. S3 and S5).

This finding contrasts sharply with the results from simulations with smaller population size of 100 virions per compartment, where estimates based on the true genealogy outperformed those based on the phylogenetic reconstruction (see Figs. 2 and 3, and Table S2). This suggests that the increased population size and the ensuing need for sub-sampling are major factors contributing to the decreased accuracy of the estimates. To assess the role of sub-sampling, we tested the inference performance on 50 sampled sequences from the simulations with a population size of 100 virions per compartment. Again, sub-sampling increased overall overestimation bias with inference based on the true genealogy showing worse performance than inference based on the phylogenetic reconstruction (see Figs. S7–S10).

Indeed, sub-sampling can reduce the representativeness of the sample and affect the inferred phylogenetic relationships, leading to increased uncertainty in the estimates. Specifically, the decreased performance of both methods on the true genealogy compared to the reconstructed phylogeny may be explained by the sampling bias introduced when sub-sampling the true genealogy. In this case, when phylogenetic uncertainty is not accounted for, branch length may not be representative of the underlying evolutionary process decreasing accuracy of the derived estimates. This could be supporting evidence, that the ability to account for phylogenetic uncertainty is crucial for accurate phylodynamic inference.

Overall, the conflicting results and variety in biases between different simulation parameters, inference methods, and clock models, do not permit conclusive explanations for all observed biases and more detailed investigations are necessary to understand when phylodynamic inference results are reliable.

Discussion

In this study, we have investigated the performance of phylogeographic methods, DTA and MASCOT, to estimate migration rates between anatomical compartments with realistic selection effects. Although phylodynamic methods have been used to study the compartmentalization of HIV-1 (Chaillon et al. 2014; Lorenzo-Redondo et al. 2016), a systematic evaluation of the methods specific to the task of estimating within-host migration rates has not been performed. There is a discrepancy between the assumptions of the inference models and characteristics of the within-host dynamic of HIV: while the inference models, especially the phylogenetic reconstruction on which they rely, assume neutral evolution, HIV-1 is subject to strong selection pressures within its host. To study how this discrepancy affects phylodynamic analysis, we have created a simulation of HIV-1 diversification in two within-host compartments, in which we assume concordant selection. Our results show that both methods are sensitive to selection and parallel evolution, leading to an overestimation of the migration rates.

In scenarios in which migration occurs, we found that migration rates are overestimated under selection. This can be explained by two mechanisms: first, mutations may independently arise and fixate in both compartments, leading to homoplastic mutations that are wrongly interpreted as migration events; and second, selection may reduce the available genetic diversity in both compartments, which reduces branch lengths of reconstructed phylogenies (Williamson and Orive 2002; Koelle and Rasmussen 2025). While both of these effects are expected to lead to an overestimation of the migration rates, we also found that the relaxed clock model (Drummond et al. 2006), which is frequently used to account for selection effects (Chaillon et al. 2014), does not decrease the bias, but instead increases it for both methods, DTA and MASCOT. This model allows variable clock rates across branches, which may indicate that the overestimation largely stems from deleterious mutations shortening the reconstructed branches. We consider this overestimation bias that is caused by selection to be a major concern for phylodynamic analysis. In our analysis, we have found an overestimation bias that is caused by the specific implementation of selection: we assumed identical directed selection in both compartments. However, other forms of selection—e.g. divergent evolution—may also lead to biases in the estimation of migration rates, not only overestimation biases (Bons and Regoes 2018).

Moreover, we have observed a consistent overestimation bias for migration rates at zero, i.e. in scenarios in which migration does not occur. This bias is caused by a discrepancy between the model used for the inference and the one we used for the simulation. In our simulation, the initial population in both compartments is identical, a modeling choice motivated by the initial dynamics of HIV-1 infection where the majority of infections are initiated by a single genotype (Keele et al. 2008). Here, even in absence of migration, the two compartments initiated with the same ancestor will be connected through the genealogy. This cannot be accurately modeled by any of the two methods that we have tested in this study, however may be addressed by other population dynamic models, such as Isolation-with-Migration models (Hey 2010). We do not consider this to be a major concern, as the methods are designed to estimate migration rates, and not to detect the absence of migration. However, it is important to note, that even when data does not violate the assumption of neutral evolution, such effects can still create a significant structural bias on inference of migration rates.

In our phylodynamic analyses of the simulated data, rather than estimating all parameters, we have fixed all population biological and evolutionary parameters, except for the migration rate. We chose this approach to obtain tight migration rate estimates and to isolate the effect of selection and phylogenetic reconstruction on the estimate. Interestingly, we found that restricting the inference to the true genealogy does not necessarily improve the performance of the methods. In Section 2, both methods fail to recover the true migration rate if we fix the genealogy, while maintaining better performance when the genealogy is inferred. This suggests that accounting for phylogenetic uncertainty may be necessary to reduce biases in sampling to translate into the derived estimates, but also indicates that inference methods based on maximum likelihood estimation may be better suited when population sizes are large. And while not having to know the genealogy a priori for reliable migration estimation may sound like good news, it adds a note of caution to how prior knowledge affects phylodynamic inference, especially when large data sets require sub-sampling.

In this study, we assessed phylodynamic inference using a detailed simulation model that explicitly incorporates details on the replication, diversification, and selection of viruses and outputs genetic sequences. This approach is rarely adopted. More commonly, phylodynamic methods are tested by simulating trees using the exact same models that are inverted for the inference (see for example Stadler 2011; Vaughan and Drummond 2013; Louca and Doebeli 2018). Our study clearly shows that the common approach falls short in capturing potential structural biases that arise from more realistic settings such as non-neutral evolution, and simulations that depart from the inference assumptions in meaningful ways are needed. To our knowledge, only one study has taken a similar approach based on a detailed simulation of an HIV-1 epidemic to evaluate phylodynamic methods (Ratmann et al. 2017), also identifying significant estimating biases.

Phylodynamic methods that are based on a posterior distribution largely rely on Markov chain Monte Carlo (MCMC) algorithms to explore the parameter space. While such methods generally allows us to characterize complex, even non-convex spaces (such as tree space), it requires relatively large computation times to achieve convergence. To allow an analysis with DTA and MASCOT in BEAST2, we had to restrict the number viral sequences we used for inference. For example, when approaching larger population sizes ( $[eqn]$ ), we had to resort to sub-sampling the data. The performance of the phylodynamic methods we have tested degraded as we increased the population sizes beyond 1,000 until eventually, we were unable to recover the migration rates. Additionally, to achieve convergence, we were forced to limit the biological realism of the simulation: we chose to limit the number of compartments, the maximum population size, and excluded processes such as coinfection and recombination. While the present study focused on the effect of directed selection on Bayesian phylodynamic inference, our simulation can integrate a wide range of complex, biologically relevant mechanisms. A thorough evaluation of how these mechanisms impact any phylodynamic inference is a promising avenue for future research.

In conclusion, we have established biases of phylodynamic inference in selective regimes that are amplified when sub-sampling the population. While we only tested a limited set of inference methods, the assumptions violated in our simulations are shared among a wide range of phylodynamic methods and therefore similar biases will likely be found across different approaches. Additionally, although we specifically focused on the estimation of migration rates between compartments within HIV-infected hosts to achieve an ecologically relevant simulation scenario, we argue that these results also apply in epidemiological setting in which selection is also known to operate. Our study illustrates the value of detailed, empirically-supported simulations that are not necessarily consistent with the assumptions of the inference method for its assessment.

Methods

Data simulation

Evolution and migration of virions is simulated using the tool “virolution”. “virolution” is a simulation tool written in the programming language Rust. It is designed to simulate large virus populations in a multitude of complex scenarios. Within its simulation loop, “virolution” will simulate the infection of host cells by virions, the replication and mutation of virions, and the sampling process that determines which virions will be present in the next generation. “virolution” is open-source and available on GitHub: https://github.com/sirno/virolution.

Here, we have used “virolution” to simulate the evolution within two compartments. The compartments start with identical populations of virions, which evolve largely independent of each other. The rate of replication of the virions is determined by the fitness associated with their genomic sequence. To keep the population size constant, the virions are sub-sampled at the end of each generation. We simulated migration between the compartments by symmetrically exchanging a fraction of the virions at each generation. When the number of virions migrating is not an integer, we use a Bernoulli-distributed variable to determine whether the remaining virion should migrate or not.

Figure S35 and S36 show the genetic distance between the two compartments for selected parameters and illustrate the compartmentalized evolution of the virions in the simulation under different conditions.

The simulations rely on an empirically-supported DFE derived from HIV-1 env sequences sampled early in infection Bons et al. (2018). For reasons of consistency with this DFE, we initiate each simulation with a randomly generated reference sequence of length $[eqn]$ nucleotides with nucleotide frequencies that agree with those of the HIV-1 env gene.

Mutation rates and substitution model

All simulations use the same GTR substitution model with a fixed substitution matrix. However, we ran simulations with two different mutation rates, one which resembles a typical mutation rate for HIV-1 (2.16×10^−5^ mutations bp^−1^ generation^−1^) and one that maintains a 10-fold higher mutation rate (2.16×10^−4^ mutations bp^−1^ generation^−1^). The slightly higher mutation rate, was intended to increase the evolutionary signal in the trees and data, and therefore potentially improve the predictive power of phylodynamic methods.

Selection based on Mutational Fitness Effects Distribution

Selection is simulated using variation in reproductive fitness of the virions. We simulate reproduction by sampling from a Poisson-distribution with $[eqn]$ , which is the basic reproduction rate of the wild-type virus. We distinguish between simulations with and without selection effects. When selection effects are enabled, we use a mutation fitness effects distribution (MFED) that has been estimated for HIV-1 in Bons et al. (2018). The MFED is a piece-wise log-normal distribution, with $[eqn]$ lethal mutations, and otherwise log-normal distributed fitness values ( $[eqn]$ , $[eqn]$ ). From this distribution, we assign a specific fitness value to any possible mutation $[eqn]$ at position $[eqn]$ and nucleotide $[eqn]$ . If $[eqn]$ is the nucleotide of a genotype h and $[eqn]$ is the nucleotide of the reference at position i, then the fitness $[eqn]$ of a genotype h is defined as:

[eqn]

where

[eqn]

We sample the number of offspring for each genotype from a Poisson-distribution with $[eqn]$ , where $[eqn]$ is the basic reproduction rate of the wild-type virus. The variation in reproductive fitness with subsequent sub-sampling will cause selective pressure on the virions.

In this study, we have used the same assignment of fitness values to mutations for both compartments.

Fixed constant population sizes

We enforce fixed population sizes in each compartment by sub-sampling each compartment after reproduction. We simulate 100 and 1,000 virions per compartment.

Migration between compartments

After sub-sampling, virions can migrate between the two compartments at rates $[eqn]$ . We choose the number of migrants by multiplying the number of virions in the compartment by the migration rate. We exchange the integer value of this product and add the result of a Bernoulli-distributed variable based on the remainder.

Repeated runs to account for stochastic variation

For each combination of settings that has been described in this section, we run the simulation 50 times, leading to a total of 4400 independent simulations. Each simulation will produce a full sample of all sequences that are present after 1,000 generation, including the exact genealogy of the final sequences as they evolved during the simulation.

Phylodynamic analysis

To infer migration rates from the simulated data, we use DTA as described in Lemey et al. (2009) and MASCOT as described in Müller et al. (2018). For both analyses we fix the population size and mutation rate to the values that were used for simulation. Further, values of the GTR substitution model were fixed to the known rates that were used during the simulations. The strict clock was uniformly distributed and unbounded. The relaxed clock was exponentially distributed with a mean $[eqn]$ .

For DTA we set the non-zero rate prior to a Poisson-distribution with $[eqn]$ and the relative migration rates prior to a Gamma-distribution with $[eqn]$ and $[eqn]$ . For MASCOT we chose the per compartment effective population size equal to the number of simulated virions in each compartment and migration rates prior uniformly distributed between 0 and 1.

The inference was done based either on the sequences (where phylogenetic reconstruction is necessary) or the true genealogy. In both cases, we used 100 individuals from each compartment, which in the case of the larger population size of $[eqn]$ individuals per compartment were chosen randomly. In simulations without phylogenetic reconstruction we provided the true genealogy as a full timed tree without any type (i.e. compartment) annotations and removed tree operators that change branch lengths or topology while allowing operators that change type annotations.

All inference runs used a chain length of $[eqn]$ . The resulting ESS values all achieved values well above $[eqn]$ .

Evaluation of structured biases

We distinguish between two types of biases that we have observed in the inferred migration rates: (1) biases that are confounded by the simulation design and (2) biases that are related to selection effects.

To evaluate the first type of bias we compared the mean inferred migration rates of 50 simulations to the true migration rates. To this end, we calculated the posterior probability of overestimation for each posterior sample after an initial burn-in of 10 % of the samples, counting the number of samples that overestimate the true migration rate.

For the second type of bias, we compared the mean inferred migration rates between 50 simulations with and without selection effects. Here, we calculated the posterior probability of overestimation for pairs of posterior samples after an initial burn-in of 10 % of the samples. For each pair of samples, we counted the number of samples, where the sample with selection is larger than the sample without selection.

Additionally, error bars on the figures represent the 95% confidence interval of the mean inferred migration rates.

Distance between compartments

To quantify the genetic distance between the two compartments in the model, the following relation is used:

[eqn]

In this equation, $[eqn]$ stands for the frequency of nucleotide p at position k in compartment i and n is the sequence length. Practically, the distance is thus defined as the sum over all absolute differences in nucleotide frequencies over all locations in the genotype. The definition is from Bertels et al. (2021). The distance is calculated within the virolution simulation tool.

Supplementary Material

msag014_Supplementary_Data

Bibliography55

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Asquith B, Edwards CT, Lipsitch M, Mc Lean AR. Inefficient cytotoxic t lymphocyte–mediated killing of HIV-1–infected cells in vivo. P Lo S Biol. 2006:4:e 90. 10.1371/journal.pbio.0040090.16515366 PMC 1395353 · doi ↗ · pubmed ↗
2Bedford T, Cobey S, Pascual M. Strength and tempo of selection revealed in viral gene genealogies. BMC Evol Biol. 2011:11:220. 10.1186/1471-2148-11-220.21787390 PMC 3199772 · doi ↗ · pubmed ↗
3Bertels F, Leemann C, Metzner KJ, Regoes RR. Parallel evolution of HIV-1 in a long-term experiment. Mol Biol Evol. 2019:36:2400–2414. 10.1093/molbev/msz 155.31251344 PMC 6805227 · doi ↗ · pubmed ↗
4Bertels F, Metzner KJ, Regoes RR. Convergent evolution as an indicator for selection during acute HIV-1 infection. Peer Community J. 2021:1:e 4. 10.24072/pcjournal.6. · doi ↗
5Bons E, Bertels F, Regoes RR. Estimating the mutational fitness effects distribution during early HIV infection. Virus Evol. 2018:4:vey 029. 10.1093/ve/vey 029.30310682 PMC 6172364 · doi ↗ · pubmed ↗
6Bons E, Regoes RR. Virus dynamics and phyloanatomy: merging population dynamic and phylogenetic approaches. Immunol Rev. 2018:285:134–146. 10.1111/imr.2018.285.issue-1.30129202 · doi ↗ · pubmed ↗
7Bouckaert R et al BEAST 2.5: an advanced software platform for Bayesian evolutionary analysis. P Lo S Comput Biol. 2019:15:e 1006650. Publisher: Public Library of Science. 10.1371/journal.pcbi.1006650.30958812 PMC 6472827 · doi ↗ · pubmed ↗
8Castoe TA et al Evidence for an ancient adaptive episode of convergent molecular evolution. Proc Natl Acad Sci U S A. 2009:106:8986–8991. 10.1073/pnas.0900233106.19416880 PMC 2690048 · doi ↗ · pubmed ↗