The role of receptor binding and immunity in SARS-CoV-2 fitness landscape: A modeling study
Zhaojun Ding, Hsiang-Yu Yuan

TL;DR
This study models how SARS-CoV-2 evolves by analyzing receptor binding and immune escape in relation to population immunity.
Contribution
A novel model decomposes the effective reproduction number to explore how viral traits and immunity shape SARS-CoV-2 fitness.
Findings
SARS-CoV-2 fitness peaks correspond to variants of concern like alpha, delta, and omicron.
BA.2 achieved optimal fitness through high immune escape despite lower ACE2 binding than BA.1.1.
Population immunity and viral traits together create a rugged fitness landscape for SARS-CoV-2.
Abstract
Despite extensive research on SARS-CoV-2 ACE2 binding and transmissibility, their relationship concerning varying immunity remains unclear. SARS-CoV-2 receptor-binding domain (RBD) sequences from Italy in GISAID, combined with multiple deep mutational scanning data, were used to calculate ACE2 binding score and immune escape during the pandemic. We developed a COVID-19 transmission model that decomposed the effective reproduction number into three time-varying components: viral infectiousness (representing fitness), host susceptibility, and contact rate. After model fitting, a rugged fitness landscape, spanned by ACE2 binding score and virus-perceived effective immunity (adjusted for viral immune escape), was observed with peaks corresponding to individual variants of concern (VOCs) (alpha, delta, and omicron (BA.1∗ and BA.2∗)). Increasing effective immunity with reduced ACE2 binding…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSARS-CoV-2 and COVID-19 Research · vaccines and immunoinformatics approaches · COVID-19 epidemiological studies
Introduction
The COVID-19 pandemic was characterized by repeated infection waves, driven by SARS-CoV-2 variants with new and advantageous genomic mutations. These mutations enhanced fitness in variants of concern (VOCs) via a complex interplay between viral intrinsic traits and population immunity, shaped by vaccinations and prior infections.1^,^2 Therefore, understanding the SARS-CoV-2 fitness landscape can provide insights into evolutionary virus dynamics, knowing whether the virus undergoes gradual or punctuated evolution, which is important for predicting and preparing for future outbreaks.
The SARS-CoV-2 fitness is associated with the virus’s ability to bind to ACE2 receptors and evade immune defenses.1 Upon entering the human body, the virus uses its spike protein to bind to the ACE2 receptor on the surface of host cells. This interaction occurs primarily through the receptor-binding domain (RBD). Simultaneously, the host immune system produces neutralizing antibodies, which prevent the virus from binding to ACE2. However, SARS-CoV-2 can evade immune recognition by mutating amino acid residues within the RBD, a phenomenon known as immune escape.1^,^3 However, how ACE2 binding and immune escape interact to influence virus fitness under varying immunity levels among infected hosts remains unclear.
The accumulated mutations in the RBD of SARS-CoV-2 spike protein can confer selective advantages toward a higher fitness, resulting in antigenic drift.2 For example, the N501Y mutation in the alpha (B.1.1.7) sub-lineage may have altered spike protein binding affinity to ACE2, resulting in higher transmission rates.4^,^5 This mutation appeared in the omicron sub-lineages. Such evolutionary changes help sustain the virus’s presence in the population, facilitating COVID-19’s transition from a pandemic to an endemic state characterized by periodic surges.
The fitness landscape has been developed to represent the relationships between genetic variations and their corresponding fitness values, providing insights into the potential adaptive trajectories of populations.2^,^6 A rugged landscape occurs due to interactions between mutations (epistasis).6^,^7 Increasing immunity can also exert high selection pressure on viruses, contributing to antigenic drift.2^,^8 When a virus evolves through fitness valleys toward a new peak, it generates boom-and-bust patterns, likely resulting in repeated outbreaks.9 However, when SARS-CoV-2 evolutionary and transmission dynamics are intertwined, reconstructing the fitness landscape becomes challenging. For example, not only virus traits but also host immunity (e.g., vaccination) and host behaviors (e.g., social distancing measures and public awareness) can influence transmissibility (or effective reproduction number).10 Hence, quantifying viral fitness and finding its relationship with viral traits presents a challenge.
In this study, we aimed to assess the relationship between viral fitness and traits derived from deep mutational scanning (DMS) during the pandemic. Sequence and epidemiological data from Italy, a country with high viral incidence rates and high vaccine coverage, were used. To quantify and thereby separate intervention effects (i.e., vaccinations and non-pharmaceutical interventions (NPIs)) on virus fitness, we incorporated mobility and vaccination data in an infectious disease transmission model to calibrate fitness estimates (Figure 1). From simulated results, we constructed a fitness landscape after mapping the virus’s evolutionary trajectory traits (i.e., ACE2 additive sum of binding score and immune escape) over different periods across different immunity levels. This study was previously made available as a preprint version.11Figure 1. Study flow and model design(A) Schematic showing the extended SEIR model integrating multiple data sources to estimate the SARS-CoV-2 fitness landscape. Data inputs include mobility, immune escape, and vaccination coverage data. These data feed into the extended SEIR model, which was calibrated using reported case data. The model outputs include virus fitness combined with ACE2 binding data and effective immunity and show how these factors shape virus evolution patterns.(B) Detailed structure of the SEIR compartment model. The model partitions the population based on vaccination status and infection stage, highlighting transitions between susceptible (S), exposed (E), infectious (I), reported infectious cases (IR) and recovered (R) states, with an emphasis on the impact of vaccination doses on these dynamics. S_i, Ei, Ii, IRi, and Ri_ subscripts denote compartments with different vaccine doses.
Results
Recurrent COVID-19 surges in Italy are jointly driven by viral fitness, population immunity, and mobility
A large variety in daily incidence and mobility was observed (Figures 2A, 2B, and S2). Despite ongoing vaccination efforts (Figure 2C), viral outbreaks reoccurred and cumulative cases increased rapidly during the omicron period (Figure 2A). The effective reproduction number was likely influenced by both virus infectiousness and susceptibility. In our model, we considered susceptibility (determined by population immunity and immune escape) and contact rate (determined by mobility) as extrinsic factors while virus infectiousness as an intrinsic factor (i.e., viral fitness; see STAR Methods for the definition).Figure 2. Factors influencing daily COVID-19 incidences and virus fitness(A) Daily COVID-19 incidences in Italy. New daily COVID-19 cases in Italy are plotted on a log10 scale. VOC emergence is highlighted by vertical dashed lines, labeled as non-VOC, alpha, delta, and omicron strains.(B) Vaccination coverage in Italy from the first to the third vaccine doses.(C) Mobility index in Italy. Population movement levels relative to pre-pandemic levels (horizontal dashed line) are shown.
Temporal dynamics of virus ACE2 additive sum of binding score and immune escape
We observed distinct patterns between the two viral traits across main VOCs, including alpha, delta, and omicron (Figure 3A). ACE2 additive binding score decreased from a high level in alpha (B.1.1.7) to delta (B.1.617.2), with early omicron lineages (BA.1^∗^) showing binding scores comparable to the wild type (Figure 3B). Binding scores then slightly fluctuated among omicron sub-lineages, with BA.1.1 showing an increase relative to BA.1, and BA.1# showing a decrease. The emerging BA.2^∗^ lineages subsequently showed ACE2 binding scores between BA.1 and BA.1.1. On the other hand, immune escape showed a gradual rise from alpha to delta, followed by a sharp rise with the emergence of omicron, whose sub-lineages exhibited similar levels (Figure 3C).Figure 3. Comprehensive analysis of SARS-CoV-2 VOC with ACE2 binding affinity and immune escape(A) Variant prevalence over time. The bar chart shows the prevalence of different SARS-CoV-2 VOCs (alpha, beta, delta, gamma, and omicron strains) from early 2020 through early 2022.(B) Additive binding scores of viral lineages under different VOC backgrounds. The value in parentheses after each VOC denotes the mean ACE2 binding scores.(C) The immune escape scores over time. Scatterplot showing ACE2 binding and alpha, delta, and omicron variant immune escape across the timeline. BA.1∗ and BA.2∗ including BA.1, BA.2, and their sub-lineages. BA.1# means BA.1∗ without BA.1 and BA.1.1. BA.2# means BA.2∗ without BA.2. The value in parentheses after each VOC denotes the mean immune escape scores.
SARS-CoV-2 transmission dynamics
To understand how ACE2 receptor binding might influence virus fitness, we incorporated vaccine coverage, mobility, and virus immune escape data into the model to calibrate model fitting for daily reported cases (Figure 1B). Our model was able to reproduce the observed transmission dynamics, including non-VOC and VOC-associated epidemic waves (Figure 4A). The relative virus fitness was increasing from the late delta to the omicron periods (Figure 4B). Both vaccine-induced immunity (Figure 4C, green line) and total natural infection combined with vaccine-induced immunity (Figure 4C, red line) increased due to the initiation of the vaccination campaign (beginning in December 2020) in Italy around the same time.12 A similar trend was observed for effective immunity (i.e., immunity that takes into account the waning immunity and immune escape of circulating VOCs) against VOCs during alpha and delta periods, but a significant decrease was recorded in the omicron period (Figure 4D). This decline reflects the strong immune escape capabilities of omicron, despite the ongoing vaccination campaign.Figure 4. Epidemic indicator and immunological response trajectories during SARS-CoV-2 variant transition in Italy(A) Model fitting for daily confirmed cases. The red line represents the estimated daily incidences. Gray region indicates 95% confidence interval (95% CI).(B) Virus fitness and effective reproduction number. The red line shows estimated virus fitness over time, which is relative to the wild-type virus. The dashed gray line shows the effective reproduction number.(C) Population immunity against wild type. The green curve represents the average proportion of the population protected by vaccine-induced immunity, while the red curve reflects total immunity, including vaccine and infection-acquired immunity.(D) The curve represents the immune protection in the total population against circulating VOCs (referred to as effective immunity), considering virus immune escape properties.
SARS-CoV-2’s fitness landscape
The phenotype-fitness landscape spanned by the additive sum of ACE2 binding score and host effective immunity showed rugged patterns with four major peaks, corresponding to B.1.1.7 (alpha), B.1.617.2 (delta), BA.1^∗^ (first omicron peak), and BA.2^∗^ (second omicron peak); one minor peak, corresponding to the transition period (mixed with delta and original omicron (B.1.1.529) lineages) between delta and omicron; and four valleys between these peaks. A higher effective immunity was associated with a lower fitness peak from alpha to delta. During the omicron period, a reduction in effective immunity from BA.1^∗^ to BA.2^∗^ together with similar ACE2 binding scores corresponded with increasing fitness peaks (Figure 5A). Notably, BA.2^∗^ maintained high immune escape scores comparable to BA.1.1 but showed slightly lower binding scores.Figure 5. Interplay between ACE2 additive binding score, effective immunity, and virus’s fitness in SARS-CoV-2 evolution(A) Observed relationship between ACE2 additive binding score, effective immunity, and fitness. Gray dots represent variant fitness levels during a 2-week transition period following the initial outbreak of each VOC pandemic.(B) Fitness landscape of VOC. Surface topography shows the virus’s fitness landscape, which was reconstructed based on multivariate normal distribution. Peaks denote maximum variant fitness values in predominant periods.(C) Mutation prevalence associated with VOC transitions. Shadowed regions corresponding to valleys in (B) are marked by different colors. Some cluster mutations with similar prevalence were termed “Set1”, “Set2,” and “Set3.”
To evaluate the robustness of the evolutionary pattern (Figure 5A), we varied vaccine waning from an optimistic baseline to moderate and fast scenarios (50% loss at 6 and 3 months, respectively, see Figure S7). The shape and key features of the phenotype-fitness landscape were maintained, with no material shifts in variant ordering or trend direction. To help visualize the rugged patterns, a smoothed fitness landscape was constructed after fitting the fitness data (Figure 5B; see STAR Methods).
The sequential changes in mutations, characterized by partial replacement, progressing through the fitness valleys toward the peaks, are described as follows.
Transition from alpha to delta
During the transition between the fitness peaks at alpha and delta, the prevalence of N501Y (ACE2 binding changes ((Wuhan-Hu-1 background, denoted as Δbind): 0.96, immune escape changes (denoted as Δescape): 0.01, see Figure S5), a mutation with high ACE2 binding, began to decline after passing through the first fitness valley (Figures S5B and S6). Meanwhile, the prevalence of L452R (Δbind: 0.16, Δescape: 0.18) and T478K (Δbind: 0.04, Δescape: 0.004) started to increase during this valley, gradually replacing N501Y as the dominant mutations. Some rare mutations, like E484K (Δbind: 0.11, Δescape:0.27), showed fluctuations during the transition between the fitness peaks at alpha and delta. More details about the rare mutations across each VOC can be found in Figures S5 and S6.
Transition from delta to omicron (BA.1∗)
After the delta variant, the virus’s evolutionary pathway moved through two fitness valleys (i.e., valley 2 and valley 3) with a local peak (i.e., a minor peak during the transition period including a mixture of omicron and delta) and then reached BA.1^∗^, the next peak. Meanwhile, only N501Y and T478K were present in the BA.1^∗^ without L452R (Figure 5C). Next, the set 1 mutations (G339D, S373P, S375F, K417N, S477N, E484A, Q498R, and Y505H), set 2 mutations (S371L, G446S, and G496S), and few other mutations (N440K, Q493R, and N501Y, see Figure S5) gradually increased as the BA.1 lineage predominated. During this period, effective immunity increased from around 0.2 to 0.4 and then returned to the original level, while binding values remained similar, corresponding to the suboptimal fitness.
Transition from omicron (BA.1∗) to omicron (BA.2∗)
The virus continued to evolve through valley 4 toward the optimal fitness peak associated with BA.2^∗^. During this transition, the set 1 and other additional mutations (N440K, Q493R, and N501Y) were retained while set 2 gradually decreased as transient mutations (Figure S6). In the end, set 3 mutations (S371F, T376A, D405N, and R408S) gradually increased and became dominant. The mutational effects in set 3 showed either unchanged or reduced binding scores using BA.1 as background (Figure S5). As the virus progressed toward BA.2 dominance, optimal fitness emerged, coinciding with reduced effective immunity and a moderate change in ACE2 binding (Figure 5B).
Discussion
We introduced a mathematical framework to estimate virus fitness while reproducing the transmission dynamics of different SARS-CoV-2 variants in Italy. The fitness landscape was constructed for the virus from alpha to omicron. This was achieved by incorporating both intrinsic viral factors (e.g., infectiousness) and extrinsic factors (e.g., host susceptibility and contact rates) into a susceptible-exposed-infectious-recovered (SEIR)-based model, which helps quantify viral fitness while controlling for extrinsic factors. The results provide valuable insights into the real-world evolutionary constraints of SARS-CoV-2. DMS data are used to provide a proxy of virus traits for sequences from GISAID. While ACE2 binding scores derived from DMS were used to characterize receptor affinity, additional binding results can be incorporated to further refine the fitness landscape.
Rugged fitness landscapes that were observed in our research (Figures 5B and S7) suggest that certain factors may play important roles in the evolutionary pathway via fitness valleys, facilitating the transition to new fitness peaks. Epistasis can play a crucial role, as interactions between mutations are likely to lead to non-linear effects, forming rugged fitness landscapes, as we observed.6^,^13^,^14 Epistatic interactions remain uncertain as previous studies have reported conflicting findings on the trend of SARS-CoV-2 ACE2 receptor binding, and the available experimental data are limited.15^,^16^,^17 Hence, we adopted multiple DMS datasets, each using a different variant or subvariant template. This potentially reduces the biases caused by epistatic effects, as the dissociation constant for each template sequence was directly measured rather than inferred by summing mutational effects from the wild type.18^,^19
Furthermore, the fitness valley may also be related to deleterious effects among immune escape mutations or their linked mutations.20^,^21 Such mutations are likely to hinder SARS-CoV-2’s evolution, requiring additional time to reach a fitness peak when compensatory mutations occur. In addition, recombination can be a potential driver of evolutionary change in SARS-CoV-2, especially when multiple variants co-circulate.22 This event can also result in viral fitness changes.
A possible compromise between receptor binding affinity and immune escape during RBD evolution of omicron sub-variants from BA.2 to BA.4/5 has been shown in a recent study,23 where the authors found that in some cases, reduced immune escape was compensated by stronger cell binding. On the other hand, a strong immune escape mutation can occur with a reduced cell binding level. Early omicron evolution (from BA.1^∗^ to BA.2^∗^) showed that once a suboptimal fitness peak is reached by strong immune escape, further adaptation occurred with fluctuating ACE2 binding score. Furthermore, the constantly changing population immunity may play an important role in evolutionary shifts between different fitness peaks. In a previous study, the effects of immune escape mutations on viral fitness were compared between the presence and absence of antibodies, showing that these mutations can also affect replication.24 In our analysis, we considered that immune escape can influence both infectiousness and host susceptibility through changes in effective immunity. The observed fitness landscape suggests that ACE2 binding may influence viral fitness differently depending on the level of effective immunity among hosts.
Implications
Firstly, continuous surveillance of SARS-CoV-2 remains essential. Even within the same VOC, such as omicron, the virus still experienced a rugged fitness landscape, leading to repeated epidemics of varying sizes. This is exemplified by the transition from omicron BA.1^∗^ to BA.2^∗^, where the virus’s evolutionary path traversed a fitness valley before reaching a new peak. Knowing the change in traits will be important to understand why a new outbreak occurs.
Secondly, forecasting when a large outbreak will occur is still very challenging. An optimal virus fitness is likely to be reached after a set of mutations confer strong immune escape with a required adjustment in ACE2 binding.
Moreover, the results indicate a complex relationship between virus fitness and host immunity, modulated by ACE2 binding. To better understand the complex interplay between multiple viral traits and fitness, more advanced artificial intelligence (AI) models, such as spiking neural networks (SNNs), may provide valuable insights. Such modeling approaches, together with the phenotype-fitness landscape, can help illuminate the dynamic evolutionary landscape of SARS-CoV-2, which in turn underscores the ongoing need for regular updates to vaccines to maintain their effectiveness against emerging variants.
In summary, the present study showed that SARS-CoV-2 mutations in spike RBD, together with changing immunity, were linked to a rugged fitness landscape, which shaped viral evolutionary trajectories in Italy. Knowing the landscape can help provide a better understanding of the complex dynamics of viral evolution and adaptation, particularly in drug resistance and vaccine development.
Limitations of the study
There are certain limitations in this study. Firstly, the study only focused on RBD region in the spike protein, and we assumed the impact of each amino acid on viral traits is additive. Nonetheless, several experimental studies that consider the epistasis effect showed a similar pattern in the order with our results of binding changes among each VOC.16^,^25 Secondly, the population mobility index was an NPI proxy as it assumed social contacts were proportional to population mobility, which did not consider the awareness-driven behavior changes on the dynamic of mobility. There is a risk that changes in βv(t) may reflect changes in NPIs or awareness, which could cause the actual detection rate to vary over time and potentially influence the estimation of βv(t). Thirdly, our model categorized the population based on administered numbers of vaccine doses, without considering age and gender heterogeneity in the population. This simplification overlooked potential differences in vaccine responses and exposure risks across different demographic groups. Fourthly, we acknowledge that viral fitness may be influenced by within-host replication, survival, and other viral or host factors. While modeling simplifies the complexity, it allows us to understand the mechanism through quantifying and, therefore, separating individual effects. Fifthly, GISAID may contain some noisy data. We have done initial screening to remove data with apparent errors. Sixthly, the seasonal or climatic factors were not considered. Finally, our framework was not intended as a long-term forecasting tool. The time-varying fitness (i.e., transmission rate βv(t)) was inferred from past incidence together with observed changes in vaccination, immune escape, and mobility. Accurate forecasts would require specifying future trajectories for different drivers, as well as viral traits of newly emerging viruses, which should be addressed in future studies.
Resource availability
Lead contact
Requests for further information and resources should be directed to and will be fulfilled by the lead contact, Hsiang-Yu Yuan ([email protected]).
Materials availability
This study did not generate new unique reagents.
Data and code availability
- •Daily reported COVID-19 cases in Italy were collected from “Our World in Data”, which was deposited at https://github.com/owid/covid-19-data/tree/master/public/data.
- •Deep mutational scanning data for ACE2 binding were deposited at https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS_Omicron/blob/main/results/final_variant_scores/.
- •Deep mutational scanning data for immune escape were deposited at https://jbloomlab.github.io/SARS2_RBD_Ab_escape_maps/.
- •Italy COVID-19 vaccination data were deposited at https://github.com/italia/covid19-opendata-vaccini.
- •Google mobility data were deposited at https://www.google.com/covid19/mobility/.
- •Apple mobility data were deposited at https://covid19.apple.com/mobility.
- •All original code has been deposited at GitHub at https://github.com/Laplace3210/bind_escape_code and is publicly available.
- •Any additional information required to reanalyze the data reported in this paper is available from the lead contact upon request.
Acknowledgments
We gratefully acknowledge all researchers for contributing to, generating, and openly sharing genome data via the Global Initiative on Sharing All Influenza Data (GISAID), as most of our analyses were based on this invaluable resource. We thank Prof. Haogao Gu, Dr. Jingbo Liang, and Mr. Hongsen Liang for helpful discussions and support and Ms. Fatema Khairunnasa for useful editing suggestions. We gratefully acknowledge Prof. Joshua Weitz at 10.13039/100008510University of Maryland, Prof. Xiangjun Du at 10.13039/501100002402Sun Yat-sen University, and Prof. Daihai He at 10.13039/501100004377Hong Kong Polytechnic University for their valuable suggestions on modeling, as well as Prof. Erik Volz and Mr. Vinicius Franceschi at 10.13039/501100000761Imperial College London for their insightful comments on the calculation of immune escape. The authors also acknowledge support from grants funded by the City University of Hong Kong under grant #9610416, General Research Fund under grant #9043514 (RGC Ref. No. 11203823), and the Institute of Digital Medicine at the City University of Hong Kong under grant #9229503.
Author contributions
Z.D. and H.-Y.Y. designed the study. Z.D. collected, analyzed, and modeled the data. Z.D. and H.-Y.Y. wrote the paper. H.-Y.Y. gave final approval for publication. All authors read and approved the final manuscript.
Declaration of interests
The authors declare no competing interests.
Declaration of generative AI and AI-assisted technologies in the writing process
During the preparation of this work, the authors used ChatGPT-5.1 in order to check the grammar and spelling of the sentence. After using this tool or service, the authors reviewed and edited the content as needed and take full responsibility for the content of the publication.
Star★Methods
Key resources table
REAGENT or RESOURCESOURCEIDENTIFIERDeposited dataItaly daily reported COVID-19 casesOur World in Datahttps://github.com/owid/covid-19-data/tree/master/public/dataDeep mutational scanning data for SARS-CoV-2 ACE2 binding (Backgrounds: Wuhan-Hu-1, Alpha and Delta)Starr et al.19 (2022)https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS_variants/blob/main/results/final_variant_scores/Deep mutational scanning data for SARS-CoV-2 ACE2 binding (Backgrounds: BA.1 and BA.2)Starr et al.18 (2022)https://github.com/jbloomlab/SARS-CoV-2-RBD_DMS_Omicron/blob/main/results/final_variant_scores/Deep mutational scanning data for SARS-CoV-2 immune escapeGreaney et al.26 (2022)https://jbloomlab.github.io/SARS2_RBD_Ab_escape_maps/Italy COVID-19 vaccination dataItalian Ministry of Health27 (2022)https://github.com/italia/covid19-opendata-vacciniGoogle mobility dataGoogle COVID-19 Community Mobility Report28 (2022)https://www.google.com/covid19/mobility/Apple mobility dataApple Map Mobility Trends Reports29 (2022)https://covid19.apple.com/mobilitySoftware and algorithmsRR Core Teamhttps://www.r-project.org/RStudioRStudio Teamhttp://www.rstudio.com/odin.dust (R-package)Open sourcehttps://mrc-ide.github.io/odin.dust/mcstate (R-package)Open sourcehttps://mrc-ide.github.io/mcstate/dplyr (R-package)Open sourcehttps://dplyr.tidyverse.org/tidyverse (R-package)Open sourcehttps://tidyverse.org/ggplot2 (R-package)Open sourcehttps://ggplot2.tidyverse.org/lubridate (R-package)Open sourcehttps://rstudio.github.io/cheatsheets/html/lubridate.html
Method details
Sequence data
Italian SARS-CoV-2 spike protein sequence data (31st January 2020–17th October 2022) were downloaded from the GISAID database.30 To ensure data quality control, two exclusion criteria were applied: (1) sequences with >5% ambiguous amino acids31 and (2) sequences <1,263 bp or >1,273 bp were filtered out. Ultimately, 112,011 sequences were included for virus ACE2 receptor binding and immune escape measurements. The proportions of beta and gamma sequences were very low, only 2.07% and 0.12%, respectively (Figure S1). Therefore, only alpha, delta, and omicron variants were reported as the main VOCs in subsequent analyses.
Measuring virus ACE2 additive sum of binding score and immune escape
DMS data were used to calculate the virus ACE2 additive sum of binding score (i.e., the sum of individual mutational effects) by mapping binding affinities between ACE2 and SARS-CoV-2 RBD amino acid mutation sites.32 To reduce the confounding effects of epistasis in cell binding, we employed multiple DMS datasets generated from five variants as backgrounds, namely Wuhan-Hu-1, Alpha, Delta, BA.1, and BA.2.18^,^19 For each viral sequence, we selected the DMS dataset whose reference background most closely matched the sequence’s ancestral lineage and used it to compute the ACE2 additive binding score (e.g., wild-type → Wuhan-Hu-1; Alpha and its sub-lineages → Alpha; Delta and its sub-lineages → Delta; BA.1 and its sub-lineages → BA.1; BA.2 and its sub-lineages → BA.2).
Escape fraction data were used to determine viral immune escape by mapping reduced binding levels between the virus and antibodies, using datasets integrated from 33 monoclonal antibodies and polyclonal sera, as available from the SARS-CoV-2 RBD antibody escape maps (https://jbloomlab.github.io/SARS2_RBD_Ab_escape_maps/).26 First, we used the Clustal Omega tool to align all sequences.33 Next, to calculate the virus ACE2 additive sum of binding score for each sequence, binding changes across individual amino acid mutations of the RBD region (residues from N331 to T531) were summed (Figure S8).34 For immune escape, we calculated the mean reduction in antibody binding for each amino acid mutation site across 33 monoclonal antibodies. Immune escape across individual amino acid mutations was summed, like the calculation of ACE2 binding. Finally, we computed daily average values of the additive ACE2 binding score and the additive immune escape score among all daily samples.
Mobility data and calculating the mobility index
The mobility index, which indicated the degree of population contact, was estimated from Google Mobility and Apple Mobility data.28^,^29 Specifically, three Apple mobility data subtypes (driving, walking, and transit movement) and four Google subtypes (grocery and pharmacy stores, transit, retail and recreation, and workplaces) were used to calculate mobility indices based on previous research.35
Vaccination data
COVID-19 vaccine administration data provided by the Italian Ministry of Health were used to summarize daily vaccination numbers for first, second, and third doses.27 During the study period (i.e., model fitting period, 31st January 2020–5th March 2022), 134,611,131 vaccine doses were recorded and sourced from five major vaccine manufacturers: Janssen, Moderna, Novavax, Pfizer, and AstraZeneca (Table S1). Messenger RNA (mRNA) vaccines (i.e., Pfizer and Moderna) and viral vector vaccines (i.e., Janssen and AstraZeneca) accounted for almost 99% of the Italian vaccination campaign (Figure S3). Consequently, we only considered mRNA (Pfizer and Moderna) and viral vector vaccines (Janssen and AstraZeneca).
Calculating virus fitness
A modeling approach was adopted to quantify viral infectiousness, which we refer to as viral fitness throughout this study. We extended a stochastic Susceptible - Exposed - Infectious - Recovered (SEIR) model, coupled with vaccination, mobility, and immune escape data, to reproduce COVID-19 transmission dynamics and estimate virus fitness across different VOC periods (Figures 1A and 1B). The model was used to estimate a time-varying transmission rate component (denoted as βv(t)), considered to be associated with ACE2 receptor binding under different immunity levels, representing changes in infectiousness. The relative virus fitness was defined as (βv(t)/β0, where β0 is the average value of βv(t) during the first month of the pandemic, representing the wild-type.
The effective reproduction number (Re) was represented as two components (Figure 2D): an intrinsic factor (infectiousness), and extrinsic factors (susceptibility and contact rate), as below.
Where Tg is the generation time, which is 4 days36^,^37^,^38; refers to vaccine efficacy from different doses (i) against the wild-type; represents protection against circulating strains, accounting for their immune escape. immune_escape(t) is the daily average immune escape of viruses; c
is the scaling factor for daily immune escape; Si(t)/N represents the fraction of individuals at each susceptibility level (i.e., S0, S1, S2 or S3), based on vaccine dose i received; M(t) is the daily mobility index, representing the effect of the change in contact rate on reducing overall fitness, Re.27 represents the transmission rate among susceptible groups with different vaccine efficacy (i.e., in Figure 1B). The effective reproduction number Re was obtained after fitting the transmission model (Figure 1B) to the daily reported cases, when daily vaccine doses were considered.
Exposed (Ei) individuals who came into contact with infected individuals will become infectious (Ii) after 1/σ days. As not all cases were reported correctly and immediately, reported infectious cases (IR) were considered based on the detection rate (d) and the confirmation delay (ρt) in the transition between Ei and Ii, which was used to fit the daily reported cases. Individuals with an infectious status would subsequently enter a removed status (i.e., R0, R1, R2 or R3) due to self-isolation or recovery after 1/γ(t) days. Considering waning immunity among these individuals, we assumed that recovered and vaccinated individuals lose their immunity on average after ω days (ω = log(0.85)/182.5, corresponding to exponential waning with a 15% loss in protection after 6 months), which is the same as the studies by Barnard et al. and Lau et al.’s settings.39^,^40 In addition, we also used a fast vaccine waning rate (ω = log(0.5)/90, 50% loss of protection after 3 months) and a moderate waning rate (ω = log(0.5)/182.5, 50% loss of protection after 6 months) as the sensitivity analysis to check the stability of fitness patterns under different waning scenarios.
Population immunity and effective immunity
Population immunity ( ) refers to the average level of immunity to a specific infectious agent (e.g., Wuhan-Hu-1) within a population, achieved through both natural infection and vaccination. It is commonly expressed as the proportion of individuals in the population who are immune, either through previous infection or through vaccination, but partial protection and waning immunity should be considered.41^,^42 These factors were incorporated into our analysis. We further calculated the virus-perceived immunity, termed effective immunity ( ) as , which accounts for immune escape and represents protection against currently circulating strains. More details about the parameters set and population immunity calculations can be found in the Methods S1: 2.2.3 Population immunity.
Quantification and statistical analysis
Model calibration and uncertainty quantification
The number of individuals in compartment IR (i.e., reported infectious cases), as the modeling output, was used to fit the daily numbers of reported cases. The particle Markov chain Monte Carlo method (pMCMC),43^,^44 which combines particle filtering and Markov chain Monte Carlo approaches, was used to calibrate model output using daily reported cases and then estimate model parameters, with a total of 100,000 iterations. Parameter uncertainty was summarized using the posterior mean and 95% confidence interval (CI) by sampling from the fitted parameter distribution. Convergence and sampling adequacy were assessed using effective sample size (>150).
Software and computational environment
All statistical analyses were conducted in R (version 4.1.3) using the packages described in method details and the key resources table. The mechanistic model parameter estimations and pMCMC implemented by using the “mcstate” package in R.43 A fitness landscape was fitted with virus fitness and assumed a multivariate normal distribution, which helped visualize SARS-CoV-2 adaptive evolutionary trajectories. More technical details about fitness landscape reconstruction can be found in the Methods S1: 2.2.4 Reconstruction of the fitness landscape.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Carabelli A.M.Peacock T.P.Thorne L.G.Harvey W.T.Hughes J.COVID-19 Genomics UK Consortium Peacock S.J.Barclay W.S.de Silva T.I.Towers G.J.Robertson D.L.SARS-Co V-2 variant biology: immune escape, transmission and fitness Nat. Rev. Microbiol.21202316217710.1038/s 41579-022-00841-736653446 PMC 9847462 · doi ↗ · pubmed ↗
- 2Yewdell J.W.Antigenic drift: Understanding COVID-19Immunity 5420212681268710.1016/j.immuni.2021.11.01634910934 PMC 8669911 · doi ↗ · pubmed ↗
- 3Greaney A.J.Starr T.N.Gilchuk P.Zost S.J.Binshtein E.Loes A.N.Hilton S.K.Huddleston J.Eguia R.Crawford K.H.D.Complete Mapping of Mutations to the SARS-Co V-2 Spike Receptor-Binding Domain that Escape Antibody Recognition Cell Host Microbe 2920214457.e 910.1016/j.chom.2020.11.00733259788 PMC 7676316 · doi ↗ · pubmed ↗
- 4Fiorentini S.Messali S.Zani A.Caccuri F.Giovanetti M.Ciccozzi M.Caruso A.First detection of SARS-Co V-2 spike protein N 501 mutation in Italy in August 2020 Lancet Infect. Dis.212021 e 14710.1016/S 1473-3099(21)00007-433450180 PMC 7836831 · doi ↗ · pubmed ↗
- 5Tian F.Tong B.Sun L.Shi S.Zheng B.Wang Z.Dong X.Zheng P.N 501Y mutation of spike protein in SARS-Co V-2 strengthens its binding to receptor ACE 2e Life 1020216909110.7554/e Life.69091 PMC 845513034414884 · doi ↗ · pubmed ↗
- 6Wright S.The roles of mutation, inbreeding, crossbreeding and selection in evolution Sixth International Congress on Genetics 11932356366
- 7Kauffman S.Levin S.Towards a general theory of adaptive walks on rugged landscapes J. Theor. Biol.1281987114510.1016/s 0022-5193(87)80029-23431131 · doi ↗ · pubmed ↗
- 8Grenfell B.T.Pybus O.G.Gog J.R.Wood J.L.N.Daly J.M.Mumford J.A.Holmes E.C.Unifying the Epidemiological and Evolutionary Dynamics of Pathogens Science 303200432733210.1126/science.109072714726583 · doi ↗ · pubmed ↗
