A simplified drift-diffusion model for pandemic propagation

Clara Bender; Abhimanyu Ghosh; Hamed Vakili; Preetam Ghosh; Avik W.; Ghosh

arXiv:2302.13361·physics.bio-ph·February 28, 2023

A simplified drift-diffusion model for pandemic propagation

Clara Bender, Abhimanyu Ghosh, Hamed Vakili, Preetam Ghosh, Avik W., Ghosh

PDF

Open Access

TL;DR

This paper introduces a simplified, analytical drift-diffusion model for pandemic propagation based on the SIR framework, providing intuitive visualization and potential policy utility.

Contribution

It offers a quasi-analytical solution to the SIR model, mapping pandemic dynamics onto a drift-diffusion process for better interpretability and application.

Findings

01

Model agrees well with COVID-19 data across countries.

02

Provides an intuitive visualization of epidemic evolution.

03

Discusses error sources and uncertainty growth over time.

Abstract

Predicting Pandemic evolution involves complex modeling challenges, often requiring detailed discrete mathematics executed on large volumes of epidemiological data. Differential equations have the advantage of offering smooth, well-behaved solutions that try to capture overall predictive trends and averages. We further simplify one of those equations, the SIR model, by offering quasi-analytical solutions and fitting functions that agree well with the numerics, as well as COVID-19 data across a few countries. The equations provide an elegant way to visualize the evolution, by mapping onto the dynamics of an overdamped classical particle moving in the SIR configuration space, drifting down gradient of a potential whose shape is set by the model and parameters in hand. We discuss potential sources of errors in our analysis and their growth over time, and map those uncertainties into a…

Equations63

\frac{d S}{d t}

\frac{d S}{d t}

\frac{d I}{d t}

\frac{d R}{d t}

\frac{d s}{d t}

\frac{d s}{d t}

\frac{df}{d t}

\frac{d f}{d t} = \frac{f ( 1 - f )}{τ _{0}}, \frac{d s}{d t} = - \frac{s ( 1 - s )}{τ _{0}}, s + f = 1

\frac{d f}{d t} = \frac{f ( 1 - f )}{τ _{0}}, \frac{d s}{d t} = - \frac{s ( 1 - s )}{τ _{0}}, s + f = 1

τ_{0} \frac{df}{d t}

τ_{0} \frac{df}{d t}

U (f)

\frac{d s}{d t}

\frac{d s}{d t}

U_{S I R} (s)

1 - s^{*} = - β ln s^{*}

1 - s^{*} = - β ln s^{*}

{s^{*}}=-\beta W\Bigl{(}\dfrac{-e^{-1/\beta}}{\beta}\Bigr{)}

{s^{*}}=-\beta W\Bigl{(}\dfrac{-e^{-1/\beta}}{\beta}\Bigr{)}

t^{'}

t^{'}

\frac{d s}{d t ^{'}}

\frac{d s}{d t ^{'}}

U_{S I R} (s)

\frac{df}{d t} = \frac{f _{0} e ^{t / τ}}{( f _{0} e ^{t / τ} + 1 ) ^{2} τ}, τ = τ_{0} + α t

\frac{df}{d t} = \frac{f _{0} e ^{t / τ}}{( f _{0} e ^{t / τ} + 1 ) ^{2} τ}, τ = τ_{0} + α t

α = - \frac{τ _{0}}{t _{m}} + \frac{1}{ln \frac{1}{f _{0}}}

α = - \frac{τ _{0}}{t _{m}} + \frac{1}{ln \frac{1}{f _{0}}}

\frac{df}{d t} = \frac{f _{0} e ^{t / τ}}{τ ( f _{0} e ^{t / τ} + 1 ) ^{2}} \times n = 1 \prod N (1 - \frac{β _{n}}{e ^{(t - t_{β_{n}}) / τ_{β_{n}}} + 1})

\frac{df}{d t} = \frac{f _{0} e ^{t / τ}}{τ ( f _{0} e ^{t / τ} + 1 ) ^{2}} \times n = 1 \prod N (1 - \frac{β _{n}}{e ^{(t - t_{β_{n}}) / τ_{β_{n}}} + 1})

A (t) = A_{0} [1 + \frac{β _{0}}{1 + e ^{- (t - t_{β}) / τ_{β}}}] = {A_{0}, t ≪ t_{β} A_{0} (1 + β_{0}), t ≫ t_{β}

A (t) = A_{0} [1 + \frac{β _{0}}{1 + e ^{- (t - t_{β}) / τ_{β}}}] = {A_{0}, t ≪ t_{β} A_{0} (1 + β_{0}), t ≫ t_{β}

R^{2} = 1 - \frac{\sum _{i} ( z _{i} - y _{i} ) ^{2}}{\sum _{i} ( y _{i} - ⟨ y ⟩ ) ^{2}}

R^{2} = 1 - \frac{\sum _{i} ( z _{i} - y _{i} ) ^{2}}{\sum _{i} ( y _{i} - ⟨ y ⟩ ) ^{2}}

\frac{d S}{d t}

\frac{d S}{d t}

\frac{d I}{d t}

s(t)=\Biggl{[}\dfrac{f_{0}\beta+(1-f_{0}-\beta)e^{-\displaystyle(1-\beta)t/\tau_{0}}}{f_{0}+(1-f_{0}-\beta)e^{-\displaystyle(1-\beta)t/\tau_{0}}}\Biggr{]}

s(t)=\Biggl{[}\dfrac{f_{0}\beta+(1-f_{0}-\beta)e^{-\displaystyle(1-\beta)t/\tau_{0}}}{f_{0}+(1-f_{0}-\beta)e^{-\displaystyle(1-\beta)t/\tau_{0}}}\Biggr{]}

U_{SIS}(s)=\displaystyle\dfrac{s^{2}(1-2s/3)}{2}+\beta\Biggl{[}\dfrac{s(s-2)}{2}\Biggr{]}

U_{SIS}(s)=\displaystyle\dfrac{s^{2}(1-2s/3)}{2}+\beta\Biggl{[}\dfrac{s(s-2)}{2}\Biggr{]}

\frac{df}{d t} = i = 1 \sum N \frac{N _{0 i} e ^{(t - t_{β_{i}}) / τ_{β_{i}}}}{( 1 + f _{i} e ^{(t - t_{β_{i}}) / τ_{β_{i}}} ) ^{2}}, τ_{β_{i}} = τ_{0 i} + α_{i} (t - t_{β_{i}})

\frac{df}{d t} = i = 1 \sum N \frac{N _{0 i} e ^{(t - t_{β_{i}}) / τ_{β_{i}}}}{( 1 + f _{i} e ^{(t - t_{β_{i}}) / τ_{β_{i}}} ) ^{2}}, τ_{β_{i}} = τ_{0 i} + α_{i} (t - t_{β_{i}})

\frac{df}{d t ^{'}} = - \frac{1}{τ _{0}} \frac{\partial U ( f )}{\partial f} + I_{0} (t)

\frac{df}{d t ^{'}} = - \frac{1}{τ _{0}} \frac{\partial U ( f )}{\partial f} + I_{0} (t)

⟨ I_{0} (t)⟩

⟨ I_{0} (t)⟩

⟨ I_{0} (t) I_{0} (t^{'})⟩

P (f, t) = \int d I_{0} Π (I_{0}) δ (f - f_{0} (t))

P (f, t) = \int d I_{0} Π (I_{0}) δ (f - f_{0} (t))

\frac{\partial P}{\partial t}

\frac{\partial P}{\partial t}

J

P_{0} (f) = P (f, t = \infty) = P_{0} e^{- U (f) / τ_{0} D}

P_{0} (f) = P (f, t = \infty) = P_{0} e^{- U (f) / τ_{0} D}

P (f, t) = C e^{- (f - f_{0} - v_{0} t)^{2} /2 (σ_{0}^{2} + 2 D t)}

P (f, t) = C e^{- (f - f_{0} - v_{0} t)^{2} /2 (σ_{0}^{2} + 2 D t)}

⟨ f (t)⟩ = \int f P (f) df

⟨ f (t)⟩ = \int f P (f) df

D_{A} (s) = \frac{[ λ _{A} N _{p} s ( 1 - s ) ] ^{2}}{2}, D_{B} (s) = \frac{[ λ _{B} N _{p} s ln s ] ^{2}}{2}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCOVID-19 epidemiological studies · Mathematical and Theoretical Epidemiology and Ecology Models · Computational Physics and Python Applications

Full text

A simplified drift-diffusion model for pandemic propagation

Clara Bender

Dept of Mechanical and Aerospace Engineering, University of Virginia, Charlottesville VA

Abhimanyu Ghosh

Poolesville High School, Poolesville, MD

Hamed Vakili

Dept of Physics, University of Virginia, Charlottesville VA

Preetam Ghosh

Dept of Computer Science, Virginia Commonwealth University, Richmond VA

Avik W. Ghosh

Department of Physics, University of Virginia, Charlottesville, Virginia 22903, USA

School of Electrical and Computer Engineering, University of Virginia, Charlottesville, Virginia 22903, USA

Abstract

Predicting Pandemic evolution involves complex modeling challenges, often requiring detailed discrete mathematics executed on large volumes of epidemiological data. Differential equations have the advantage of offering smooth, well-behaved solutions that try to capture overall predictive trends and averages. We further simplify one of those equations, the SIR model, by offering quasi-analytical solutions and fitting functions that agree well with the numerics, as well as COVID-19 data across a few countries. The equations provide an elegant way to visualize the evolution, by mapping onto the dynamics of an overdamped classical particle moving in the SIR configuration space, drifting down gradient of a potential whose shape is set by the model and parameters in hand. We discuss potential sources of errors in our analysis and their growth over time, and map those uncertainties into a diffusive jitter that tends to push the particle away from its minimum. The combined physical understanding and analytical expressions offered by such an intuitive drift-diffusion model could be particularly useful in making policy decisions going forward.

††preprint: APS/123-QED

I Introduction

Numerical modeling of pandemic propagation has a rich and varied history Huang and Qiao (2020); Braca et al. (2021); Adiga et al. (2020a, b); Bertozzi et al. (2020); Craig et al. (2021); Cao and Liu (2021); Shakeel et al. (2021); Gnanvi et al. (2021); Lalmuanawmaa et al. (2020); Roda et al. (2020); Toda (2020); Tolles and Luong (2020); Vespignani et al. (2020), ranging from Monte Carlo simulations that create histograms out of stochastic events, to Machine Learning/AI models trained on emerging data, curve fitting or structural models, graph theoretical approaches, to solving smooth differential equations such as predator-prey (Lotka-Volterra) and Susceptible-Infected-Recovered (SIR) models that simulate their average trends. Many of these models are highly detailed with several retro-fitted parameters. The ability of these models to predict accurate trends is often compromised by various sources of errors, as well as unpredictable uncertainties associated with geopolitics. What is critical to understand in this context are the simplified physical intuitions that may arise from these models, a minimal set of features and ways to capture them effectively, as well as the impact of various errors on their long term predictability.

The purpose of this paper is multi-fold.

(a) We revisit the SIR model and relate its mathematical parameters with key epidemiological constants, such as reproduction number $R_{0}$ , herd immunity fraction $I_{0}$ , incubation period $\tau$ and serial interval $S_{I}$ between events N. C. Achaiah and Setty (2020). The long-term behavior of its solutions can be expressed in terms of these parameters when they are time-independent constants. For instance, the long time single event fractional susceptibility $s^{*}$ can be expressed as a Lambert W function involving $R_{0}$ , and further simplified to a power law over a range of $R_{0}$ values.

(b) We introduce an elegant physical picture underlying the dynamics, in particular, phase transitions associated with parameter tuning. The dynamics can be mapped onto the equation for an overdamped classical particle drifting down gradient of a potential profile $U(s)$ , whose shape depends on the epidemiological constants as well as the specific model in hand. Damping makes the particle settle at the bottom instead of rolling through it, taking $s$ to its long-termed values $s^{*}$ .

(c) To make this solution quasi-analytical, especially for multi-events, we parametrically connect a simple model developed by Shur Shur (2021) with the SIR model, relating causal inputs (infection and recovery rates and their time-dependences) with observable consequences (stretch parameters, pandemic rise and fall times).

(d) We apply this model across multiple countries over a much wider time period than originally explored in the Shur model. In the process, we identify an inherent problem with a multiplicative model - namely, difficulty of fitting valleys without making the parameters unphysically large - an issue that argues towards an additive theory. A multiplicative model implicitly assumes independence of events, which makes it hard to pull out a new peak from a deep valley. We show that an analogous additive equation typically gives a larger $R^{2}$ fitting parameter.

(e) Finally, we point out some of the sources of sensitivity in the model fitting parameters and their long-term impact on evolution. We introduce Lángevin noise in the dynamics through both a Monte Carlo approach as well an equivalent drift-diffusion approach for the underlying, smooth probability distribution function (PDF), and interpret the evolution of the PDF by adding this diffusive jitter to the overdamped classical particle in an external potential. Just as a strong noise can kick a Brownian particle out of its global energy minimum to a shallower metastable state, strong uncertainty in parameters can limit the predictive time and lead to incorrect conclusions - not just evolutionary (continuously growing errors) but abrupt jumps (leaving one well and settling in another). We outline the impact of various uncertainties in the system that contribute to the diffusive spreading of the PDF, from initial reporting errors to parameter uncertainties to finite sampling size effects.

II Simplifying the SIR Model

Let us start by introducing the SIR model. The SIR model is a standard set of coupled differential equations used to study the evolution of an interacting population, such as one infected by a pandemic. The model is one of many approaches that have been invoked to study the spread of the COVID-19 pandemic. While elaborate Monte Carlo and AI models can delve into details, the model benefits from simplicity and the presence of a few, physically meaningful, lumped parameters. More importantly, the smoothness of the differential equations and their underlying solutions helps considerably in extracting quasi-analytical approximations (like we do here) and building physical intuition. The simplest version governing the time evolution of susceptible (S), infected (I) and recovered (R) population, where $S+I+R=N_{p}$ = constant (infected includes deceased population, in other words, the number of infections dead or alive), reads

[TABLE]

where $A$ is the average infection rate set by the strength of the interaction between infected and susceptible population, and $B$ is the average recovery rate. Here $N_{p}$ denotes the population of an infection cluster (one with spatially near-constant $A$ , $B$ set across the population, set by a certain upper limit on the $|d(A,B)/(A,B)|$ fractional variation in parameter, with small interaction parameters $A$ between clusters in a multi-patch model). Both sets of parameters $A$ and $B$ are sensitively dependent on space (population density varies, governments act differently) and time (virus mutates, population grows herd immunity, medication improves, masking and quarantine policies evolve). By tweaking these parameters, we can build in effects such as different kinds of intervention Sourav Chowdhury and Chaudhuri (2021), ‘stretching the curve’, in other words, avoiding a breakdown of the healthcare system - in this case by imposing a simple threshold on $I$ that makes B plummet when I goes above a critical number $I_{C}$ . The curves can then develop abrupt transitions or multiple modes. Later we show an example with a jump in $A$ leading to multiple peaks.

We can normalize the equation in terms of fractional populations, $s=S/N_{p}$ and $f=I/N_{p}$ (the fractional recovered is unity minus the sum of $s$ and $f$ ). Dividing both sides by $N_{p}$ , we get

[TABLE]

where $\tau_{0}=1/AN_{p}$ and $\beta=B/AN_{p}$ are respectively, positive definite and positive semi-definite constants.

For non-zero $\beta$ , the set of equations has a fixed point at $f=0$ where all the time-derivatives on the left vanish, giving a long-term asymptotic value of $f(t=\infty)=0$ and $s(t=\infty)=s^{*}$ . The intermediate values $s(t)$ , $f(t)$ can be numerically obtained in Matlab using a straightforward ode23 solver, with an initial condition set by the initial fraction of infected population $f_{0}\approx 1/N_{p}$ , meaning we started the dynamics with a single infected population - (this in itself is an approximation, because a sizeable $(A,B)$ pair may only settle in after a few infected people are initiated). The corresponding initial condition on susceptible fraction is $s_{0}=1-f_{0}=1-1/N_{p}\approx 1$ , since $N_{p}\gg 1$ .

Fig. 1 shows the results for various $\beta=B/AN_{p}$ values. The quantity $AN_{p}=1/\tau_{0}$ sets the overall incubation period (rise time), so that a plot of $f=I/N_{p}$ vs $t/\tau_{0}$ is controlled by the single parameter $\beta$ whose inverse $R_{0}=1/\beta$ is the reproduction number describing the average number of victims each infected person in turn infects.

From the second equation, we see that if at the outset $s_{0}<\beta$ , we have a negative slope in $f$ meaning the infection dwindles. In other words, for a recovery to infection ratio below a transition point $\beta^{*}=1/R_{0}^{*}=s_{0}\approx 1$ , there is a non-zero steady state (i.e., $t\rightarrow\infty$ ) population $s^{*}$ of susceptible but uninfected population given by Eq. 7, while the rest are all recovered.

Finally, we have herd immunity $i_{0}$ that describes the fraction of the population that must be immunized ( $1-s$ , leaving a fraction $s$ susceptible) before the pandemic starts to dwindle. We see that this happens when $i_{0}=1-\beta/s_{0}$ . For the example in Fig. 1(b), we have $s_{0}=1-1/N_{p}=0.999$ and $\beta=0.556$ (contact number $R_{0}=1.79$ ), so that herd immunity sets in when the immunization rate is greater than $0.4434$ , i.e., about $44\%$ of the susceptible population is immunized, and the susceptible fraction has dropped to $0.556=\beta/s_{0}$ .

III Interpreting Pandemic Dynamics as an Overdamped Particle Drifting down a Potential

We will now reinterpret the SIR equations to give them a physical picture. If, for instance, we ignore recovery, then $s=1-f$ so that the equation for $f=I/N_{p}$ becomes a single variable one

[TABLE]

with ${f}_{0}=1/N_{p}\ll 1$ the initial fraction of infected people. The solution is a modification of the well-known inverse Fermi-Dirac distribution $f=f_{0}/({f}_{0}+e^{-t/\tau_{0}})$ describing the equilibrium population of holes in a non-degenerate semiconductor (in neural net language it is the sigmoid/logistic function). It is notable that in the absence of any recovery and with an initial condition $f_{0}>0$ , $f$ can only grow over time, meaning the long-term solution is $f=1$ , i.e., the entire population gets infected. We can provide an elegant physical interpretation of the evolutionary dynamics of the infected fraction $f$ , if we interpret $f$ as a generalized configurational coordinate between 0 (uninfected) and 1 (fully infected). Since the total population does not change, the equation can be interpreted as a conservative picture of an overdamped classical particle whose distribution in time follows the down gradient of a potential

[TABLE]

where $U(f)$ represents the potential, $0<f<1$ . Here $df/dt$ is the speed of the particle, the inverse incubation time $\gamma=1/\tau_{0}$ becomes its dynamic friction coefficient, and the acceleration term $md^{2}f/dt^{2}$ has dropped out as the particle has already reached terminal velocity where the frictional force $\tau_{0}df/dt$ matches the driving force $F=-\partial U/\partial f$ .

Fig. 2 shows the potential in question. It has a clear minimum at $f=1$ , meaning that in the absence of recovery the inevitable eventuality is for the entire population to get infected over time.

In presence of recovery $B$ , the physics gets richer as we reach a steady state $s^{*}$ . Let us rewrite the differential equation for $s$ starting from Eq. 2. We define $d^{2}s/dt^{2}=pdp/ds$ , where $p=ds/dt$ , and use an integrating factor $1/s$ for $dp/dt$ , in order to set up a differential equation in the ( $s,p$ ) phase space. From there, we can get a differential equation involving the single variable $s$

[TABLE]

upto an additive constant (similar to the choice of a ground in an electrical potential), where once again $\beta=B/AN_{p}=1/R_{0}$ , $R_{0}$ being the reproduction number. The evolution of the potential $U_{SIR}(s)$ for various recovery to infection rate ratios $B/A$ , and the corresponding force $F_{s}=-\partial U/\partial s=\tau_{0}ds/dt$ are shown in Fig. 3.

It is prudent at this stage to find the fixed points of the equation where $ds/dt=0$ . We get two fixed points, $s=0$ and $s=1$ for $\beta>1$ , and more interestingly for $\beta<1$ , an intermediate number $s^{*}$ satisfying

[TABLE]

The right side of this transcendental equation starts from 0 at $s=1$ with initial slope $-\beta$ and rises monotonically with decreasing $s$ (Fig. 4), so it can only intersect the left part of the equation if its slope is smaller in absolute magnitude than the fixed slope $-1$ of the left side, i.e., $\beta<1$ . This means there is a steady-state fraction of recovered population as well, set at $1-s^{*}$ . The exact solution is given by

[TABLE]

where $W$ is the Lambert W function (the inverse of $xe^{x}$ ), in this case extended along the negative branch Lehtonen (2016); Wang (2010). Over a reasonable range of $\beta$ values shown above, the solution roughly follows a power law $s^{*}\approx\beta^{5/2}$ , for $\beta<1$ (inset in Fig. 4). Indeed, using the parameters from Fig. 1, we see that for $\beta=0.556$ , $s^{*}N_{0}=S_{0}=268$ , while the approximation $\beta^{2.5}=230$ , in decent agreement with the plot. For $\beta=0.111$ , $S_{0}=0.1$ ie it reaches zero. The overall message is simple - that reducing $\beta$ (increasing the $R_{0}$ number) reduces the fraction of the population that stays uninfected over time.

The equation above accords no analytical solution for $s(t)$ and thus $f(t)$ beyond the fixed point. However, it has the general shape of a Fermi-Dirac distribution and can be solved to arbitrary accuracy numerically, using the evolution of a classical overdamped particle in an evolving potential. The situation can be complicated by the emergence of multiple infection and recovery events, a complex geopolitical situation with evolving knowledge, healthcare, government decisions, test taking etc, which can in their totality make parameter estimation necessary for prediction near impossible beyond a certain time frame (not to mention that severe nonlinearity can make prediction very short ranged even if the parameters were somehow known to reasonable accuracy). The focus therefore is to look at the generic structure of the solutions, and qualitative wisdom arising from them.

One problem with this equation is the fact that at large $t$ , $I=0$ which means everyone is either uninfected or fully recovered. This is not consistent with multiple events, and is in fact a consequence of the exponential drop in $I$ , especially around the fixed point of $s$ i.e., $df/dt\approx-(s^{*}-\beta)f/\tau_{0}$ . To counter this drop, we can make $\tau$ a stretchable time $\tau=\tau_{0}+\alpha t$ , which gives a slower decay $f\sim 1/t$ for long times at which point a change in $A$ can allow a re-emergence to occur. Indeed, we can modify our SIR equations to accomodate the stretch parameter $\alpha$ , using a revision of the time axis that stretches from linear to logarithmic

[TABLE]

whereupon we get

[TABLE]

In the analysis of a simple phenomenological model that we will borrow from Shur (2021), the infected and recovered are both captured within one count $f+r=1-s$ . As can be seen in the adjoining plots, this fraction amounts to a stretch out of the susceptible population from $\sim 0$ to a finite value $1-s^{*}$ , created by the recovery rate that endows the Shur analysis with a phenomenological stretch parameter $\alpha$ :

[TABLE]

where $f=1-s$ , and $\alpha$ is adjusted to the peak position $t_{m}$ as,

[TABLE]

Fig. 5 plots the SIR evaluated infected + recovered (i.e., once infected) population for various relative recovery rates $\beta$ , and shows that the effect of increasing recovery is to endow the bell curve with a larger stretch out character. We fitted the numerical SIR results with the Shur single peak equation (Eq. 10), and show how the Shur parameters relate to the SIR parameters (Fig. 6). We see that the the extracted stretch parameter $\alpha$ which increases roughly logarithmically with $B$ , the corresponding reduction in initial rise time $\tau_{0}$ (to keep the area under the curve meaningful), and the effective susceptible population at $t=0$ , $N_{0}\approx N_{p}=1/f_{0}$ , a fitting parameter setting the maximum height of the $d(I+R)/dt$ curve, with $N_{0}$ also varying weakly with $B$ .

IV Multiple Events

To accommodate multiple resurgence events, Shur’s paper multiplies the analytical first peak result with other fermi/antifermi distributions piece-meal, keeping in mind that the final answer is not the sum of separate peak distributions, but independent products (meaning that separate fits of each peak, as conventional in Lorentzian fits to peaked data, will not work). We start with the following expression from Shur with phenomenological mitigation parameters $\beta_{i}$ , peak times $t_{\beta i}$ and peak widths $\tau_{\beta i}$ .

[TABLE]

where $\beta_{n}$ describes mitigation when positive (demitigation when negative) for the $n$ th event, $t_{\beta_{n}}$ describes the onset of the event (roughly turning on at $t_{\beta_{n}}-3\tau_{\beta_{n}}$ ) and $\tau_{\beta_{n}}$ is the recovery time over which the event persists.

In keeping with the tone of this paper, let us try to justify this fitting form from the SIR equation. We can capture the same physics numerically with the SIR model brute force, by assuming in Eq. 1 a time-dependent infection rate

[TABLE]

A good agreement between the SIR result with this varying $A$ and the Shur multi-peak equation above is showcased in Fig. 7, correlating the parameters $\beta_{0}$ , $\tau_{\beta}$ and $t_{\beta}$ in Eq. 13 with the first mitigation peak parameters $\beta_{1}$ , $t_{\beta_{1}}$ , $\tau_{\beta_{1}}$ in Eq. 12. The interpeak separation $t_{\beta_{i+1}}-t_{\beta_{i}}$ is related to the serial interval $S_{I}$ between events.

The original work introducing these equations worked with data fits over only one or two peaks across a short time period. We show now that this can be extended across much larger time scales in multiple populations. However, there are some prices to pay. As peak heights vary substantially, the $\beta_{n}$ values are sometimes much bigger (Table 2) than originally proposed (10s of thousands instead of between -1 to +3). They are on the one hand restrictive (small adjustments to even such large $\beta$ values change the later peaks substantially) and on the other hand sensitive to other parameters, primarily the stretch function $\alpha$ . This is not altogether unexpected. The job of $\alpha$ is to sustain a background infected population that allows resurgence down the road (i.e., it creates a floor that the later peaks ride on). This makes it hard to remove any stretch features out of later peaks which necessarily become asymmetric and makes it hard to capture deep valleys in the data. It also depends sensitively on the floor value that the first peak subsides to - while data on the floor is harder to gather, its magnitude can affect subsequent parameter values sensitively. Simply put, we need a large negative $\beta$ to pull a peak out of a very low valley, so errors in estimating the valley floor affect $\beta$ values quite sensitively.

Fig. 8 and the accompanying table show an attempted fit for data owi across multiple countries over several months. Let us briefly discuss the fitting protocol, as suggested by Shur Shur (2021). Alternate methods for SIR fits exist Lounis and Bagai (2020), Bagai et al. (2020). We fit the rise time of the first peak with $\tau_{0}$ and its height with $f_{0}$ . The peak position $t_{m}\approx\tau_{0}\ln{1/f_{0}}/(1-\alpha\ln{1/f_{0}})$ then gives us the stretch parameter $\alpha$ . For subsequent peaks, the onset of a rise is roughly $t_{\beta}-3\tau_{\beta}$ , the peak width $\sim\tau_{\beta}$ and the height controlled by $\beta$ itself (a positive $\beta$ gives a drop while a negative $\beta$ , seen commonly here, gives a rise).

The goodness of fit can be quantified by the $R^{2}$ number listed on the figures. For a fitting function $z(t)$ (the phenomenological equation) compared to a target function $y(t)$ (the smoothened data), the fitting equation is given by

[TABLE]

where $\langle y\rangle$ is the time-averaged value of $y(t)$ . Note that for truly bad fits where the predicted regression curve $z$ departs further from $y$ than does the mean, $R^{2}$ can in fact be negative; however, for a reasonable fit we expect it to lie between 0 and 1, and venture closer to zero as the fit gets better and better. Also close to near zero values, the denominator could numerically vanish faster than the numerator, so we will need to manually prune any Matlab outputs with NaN near zero.

It is worth emphasizing that in spectroscopic analyses, fitting functions for multipeaked experimental data often decomposes naturally into sums of Lorentzians. Such a sum in effect allows us to fit the peaks easily, including the intervening valleys. A multiplicative model, in effect, treats the probabilities independently, which becomes a problem because the initial value for each peak is set by the valley floor (and thus the stretch parameter) of the first peak. An example of such fitting anomaly is seen from the data table. The fitted populations $N_{p}=f_{0}^{-1}$ follow the expected sequence across the countries. The $\beta_{1}$ values for India, South Korea and New Zealand are very high compared to the US, suggesting an aggressive initial mitigation strategy (quarantine, masking). However, the exact number is probably unphysical, as small variations in the fitted valley can alter $\beta$ in a hyper-sensitive fashion.

One way to address the valley effect is to assume that the recovered population goes back to being susceptible, giving us in effect, an SIS model and creating a robust residue of susceptibles for further infection, restoring the possibility of a moderate, physically meaningful $\beta_{n}$ .

[TABLE]

assuming an instant reintroduction of a recovered population back into susceptible (we can also build delays as incubation periods/temporary immunity post infection). The solution to $s=S/N_{p}$ is straightforward. Once again, we differentiate the first equation with time, and substitute expressions from $f=I/N_{p}$ from the first equation and replace the derivative $df/dt=-ds/dt$ to get a differential equation involving $s$ alone. We then replace $d^{2}s/dt^{2}=pdp/ds$ , where $p=ds/dt$ , solve for $p$ using an integrating factor, and then solve the first order differential equation involving $p=ds/dt$ , each time keeping track of initial conditions $p_{0}=(\beta-s_{0})f_{0}$ , with $f_{0}=1-s_{0}$ . The result is

[TABLE]

where at $t=0$ , $s=s_{0}=1-f_{0}$ . It is plain to see that for $\beta<1$ , $s$ approaches $s^{*}=\beta$ at long times, while for $\beta>1$ , $s$ approaches $1$ , qualitatively consistent with the results of the single peak $SIR$ model in Fig. 1. Once again, we can interpret this evolution as an overdamped particle in a potential, except in this case (compare with Eq. 5)

[TABLE]

with $AN_{p}=1/\tau_{0}$ , $\beta=B/AN_{p}$ . As before, we are ignoring an overall constant vertical shift in $U(s)$ related to $s_{0}$ that has no bearings on the dynamics and amounts to choosing the (arbitrary) ground of the potential.

There are many variants of the SIR model such as the SISV model S. Alonso-Quesada and Nistal (2018), where a part of the susceptible population gets vaccinated, while a fraction of the vaccinated go back to being susceptible. Or the SIRS model, where the infected population gets split between a recovered population and a newly susceptible population. There are other acronyms such as SIVR (SIR + virus variant), SIQR (SIR + Quarantine) De la Manuel Sen and Nistal. (2017), SIAR (SIR + symptomatic vs asymptomatic), Nikhil Anand and Somanath. (2020), SIR-S (SIR + stratification), SIXR (SIR + vaccination), P2SIR (SIR + travel) models Shannon Connolly and Heiner (2022),con . They can also have added features such as deaths, maternally derived immunity, exposure period, etc. The result in the previous paragraph implies that in many cases we can identify a suitable hybrid between SIR and SIS models, which have different asymptotic behaviors $s^{*}_{SIR}\sim\beta^{2.5}$ and $s^{*}_{SIS}\sim\beta$ . In fact, we can invoke a fraction that is re-inserted from the infected population back into susceptible (the rest to recovery), to fit an experimentally measured $s^{*}$ vs $\beta=1/R_{0}$ in a controlled experimental environment.

Within the SIR model itself, one can avoid the issue with poor valley fitting by going back to an additive decomposition of the form

[TABLE]

Fig. 9 shows the impact of an additive fitted equation (Eq. 18) on the infection rate. We use $f_{i}=N_{0i}^{-1}$ . The rest of the parameters are tabulated in Table 2. The calculated $R^{2}$ values are higher as shown in the figure, suggesting that we may get a better fit with an additive model. Further work will need to be done to connect these parameters with the mitigation parameters in the multiplicative model.

Note that standard device models for electron flow include a drift-diffusion component (drift is the sliding down the potential, and diffusion is an uncertainty related jitter that will be discussed shortly) and also a recombination-generation component. In this case, recombination would be a part of every SISV population that dies through natural causes and part of the infected population dying through infection related complications, while generation would be new births. Over the duration of a pandemic, spanning a few months, we can ignore birth and death rates and focus on a near constant population.

We now discuss the sensitivity of the parameters, and overall dynamics of error propagation, which has implications both on long term predictability, and on effective strategies for frequency of data collection.

V Error Propagation

While the above equations provide a simple fitting protocol for the spread of a pandemic, they do not carry any inherent predictive value as the relevant parameters are retrofitted. Predicting the parameters requires extensive data and insights into the underlying dynamics (e.g. linear vs nonlinear equations, time-dependence of parameters), typically both. To carry this forward we will need to generalize the SIR model to a spatio-temporal gradient diffusion equation, which is beyond the scope of this paper. While detailed epidemiological models can relate SIR parameters such as $\tau_{0}$ , $\alpha$ , $\beta_{i}$ etc to constants such as the reproduction number $R_{0}$ based on contact-tracing and cumulative incidence data, it is worth dwelling on the challenges of reliable prediction based on these numbers and equations alone.

We identify three sources of error in our fitting protocols - (a) reporting error $\sigma_{0}$ which has to do with initial uncertainty in data collection (known unknowns), (b) parametric uncertainty $D$ which has to do with oversimplification in our evolution equations in the face of more complex and unpredictable microscopic and macroscopic interactions (unknown unknowns - governments enact lockdowns, a breakthrough happens in vaccine technology), as well as uncertainty in the parameters that evolve (known unknowns - e.g. virus mutates, people congregate at popular venues such as festivals), and (c) measurement error $(\epsilon,\delta)$ arising inherently from the finite sized and noisy nature of the data itself. Of these three, the first two belong to a common category ( $D$ grows the initial uncertainty $\sigma_{0}$ linearly at first, later slowing it down to a sublinear function of time).

V.1 Reporting Error $\sigma_{0}$ and parametric uncertainty $D$

Let us start by discussing how to add a random noise in the evolution of the probability distribution function (PDF). In presence of additive white noise $I_{0}$ the overdamped Newtonian evolution equation becomes the celebrated Lángevin equation

[TABLE]

with stretch $\alpha$ subsumed in $t^{\prime}$ , where the noise has the following average moments

[TABLE]

where the diffusion constant $D$ is usually proportional to the mobility of the particle (velocity over force, set by the potential gradient and damping) and temperature which controls repulsive particle-particle interaction. The equation is generally solved using stochastic Monte Carlo techniques, where we use a random number generator to repeatedly construct $I_{0}$ values sampled from a given probability distribution. We then solve $f_{0}(t)=f(t,I_{0})$ for each given $I_{0}$ (e.g. Fig. 10) and extract a histogram. Indeed, this is one of the most popular ways of solving the pandemic equation, i.e., tossing coins to generate random values of $I_{0}$ from a given distribution and then numerically solving for $f$ - very often, this is done using a discretized (algebraic) deconstruction of the ODE onto a large grid of population elements and then the fraction $f$ is numerically extracted. It is however convenient, invoking the law of large numbers and ultimately the central limit theorem, to simplify the analysis (at least for intuitive reasoning) to directly estimate the PDF using the Fokker-Planck equation that comes from the Lángevin equation.

Since $I_{0}$ is extracted from a probability distribution $\Pi(I_{0})$ , typically Gaussian white noise with variance $D$ , the probability $P(f,t)$ for the output $f$ can be written as

[TABLE]

We can then calculate $dP/dt$ , using the property of a Gaussian $I_{0}\Pi(I_{0})=-D\partial\Pi/\partial I_{0}$ and the underlying Markov approximation, to derive the corresponding Fokker-Planck equation

[TABLE]

where the first term on the right of the probability current density $J$ shows the deterministic drift of the PDF towards the local minima of $U$ , while the second term shows the stochastic diffusion that tries to spread out and homogenize $P$ across the set of available $f$ values.

For the steady-state solution ( $t\rightarrow\infty,\partial P/\partial t=0$ ), the value of $J$ is independent of $f$ (Kirchhoff’s Law) and set by boundary conditions for a constant $D$ . This solution is the Boltzmann equation of the form

[TABLE]

and the initial distribution will show a combination of drift (sliding downhill and narrowing) and diffusion (spreading symmetrically and broadening) to transition over time until it maximizes at steady state to the value where $U$ is the lowest.

The transient behavior of the Fokker-Planck equation is not easy to solve analytically, but we can account for its dominant components. For instance, if we start with a Gaussian initial distribution $P(f,0)=\displaystyle Ce^{\displaystyle-(f-f_{0})^{2}/2\sigma_{0}^{2}}$ , then solving the FPE in Fourier domain, we can show that over time it will tend to spread as

[TABLE]

where $v_{0}(f)=-(1/\tau_{0})dU/df$ (this only works if we assume the potential varies slowly over the width of the PDF so that the linear expansion of $U$ around the peak of the PDF suffices). The distribution would move us back to $f=0$ if $B>A$ or to $f=1$ if $B<A$ (Fig. 3). The constant $C$ must integrate to the total population, so that $C=N_{p}/\sqrt{2\pi\sigma_{0}^{2}}$ . We can see in this solution both the drift component $v_{0}$ and the diffusion component $D$ playing their respective roles.

This equation shows us two sources of error inherent in the system - the first is the initial reporting error $\sigma_{0}$ which originates at the outset, such as through faulty data gathering, reporting, testing inaccuracies etc. The second is the parametric uncertainty, characterized by $D$ , where over time there is added uncertainty due to the very nature of pandemic spreading and our convoluted response to it. We can thereafter calculate the evolving mean

[TABLE]

which should track the peak at $f_{0}+v_{0}t$ , but with a growing standard deviation $\sigma_{0}^{2}+2Dt$ that will make prediction harder beyond $t_{max}\approx(\epsilon^{2}-\sigma_{0}^{2})/2D$ , where $\epsilon$ is the maximum acceptable error in $f$ .

Let us now connect this uncertainty with the pandemic equation. The easiest way is to introduce a Gaussian white noise in the parameters, $A,B\rightarrow A,B+\lambda_{A,B}\eta(t)$ , , where $\eta(t)$ is the normal probability distribution to mimick the noise ( $\langle\eta(t)\eta(t^{\prime})\rangle=\delta(t-t^{\prime})$ , meaning $\eta$ and $\lambda$ both have units of days*-1/2*). For the SIR potential (Eq. 5) this gives an added stochastic force and a corresponding diffusion constant by mapping Eq. 5 to Eqs. 19, 20.

[TABLE]

As expected, the uncertainties at the two fixed points $s=0,1$ are zero, seen also in Fig. 10.

Note that we assumed a Gaussian noise for simplicity, but that distribution has infinite support ( $f$ values are unrestricted), while we need to operate within the range $(0,1)$ for $f$ . For a tight standard deviation, and $\langle f\rangle$ lying between $\sim(\sigma,1-\sigma)$ , with $\sigma$ being the standard deviation, we will for the most part see physically meaningful $f$ values, but on occasion we will see unphysical $f$ s that venture out of this limit. We can either choose to eliminate those $f$ values, average over them, or switch to a uniform distribution over the range $(0,1)$ , for which a physically intuitive Fokker-Planck equation, however, can be challenging to derive.

In Fig. 11, we apply the Fokker-Planck (FP) equation to Eq. 15 with noise in A. We take $A=A+\lambda\eta(t)$ , where $\eta(t)$ is the normal probability distribution to mimick the noise. The Fokker Planck solution is compared to the Monte Carlo results with a random distribution for $A$ . As can be seen from the figure, while diffusion tends to broaden the PDF, because the diffusion constant itself is $s$ dependent, there is a tightening of the distribution around the equilibrium value $s^{*}$ , near which the diffusion constant closes to $[\lambda N_{p}s^{*}(1-s^{*})]^{2}/2$ . This means that there is initially a growth in uncertainty but beyond peak infection that reduces as we reach the fixed point.

As an illustration, suppose we have $\beta=0.556$ , and $\tau_{0}=10$ days the rise time for an infection. We begin with an initial reporting uncertainty $\sigma_{0}=0.1$ (i.e., $10\%$ ). We also assume a parameter uncertainty in $1/\tau_{0}=AN_{p}$ equal to a fraction of $\Delta=0.1$ ( $10\%$ again). We can map this uncertainty with the standard deviation, meaning $\lambda^{2}=\Delta/\tau_{0}$ . The steady state $s^{*}=0.27$ , so the long time diffusion constant $D=[\lambda s^{*}(1-s^{*})]^{2}/2=0.0002$ /day. This means for the initial uncertainty to balloon up to say $\epsilon=0.2$ (20 $\%$ ) will take $t_{max}\approx(\epsilon^{2}-\sigma_{0}^{2})\tau_{0}/\Delta[s^{*}(1-s^{*})]^{2}\approx 75$ days (this analysis is admittedly over-simplified because we start from $s_{0}\approx 1$ where the diffusion constant is also low, and the infection time $\tau$ has a time-dependent stretch that this back-of-the-envelope treatment ignores. However, we have outlined above the tools to calculate $s(t)$ , $\tau(t)$ and do a more rigorous projection, should the need arise. Our estimated $D$ puts a lower bound on the data validity period, since $s(1-s)$ becomes maximum when $s$ reaches $0.5$ and $D\approx 0.0003$ /day).

For a multi-peaked solution, we can go a few steps further to estimate the time after which the Brownian particle can jump over the barriers of height $\Delta U$ in the $U(s)$ landscape, following an Arrhenius law $t_{jump}^{-1}=\nu e^{-\Delta U/\tau_{0}D}$ , where the attempt frequency $\nu$ is set by the dynamics in the valleys. Such a jump could transfer the configuration coordinates $s,f$ between metastable states (local minima) until subsequent noisy events can rescue them. We leave such analyses for future publications.

V.2 Measurement uncertainty $\delta$

It is also worth emphasizing that there is an error with fitting the solutions to the Fokker-Planck equation to stochastic data over a finite dataset of sample size $N_{s}$ . Based on Gaussian statistics, we can estimate that for an error margin $\epsilon$ (ie, accuracy probability $1-\epsilon$ ) the acceptable margin of error $(-\delta,\delta)$ around the running average for a finitely sampled set of size $N_{s}$ is given by Manohar (2015)

[TABLE]

For a 5 day data period of averaging $N_{s}=5$ , we can say with 95 $\%$ confidence that the data swing $\delta=1.95$ , meaning there is almost a $200\%$ potential swing in smoothened data extracted, due to finite sample size errors. On the other hand, making the sampling $N_{s}$ too large has its problems, as that tends to average over and wash out the significant events. Fig. 12 shows the error that builds up if the sampling time runs into 100s of days. Naturally, we expect an optimal sampling rate between these two limits.

VI Conclusions

Predicting pandemics is highly involved, as our knowledge of the underlying causes is often evolving in real time. Simple models provide broad insights, especially if we can find a way to visualize the evolution, develop phenomenological models with epidemiologically meaningful parameters, and have an accompanying error estimate. We have shown how we can visualize the pandemic response as the drift of an overdamped classical Brownian particle in a potential towards their local minima, along with uncertainty related diffusion away from those minima (strong enough uncertainty can diffuse the particle over a barrier into a metastable state). The shape of the potential is controlled partly by intrinsic epidemiology, and partly by the various mitigation strategies and sociological constants at play. Simple multiplicative (Eq. 12) and additive (Eq. 18) model fitting equations have been offered and their fits quantified across various countries. A part of the error arises from initial reporting uncertainty, which is amplified over time by parametric uncertainty except near fixed points where the parameters play minimal role on the dynamics. The quality of the data itself depends on the sampling time, where our ability to separate signal from noise (slower frequency evolution vs higher frequency random wiggles) poses a limit to the fitting equation and their overall predictability. With simple models and their uncertainties in place, we can focus on behavioral trends with respect to variation of these parameters such as cyclical vs abrupt quarantine measures. While these equations are the equivalent of a macrospin model in magnets (no spatial or geographical variation included), they need to eventually be extended to account for multi-patch solutions. Nonetheless, the quasi-analytical and easily visualizable cluster averages, and in particular error thresholds could be of potential use in predicting simple trends and evaluating policy decisions.

VII Acknowledgments

We acknowledge initial discusions with Prof. Keith Williams (UVA, ECE) who suggested the use of Lotka-Volterra and SIR approaches, and later discusions with Prof. Anil Vullikanti (UVA, Biocomplexity institute). This project was funded by the UG internship program within the SRC-CRISP center, the NSF-REU supplement to the NSF-IUCRC center for Multifunctional Integrated Systems Technology (MIST), and the NSF grant CBET 1802588.

Bibliography28

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Huang and Qiao (2020) N. E. Huang and F. Qiao, Sci Bull (Beijing) 65 , 425 (2020).
2Braca et al. (2021) P. Braca, D. Gaglione, S. Marano, L. M. Milleflori, P. W Illett, and K. Pattipati, IEEE Signal Process Lett. 28 , 683 (2021).
3Adiga et al. (2020 a) A. Adiga, J. Chen, M. Marathe, H. Mortveit, S. Venkatramanan, and A. Vullikanti, J Indian Inst Sci 100 , 900 (2020 a).
4Adiga et al. (2020 b) A. Adiga, D. Dubhashi, B. Lewis, M. Marathe, S. Venkatraman, and A. Vullikanti, J Indian Inst Sci 100 , 793 (2020 b).
5Bertozzi et al. (2020) A. L. Bertozzi, E. Franco, G. Mohler, and D. Sledge, PNAS 117 , 16732 (2020).
6Craig et al. (2021) B. R. Craig, T. Phelan, J.-P. Siedlarek, and J. Steinberg, Economic Commentary 2021-10 , 1 (2021).
7Cao and Liu (2021) L. Cao and Q. Liu, ar Xiv:2104.12556 v 3 (2021).
8Shakeel et al. (2021) S. M. Shakeel, N. S. Kumar, P. P. Madalli, R. Srinivasaiah, and D. R. Swamy, Osong Public Health Res Perspect 12 , 215 (2021).

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

A simplified drift-diffusion model for pandemic propagation

Abstract

I Introduction

II Simplifying the SIR Model

III Interpreting Pandemic Dynamics as an Overdamped Particle Drifting down a Potential

IV Multiple Events

V Error Propagation

V.1 Reporting Error σ0\sigma_{0}σ0​ and parametric uncertainty DDD

V.2 Measurement uncertainty δ\deltaδ

VI Conclusions

VII Acknowledgments

V.1 Reporting Error $\sigma_{0}$ and parametric uncertainty $D$

V.2 Measurement uncertainty $\delta$