More than presence-absence; modelling (e)DNA concentration across time and space from qPCR survey data
Milly Jones, Eleni Matechou, Diana Cole, Alex Diana, Jim Griffin, Sara Peixoto, Lori Lawson Handley, Andrew Buxton

TL;DR
This paper introduces a new model to estimate DNA concentration from qPCR data, improving accuracy in ecological monitoring.
Contribution
A novel modeling framework that estimates DNA concentration while accounting for contamination, inhibition, and data variability.
Findings
The model improves accuracy in DNA concentration estimation compared to traditional methods.
It effectively handles contamination and inhibition in qPCR data.
The framework was validated through simulations and applied to real-world ecological case studies.
Abstract
Environmental DNA (eDNA) surveys offer a revolutionary approach to species monitoring by detecting DNA traces left by organisms in environmental samples, such as water and soil. These surveys provide a cost-effective, non-invasive, and highly sensitive alternative to traditional methods that rely on direct observation of species, especially for protected or invasive species. Quantitative PCR (qPCR) is a technique used to amplify and quantify a targeted DNA molecule, making it a popular tool for monitoring focal species. Modelling of qPCR data has so far focused on inferring species presence/absence at surveyed sites. However, qPCR output is also informative regarding DNA concentration of the species in the sample, and hence, with the appropriate modelling approach, in the environment. In this paper, we introduce a modelling framework that infers DNA concentration at surveyed sites…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8- —http://dx.doi.org/10.13039/501100000266Engineering and Physical Sciences Research Council
- —http://dx.doi.org/10.13039/501100000270Natural Environment Research Council
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEnvironmental DNA in Biodiversity Studies · Identification and Quantification in Food · Genomics and Phylogenetic Studies
Introduction
Environmental DNA (eDNA) is DNA that individuals of a species leave behind in the environment. Therefore, eDNA surveys allow monitoring of species in the wild by targeting detection of their DNA in corresponding physical samples, such as water or soil [45]. eDNA is increasingly becoming a standard application in bio-monitoring, both alongside and independently of traditional survey methodologies [35]. This is particularly the case for protected [4] or invasive [44] species, as eDNA surveys can be more cost effective [13, 37] and provide high probabilities of species detection [26, 31] in an inexpensive, and non-invasive survey approach. Therefore, eDNA surveys are quickly becoming a widely employed sampling method for wildlife populations [35], and models for the corresponding data are increasingly being developed [7, 12, 17, 39].
eDNA surveys comprise of three stages: DNA availability in the environment, DNA collection in environmental samples, and DNA analysis of the samples in the laboratory [17]. The amount of DNA available for collection from the environment is expected to vary spatially and/or temporally, and as a function of landscape and site covariates, with additional stochasticity at the individual site level [19]. During DNA collection, across the surveyed site(s), and at one or more time points, a number of samples are collected from the environment. DNA concentrations in collected samples are noisy observations of the DNA concentration in the environment and can be functions of environmental covariates (such as temperature, rainfall, or pH [19]) or technical covariates (such as collection method [5]). DNA analysis typically relies on PCR (Polymerase Chain Reaction), during which the physical samples are divided into technical replicates, and the DNA in each replicate is amplified using appropriate primers. In the quantitative PCR (qPCR) protocol, DNA copies in a sample are successively amplified through several fluorescence-based PCR cycles. The qPCR process results in an exponential amplification curve that measures the fluorescence signal against the PCR cycle number. The threshold cycle (CT) value is then the fractional cycle number at which point the fluorescence of a sample crosses a threshold, which is set by the corresponding software as the point where the fluorescence signal exceeds the background noise but is still within the exponential growth phase. Should a sample’s fluorescence signal surpass the threshold, then the PCR run is said to be successful (positive), and the sample has amplified. In general, samples with higher amounts of DNA concentration are expected to amplify faster (i.e. in an earlier PCR cycle), and hence have lower CT values [41].
qPCR is a widely used method for monitoring targeted species as it can be tailored to the species of interest by designing species-specific primers [35]. When modelling qPCR data, focus is often on inferring and reporting DNA presence/absence at surveyed sites [2, 17] by only using the information on whether each PCR replicate was positive or not. However, the link between CT and the initial DNA concentration in the sample can be ascertained using standards (samples of known concentration run alongside samples collected from the environment). Modelling log-concentration in the standard as the covariate and the CT value as the response gives a straight regression line with negative slope [30]. Comparing the CT values between standards, which have a known DNA concentration, and CT values from collected physical samples allows inferring DNA concentrations for the latter. Indeed, more recently, there has been a greater effort to infer DNA concentration rather than just presence/absence from qPCR data [39]. Internally the CT values are transformed to give estimates of DNA concentration in samples based on the regression line generated by the standards. These values are often then used to fit models investigating effects of covariates on DNA concentrations in the environment in a two stage design rather than a single model propagating uncertainty through all analysis [6, 29]. One stage models linking CT values to DNA concentrations in the environment include Espe et al. [12] and Shelton et al. [39], though these do not account for all error and noise in the data-generating process discussed below.
Despite best and continuously improving field and lab practices, DNA-based surveys will always lead to noisy and error-prone data. In addition to the variation in DNA concentrations at availability and collection stages, CT values themselves are noisy indicators of the amount of DNA in the sample, as results from PCR runs on the same sample and under the same protocols vary. The regression line between CT values and DNA concentration is expected to vary slightly across PCR assays [41]. Additionally, the dispersion of CT values for a given DNA concentration increases as the concentration of DNA in the sample decreases [14, 27] (in other words, CT values are heteroscedastic). Both the variation in the regression line across plates and the CT heteroscedasticity can be seen in Figure 3 for the standards from one of the case studies presented in this paper, but the pattern is expected in all cases. Furthermore, qPCR analyses are only run for a maximum number of cycles, CT.max. Samples with low concentrations of DNA often therefore fail to amplify despite presence of DNA as their fluorescence signal failed to pass the threshold before CT.max elapsed. In this way, qPCR analyses can experience false negative errors at low concentrations of DNA in samples due to this right censoring of CT values. In addition to the natural variation in CT values described above, PCR analyses may suffer from contamination or inhibition. Contamination may occur in lab settings due to the presence of target species DNA outside of collected samples that enter into replicates on PCR plates. Inhibition occurs when PCR inhibitors interact with the PCR amplification process to reduce the efficiency of the reaction, and in extreme circumstances can prevent amplification even if the target sequence of DNA is present [21]. Failure to account for contamination may lead to biased inferences, such as false positive errors (incorrectly inferring that presence of DNA in the sample comes from the environment) or high DNA concentrations in the environment. Similarly, failure to account for inhibition could lead to biased inferences about low DNA concentrations [18] or a false negative error (incorrectly inferring absence of DNA in the sample).
Currently, different ad-hoc measures are taken to deal with suspected contamination or inhibition in samples, PCRs, or plates, but these differ between labs or research protocols, are arbitrary, and result in data loss. Currently, a CT value shift of over 2 cycles [42], 3 cycles [20], or 5 cycles [40], in the IPC (internal positive controls) of the environmental sample can be considered evidence of inhibition. When potential inhibition is identified, the affected sample is often diluted and re-analysed with a correction factor to account for the dilution, however the dilution may also result in a failure of samples to amplify [18]. To account for contamination, often, if there is a detection of DNA in negative controls (field blanks, extraction, or no template controls), then any positive detections associated with the sampling occasion or plate are discarded [18, 22, 37]. This may present a loss of information and wasted effort. Alternatively, rather than discard samples, the maximum average concentration of DNA associated with negative field controls can be used as a baseline amount of DNA to subtract from samples collected in the environment [28]. It has been suggested that increasing the number of samples taken or technical replicates analysed may help make inferences more robust, as in occupancy studies [7, 18], but there is a need for a single framework to estimate DNA concentration across surveyed sites, accounting for contamination and inhibition, without discarding samples or resorting to ad-hoc rules of thumb.
Previous work on modelling false negative errors due to low DNA concentrations includes using right censoring at CT.max [12], or using a hurdle model (modelling the probability of amplification and then distribution of CT values conditional on amplification [39]). Within an occupancy framework, Guillera-Arroita et al. [15] and Griffin et al. [17] account for both false positive and false negative errors by treating the data as binary successes and failures. However, little attention has been paid to accounting for contamination and inhibition while estimating DNA concentrations without discarding samples. Further, whilst the relationship between CT values and log-DNA has been established [30] and modelled [12, 39], few models account for the heteroscedasticity in CT values across log-DNA (Matz et al. [27] do so for qRT-PCR with a Poisson log-normal model). In particular, if accounting for contamination or inhibition in samples, it becomes necessary to understand whether differences in CT values are due to the increased variation at lower concentrations or due to error.
We present a model that links CT to DNA concentration (building from one stage models as in Espe et al. [12]; Shelton et al. [39]), and include a collection stage to model variation in collected DNA in samples across a site, allowing for covariates at both the site and sample collection stages [39]. Additionally, we include a temporal model on the available DNA across sites. Similar models at this stage include work by Shelton et al.[39] who consider a spatially smooth function on log-DNA over their coastal site. Finally, we account for contamination or inhibition of technical replicates at the PCR analysis stage, and allow for these to be identified and incorporated into the model in such a way as to mitigate potential biases in inferred DNA concentration (particularly when DNA concentration is low and so these effects are more keenly felt). We also account and correct for the heteroscedasticity in the distribution of CT values across DNA concentrations.
The paper is structured as follows. The model is presented in Section 2. A simulation study is used in Section 3 to compare, over a range of survey designs, the full model to one model ignoring contamination/inhibition and another ignoring CT variance heterogeneity. We illustrate how the model can be utilised via three case studies in Section 4. The first surveys zebra mussels (Dreissena polymorpha) in aquatic systems across the UK. This survey covers twenty sites (five sites across four different aquatic environments - lakes, rivers, canals, and reservoirs), with each site being visited only once between July and August 2021. The second surveys zebra mussels in the River Hull and Eccup Reservoir, with repeat visits to each site once a month from December 2020 to November 2021. The final study surveys great crested newts (Triturus cristatus) in eight ponds at the University of Kent campus, with repeat fortnightly visits from February to October 2015.
Model
The data consist of DNA sampled from the environment at n sites and across T time points. Let \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$M_{it}$$\end{document} be the number of samples taken from the environment at site i, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$i = 1, \cdots , n$$\end{document} , and time t, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t = 1, \cdots , T$$\end{document} . We collect site-specific covariates \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X^b$$\end{document} , and sample-specific covariates \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X^w$$\end{document} . In the lab, the m-th sample from site i and time t is divided into \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K_{imt}$$\end{document} PCR replicates (also called technical replicates). Each replicates k is then analysed on some PCR plate p, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p = 1, \cdots , P$$\end{document} , during a PCR run. We denote by \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_{imtk}$$\end{document} the cycles to threshold (CT) value for replicate k. PCR runs have a maximum CT, which we denote CT.max, after which the run is ended. In what follows, i indexes the site, t the time, m the sample, k the replicate, and p the plate.
Our model (see Figure 1) is divided into 3 stages: DNA availability, DNA collection, and DNA analysis, as is standard for models for DNA-based data of this type [39]. The first stage models the log-DNA concentration at each site and time point, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_{i,t},$$\end{document} as a function of site-specific covariates and the DNA concentration at the previous time point. The second stage models the log-DNA concentration in each sample, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$v_{imt}$$\end{document} , as a function of the amount of DNA available and of sample-specific covariates. The last stage models \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_{imtk}$$\end{document} for each replicate. The expected CT values, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _{imtk}$$\end{document} , are a function of the DNA concentration in the sample, where replicates with greater concentrations of DNA are expected to amplify faster. The variability of CT about \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _{imtk}$$\end{document} is also dependent on the DNA concentration in the sample, so that the distribution of CT is heteroscedastic. The variation in CT values decreases as the concentration of CT increases (see for example Figure 3b). A replicate may also experience contamination or inhibition. In the case of contamination, the CT value is expected to be smaller than \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _{imtk}$$\end{document} due to the presence of additional DNA. For inhibition, the CT value is expected to be larger than \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _{imtk}$$\end{document} due to factors interfering with the amplification of DNA. Some replicates may fail to amplify before CT.max elapses, either due to low concentration of DNA or inhibition, and these result in values we denote by NA. The corresponding model for each stage is described below.
DNA availability We model \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_{i,t}$$\end{document} , as a latent AR(1) process plus exogenous predictors [34]:
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} l_{i,t} \sim \text {N} (\rho _i (l_{i,t-1} - X^b_{i, t-1} {\varvec{\beta }}_b) + X^b_{i,t}{\varvec{\beta }}_b, \tau ^2), \end{aligned}$$\end{document}where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\beta }}_b$$\end{document} is the vector of coefficients for the covariates, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X^b_{i,t}$$\end{document} , of the corresponding sampling occasion, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau ^2$$\end{document} models the noise across time (assumed to be constant across sites), and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rho _{i}$$\end{document} is the AR(1) rate coefficient for site i. Therefore, we model \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_{i,t}$$\end{document} as the sum of a latent AR(1) process and the effect of predictors at the time of sampling. The latent AR(1) process is independent of the predictors; in other words \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rho _i$$\end{document} is the growth term of the latent process, which is obtained by subtracting the predictors from \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_{i,t}$$\end{document} at a sampling occasion. In order to estimate the temporal terms \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rho _i$$\end{document} , we borrow information across sites using a hierarchical model so that:
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \rho _i&\sim \text {N}(\rho _0, \sigma ^2_\rho ), \hspace{2cm} \text {for } i = 1, \cdots , n, \end{aligned}$$\end{document}for \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rho _0$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma ^2_\rho $$\end{document} the mean and noise across sites, respectively.
For \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t=1$$\end{document} (or when \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T=1$$\end{document} so that we only have a single time point), we let:
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} l_{i,1} \sim \text {N}({\varvec{\beta }}_{b,0} + X^b_{i,1} {\varvec{\beta }}_b, \tau _1^2), \end{aligned}$$\end{document}where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau ^2_1$$\end{document} models the variation across sites at \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t=1$$\end{document} (rather than across time) and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\beta }}_{b,0}$$\end{document} is the mean log-DNA across sites at \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$t=1$$\end{document} .
DNA collection Given \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_{i,t}$$\end{document} , we then model \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$v_{imt}$$\end{document} as follows:
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} v_{imt} \sim \text {N}(l_{i,t} + X^w_{imt}{\varvec{\beta }}_w, \sigma ^2), \end{aligned}$$\end{document}where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\beta }}_w$$\end{document} is the vector of covariate coefficients for the sample-specific covariates \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$X^w_{imt}$$\end{document} , and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma ^2$$\end{document} is the noise across samples. Unlike Diana et al. [10], we do not explicitly model false negative (inhibition) or positive (contamination) errors at the sample collection stage, and assume that the noise term \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma ^2$$\end{document} adequately accounts for these events. We return to this assumption in Section 5.
DNA analysis We assume that DNA in the sample is uniformly distributed, so that each replicate \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$k = 1, \cdots , K_{imt}$$\end{document} from the same sample has concentration \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w_{imt} = \exp (v_{imt})$$\end{document} . To account for the replicates that fail to amplify before CT.max, we use a right censoring model for the CT values [12]. We first present a model for the uncensored CT values, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tilde{C}_{imtk}$$\end{document} , and then censor these at CT.max to get the observed CT values \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_{imtk}$$\end{document} .
We model \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tilde{C}_{imtk}$$\end{document} as linear in log-DNA, with plate-specific regression coefficients, with log-variance as linear in log-DNA, and allow for contamination and inhibition with a mixture model. For the mixture model, we introduce a latent indicator variable, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _{imtk}$$\end{document} , such that:
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \gamma _{imtk} = {\left\{ \begin{array}{ll} 1 \hspace{1cm} \text {replicate contaminated with probability } p_c,\\ 2 \hspace{1cm} \text {replicate inhibited with probability } p_h,\\ 0 \hspace{1cm} \text {neither with probability } 1- p_c-p_h, \end{array}\right. } \end{aligned}$$\end{document}where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_c$$\end{document} is the probability a replicate is contaminated and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_h$$\end{document} is the probability a replicate is inhibited. We assume that a replicate cannot be simultaneously contaminated and inhibited, and that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_c$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_h$$\end{document} are equal across all PCR runs.
Conditional on \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _{imtk}$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tilde{C}_{imtk}$$\end{document} is then modelled as:
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \tilde{C}_{imtk}&\sim {\left\{ \begin{array}{ll} \textrm{N}(\mu _{imtk}, \sigma ^2_y(w_{imt})) & \text {if } \gamma _{imtk} = 0,\\ \textrm{TN}_{0, \mu _{imtk}}(\mu _{imtk}, \sigma ^2_c) & \text {if } \gamma _{imtk} = 1,\\ \textrm{TN}_{\mu _{imtk}, \infty }(\mu _{imtk}, \sigma ^2_c) & \text {if } \gamma _{imtk} = 2,\\ \end{array}\right. }\\ \mu _{imtk}&= \alpha ^1_{p} + \alpha ^2_{p} \log (w_{imt}), \end{aligned}$$\end{document}where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hbox {TN}_{a,b}(\mu , \sigma ^2)$$\end{document} is the normal distribution with mean \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu $$\end{document} and variance \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma ^2$$\end{document} , truncated between a and b. The plate-specific regression coefficients, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha ^1_p$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha ^2_p$$\end{document} , share information across the P plates using a hierarchical model:
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \alpha ^1_p&\sim \text {N}(\alpha ^1_0, \sigma ^2_{\alpha }) \hspace{1cm} \text {for } p = 1, \cdots , P, \\ \alpha ^2_p&\sim \text {N}(\alpha ^2_0, \sigma ^2_{\alpha }) \hspace{1cm} \text {for } p = 1, \cdots , P. \end{aligned}$$\end{document}When \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _{imtk} = 0$$\end{document} , to account for the heteroscedasticity of CT values, we follow the method of Cook and Weisberg [9] and write:
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \log (\sigma _y^2(w_{imt})) = a_1 + a_2 \log (w_{imt}), \end{aligned}$$\end{document}where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_1$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_2$$\end{document} are the regression coefficients for log-variance against log-DNA.
The truncated normal distribution is used to capture the expected behaviour of contaminated ( \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _{imtk} = 1$$\end{document} ) and inhibited ( \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\gamma _{imtk}=2$$\end{document} ) replicates. For example, as discussed earlier, we expect contaminated replicates to amplify faster than the expected value \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu _{imtk}$$\end{document} (the expected value when no contamination is present). In both cases, we take some large \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma _c^2$$\end{document} (in practice we take \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma _c$$\end{document} about the order of CT.max) which enables \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tilde{C}_{imtk}$$\end{document} to take a wide range of values within the appropriate interval. We expect contamination or inhibition to be rare, or for the distributions of affected samples to vary widely across sampling occasions and PCR runs. This approach (similar to a variance-inflation method [25]) means that we need not learn the distribution of contaminated or inhibited replicates as in a mean-shift method. The mixture model then allows for \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_c$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_h$$\end{document} to be estimated separately.
Finally, given \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tilde{C}_{imtk}$$\end{document} and CT.max, the observed \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C_{imtk}$$\end{document} is modelled as:
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} C_{imtk} = {\left\{ \begin{array}{ll} \tilde{C}_{imtk} \hspace{1cm} & \text {if } \tilde{C}_{imtk} < \text {CT.max},\\ \text {NA} \hspace{1cm} & \text {otherwise.} \end{array}\right. } \end{aligned}$$\end{document}On each plate p, alongside replicates from samples collected in the environment, are standards (replicates of known amount of log-DNA). Analogous to the samples collected during sampling occasions, we denote by \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w^{\star }_{s}$$\end{document} the amount of DNA, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$C^{\star }_{s}$$\end{document} the CT, and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p^{\star }_s$$\end{document} the plate in which standard s is analysed. The DNA analysis model for standards is the same as for the environmental replicates, except \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w^{\star }_s$$\end{document} is known and does not need to be learnt. In this way, the standards help inform the model PCR analysis parameters (CT regression, CT heteroscedasticity, and probability of contamination and inhibition parameters).
We summarise the full model in Figure 1, highlighting the different analysis stages for samples collected from the environment. In Figure 2, we present the directed acyclic graph (DAG) of the full model (including the analysis of standards), showing the relationships between variables.Fig. 1. The full model highlighting the three modelling stages: DNA availability, DNA collection, and DNA analysisFig. 2Directed acyclic graph representing the relationships between the variables in the model
We implemented the model in NIMBLE [11] in R [32], and all results presented in the paper are obtained using NIMBLE. Model code is available on https://github.com/millyljones/Spatio-temporal-eDNA/tree/main. We include a constraint within the MCMC that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_c + p_h < 1 - (p_c + p_h)$$\end{document} . In other words, the probability a replicate is contaminated or inhibited cannot be greater than the probability that it is not so. At each iteration of the MCMC, if proposed probabilities \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_c$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_h$$\end{document} violate this constraint, then both the proposed probabilities are rejected.
Simulation Study
Ignoring false positive and false negative errors can have an impact on inference of model parameters. For example, Buxton et al. [7] found that ignoring false positive errors in an occupancy study resulted in overestimation of occupancy probabilities. They also found that increasing replication (both in terms of number of samples M and technical replicates K) reduced bias and posterior credible interval (PCI) width for model parameters in their occupancy study. In this section, we present a simulation study that investigates the effect of ignoring contamination and inhibition at the PCR analysis stage for DNA concentration studies. We consider a range of study designs (varying the number of samples M and replicates K) to compare their effect on the estimation of model parameters. We also investigate the effect of ignoring CT heteroscedasticity, as this has not been considered by previous models.
Let Model 1 be the full model described in Figure 1. Let Model 2 be as Model 1, except that instead of modelling variance of CT values using \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma _y^2(w_{imt})$$\end{document} , the variance of CT values, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma ^2_P$$\end{document} , is held constant across the plate the replicates are analysed on. Let Model 3 be as Model 1, except that we take \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_c = p_h = 0$$\end{document} , so that we ignore contamination and inhibition.
Across the simulations, we assume that there are \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n=10$$\end{document} sites, each visited \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$T = 20$$\end{document} times. All samples from a single sampling occasion are analysed on a single, distinct plate. For the standards, on each plate, we take \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K^{\star } = 3$$\end{document} replicates of seven concentrations, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$3\times 10^z$$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$z = 1, \cdots , 7$$\end{document} . We consider a range of sample designs. We consider taking \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$M = 1, 2, 5,$$\end{document} or 10 samples at each site, and then consider using \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K = 1, 2, 5, \text {or} 10$$\end{document} replicates per sample in the analysis. At each site we observe 2 covariates, one continuous ( \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sim $$\end{document} N(0,1)) and one binary ( \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sim $$\end{document} Bern(0.5)), and set \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\beta }}_b = (1, -1)$$\end{document} respectively. With each sample we also observe 2 covariates, one continuous ( \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sim $$\end{document} N(0,1)) and one binary ( \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sim $$\end{document} Bern(0.5)), and set \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\beta }}_w = (1, -1)$$\end{document} respectively. For each contaminated replicate, we model the amount of contamination, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda $$\end{document} , using a normal distribution with mean \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$3\times 10^3$$\end{document} and standard deviation 100. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda $$\end{document} is then added to the amount of DNA in the replicate, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$w_{imt}$$\end{document} . For inhibited replicates, we delay the amplification process proportionately to the amount of DNA in the sample. We let the expected CT value for inhibited samples indicate that the DNA concentration is 90% lower than the true amount. We discuss the choice of distributions for contamination and inhibition in Section 5. For the first time point at each site, we let \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\beta }}_{b,0} = 6$$\end{document} , and then let \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rho _i = 1$$\end{document} for all sites. The variances \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau ^2, \tau ^2_1, \sigma ^2 = 1$$\end{document} , and the CT variance parameters are set to \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_1 = 0.2, a_2 = -0.25$$\end{document} . The plate regression parameters \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha ^1_p$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\alpha ^2_p$$\end{document} are drawn from \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {N}(44, 0.1)$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\text {N}(-1.7, 0.01)$$\end{document} respectively. The maximum cycle number CT.max is set to 40.
In the first set of simulations, we let \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(p_c, p_h) = (0.05, 0.1)$$\end{document} , and in the second set we let \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(p_c, p_h) = (0.01, 0.02)$$\end{document} . These two cases compare the model’s performance under substantial and then very small contamination and inhibition. We take the probability of inhibition to be greater than contamination as this is what is more commonly observed in practice. For example Griffin et al. [17] found false negative and false positive error probabilities of 19% and 5% during the laboratory analysis stage. We also find that \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_h$$\end{document} is generally greater than or similar to (with overlapping PCIs) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_c$$\end{document} in our case studies presented in Section 4.
For each simulation we generate and analyse \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N = 100$$\end{document} data sets. In Section 3.1, we compare the posterior summaries of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_{i,t}$$\end{document} and other model parameters across the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N = 100$$\end{document} data sets. Details about prior distributions and MCMC parameters for each simulation can be found in Section S1.
Simulation results
We denote by \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_{i,t, j}$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tilde{l}_{i,t, j}$$\end{document} the true and posterior mean log-DNA concentrations respectively for site i, at time t, in simulation j. For N simulations, we compute the mean square error (MSE) as:
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \text {MSE} = \frac{1}{NnT} \sum _{j=1}^N \sum _{i=1}^n \sum _{t=1}^T (l_{i,t,j} - \tilde{l}_{i,t,j})^2. \end{aligned}$$\end{document}We denote by \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\theta _j$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tilde{\theta _j}$$\end{document} the true value and the posterior mean of some parameter in the model for simulation j. Then the mean bias (MB) is:
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\begin{aligned} \text {MB} = \frac{1}{N} \sum _{j=1}^N (\tilde{\theta _j} - \theta _j) \end{aligned}$$\end{document}The 95% PCIs are computed given the 2.5% and 97.5% quantiles of the posterior distribution.Table 1. Mean square error (MSE), mean range of 95% PCIs (R), and mean coverage of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_{i,t}$$\end{document} (C) across Model 1 (full model), Model 2 (constant CT variance), and Model 3 (ignoring contamination and inhibition) under different sampling designs. Probability of contamination and inhibition \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(p_c, p_h) = (0.05, 0.1)$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(p_c, p_h) = (0.01, 0.02)$$\end{document} \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(p_c, p_h) = (0.05, 0.1)$$\end{document} Model 1Model 2Model 3M = 1MSERCMSERCMSERCK = 11.2123.7260.9301.2193.6360.9221.4933.9760.928K = 20.8583.3120.9400.8423.1830.9311.0713.5100.933K = 50.7763.0310.9330.7902.8920.9231.0123.1820.924K = 100.7463.0130.9420.7832.8360.9201.5773.2610.938M = 2MSERCMSERCMSERCK = 10.7152.9530.9360.7112.8750.9320.8983.150.934K = 20.6052.6360.9390.6122.5350.9310.9002.7970.925K = 50.4422.4270.9480.5022.3200.9300.5872.5790.936K = 100.4982.4020.9430.5892.2630.9194.8552.8980.931M = 5MSERCMSERCMSERCK = 10.4242.2160.9360.4322.1550.9290.6732.4270.923K = 20.3121.9600.9460.3301.8650.9330.4922.1460.935K = 50.2491.7900.9500.3041.7090.9321.9382.1790.940K = 100.2611.7500.9460.3611.6510.92112.4392.5770.895M = 10MSERCMSERCMSERCK = 10.2681.7610.9370.2841.6960.9300.6011.9750.921K = 20.2051.5060.9450.2431.4300.9300.4441.7810.946K = 50.1741.4080.9460.2651.3380.91910.2362.3480.896K = 100.1931.3800.9440.3301.3010.91018.4752.3850.817 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(p_c, p_h) = (0.01, 0.02)$$\end{document} Model 1Model 2Model 3M = 1MSERCMSERCMSERCK = 10.9143.3090.9340.9153.2530.9310.9753.3910.937K = 20.7323.1560.9450.7093.0570.9410.7983.2030.944K = 50.6822.9620.9350.7222.8400.9220.7733.0010.932K = 100.6482.9250.9410.7062.7740.9220.7302.9580.936M = 2MSERCMSERCMSERCK = 10.5602.5990.9440.5602.5550.9400.6112.6750.944K = 20.4952.4970.9450.5142.4140.9370.5682.5620.942K = 50.5102.4210.9440.5432.3020.9290.5872.4670.939K = 100.4862.3670.9400.5452.2380.9170.9662.4710.937M = 5MSERCMSERCMSERCK = 10.3231.9400.9430.3411.8860.9360.4312.0160.937K = 20.3151.8690.9440.3291.7880.9320.4411.9270.939K = 50.2681.7560.9480.3501.6730.9250.7471.8590.942K = 100.2541.7410.9430.3711.6380.9092.3822.0990.937M = 10MSERCMSERCMSERCK = 10.2081.5340.9480.2281.4800.9380.3411.6140.936K = 20.2051.4360.9480.2311.3720.9300.3311.5170.941K = 50.1781.3820.9450.2431.3120.9191.2241.5990.938K = 100.1851.3500.9440.3201.2760.9077.4192.0550.909
Table 1 shows results for MSE, mean width of 95% PCIs, and corresponding mean coverage for \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_{i,t}$$\end{document} in the cases \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(p_c, p_h) = (0.05, 0.1)$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(p_c, p_h) = (0.01, 0.02)$$\end{document} . Tables S2 and S3 show results for MB, mean width of 95% PCIs, and mean proportion of PCIs containing zero for the site and sampling coefficients \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\beta }}_b$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\beta }}_w$$\end{document} for \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(p_c, p_h) = (0.05, 0.1)$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(p_c, p_h) = (0.01, 0.02)$$\end{document} respectively. Tables S4 and S5 show the same for the log-variance parameters \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_1, a_2$$\end{document} , and the probabilities \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_c$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_h$$\end{document} .
In Table 1 we can see that for Model 1 the larger improvements in MSE and PCIs come when increasing either M or K from 1 to 2. In other words, replication in either collection of samples or in the PCR analysis yields improvements in both the posterior means and credible intervals of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_{i,t}$$\end{document} . Increasing M or K beyond 2 decreases MSE and narrows PCIs, but with diminishing returns. Increasing the number of samples M has a more considerable effect on the MSE than increasing K, but comes with an increased cost of effort in the field. Models 2 and 3 have lower coverage on average than Model 1. Model 2 underestimates the variability in the CT values, and so does not account for the full uncertainty in the data-generating process, leading to narrower PCIs that have smaller coverage. In Table 1, where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(p_c, p_h) = (0.05, 0.1)$$\end{document} , Model 3 does not account for errors in the PCR stage of analysis, and so fails to correctly account for contamination and inhibition, and so has much higher MSE, wider PCIs, and lower coverage. In fact for Model 3 the MSE increases for increasing K as there is more chance for samples to have a replicate experience either contamination or inhibition, leading to more false positive and negative errors, and greater bias in the analysis. Under the simulation parameters we have investigated, the effect of ignoring contamination and inhibition leads to worse outcomes than the effect of ignoring CT heteroscedasticity. We return to this in Section 5 as other simulation parameters may lead to different conclusions.
In Tables S2 and S3, we can see that increasing M and K reduces mean bias in \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\beta }}_b$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\beta }}_w$$\end{document} , and reduces the width of PCIs, increasing power to detect important covariate effects. As with log-DNA, the improvement when increasing M is greater than when increasing K, but at the cost of greater effort in the field. Under our simulation parameters, where binary coefficients ( \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\beta }}_b[2], {\varvec{\beta }}_w[2]$$\end{document} ) are drawn from a Bernoulli distribution with probability 0.5, then a greater amount of replication is needed to detect covariate effects than for the continuous covariates. Model 1 generally has the smallest bias in the mean values of the covariate coefficients, and Model 3 generally has the largest bias and widest credible intervals (though when \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_c$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_h$$\end{document} are small then this difference is smaller).
In Tables S4 and S5, for both the high and low contamination/inhibition cases, and across the survey designs, there is underestimation of the contamination probability \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_c$$\end{document} in Model 1. The model’s ability to detect contaminated replicates relies on the concentration of added DNA being high enough to considerably increase the CT value of that replicate. The higher the concentration of the DNA in a sample, the smaller the effect of contamination, and so in our simulation not all contamination is detectable. So this underestimation was to be expected. There is similarly often a small negative bias in the probability of inhibition \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_h$$\end{document} , though due to the way these replicates were simulated, the model was able to detect these more often. As a consequence of the small underestimation of probabilities \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_c$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_h$$\end{document} , the intercept on the log-variance, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_1$$\end{document} , is slightly overestimated, as replicates that were contaminated or inhibited, but not labelled as such, have pushed up the variance slightly. Increasing the number of technical replicates K does help to reduce the positive bias in \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_1$$\end{document} , and generally also improves the posterior means of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_2, p_c,$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_h$$\end{document} whilst reducing the width of the 95% PCIs. In Table S5, where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_c$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_h$$\end{document} are negligible, then Model 1 still provides reasonable posterior means for \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_c$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_h$$\end{document} , so the model still performs well if levels of contamination and inhibition are very small. Model 2 generally has comparable posterior means and PCIs to Model 1 for \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_c$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_h$$\end{document} . Model 3 has large biases in estimates of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_1$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_2$$\end{document} , even in the case where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_c$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_h$$\end{document} are low.
Case studies
We consider three case studies: zebra mussels (Dreissena polymorpha) across multiple sites but a single time point in Section 4.1; zebra mussels across two sites and multiple time points in Section 4.2; great crested newts (Triturus cristatus) at a single site and multiple time points in Section 4.3.
Zebra mussels: single time point
eDNA samples were collected from n = 20 sites within England, during July and August of 2021, with zebra mussels (Dreissena polymorpha) as the target species. At each site, M = 10 samples were collected and sampling locations were chosen based on safety and accessibility to the water. For running waters, like canals and rivers, samples were collected over a 1 km stretch and evenly spaced, when possible, while for standing waters, such as lakes and reservoirs, sampling was conducted around the perimeter of the site. A full list of the sampled sites can be found in Table S6.
Water was filtered through an enclosed NatureMetrics filter using a 100 mL luer-lock syringe until the filter clogged, and DNA was preserved with Longmire’s buffer. The samples were extracted using a modified DNeasy Blood & Tissue kit (Qiagen) and tested for inhibition with the TaqMan Exogenous Internal Positive Control (Fisher Scientific). These tests did not indicate evidence of inhibition in the samples. Species-specific qPCRs were conducted using, with minor modifications, the cytochrome b assay described in Gingera et al. [16] on a StepOnePlus Real-Time PCR machine. Samples were considered positive if their signal intersected the threshold line defined by the software, and the cycle at which that intersection occurred corresponded to the CT value of the sample.
At each site and sampling location, water chemistry data such as water temperature, pH, turbidity and conductivity were recorded using a portable meter (HI-98130, Hanna Instruments), and calcium levels were obtained using a calcium meter (LAQUATwin Calcium Ion Ca-11 meter, Camlab). In addition, the depth at each sampling point was also recorded, as well as information on substrate type, which was divided into four categories - boulders (B), gravel (G), silt (S), and sand (SA).
Each sample was analysed in \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K = 6$$\end{document} replicates. All replicates from the same site were analysed in the same PCR plate, therefore \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p = i$$\end{document} . For the standards, on each plate, we take \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K^{\star } = 3$$\end{document} replicates of seven concentrations, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$3\times 10^z$$\end{document} copies/ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu $$\end{document} L, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$z = 1, \cdots , 7$$\end{document} .
Figure 3a demonstrates the linear relationship between log-concentration and CT for the standards by fitting a simple regression for each of the 20 plates (linear modelling using dglm() [36]). The fitted lines show that the PCR efficiency varies slightly between PCR runs, motivating the hierarchical model on the plate-specific regression coefficients. Figure 3b shows that the spread of the residuals of the fitted CT values increases as the log-concentration in the standards decreases, and we account for this heteroscedasticity by modelling CT variance conditionally on log-DNA, as described in Section 2.Fig. 3. Zebra mussels: single time point. (a) CT values from standards against log-DNA in the standard. Each line is the linear fit of CT against log-DNA for each of the 20 plates (with each plate corresponding to a single site) used to analyse the samples. (b) Residuals of the linear regression with CT as the response and log-DNA concentration as the covariate
The 20 sites were taken from 4 different environments: river, reservoir, lake and canal (with 5 sites for each environment). We use the environment as a covariate for DNA concentration at the site. The covariates included in the model for DNA collection were volume, temperature, calcium, depth, and substrate. Details of model implementation can be found in Section S2.1.2 and details of prior distributions are included in Table S7.
Results for the posterior distributions of covariate coefficients \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\beta }}_b$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\beta }}_w$$\end{document} are reported in Figure 4. Figure 4b shows that, as expected, volume of water filtered has a positive effect on the collection rate of DNA. For the environment types shown in Figure 4a and the other environmental factors shown in Figure 4b, the inclusion of zero within the 95% PCIs of coefficients suggest that the data do not provide strong evidence that these covariates have a non-zero effect. The results for the log-DNA at each site are shown in Figure 5. There is substantial variability in inferred DNA concentrations between sites of the same environmental type (though the canals show the smallest between-site variation). The difference between these DNA concentrations is likely due to site-specific environmental characteristics that were not recorded or used in the model. Canals were the most consistent in terms of habitat (for example in substrate type) amongst the environment types, which may explain the smaller amount of variation between canals. For lakes L1 and L2 (Eight Acre Lake and Farnham Lake respectively), the sampling area was smaller than the other lakes (due to accessibility restrictions), and for L2 in particular, almost all sampling was near pontoons where recreational activities occur, which is a known vector for dispersal for this species. As a result, the area from which the samples were collected likely contained high biomass of species, resulting in higher DNA availability when compared to the other lakes. This was further evidenced by the observation of numerous live and deceased organisms at the sampling locations at Farnham Lake (L2). Similarly, DNA concentrations for Ri3 (River Thames) are much higher than the other rivers in the study likely because the Thames has a higher presence of boat traffic and recreational activities.Fig. 4. Zebra mussels: single time point. Posterior means (circles) and 95% PCIs (bars) of (a) site-specific covariate coefficients and (b) sample-specific covariate coefficients. (a) Covariates are: environment types (river, reservoir (Res), and lake). (b) Covariates are: substrate types (boulders (B), gravel (G), and silt (S)), depth, calcium (calc), temperature (temp), and volume (vol)Fig. 5. Zebra mussels: single time point. Posterior means (circles) and 95% PCIs (bars) of the log-DNA (copies/ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu $$\end{document} L) concentration across the 20 sites. The environments for each site are shown on the right label. See Table S6 for the names of sites corresponding to the labels on the left
Table 2 shows the posterior means and 95% PCIs for the CT variance parameters, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_1, a_2$$\end{document} , the probabilities of contamination and inhibition, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_c$$\end{document} , and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_h$$\end{document} , and the DNA availability and collection standard deviations, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau $$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma $$\end{document} .Table 2. Zebra mussels: single time point. Posterior means and 95% PCIs of variance parameters, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_1, a_2$$\end{document} , probabilities of contamination and inhibition \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_c, p_h$$\end{document} , and DNA availability and collection standard deviations \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau $$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma .$$\end{document} ParameterMean95% PCI \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_1$$\end{document} 0.535(0.308, 0.756) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_2$$\end{document} -0.398(-0.419, -0.377) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_c$$\end{document} 0.019(0.013, 0.027) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_h$$\end{document} 0.011(0.006, 0.018) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _1$$\end{document} 2.090(1.492, 2.940) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma $$\end{document} 1.399(1.260, 1.554)
The posterior means for \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_c$$\end{document} (0.019 [0.013, 0.027]) and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_h$$\end{document} (0.011 [0.006, 0.018]) are very small. These probabilities are similar to the settings used in the second set of simulations where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(p_c, p_h) = (0.01, 0.02)$$\end{document} . The results in Table 1 show that ignoring contamination and inhibition (even when these are small), or ignoring CT heteroscedasticity leads to an increase in MSE and lower nominal coverage particularly when M and K are large, as we have in this case study.
Zebra mussels: multiple time points
eDNA samples were collected from n = 2 sites in Yorkshire (England), every month, between December 2020 and November 2021, where the target species was the zebra mussel (Dreissena polymorpha), as in Section 4.1. At each time point (i.e. month) and each site, M = 10 samples were collected from the shoreline, with sampling locations chosen based on safety and accessibility to the water. At Eccup Reservoir, sampling locations were selected to maximise the perimeter of the reservoir sampled, while at the River Hull samples were collected over a 1 kilometre stretch (i.e. collecting one sample approximately every 100 meters) in a publicly accessible area. Sampling locations were always the same for each time point at both sites.
At each sampling occasion, 2L water samples were collected into a sterile plastic bottle, stored in a sterile cool box with ice packs and transported to the laboratory on the same day of collection. All water samples were vacuum-filtered within 24h of collection in a dedicated laboratory, using two 0.45 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu $$\end{document} m cellulose filters (47 mm, Cytiva Whatman Mixed Cellulose Ester Membranes; Fisher Scientific, UK) per sample. The volume filtered for each sample was recorded, and filters were stored at - \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$20^\circ $$\end{document} C until DNA extraction. Samples were extracted using the water protocol described in Sellers et al. [38], and tested for inhibition with the TaqMan Exogenous Internal Positive Control (Fisher Scientific). This test did not indicate evidence of inhibition in the samples. Species-specific qPCR reactions were performed following the protocol described in Section 4.1.
At each time point and at each of the ten sampling locations at both sites, water chemistry information (temperature, pH, turbidity, conductivity and calcium) was recorded as described in Section 4.1. Calcium data in December, and turbidity and conductivity data in February, are missing due to technical problems with the probes in those months. Water levels were also recorded each month at both sites, by checking a reverse water depth gauge board installed at Eccup Reservoir, and by retrieving data from a monitoring station close to the sampling locations at the River Hull.
Each sample was analysed in \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K=6$$\end{document} technical replicates. All the replicates from the same site and same time point were analysed in the same PCR plate for a total of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$P = 24$$\end{document} plates. For the standards, on each plate, we take \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K^{\star } = 3$$\end{document} replicates of seven concentrations, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$3\times 10^z$$\end{document} copies/ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu $$\end{document} L, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$z = 1, \cdots , 7$$\end{document} . Figures exploring the linear fit between log-DNA in standards and CT value, and the residuals, can be found in Figure S1. The covariates included in the model for DNA collection were volume, pH, and calcium. Details of model implementation can be found in S2.2.2, and details of prior distributions are included in Table S9.
Results for the posterior distributions of covariate coefficients \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\beta }}_w$$\end{document} are reported in Table 3. We can see that, as in Section 4.1, volume has a positive impact on DNA collection rate. The inclusion of zero within the 95% PCIs of the other covariate coefficients suggests the data do not provide enough evidence these covariates have a non-zero effect on the amount of DNA collected given the amount of DNA available. Figure 6 shows the log-DNA means and 95% PCIs for the two sites. Eccup Reservoir on average has higher log-DNA than the River Hull. In flowing waters, such as rivers, there is a larger dispersion and dilution effect of DNA compared to standing water, such as in reservoirs, which generally leads to lower DNA concentrations. In both sites, the log-DNA experiences a peak in the summer months before dropping over the autumn period. The end of spring and the summer period corresponds to the species’ reproductive season, and as such DNA availability increases.Table 3. Zebra mussels: multiple time points. Mean and 95% PCIs for the sample covariate coefficients (volume, pH, and calcium)CovariateMean95% PCIvolume0.401(-0.034, 0.844)pH-0.032(-0.298, 0.235)calcium0.160(-0.445, 0.787)
Fig. 6. Zebra mussels: multiple time points. Posterior means (circles) and 95% PCIs (bars) of the log-DNA (copies/ \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu $$\end{document} L) concentrations available at Eccup Reservoir and River Hull between December 2020 and November 2021
Table 4 shows the posterior means and 95% PCIs for the CT variance parameters, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_1, a_2$$\end{document} , the probabilities of contamination and inhibition, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_c, p_h$$\end{document} , the DNA availability AR(1) terms \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rho _1, \rho _2$$\end{document} , and the DNA availability and collection standard deviations, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau $$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _1,$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma $$\end{document} . As with the previous case study (Section 4.1), the posterior means for \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_c$$\end{document} (0.004 [0.002, 0.008]) and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_h$$\end{document} (0.028 [0.018, 0.039]) are very small and are similar to the settings used in the second set of simulations where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(p_c, p_h) = (0.01, 0.02)$$\end{document} . Table 1 shows that ignoring contamination and inhibition (even when these are small), or ignoring CT heteroscedasticity leads to an increase in MSE and lower nominal coverage particularly when M and K are large, as we have in this case study.Table 4. Zebra mussels: multiple time points. Posterior means and 95% PCIs of variance parameters, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_1, a_2$$\end{document} , probabilities of contamination and inhibition \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_c, p_h$$\end{document} , DNA availability AR(1) terms \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rho _1, \rho _2$$\end{document} , and DNA availability and collection standard deviations \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau $$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _1$$\end{document} , and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma .$$\end{document} ParameterMean95% PCI \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_1$$\end{document} 1.328(1.150, 1.505) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_2$$\end{document} -0.406(-0.424, -0.387) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_c$$\end{document} 0.004(0.002, 0.008) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_h$$\end{document} 0.028(0.018, 0.039) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rho _1$$\end{document} 0.973(0.787, 1.155) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rho _2$$\end{document} 0.900(0.613, 1.177) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau $$\end{document} 2.279(1.665, 3.134) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _1$$\end{document} 1.113(0.536, 2.386) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma $$\end{document} 1.242(1.122, 1.381)
Great crested newts
eDNA samples were collected from a single site comprising of 8 ponds in close proximity to each other at the University of Kent campus every 14 days, between 26 February and 22 October 2015. The target species was great crested newts (Triturus cristatus). At each sampling occasion a number of samples were taken (M varying between 8 and 24 samples on each occasion), and then analysed in \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K=8$$\end{document} replicates.
Data collection is described in detail in Buxton et al. [6]. Ethanol precipitation eDNA collection method was used alongside 0.7 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu $$\end{document} m glass-microfiber syringe filters and 0.7 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu $$\end{document} m cellulose acetate syringe filters. Ethanol precipitation followed methodologies outlined in Biggs et al. [4], collecting 0.09L of sample water, while the two filter methods up to 1L of water was filtered stopping at the point a filter became blocked.
DNA was extracted using Qiagen (R) DNeasy Blood and tissue kits following the protocols outlined in Buxton et al. [6], with qPCR conducted on a Biorad CFX connect Real-Time PCR machine using the primers and hydrolysis probe published in Thomsen et al. [43] and PCR assay and cycle conditions published by Biggs et al. [4]. qPCR standards were created from a serial dilution of a great crested newt tissue extract, quantified using a Qubit® 2.0 flurometer (Life TechnologiesTM, Carlsbad, California, USA) with Double Stranded DNA High Sensitivity Kit following manufacturers’ instructions, qPCR negative controls were also included in each run.
A total of 28 plates were used, where each plate consisted only of samples collected on the same sampling occasion, though samples from that occasion may take up several plates. On each plate, there were \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K^\star = 3$$\end{document} replicates of 3 concentrations, ranging between 0.03 and 7.9 \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu $$\end{document} g/mL. Figures exploring the linear fit between log-DNA in standards and CT value, and the residuals of the corresponding linear regression, can be found in Figure S2.
The covariates included for DNA availability were mean weekly temperature ( \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$^\circ $$\end{document} C) and mean weekly rainfall (mm). The three DNA collection methods were included as covariates on collection: Ethanol (E) and Glass Microfiber (GF), using Cellulose (C) as the reference level. The pond number of each sample was also included as a covariate on DNA collection (with pond 1 as the reference level). Details of model implementation can be found in S2.3.2, and details of prior distributions are included in Table S11.
Figure 7b shows that, using pond 1 as a reference level, ponds 4 and 7 have a negative impact on DNA collection. For the DNA availability covariates shown in Figure 7a and for the other pond effects and DNA collection methods shown in Figure 7b, the inclusion of zero within the 95% PCIs of coefficients suggests the data do not provide strong evidence that these covariates have a non-zero effect.
We can see the results of the amount of log-DNA available at the site in Figure 8 (we use base 10 here for comparison with Buxton et al. [6]). DNA concentration increases over the summer months and then quickly decreases over autumn, concurrent with larval metamorphosis and emergence from the ponds into the terrestrial environment.Fig. 7. Great crested newts: Posterior means (circles) and 95% PCIs (bars) of (a) site-specific covariate coefficients and (b) sample-specific covariate coefficients. (a) Covariates are mean weekly temperature (Temp), mean weekly rainfall (mm) (Rain). (b) Covariates are: indicators for ponds 2 to 8 (P2 - P8) with P1 as the reference level, and DNA collection method, (Glass Microfiber (GF) and Ethanol (E)) with Cellulose as the reference levelFig. 8Great crested newts: Posterior means (circles) and 95% PCIs (bars) of the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\hbox {log}_{10}$$\end{document} -DNA ( \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mu $$\end{document} g/mL) concentrations between February 2015 and October 2015
Table 5 shows the posterior means and 95% PCIs for the CT variance parameters, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_1$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_2$$\end{document} , the probabilities of contamination and inhibition, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_c$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_h$$\end{document} , the DNA availability AR(1) term \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rho $$\end{document} , and the DNA availability and collection standard deviations, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau $$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _1,$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma $$\end{document} . The posterior means for \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_c$$\end{document} (0.006 [0.003, 0.011]) and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_h$$\end{document} (0.031 [0.019, 0.044]) are very small, and are similar to the simulation parameters in which \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$(p_c, p_h) = (0.01, 0.02)$$\end{document} . Table 1 shows that the effect of ignoring contamination, inhibition, or heteroscedasticity leads to an increase in MSE and a reduction in nominal coverage, particularly when M and K were large.Table 5. Great crested newts: Posterior means and 95% PCIs of variance parameters, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_1, a_2$$\end{document} , probabilities of contamination and inhibition \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_c, p_h$$\end{document} , DNA availability AR(1) terms \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rho $$\end{document} , and DNA availability and collection standard deviations \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau $$\end{document} , \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _1$$\end{document} , and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma .$$\end{document} ParameterMean95% PCI \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_1$$\end{document} -3.046(-3.249, -2.833) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$a_2$$\end{document} -0.316(-0.341, -0.290) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_c$$\end{document} 0.006(0.003, 0.011) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$p_h$$\end{document} 0.031(0.019, 0.044) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rho $$\end{document} 1.009(0.934, 1.080) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau $$\end{document} 1.251(0.817, 1.876) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau _1$$\end{document} 0.886(0.421, 2.048) \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma $$\end{document} 1.332(1.186, 1.499)
Discussion
qPCR methods are widely used for monitoring species distributions. This paper provides a statistical framework for the corresponding data, linking CT values to the concentration of species’ DNA in the environment across time and space whilst accounting for covariates. The model allows for contamination and inhibition of replicates during the PCR stage, and accounts for heteroscedasticity in CT values over log-DNA concentration. This is a unifying framework, propagating uncertainty through all stages of analysis, DNA availability, collection, and PCR analysis. Whilst analyses can be conducted on the back-transformed CT values directly [6], these often fail to account for the full data-generating process (see Section S3 for comparison with linear mixed effects models). We use extensive simulation studies, under different survey designs, to show that ignoring contamination and inhibition can lead to biased inferences about DNA concentration, and that ignoring CT heteroscedasticity leads to over-confident inferences that do not have the nominal coverage. These simulations also highlight the need for replication at both the sample and replicate level [7]. We apply the model to three case studies, one for a protected species and two for an invasive species.
Our model relies on a number of justifiable, and in some cases necessary, assumptions. Specifically, we assume that replicates from the same sample contain the same concentration of DNA; any variation between replicates in reality is then absorbed into the variation of CT values. Sharing the CT variation regression coefficients across plates helps to borrow information, though this assumption could be relaxed to model plate effects in a hierarchical model (as we do for the CT mean regression coefficients). We assume that no replicate can be simultaneously contaminated and inhibited, and this assumption is necessary for the model to be identifiable. Currently we also only allow for contamination or inhibition in individual replicates (rather than at the sample level), where contaminated or inhibited replicates are identified by being outliers with respect to other CT values associated with that sample. The other potential source of contamination in these studies is during the collection of samples at the site [3], or during the processing of samples [8, 23]. Unless samples are taken from the same location at the same time, or there are grounds for sensible assumptions about the distribution of (log-)DNA across a site, it may be difficult to identify contaminated samples under this method (as samples with vastly different amounts of log-DNA may be due to some un-modelled heterogeneity in the distribution of log-DNA across the site). Because we do not model contamination or inhibition at the collection stage, we rely on the noise term \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma ^2$$\end{document} to capture any resulting shifts in the CT values associated with that sample. However this modelling framework relies on good practice in the field for that to be a reasonable assumption. Large amounts of contamination or inhibition would lead to significant shifts in CT values and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma ^2$$\end{document} would be insufficient to cover the increased uncertainty. For the case study of zebra mussels at a single time point, during sampling at each site, negative field controls were taken and analysed to test for contamination at the collection stage. For six sites, the negative field controls contained replicates that amplified, indicating potential contamination of all samples collected at those sites. Our model does not account for this, however the amount of contamination in the negative field controls was negligible when compared to the amount of DNA in the collected samples. For very small amounts of contamination at this stage, the noise term for DNA collection \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma ^2$$\end{document} is thought to be enough to account for this presence of additional DNA. A simulation study of the effect of unaccounted sample contamination and inhibition can be found in Section S5. Tables S21 and S22 show that posterior distributions for \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_{i,t}$$\end{document} and covariate coefficients \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\beta }}_b$$\end{document} and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\beta }}_w$$\end{document} are fairly robust to small levels of inhibition and contamination at the collection stage, but that bias and uncertainty increase as these levels increase. To relax these assumptions, the inclusion of results from negative field controls in the model could help us extend the model to account for more sources of contamination, as the probability and distribution of contamination during the sample collection stage would be better understood. However, negative field controls are not always conducted: the zebra mussels case studies both used negative field controls (only the single time point survey indicated contamination), but the great crested newt study did not.
Our model takes the concentration of DNA \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_{i,t}$$\end{document} to be constant throughout the site. The amount of DNA in the collected sample, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$v_{imt}$$\end{document} , is conditional on \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_{i,t}$$\end{document} , and stochasticity can be explained through the noise term \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma ^2$$\end{document} or sample-specific covariates. If however \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_{i,t}$$\end{document} varies significantly throughout the site, and in particular over the sampling locations, then this variation will appear through the noise term \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma ^2$$\end{document} . If the variation of DNA concentration throughout the site can be explained by covariates, and these are included in the model for DNA collection, then the \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${\varvec{\beta }}_w$$\end{document} coefficients will indicate that the DNA in samples varies according to those covariates. Therefore in the case where \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_{i,t}$$\end{document} varies considerably throughout the site, then inferred \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$l_{i,t}$$\end{document} values can be interpreted as the average DNA concentration across sampling locations at the site and over the sampling covariates. For example, in the great crested newt case study (Section 4.3), we used pond as an effect on collection rate, and inferred that DNA in collected samples varied by pond number. However, if DNA concentrations in each pond differ, then these pond coefficients may in fact be indicating heterogeneity in DNA across the site, rather than an effect on collection rate. Currently, this model cannot differentiate between these two sources of variation in DNA concentration in samples.
In our simulation study, the mean concentration of contaminant \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\lambda $$\end{document} was such that it would not be detectable across all contaminated replicates. This is because the amount of contamination is only relevant depending on the amount of DNA already in the sample. For example a small concentration of contaminant DNA in a replicate that already has a high concentration of DNA is not going to significantly affect the CT values associated with that replicate. This was chosen to more closely mimic how contamination may affect analysis in real-world applications - where contamination is primarily a concern for low DNA concentration samples rather than abundant samples. Similarly, inhibition was modelled in such a way as to delay the amplification process proportionately to the amount of DNA in the sample, so that inhibition is more detectable in DNA abundant samples compared to DNA sparse samples. Despite the small underestimation of probabilities of contamination and inhibition in the model, our simulation study shows that not accounting for these processes leads to worse outcomes.
This model only estimates the amount of DNA present in the environment at time of sampling, rather than an estimate of species abundance at a site. Linking inferred DNA concentration to species abundance would require knowledge about DNA shedding rates, estimates of how long DNA persists in the environment, the effects of environmental factors, species habitat use, etc. These would then need expert knowledge to interpret how DNA availability links back to changes in species abundance at a site. Species detection/non-detection or counts can be integrated into this model in order to investigate the relationship between DNA concentration in the environment and species abundance. For example, Buxton et al. [6] show that contributions to DNA in the ponds vary seasonally with the different life stages of great crested newts. During the breeding season, the adult population changes very little, but DNA concentration increases due to breeding activities. Post breeding activity, DNA concentration increases as larval abundance increases, but the adult population decreases as individuals return to land. Therefore linking DNA concentration to adult population abundance is complicated by the seasonal behaviour of the species.
We had a small number of covariates available in our case studies, and a subset of the covariates that were not strongly correlated with each other were included in the model in each case study, but no formal model selection was carried out. In case studies with large numbers of covariates, covariate choice and combinations may be best implemented within a Bayesian variable selection approach [17]. We also assume that log-DNA concentrations are a linear function of covariates, but could consider extensions to more flexible models or interactions subject to data availability and quality. The model currently treats \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta _{b,0}$$\end{document} as an initial condition rather than a long term mean value, meaning that log-DNA concentrations over time are primarily determined by covariates. We also have not enforced \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$|\rho | < 1$$\end{document} , and do not assume stationarity, as eDNA can rapidly accumulate or decay under certain conditions. In our case studies, posterior means of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\rho $$\end{document} are close to 1, indicating strong temporal dependence. Alternative parametrisations for the temporal model could incorporate an explicit intercept at each time step, imposing a long term mean structure on log-DNA concentrations. This could be useful if prior knowledge suggests DNA levels fluctuate around some baseline rather than being driven by covariates. Additionally, enforcing stationarity may be appropriate in cases where long term persistence is expected.
Our case studies also each only considered a small number of sites, and so our model did not consider any spatial correlation between the sampled sites, and instead only focused on accounting for fixed effects. qPCR methods are used on a much larger scale (Buxton et al. [2] analyses qPCR results for great crested newts on a national scale in England), where more sophisticated spatial models may be considered for the DNA availability stage.
Our simulation study considered a wide range of survey designs (different levels of replication in M and K), three models (where one model ignores CT heteroscedasticity and one ignores contamination and inhibition), and two levels of contamination and inhibition probabilities (one high and one low). Further simulations could be considered to compare the effects of increasing or decreasing noise at all stages of the analysis, or by increasing the slope of the log-variance of CT values to see if this increases the bias of results from Model 2. We could also vary the concentration of the contaminant DNA, or change the effect of inhibition to see how these affect the model’s ability to detect these affected replicates.
This paper provides a general framework for inferring DNA concentration and quantifying covariate effects whilst accounting for both potential contamination or inhibition of replicates and the heteroscedasticity in CT values obtained from qPCR analyses. Simulation results highlight the importance of replication in both the number of samples M and the number of PCR replicates K. qPCR continues to be the preferred monitoring tool for single species monitoring, being used for large scale monitoring of elusive species such as great crested newts [33], invasive species such as zebra mussels [1], and even for investigating ancient DNA spanning centuries [24].
Supplementary information
Supplementary material gives additional information on prior distributions and posterior summaries for simulations and all case studies presented in this paper. It also contains a comparison of Model 1 to a linear mixed effects model, a prior sensitivity analysis, and a simulation study with sample contamination.
Supplementary Information
Below is the link to the electronic supplementary material.Supplementary file 1 (pdf 547 KB)
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Buxton AS, Groombridge JJ, Zakaria NB, Griffiths RA (2017) Seasonal variation in environmental DNA in relation to population size and environmental factors. Sci Rep 7(1):46294. 10.1038/srep 4629410.1038/srep 46294 PMC 538549228393885 · doi ↗ · pubmed ↗
- 2R Core Team (2018) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. R Foundation for Statistical Computing. https://www.R-project.org/
- 3Rees HC (2023) An evidence review for great crested newt e DNA monitoring protocols. https://publications.naturalengland.org.uk/publication/6200035650568192
- 4Smyth G, Dunn PK, Corty RW (2023) Dglm: Double Generalized Linear Models. R package version 1.8.6. https://CRAN.R-project.org/package=dglm
- 5Thermo Fisher Scientific (2016) Real-Time PCR: Understanding Ct. https://www.thermofisher.com/content/dam/Life Tech/Documents/PD Fs/PG 1503-PJ 9169-CO 019879-Re-brand-Real-Time-PCR-Understanding-Ct-Value-Americas-FHR.pdf. Accessed: 2024-07-28
- 6Turner CR, Miller DJ, Coyne KJ, Corush J (2014) Improved methods for capture, extraction, and quantitative assay of environmental DNA from Asian bigheaded carp (Hypophthalmichthys spp.). PLOS ONE 9(12), 114329. 10.1371/journal.pone.011432910.1371/journal.pone.0114329 PMC 425625425474207 · doi ↗ · pubmed ↗
