The Jensen Effect and Functional Single Index Models: Estimating the   Ecological Implications of Nonlinear Reaction Norms

Zi Ye; Giles Hooker; Stephen Ellner

arXiv:1901.01864·stat.ME·December 17, 2019

The Jensen Effect and Functional Single Index Models: Estimating the Ecological Implications of Nonlinear Reaction Norms

Zi Ye, Giles Hooker, Stephen Ellner

PDF

Open Access

TL;DR

This paper introduces a new method to estimate and test the ecological impact of environmental variability on species using a functional single index model, focusing on the Jensen Effect to understand nonlinear responses.

Contribution

It develops a novel approach for estimating the Jensen Effect directly from observational data, enhancing understanding of nonlinear environmental responses in ecology.

Findings

01

Method performs well in simulations

02

Effective on real ecological data

03

Provides insights into species coexistence mechanisms

Abstract

This paper develops tools to characterize how species are affected by environmental variability, based on a functional single index model relating a response such as growth rate or survival to environmental conditions. In ecology, the curvature of such responses are used, via Jensen's inequality, to determine whether environmental variability is harmful or beneficial, and differing nonlinear responses to environmental variability can contribute to the coexistence of competing species. Here, we address estimation and inference for these models with observational data on individual responses to environmental conditions. Because nonparametric estimation of the curvature (second derivative) in a nonparametric functional single index model requires unrealistic sample sizes, we instead focus on directly estimating the effect of the nonlinearity, by comparing the average response to a…

Tables1

Table 1. Table 1: Simulation results with ( λ g , λ β ) subscript 𝜆 𝑔 subscript 𝜆 𝛽 (\lambda_{g},\lambda_{\beta}) selected by GCV. Values in the Table are averages over 100 simulations.

	g1		g2		g3
Initial	True	equal	True	equal	True	equal
RSE	1.1213	0.6417	0.5385	0.6980	0.7608	0.7024
RASE(1)	0.0921	0.0800	0.0490	0.0730	0.0706	0.0764
RASE(2)	5.2517	3.0516	2.9079	4.1328	5.4393	1.2170

Equations74

G = g (E) + ϵ,

G = g (E) + ϵ,

E = \int X (t) β (t) d t,

E = \int X (t) β (t) d t,

Y = g (\int X (t) β (t) d t) + ϵ .

Y = g (\int X (t) β (t) d t) + ϵ .

δ = g [E (E)] - E [g (E)]

δ = g [E (E)] - E [g (E)]

Y = g (\int X (t) β (t) d t) + ϵ

Y = g (\int X (t) β (t) d t) + ϵ

g (s) = ϕ^{⊤} (s) d,

g (s) = ϕ^{⊤} (s) d,

β (t) = ψ^{⊤} (t) c,

β (t) = ψ^{⊤} (t) c,

2 PLS ≐

2 PLS ≐

=

\hat{β} (t) = ψ^{⊤} (t) \hat{c},

\hat{β} (t) = ψ^{⊤} (t) \hat{c},

\overset{g}{^}_{i} ≐ \overset{g}{^} (\int X_{i} \hat{β}) = ϕ^{⊤} [(\int X_{i} (t) ψ^{⊤} (t) d t) \hat{c}] \hat{d},

\overset{g}{^}_{i} ≐ \overset{g}{^} (\int X_{i} \hat{β}) = ϕ^{⊤} [(\int X_{i} (t) ψ^{⊤} (t) d t) \hat{c}] \hat{d},

GCV (λ) ≐ \frac{\frac{1}{n} ∥ ( I - S ( λ ) ) Y ∥ ^{2}}{[ \frac{1}{n} tr ( I - S ( λ ) ) ] ^{2}},

GCV (λ) ≐ \frac{\frac{1}{n} ∥ ( I - S ( λ ) ) Y ∥ ^{2}}{[ \frac{1}{n} tr ( I - S ( λ ) ) ] ^{2}},

δ = \frac{1}{n} i = 1 \sum n g (\int X_{i} β) - g (\int \overset{ˉ}{X} β),

δ = \frac{1}{n} i = 1 \sum n g (\int X_{i} β) - g (\int \overset{ˉ}{X} β),

\overset{g}{^}_{λ} (e) = ϕ (e)^{⊤} (Φ^{⊤} Φ + λ P)^{- 1} Φ^{⊤} Y .

\overset{g}{^}_{λ} (e) = ϕ (e)^{⊤} (Φ^{⊤} Φ + λ P)^{- 1} Φ^{⊤} Y .

δ_{λ} = \frac{1}{n} \sum \overset{g}{^}_{λ} (E_{i}) - \overset{g}{^}_{λ} (\overset{ˉ}{E}) = a^{⊤} Φ^{+} (Φ^{⊤} Φ + λ P)^{- 1} Φ^{⊤} Y = u_{λ} Y,

δ_{λ} = \frac{1}{n} \sum \overset{g}{^}_{λ} (E_{i}) - \overset{g}{^}_{λ} (\overset{ˉ}{E}) = a^{⊤} Φ^{+} (Φ^{⊤} Φ + λ P)^{- 1} Φ^{⊤} Y = u_{λ} Y,

t_{λ} = \frac{u _{λ} Y}{σ ^ u _{λ} u _{λ}^{⊤}}

t_{λ} = \frac{u _{λ} Y}{σ ^ u _{λ} u _{λ}^{⊤}}

Σ (λ_{1}, λ_{2}) = \frac{u _{λ_{1}} u _{λ_{2}}}{∥ u _{λ_{1}} ∥ ∥ u _{λ_{2}} ∥}

Σ (λ_{1}, λ_{2}) = \frac{u _{λ_{1}} u _{λ_{2}}}{∥ u _{λ_{1}} ∥ ∥ u _{λ_{2}} ∥}

i

i

i_{- 1}

v

\hat{δ} (λ_{g}, λ_{β}) = a^{⊤} v = a^{⊤} ϕ (i) \hat{d}

\hat{δ} (λ_{g}, λ_{β}) = a^{⊤} v = a^{⊤} ϕ (i) \hat{d}

= a^{⊤} ϕ (i) (ϕ (i_{- 1})^{⊤} ϕ (i_{- 1}) + λ_{g} P_{g})^{- 1} ϕ (i_{- 1})^{⊤} Y .

\displaystyle\left(\begin{array}[]{c c}\bigtriangledown_{\bm{d}}\\ \bigtriangledown_{\bm{c}}\end{array}\right)=\left(\begin{array}[]{c}\Phi^{\top}\left\{\bm{Y}-\Phi\bm{d}\right\}+\lambda_{g}\mathbb{P}_{g}\bm{d}\\ \Psi^{\top}\text{diag}\left\{\Phi^{(1)}\bm{d}\right\}\left\{\bm{Y}-\Phi^{\top}\bm{d}\right\}+\lambda_{\beta}\mathbb{P}_{\beta}\bm{c}\end{array}\right)=\left(\begin{array}[]{c}\mathbb{Z}_{g}\\ \mathbb{Z}_{\beta}\end{array}\right)(\bm{Y}-\Phi\bm{d})+\left(\begin{array}[]{c}\lambda_{g}\mathbb{P}_{g}\bm{d}\\ \lambda_{\beta}\mathbb{P}_{\beta}\bm{c}\end{array}\right)

\displaystyle\left(\begin{array}[]{c c}\bigtriangledown_{\bm{d}}\\ \bigtriangledown_{\bm{c}}\end{array}\right)=\left(\begin{array}[]{c}\Phi^{\top}\left\{\bm{Y}-\Phi\bm{d}\right\}+\lambda_{g}\mathbb{P}_{g}\bm{d}\\ \Psi^{\top}\text{diag}\left\{\Phi^{(1)}\bm{d}\right\}\left\{\bm{Y}-\Phi^{\top}\bm{d}\right\}+\lambda_{\beta}\mathbb{P}_{\beta}\bm{c}\end{array}\right)=\left(\begin{array}[]{c}\mathbb{Z}_{g}\\ \mathbb{Z}_{\beta}\end{array}\right)(\bm{Y}-\Phi\bm{d})+\left(\begin{array}[]{c}\lambda_{g}\mathbb{P}_{g}\bm{d}\\ \lambda_{\beta}\mathbb{P}_{\beta}\bm{c}\end{array}\right)

\displaystyle\mathbb{H}=\left(\begin{array}[]{cc}\mathbb{Z}_{g}^{\top}\mathbb{Z}_{g}+\lambda_{g}\mathbb{P}_{g}&\mathbb{Z}_{g}^{\top}\mathbb{Z}_{\beta}\\ \mathbb{Z}_{\beta}^{\top}\mathbb{Z}_{g}&\mathbb{Z}_{\beta}^{\top}\mathbb{Z}_{\beta}+\lambda_{\beta}\mathbb{P}_{\beta}\end{array}\right).

\displaystyle\mathbb{H}=\left(\begin{array}[]{cc}\mathbb{Z}_{g}^{\top}\mathbb{Z}_{g}+\lambda_{g}\mathbb{P}_{g}&\mathbb{Z}_{g}^{\top}\mathbb{Z}_{\beta}\\ \mathbb{Z}_{\beta}^{\top}\mathbb{Z}_{g}&\mathbb{Z}_{\beta}^{\top}\mathbb{Z}_{\beta}+\lambda_{\beta}\mathbb{P}_{\beta}\end{array}\right).

\displaystyle\text{cov}\left(\begin{array}[]{c c}\bm{d}\\ \bm{c}\end{array}\right)=\hat{\sigma}^{2}\mathbb{H}^{-1}\left(\begin{array}[]{c c}\mathbb{Z}_{g}\\ \mathbb{Z}_{\beta}\end{array}\right)^{\top}\left(\begin{array}[]{c c}\mathbb{Z}_{g}\\ \mathbb{Z}_{\beta}\end{array}\right)\mathbb{H}^{-1}

\displaystyle\text{cov}\left(\begin{array}[]{c c}\bm{d}\\ \bm{c}\end{array}\right)=\hat{\sigma}^{2}\mathbb{H}^{-1}\left(\begin{array}[]{c c}\mathbb{Z}_{g}\\ \mathbb{Z}_{\beta}\end{array}\right)^{\top}\left(\begin{array}[]{c c}\mathbb{Z}_{g}\\ \mathbb{Z}_{\beta}\end{array}\right)\mathbb{H}^{-1}

\overset{σ}{^}^{2} = \frac{1}{df _{res}} i = 1 \sum n [Y_{i} - \overset{g}{^} (\int X_{i} \hat{β})]^{2}

\overset{σ}{^}^{2} = \frac{1}{df _{res}} i = 1 \sum n [Y_{i} - \overset{g}{^} (\int X_{i} \hat{β})]^{2}

df_{res} = n - 2 tr (S) + tr (S S^{⊤})

df_{res} = n - 2 tr (S) + tr (S S^{⊤})

\mathbb{S}\doteq\mathbb{S}\left(\lambda_{g},\lambda_{\beta}\right)=\left(\begin{array}[]{c c}\mathbb{Z}_{g}\\ \mathbb{Z}_{\beta}\end{array}\right)\mathbb{H}\left(\mathbb{Z}_{g}^{\top},\ \mathbb{Z}_{\beta}^{\top}\right)

\mathbb{S}\doteq\mathbb{S}\left(\lambda_{g},\lambda_{\beta}\right)=\left(\begin{array}[]{c c}\mathbb{Z}_{g}\\ \mathbb{Z}_{\beta}\end{array}\right)\mathbb{H}\left(\mathbb{Z}_{g}^{\top},\ \mathbb{Z}_{\beta}^{\top}\right)

t ≐ t (λ_{g}, λ_{β}) ≐ \frac{δ ^ ( λ _{g} , λ _{β} )}{sd [ δ ^ ( λ _{g} , λ _{β} ) ]} .

t ≐ t (λ_{g}, λ_{β}) ≐ \frac{δ ^ ( λ _{g} , λ _{β} )}{sd [ δ ^ ( λ _{g} , λ _{β} ) ]} .

sd [\hat{δ} (λ_{g}, λ_{β})] = {[a^{⊤} ϕ (i)] cov (\hat{d}) [a^{⊤} ϕ (i)]^{⊤}}^{\frac{1}{2}} .

sd [\hat{δ} (λ_{g}, λ_{β})] = {[a^{⊤} ϕ (i)] cov (\hat{d}) [a^{⊤} ϕ (i)]^{⊤}}^{\frac{1}{2}} .

u_{λ} = a^{⊤} ϕ (i) (ϕ (i_{- 1})^{⊤} ϕ (i_{- 1}) + λ P)^{- 1} ϕ (i_{- 1})^{⊤},

u_{λ} = a^{⊤} ϕ (i) (ϕ (i_{- 1})^{⊤} ϕ (i_{- 1}) + λ P)^{- 1} ϕ (i_{- 1})^{⊤},

X (t) = i = 1 \sum 25 ξ_{i} ψ_{i} (t),

X (t) = i = 1 \sum 25 ξ_{i} ψ_{i} (t),

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEcology and Vegetation Dynamics Studies · Plant and animal studies · Animal Ecology and Behavior Studies

Full text

**The Jensen Effect and Functional Single Index Models:

Estimating the Ecological Implications of Nonlinear Reaction Norms

** Zi Ye111PhD Candidate, Department of Statistical Science, Cornell University, [email protected], Giles Hooker222Associate Professor, Department of Statistical Science, Cornell University, [email protected], Stephen Ellner333Professor, Department of Ecology and Evolutionary Biology, Cornell University, [email protected]

Abstract

This paper develops tools to characterize how species are affected by environmental variability, based on a functional single index model relating a response such as growth rate to environmental conditions. In ecology, the curvature of such responses are used, via Jensen’s inequality, to determine whether environmental variability is harmful or beneficial, and differing nonlinear responses to environmental variability can contribute to the coexistence of competing species.

Here, we address estimation and inference for these models with observational data on individual responses to environmental conditions. Because nonparametric estimation of the curvature (second derivative) in a nonparametric functional single index model requires unrealistic sample sizes, we instead focus on directly estimating the effect of the nonlinearity, by comparing the average response to a variable environment with the response at the expected environment, which we call the Jensen Effect. We develop a test statistic to assess whether this effect is significantly different from zero. In doing so we re-interpret the SiZer method of Chaudhuri and Marron (1999) by maximizing a test statistic over smoothing parameters. We show that our proposed method works well both in simulations and on real ecological data from the long-term data set described in Drake (2005).

1 Introduction

In natural ecosystems, environmental conditions are highly variable over time and space (e.g., Vasseur and McCann, 2007) and many classical questions in ecology and evolution are therefore concerned with the potential consequences of this variation. Two important topics have been how species’ traits and life histories evolve so that species can persist in environments that may be favorable at some times and unfavorable at others (see Cohen (1966) and Koons et al. (2008)). Differing nonlinear responses to environmental variability can contribute to maintaining the biodiversity of competing species, allowing them to coexist stably (see for example Hutchinson (1961); Chesson and Warner (1981); Ellner (1987); Chesson (1994, 2000b, 2000a)). Nonlinear responses to environmental conditions are also important for forecasting responses to climate change, as environmental variability can either increase population growth rate (Drake (2005); Koons et al. (2009)) or decrease it (Lewontin and Cohen (1969)), depending on the shape of the norm of reaction between environmental variables and the components of population growth rate (survival, growth, and reproduction).

The goal of this paper is to develop methods for determining the effect of environmental variability on some component of population growth. Within mathematical biology, this is a result of the curvature of the model; from Jensen’s inequality convex functions result in higher growth under a variable environment, than when the environment is held constant at its mean, and vice versa for concave functions. However, common statistical models for species growth make parametric assumptions that pre-determine these effects. For example, exponential models will always return convex results, while Michaelis-Menten saturation effects produce concave relationships. Recent statistical research in semi-parametric or nonparametric methods inspires us to understand the effect of environmental variation via nonparametric models that do not impose these assumptions. Specifically, we consider spline-based methods (see Wood (2000), Ramsay (2006), Ramsay et al. (2009) and Ruppert et al. (2003)) to predict nonlinear responses under environmental fluctuation.

The nonparametric model considered here is

[TABLE]

where $\bm{G}$ and $\bm{E}$ are the growth (or future size) and environment of an organism. The function $g$ is the link function to be estimated, and $\epsilon$ is random error. We assume that the environmental $\bm{E}$ is described by a climate history, such as temperature or precipitation observed at a fine time-scale, such as daily resolution. Following Teller et al. (2016), these are thought of as functional covariates, leading to representation of $\bm{E}$ as a functional linear term

[TABLE]

where $\beta\left(t\right)$ is the coefficient function to be estimated, and $X\left(t\right)$ is the observed climate history.

Combining $\left(\ref{model}\right)$ and $\left(\ref{env}\right)$ , a functional model for observed species growth is

[TABLE]

This is the Functional Single Index model introduced in Chen et al. (2011) and Ma (2016). The functional single index model is an extension of the single index model to a functional covariate via $\int X\left(t\right)\beta\left(t\right)\mathrm{d}t$ . This model allows a nonlinear relationship between response $Y$ and covariate function $X\left(t\right)$ . In addition, it improves stability in estimating the link function $g$ through imposing smoothness on $\beta\left(t\right)$ .

With estimates of $g$ and $\bm{E}$ , we wish to assess the impact of environmental variability. Within mathematical biology, this is done by an analysis of $g^{\prime\prime}$ : an always positive second derivative corresponds to $g$ being convex and increased growth under environmental variability due to Jensen’s inequality: $Eg(\bm{E})>g(E\bm{E})$ . Ye and Hooker (2018) investigates estimating $g^{\prime\prime}$ in a functional single index model using a local quadratic estimate for $g$ and demonstrates convergence rates but finds that the resulting estimators are impractical for finite sample size. A brief demonstration of this same effect using our estimators is given in Appendix A.

The methods in this paper bypass estimating curvature and instead we estimate the consequences of environmental variation directly. That is, we define the quantity

[TABLE]

directly and conduct inference about its sign. We have titled this the “Jensen Effect” as being the result of Jensen’s inequality. However, we note that $g$ need not be strictly convex for $\delta$ to be positive (or conversely for it to be negative) and observe that the ecological interest lies in $\delta$ rather than $g^{\prime\prime}$ . Indeed, in our real-world examples in Section 5 we observe that $g$ is often not estimated to have the same curvature over the whole range; nonetheless the interaction of curvature and the distribution of covariates does produce a consistent Jensen Effect.

We argue that this target is also better suited for inference: it avoids estimating derivatives and, by averaging, we expect to gain stability relative to pointwise estimates of $g$ . By contrast, direct inference about curvature requires a test of $\min_{s}g^{\prime\prime}(s)>0$ or $\max_{s}g^{\prime\prime}(s)<0$ , which is substantially less stable. While bounding curvature does allow statements about consequences of variability to be independent of the distribution of $\bm{E}$ , we observe that using a non-parametric estimate of $g$ already constrains the range of the argument $s$ at which we can make statements about $g^{\prime\prime}(s)$ and our tests can be employed using any given alternative distribution for $\bm{E}$ that is of interest.

In order to conduct inference, we take inspiration from the SiZer method of Chaudhuri and Marron (1999). Our estimates $\hat{g}_{\lambda}(\cdot)$ and $\hat{\beta}_{\lambda}(\cdot)$ both employ smoothing parameters. Rather than choosing these parameters, we instead examine the resulting $\hat{\delta}_{\lambda}$ as a function of $\lambda$ and use $\max_{\lambda}|\hat{\delta}_{\lambda}|$ as a test statistic. We can assess the significance of this statistic by treating $\hat{\delta}_{\lambda}$ as a Gaussian process over $\lambda$ and simulating from a null distribution in which $E\hat{\delta}_{\lambda}=0$ . This avoids the need to account for the selection of $\lambda$ as well as allowing us to detect relationships that may not be significant at smoothing parameters chosen by GCV or other criteria.

1.1 Related Literature: Single Index Models

There is a large literature on single index models covering both applied methodology and theoretical properties. The link function $g$ and the coefficient vector $\bm{\beta}$ have been estimated by three different methods. (1) The most widely used is the Projection Pursuit Regression (PPR) approach introduced in Hardle et al. (1993). This method is a nested estimation procedure, with the link function $g$ estimated by local polynomial approximation and the coefficient function by minimizing the MSE. Theoretical properties were studied in Hardle et al. (1993) and Ichimura (1993). (2) The Average Derivative approach was introduced in Hristache et al. (2001). (3) Li (1991) introduced the Sliced Inverse Regression method, which considered the estimation of the coefficient vector as a dimension-reduction problem.

In contrast, there are few studies of the functional single index model. A counterpart to the Projection Pursuit Regression was introduced in Chen et al. (2011), where the coefficient function $\beta$ was approximated by a spline basis and the coefficient vector was estimated. In addition, a convergence rate was found for this method. Ma (2016) used two spline bases to approximate the coefficient function and the link function, respectively, and derived some asymptotic properties of the resulting estimate. Methods that use single index models as an additive term for function-on-scalar regression have been developed and studied in Li et al. (2017).

1.2 Related Literature: SiZer

The SiZer (SIgnificant ZERo crossings of derivative) method that we adopt to assess significance was introduced by Chaudhuri and Marron (1999) to assess the significance of peaks and other features in nonparametric smooths while bypassing the selection of smoothing parameters. Chaudhuri and Marron (1999) observe that smoothing parameters can have a substantial impact on the features observed, and rather than base inference on a selected smoothing parameter, they examine a range of reasonable smooths.

Specifically, SiZer is particularly concerned with the assessment of sign changes of derivatives. Local modes, in particular, can be represented by sign changes in a first derivative. To assess the significance of these Chaudhuri and Marron (1999) obtains a nonparametric smooth $\hat{f}(x;\lambda)$ of a relationship over $x$ indexed by smoothing parameters $\lambda$ . The method then constructs the $t$ -statistic $t(x;\lambda)=\hat{f}^{(k)}(x;\lambda)/\mbox{sd}\left(\hat{f}^{(k)}(x;\lambda)\right)$ . $\hat{f}^{(k)}(x;\lambda)$ is then plotted over both $x$ and $\lambda$ with regions in which $|t(x;\lambda)|>C$ are indicated. In particular, changes from a positive value of $\hat{f}^{(1)}(x;\lambda)$ to a negative value as $x$ is changed indicate a local maximum. Chaudhuri and Marron (1999) present their methods in the context of local polynomial smoothing and kernel density estimation, but exactly the same tools can be employed for any nonparametric method including the smoothing splines we employ here. See Sonderegger et al. (2009) for an example of the use of SiZer in ecological models when searching for points of rapid ecological change to in response to environmental forcing.

The interpretation of the significance of these changes depends on the selection of the critical value $C$ . Chaudhuri and Marron (1999) suggest a variety of choices, including using pointwise significance levels (in which a number of false positives may be expected when searching over both $x$ and $\lambda$ ), using a critical value that approximately controls the familywise error rate in $x$ for each $\lambda$ and obtaining a uniform bound by finding a critical value of $\max_{x,\lambda}\left|t(x;\lambda)\right|$ via a bootstrap. Marron and Zhang (2005) develops approximations for Gaussian processes that reduce the computational burden while improving coverage probabilities.

We borrow from these ideas, and particularly repurpose the search over “scale-space” to examine the Jensen Effect over a range of smoothing parameters. In contrast to SiZer, which examines features over a range of $x$ , the Jensen Effect averages over covariates. Similar to Marron and Zhang (2005), we select a significance threshold designed to control the familywise error rate over smoothing parameters, however we select this from an explicit Gaussian approximation which can be efficiently simulated.

7 smoothing parameter selection by comparing the estimates of a curve over a range of smoothing parameters. Our test statistic is inspired by the SiZer method. Instead of trying to select an optimal smoothing parameter for estimation and inference, we examine estimates over a range of smoothing parameters and conduct inference based on maximizing a test statistic over that range.

In the remainder of this paper we provide details of our estimation procedure for a functional Single Index Model in Section 2, and our assessment and test of the Jensen Effect in Section 3. Simulation studies to assess the efficacy of our test are conducted in Section 4. Section 5 provides a motivating example in which we examine the response of 8 copepod species to water temperature variability in data obtained from the North Temperate Lakes Long Term Experimental Research station where we find evidence for positive adaptation to environmental variability in 5 species. Section 6 concludes.

2 Estimation Procedure

Here we provide details of our representation of $\beta$ and $g$ and our estimation procedure given particular smoothing parameters. Appendix A provides a demonstration of using these approaches to estimate curvature rather than the Jensen effect we examine here.

$\beta$ and $g$ are represented using smoothed basis expansions. Assume that $n$ independent and identically distributed data pairs $\left(X_{1}\left(t\right),Y_{1}\right),\cdots,\left(X_{n}\left(t\right),Y_{n}\right)$ are observed where $X_{j}(t)$ is a real-valued function on [0,1]. The Functional Single Index model is

[TABLE]

where the coefficient function $\beta\left(t\right)$ is, like X(t), defined on the interval $\left[0,1\right]$ . $\epsilon$ is assumed to be Gaussian random error. This integral may need to be evaluated numerically, depending on the representations used for $X_{i}$ and $\beta$ , and we assume that this is done up to ignorable error throughout the calculations below. To ensure identifiability of the model, we require that $\left\|\beta\right\|=\int\beta\left(t\right)^{2}\mathrm{d}t=1$ .

We use a $K_{1}$ -dimensional B-spline basis for the link function $g$ . For any $s$ in the range of possible $\int X\beta$ values, the link function $g$ can be written as

[TABLE]

where $\bm{d}$ is a $K_{1}$ -dimensional column coefficient vector.

We use a $K_{2}$ -dimensional basis for the coefficient function $\beta$ , such that

[TABLE]

where $\bm{c}$ is a $K_{2}$ -dimensional column coefficient vector, and $t\in\left[0,1\right]$ .

The coefficient vectors $\bm{c}$ and $\bm{d}$ are estimated by minimizing a penalized sum of squares

[TABLE]

where $g_{i}\doteq g\left(\int X_{i}\beta\right)$ and the penalty matrices $\left[\mathbb{P}_{\beta}\right]_{ij}=\int\psi_{i}^{(2)}(t)\psi_{j}^{(2)}(t)dt$ and $\left[\mathbb{P}_{g}\right]_{ij}=\int\phi_{i}^{(2)}(t)\phi_{j}^{(2)}(t)dt$ are available analytically for most common choices of basis expansion.

Equation $\left(\ref{PLS}\right)$ specifies a nonlinear optimization problem, which we solve numerically using built-in optimizers in R (see below). Denoting the estimated coefficients as $\hat{\bm{c}}$ and $\hat{\bm{d}}$ , the estimates are

[TABLE]

and

[TABLE]

where $i=1,\cdots,n$ .

2.1 Notes on Implementation and Model Selection

Our objective criterion (4) requires nonlinear numerical optimization. In our experiments below we have used the R function optim with some additional modifications. The simulations reported in Section 4 used the BFGS gradient-based optimizer. However, at large values of $\lambda$ we find that very tight convergence criteria are needed to reduce the numerical error to below that of the estimated noise. This was mitigated with two strategies:

We initialize our optimization with $\bm{d}$ chosen so that $\hat{g}$ is exactly linear and $\bm{c}$ is obtained from functional linear regression. 2. 2.

We re-initialize BFGS once it converges, and run it a second time. BFGS uses a sequentially-calculated approximate Hessian, and can stop early due to poor estimation of this Hessian. Re-initialization resets the approximate Hessian to the identity, so that optimization is restarted with a steepest descent step. 3. 3.

In our motivating data example we additionally attempted re-initializing from the solutions found at surrounding 8 combinations of $\lambda_{g}$ and $\lambda_{\beta}$ and chose the best optima among these. This was repeated until the maximum relative improvement in optima was less than 0.01. This strategy was employed in order to ensure a smooth relationship between smoothing parameters and estimated effects, but can substantially increase computational costs.

To maintain identifiability of our model, we normalize our estimate of $\beta(t)$ within each evaluation of the objective function and multiply by $\text{sign}(\beta(0))$ .

In order to represent $\hat{g}$ with a basis expansion, we need to control the range of its arguments. Throughout our estimates below, we have used the identifiability requirement that $\left\|\beta\right\|=1$ to use a range of $[-S,\ S]$ where $S$ is the largest score for the maximum eigenvalue from a principal components decomposition of the $X_{i}$ . If $|\int X_{i}(t)\beta(t)dt|>S$ for some $i$ , we replace the argument with the corresponding end-point of the range and add a penalty of $|\int X_{i}(t)\beta(t)dt-S|$ to the objective (4). In practice, while we find that this excecdance can occur during optimization, it never appears in the final result.

We find that these procedures are sufficient to provide reliable inference for the Jensen Effect. However, estimates of $g^{\prime\prime}(\cdot)$ can be highly sensitive to initial conditions and optimization strategies Appendix A provides a brief example of the sensitivity of curvature to both noise and initial conditions. We speculate that this sensitivity is a result of a complex optimization landscape in which there are many good estimates of $g$ , but these can vary substantially for $g^{\prime\prime}$ .

While our assessment of statistical significance avoids selecting $\lambda$ , it will be useful to have a value for visualization and for an estimate of residual variance. To choose $\lambda$ , we define a smoother matrix $\mathbb{S}_{\lambda}$ associated with $\lambda$ , and the GCV value for selecting $\lambda$ is calculated from

[TABLE]

where $\mathbb{I}$ is the $\left(n\times n\right)$ -dimensional identity matrix. We derive $\mathbb{S}_{\lambda}$ from a Taylor expansion in (34) below.

3 Jensen Effect

The ecological interest in $g^{\prime\prime}$ is in the comparison of $g\left[\mathrm{E}\left(\bm{E}\right)\right]$ and $\mathrm{E}\left[g\left(\bm{E}\right)\right]$ . Because reliable estimation of $g^{\prime\prime}$ requires unrealistic sample sizes, we instead compare these quantities directly to estimate what we call the “Jensen Effect”.

We define a difference statistic by

[TABLE]

where $\bar{X}=\frac{1}{n}\sum\limits_{i=1}^{n}X_{i}$ . In mathematical analyses, if the link function $g$ is convex, then $\delta>0$ which indicates better growth with a varying environment; otherwise, the difference $\delta<0$ and a constant environment is better for growth. However, this estimate still depends on the smoothing parameters $\lambda_{g}$ and $\lambda_{\beta}$ . Inspired by the SiZer method of Chaudhuri and Marron (1999), we examine the difference $\delta$ over a range of $\lambda$ values for $g$ and $\beta$ , and generate hypothesis tests using the maximum or minimum value of $\delta$ as a function of $\lambda$ .

3.1 Hypothesis Test $1$ : Nonparametric Smoothing

We begin by briefly developing our SiZer-inspired test for a standard smoothing spline (treating the environment $\bm{E}$ as known) before developing the test for a functional single index model. Here defining $\Phi$ to be matrix of evaluations, $\Phi_{ij}=\phi_{j}\left(E_{i}\right)$ and $\mathbb{P}$ to be the second derivative penalty matrix, the standard smoothing spline estimate is

[TABLE]

Define the $\left(n+1\right)$ -dimensional column vector $\bm{a}\doteq\left(\frac{1}{n},\cdots,\frac{1}{n},-1\right)^{\top}$ and the augmented set of evaluation points $\mathbf{e}=\left(E_{1},\ldots,E_{n},\bar{E}\right)^{\top}$ , where $E_{i}=\int X_{i}\hat{\beta}$ (at each observed environment value) and $\bar{E}=\frac{1}{n}\sum\limits_{i=1}^{n}\int X_{i}\hat{\beta}$ (averaged across all environment values), with corresponding evaluation matrix $\Phi^{+}$ . We can write

[TABLE]

which we can standardize to obtain the t-statistic

[TABLE]

in which $\hat{\sigma}$ , the estimate of the standard deviation of the random error $\epsilon$ , is obtained from the value of $\lambda$ selected by GCV (see details below).

Since the response variable $Y$ is Gaussian, the test statistics $t_{\lambda}$ is also a Gaussian process with the covariance function

[TABLE]

which involves no unknown parameters. We can thus use $\max_{\lambda}\left|t_{\lambda}\right|$ as a test statistic, obtaining critical values by simulating from the Gaussian process $\mathbb{GP}\left(\bm{0},\Sigma\right)$ . Under the null hypothesis $\delta=0$ , $t_{\lambda}$ is a Gaussian process with mean $\bm{0}$ .

An important consideration here is that we expect $\delta_{\lambda}$ to inherit smoothing bias, but this should result in under-estimation of the Jensen Effect because it will shrink the estimated second derivative. In analogy to SiZer, by examining $t_{\lambda}$ over the whole range of $\lambda$ we can assess this effect at various levels of smoothing; our use of $\max|t_{\lambda}|$ as a test statistic allows to maintain a conservative test. We do still need to choose $\lambda$ by GCV in our estimate $\hat{\sigma}^{2}$ , because we use the same $\hat{\sigma}^{2}$ when calculating the covariance matrix $\Sigma$ . We expect $\hat{\sigma}^{2}$ to be relatively insensitive to the specific $\lambda$ chosen so long as we do not over-smooth (see arguments in Ruppert et al. (2003)); maintaining a constant $\hat{\sigma}^{2}$ in the $t$ -statistic removes the need to account for changes in $\hat{\sigma}^{2}$ across $\lambda$ .

We note the potential for $\hat{\delta}_{\lambda}$ to change signs over the range of $\lambda$ . For example, if the underlying $g$ is strongly convex in a very narrow region but concave more broadly we might find a positive effect at small values of $\lambda$ and a negative effect at large values as the convex portion of $g$ is smoothed over. We would regard this as good reason to examine the resulting estimates of $g$ with an eye to plausibility at both values of $\lambda$ . In our motivating data in Section 5, we found a couple of examples in which $\delta_{\lambda}$ was declared significant at two values of $\lambda$ where $\delta_{\lambda}$ had opposite signs. One of these could be dismissed easily as occuring only at one set of extreme values. The case of Diacyclops Thomasi required further investigation which we defer to Section 5.

3.2 Hypothesis Test $2$ : Functional Single Index Model

The functional single index model complicates the process described above by including two smoothing parameters and nonlinear effects of $\hat{\beta}$ , necessitating a Taylor expansion to approximate the recipe above. For each pair of smoothing parameters $\left(\lambda_{g},\lambda_{\beta}\right)$ , we obtain an estimate of $\beta$ , $g$ , and $\delta$ , denoted as $\hat{\delta}\left(\lambda_{g},\lambda_{\beta}\right)$ .

Defining

[TABLE]

the estimated difference function given $\left(\lambda_{g},\lambda_{\beta}\right)$ is

[TABLE]

To construct a t-statistic to test the significance of $\delta$ , an estimate of the variance of the difference function $\hat{\delta}$ is needed. The estimated difference function $\hat{\delta}\left(\lambda_{g},\lambda_{\beta}\right)$ is defined on an estimate of $\hat{\bm{c}}$ and $\hat{\bm{d}}$ , which are the coefficients of $\beta$ and $g$ respectively. Therefore, we need to calculate the covariance of the estimated $\hat{\bm{c}}$ and $\hat{\bm{d}}$ .

Recall (4), the penalized least squares criterion to be minimized, and define the matrices of linear basis effects $\Psi_{ij}=\int X_{i}\left(t\right)\psi_{j}\left(t\right)\mathrm{d}t$ and evaluations of the link function bases and derivatives $\Phi^{\left(k\right)}_{ij}=\phi_{j}^{\left(k\right)}\left(\Psi_{i\cdot}\bm{c}\right)$ with $\bm{c}$ taken at its expected estimate. We derive gradients of PLS as

[TABLE]

and expected Hessian

[TABLE]

We can now obtain the sandwich covariance

[TABLE]

where we estimate $\sigma^{2}$ from

[TABLE]

where, following Ruppert et al. (2003), the residual degree of freedom is defined as

[TABLE]

where

[TABLE]

is an approximate smoother matrix in which we use the values of $(\lambda_{g},\lambda_{\beta})$ selected by GCV.

We now define a t-statistic for $\delta$ as a function of $\lambda$ ,

[TABLE]

where $\text{sd}\left[\hat{\delta}\left(\lambda_{g},\lambda_{\beta}\right)\right]$ is given by

[TABLE]

Defining the $n$ -dimensional row vector $\bm{u}_{\lambda}$ as

[TABLE]

the estimated covariance matrix of $\hat{\delta}_{\lambda}$ is $\hat{\sigma}^{2}\bm{u}_{\lambda}\bm{u}_{\lambda}^{\top}$ . $\delta_{\lambda}$ is therefore approximately a Gaussian process indexed by $\lambda$ and we can get the estimated variance of $t_{\lambda}$ from $\frac{\bm{u}_{\lambda}\bm{u}_{\lambda}^{\top}}{\left\|\bm{u}_{\lambda}\right\|^{2}}$ .

We want to test if $\delta\equiv 0$ . Denote the number of the smoothing parameters $\lambda_{g}$ as $m$ , we test $H_{0}$ : $\left(\hat{\delta}_{\lambda_{1}},\cdots,\hat{\delta}_{\lambda_{m}}\right)^{\top}=\bm{0}_{m}$ . Under $H_{0}$ , $\left(t_{\lambda_{1}},\cdots,t_{\lambda_{m}}\right)^{\top}\sim\mathrm{N}\left(\bm{0}_{m},\mathbb{A}_{mm}\right)$ , where $\bm{0}_{m}$ is a $m$ -dimensional column vector, and the covariance matrix $\mathbb{A}$ is $\left(m\times m\right)$ -dimensional with the $\left(i,j\right)$ term equals to $\frac{\bm{u}_{\lambda_{i}}\bm{u}_{\lambda_{j}}^{\top}}{\left\|\bm{u}_{\lambda_{i}}\right\|\left\|\bm{u}_{\lambda_{i}}\right\|}$ . The test statistic that we examine is $T=\max\left\{t_{\lambda_{1}},\cdots,t_{\lambda_{m}}\right\}$ .

In order to obtain a critical value for this statistic, we repeatedly simulate $t_{\lambda}$ from $\mathrm{N}\left(\bm{0}_{m},\mathbb{A}_{mm}\right)$ and obtain a distribution for $\max_{\lambda}\left|t_{\lambda}\right|$ .

4 Simulation Study

In this section, we use simulated data to explore the power of our test for both single index and functional single index models. Computation time for these models depends on numerous properties of the model: the number of smoothing parameter values to try, the nonlinearities of the underlying $g(\cdot)$ , the size of the error variance and the data set size. Because differing settings could result in substantially different computing times for exactly the same problem, we have not given exact timings here, but note that our single index models run within a few minutes per simulation, while functional versions can require ten to thirty minutes. Our real-world examples, with larger data sets and a finer mesh of smoothing parameters required more than half an hour on a recently purchased laptop.

4.1 Single Index Model

We test for a Jensen Effect by calculating the difference function $\delta$ over a range of smoothing parameters. If the link function $g$ is convex, the $\delta$ function will be positive for most of $\lambda$ values, although it may have high variance at low $\lambda$ and high bias at high $\lambda$ . For each simulation, we conduct the hypothesis test introduced in previous section.

Our simulation study starts with the single index model with $p=5$ covariates generated independently and uniformly on $\left[-0.5,0.5\right]$ , and the coefficient $\bm{\beta}=\frac{1}{\sqrt{p}}\bm{1}_{p}$ so that $\left\|\bm{\beta}\right\|=1$ .

To illustrate the Jensen Effect, we choose three different link functions, (1) $g\left(s\right)=e^{s}$ , (2) $g\left(s\right)=-s^{2}$ , (3) $g\left(s\right)=s$ . We represented $g$ by a $25$ -dimensional quintic B-spline basis. For each link function, we simulated $1000$ data sets of size $100$ , with error standard deviation $0.1$ . We obtained critical values for our test by simulating $5000$ normal samples from the null distribution. Figure 1 presents a sample of $\delta_{\lambda}$ and $t_{\lambda}$ functions functions versus $\log(\lambda)$ for $g\left(s\right)=e^{s}$ ; plots for the other link functions are in Appendix B. The rejection rates for these functions are: $99.2\%$ , $99.3\%$ and $5.7\%$ respectively.

4.2 Functional Single Index Model

To define a distribution for the functional covariates, we use a $25$ -dimensional Fourier basis $\bm{\psi}\left(t\right)$ , where $t\in\left[0,1\right]$ . The covariate functions $X\left(t\right)$ are generated as

[TABLE]

where $\xi_{i}\sim\mathrm{N}\left(0,e^{-\left(i-1\right)/12}\right)$ . The coefficient function is

[TABLE]

where $\bm{c}=\left(0,1,1,0.5,0,\cdots,0\right)^{\top}$ .

Again we used the three link functions $g\left(s\right)=e^{2}$ , $g\left(s\right)=-s^{2}$ , $g\left(s\right)=s$ . We represented $g$ by a $25$ -dimensional quintic B-spline basis. For each link function, we generated $1000$ simulated data sets of size $100$ with error standard deviation $0.1$ , and for each such data set we generated $5000$ normal samples from the null distribution to obtain critical values.

A plot of the $\delta_{\lambda}$ and $t_{\lambda}$ functions for $g\left(s\right)=e^{s}$ is presented in Figure 2. We have placed equivalent plots for $g\left(s\right)=-s^{2}$ and $g\left(s\right)=s$ in Appendix C. The rejection rates for the three link functions were $100\%$ , $100\%$ and $7.3\%$ , showing very good power with a reasonable sample size and close to nominal rate when the null hypothesis is true (no curvature).

4.3 Power Analysis

To investigate the power of our test in more detail we consider a series of increasingly nonlinear link functions

[TABLE]

with $0\leq\eta\leq 1.2$ for the single index model and $0\leq\eta\leq 0.8$ for the functional single index model. As $\eta$ increases, $g$ becomes strongly convex. For each $\eta$ , we generate $1000$ simulated data sets and again used $5000$ normal samples under the null distribution to obtain critical values. Due to the computational overhead associated with searching over another smoothing parameter, we used only 200 simulations for the functional single index model. We replicated each simulation changing $n$ to 200 and changing $\sigma$ to 0.2 to test for the expected loss of power with increasing noise and gain of power with sample size.

Figure 3 presents the rejection rate plotted against $\eta$ . We observe a sharp increase as $\eta$ increases, as expected: as the link function $g$ becomes more and more convex, the rejection rate will converge to $1$ . The expected patterns of decreasing power with increasing $\sigma$ and increasing power with $n$ are observed, although a smaller simulation size makes this less clear in the functional single index model.

We also used this experiment as an opportunity to verify that our estimates of residual standard error, $\sigma$ performed well. See Appendix D for further details.

5 Application to real ecological data

To demonstrate application of our tests, we analyze the North Temperate Lakes LTER: Zooplankton - Trout Lake Area data set444Zooplankton records from

https://portal.edirepository.org/nis/mapbrowse?scope=knb-lter-ntl&identifier=37&revision=29

water temperature from

https://portal.edirepository.org/nis/mapbrowse?scope=knb-lter-ntl&identifier=129. An earlier version of these data were analyzed by Drake (2005) to examine the temperature-dependence of copepod populations. In our data set, the density of the populations of eight species of copepods and rotifers, along with water temperature, were recorded from $1981$ to $2015$ in $8$ different lakes. Our choice of species was determined based on the number and length of observations available and differs from those studied in Drake (2005). Note that the growth response in these data is not the growth (in size) of an individual, but the growth (in numbers) of a population, but our functional single index model is still appropriate for this setting.

The values recorded in the original data set are:

$\tt{d}$ : species’ density at a specific time and lake. 2. 2.

$\tt{t}$ : record of water temperature collected as the same time as $\tt{d}$ .

Both measurements were recorded on irregular time points among different years and lakes, so in order to obtain functional covariates, we preprocessed the temperature data by fitting a smoothing spline; see details in Appendix E.

Our response is change in density between successive observed time points normalized by the time change:

[TABLE]

recorded so long as $s_{i+1}-s_{i}<100$ , where $s_{i}$ is the time (in days) of the $i^{th}$ sample.

At the time of each observed response, we used temperature values over the $60$ preceding days as the climate history covariate $X\left(t\right)$ . For each species, we fit a penalized spline functional single index model for the growth in population density as a function of temperature history in each lake; details these procedures are given in Appendix E. We represented $\beta$ with 12 order-6 B-splines covering the 60 day history and re-interpolated the smoothed $X(t)$ processes onto this basis. We used a $25$ -dimensional cubic B-spline basis to represent $g$ because we wanted a linear function to fall in the span of our basis and set its range to be $\pm\max\int\phi_{j}(t)X_{i}(t)dt$ to ensure that the single index values fell within it. We searched over values of $\log_{10}(\lambda)$ in the range $[-6,2]$ for $g$ and $[-2,6]$ for $\beta$ . These were chosen to cover the range of values selected by GCV for most data sets while avoiding spurious significance due to optimization errors.

The Jensen Effect $\delta$ was estimated to be positive at all smoothing parameters in 6 out of the 8 species; in Keratella Earlinae only one extreme combination of smoothing parameters produced a negative $\delta$ . However, statistically significant values of $\delta$ were only found for 5 out of these 7 species (Kellicottia Longispina, Keratella Cochlearis, Keratella Earlinae, Polyarthra Remata, and Polyarthra Vulgaris). Nonetheless, we conclude from this that a majority of the copepod species in this data set are evolved to take advantage of environmental variability. Examining estimates of $\beta(t)$ , we find some of these are undersmoothed at GCV values, but they tend to represent gradients corresponding to either warming or cooling water temperatures, likely associated with seasonal abundance trends. The use of water temperature as a sole covariate means that its effects are conflated with other environmental variables that change seasonally, such as the availability of nutrient sources. We thus cannot conclude a causal relationship, but note that the same analysis can be undertaken while accounting for other covariates if when they are available.

Figure 4 provides a canonical set of plots for the species Polyarthra vulgaris; equivalent plots for the remaining species are presented in Appendix E. The top four panels provide the estimated $g(s)$ , $g^{\prime\prime}(s)$ and $\beta(t)$ estimated at the smoothing parameters selected by GCV as well as a contour plot of $\delta$ over the smoothing parameters with a shaded area that indicates values at which the effect was found not to be significant. In this case, $\delta$ is positive and significant at all smoothing parameters. In order to explore the shape of the response further, we also plot $g(s)$ and $\beta(t)$ at the smoothing parameters that result in the maximum value of $\delta$ . Note that maximizing $\delta$ here results in particularly large edge effects which should be treated with caution.

One species, Diacyclops Thomasi, was an exception to the general pattern in our analyses, producing areas in the ( $\lambda_{g},\lambda_{\beta}$ plant where the estimated $\delta$ has large positive values and other areas where it has large negative values. Figure 5 provides plots of $g(s)$ at 3 smoothing parameter values, as well as $\delta(\lambda_{g},\lambda_{\beta})$ along with indicators of where it is significant in order to visualize the effects of smoothing parameters. We note that $\delta$ is not significant at the values of the smoothing parameters selected by GCV (black square), but it is declared to be significant in both the negative (middle of the plot given by the black circle) and positive (bottom indicated by the triangle) directions. Figure 5 provides plots of $g$ at each of these two points for comparison. The positive estimate is associated with an inflection of $g$ at the low end of the range of $\int X_{i}(t)\hat{\beta}(t)dt$ values, and is likely the result of undersmoothing. We therefore conclude, tentatively, that Diacyclops Thomasi apparently differs from the other species in being harmed by temperature variability, but we feel that further experiments are warranted in this case. We note that this is the only species in which our signal is ambiguous (as opposed to inconclusive). In Keratella Earlinae (Figure 20) a single significant negative value is found at the bottom corner of the contour plot, which we feel can be dismissed; see an equivalent but positive corner effect in Kellicottia Longispina in Figure 18.

6 Conclusion

Environmental variability is ubiquitous, and project to change significantly over the next century as one aspect of global climate change. Projecting how biological populations will respond to climate change therefore requires methods to estimate how they will respond to changes in the variance of conditions, not just the mean.

With the goal of estimating the net effect of environmental variability on components of population growth rate, we first attempted to estimate the curvature of the link function $g$ in a Functional Single Index model. In our penalized spline based method, we found that unrealistic sample sizes were needed to obtain accurate estimates. So instead, we directly investigated the effect of environmental variability by comparing the expected response (averaged across the environmental variation) to the response at the expected environment. We termed this the “Jensen Effect” since it describes the effect of Jensen’s Inequality, but we note that it operates far more broadly than on just convex or concave functions. Inspired by the SiZer method, our test for a nonzero Jensen Effect is based on maximizing a test statistic across a wide range of smoothing parameters encompassing all plausible values, thus avoiding the need for smoothing parameter selection. We have shown that our proposed procedures work well, on both simulated and real data.

There are multiple potential extensions of this methodology. We have used observed data as representative of the covariates of interest, to define the average and distribution of environmental variability. However, the test can be conducted for any assumed distribution of covariates, and it may be of interest to describe regions of single index values in which the estimated response function produces a Jensen Effect. A way to achieve this is to plot the $a$ and $b$ for which $E(g(S))-g(ES)$ is significantly different from 0, when our procedure is applied under the assumption that $S\sim U[a,b]$ .

It may also be of interest to ask about Jensen Effects on different scales of measurement. For example, using size (rather than change in size) as a response in our empirical application to copepod growth rate results in heteroskedasticity in the response, which can be ameliorated by a log transform. We would then employ the model $E\log(Y)=g(\int X(t)\beta(t)dt)$ and the Jensen effect of interest is $E\exp(g(\int X(t)\beta(t)dt))-\exp(g(E\int X(t)\beta(t)dt))$ . Smoothing $\hat{g}_{\lambda}$ biases the Jensen effect towards being positive, and our method would need modifications to take account of this.

The same challenge arises in extending our approach to other response structures, such as survival or count data, which also use nonlinear link functions for fitting within a generalized linear models framework. Any of the standard GLM links could be modified by placing a nonparametric $g(s)$ within the GLM link. But the Jensen Effect then applies to the composition of the GLM link with $g(s)$ , so again there would need to be a way of accounting for the bias introduced by smoothing.

Appendix A A Simulated Demonstration

We present here a brief simulation study to observe the accuracy of curvature estimates. The covariate function $X\left(t\right)$ was generated based on a Fourier basis

[TABLE]

where $\mu\left(t\right)=t$ , $\eta_{1}\left(t\right)=\frac{1}{\sqrt{2}}\sin\left(2\pi t\right)$ , $\eta_{2}\left(t\right)=\frac{1}{\sqrt{2}}\cos\left(2\pi t\right)$ , $\eta_{3}\left(t\right)=\frac{1}{\sqrt{2}}\sin\left(4\pi t\right)$ , $\eta_{4}\left(t\right)=\frac{1}{\sqrt{2}}\cos\left(4\pi t\right)$ , and $\xi_{ik}$ are independent $\mathrm{N}\left(0,\gamma_{k}\right)$ with $\gamma_{1}=1$ , $\gamma_{2}=\frac{1}{2}$ , $\gamma_{3}=\frac{1}{4}$ , $\gamma_{4}=\frac{1}{8}$ . The coefficient function is

[TABLE]

We observe that the coefficients for $\beta$ satisfy $\left\|\bm{c}\right\|=1$ , under an orthonormal basis. The random errors $\epsilon_{i}$ are simulated as i.i.d. Gaussian noise with mean [math] and $\text{var}\left(\epsilon\right)=0.1\text{var}\left[g\left(\int X\beta\right)\right]$ .

We selected the sample size as $n=100$ and examined three link functions:

$g\left(s\right)=e^{-s}$ . 2. 2.

$g\left(s\right)=-s^{2}$ . 3. 3.

$g\left(s\right)=s$ .

To measure the performance of our estimators we define the MSE of the estimated $\beta$ and $g^{\left(k\right)}$ to be

[TABLE]

and

[TABLE]

where $\hat{Y}^{\left(k\right)}_{i}=\hat{g}^{\left(k\right)}\left(\int X_{i}\left(t\right)\hat{\beta}\left(t\right)\mathrm{d}t\right)$ for $k=0,1,\cdots$ .

Of particular concern in the results (Table 1) is the substantial discrepancy between estimates from different initial conditions. Ye and Hooker (2018) similarly observed that second derivative estimates were highly sensitive to the effort placed into optimization.

The plots in Figure 6 provide an example of our results. The estimate of the link function nearly overlaps the true curve, indicating that our estimate of the link function is quite accurate. However, for the second derivative, the estimate deviates from the true curve, becoming negative towards the right-hand limit. This reduced accuracy is also evident in the results in Table 1. These plots indicate that our estimate of the curvature is not good enough to use as a basic for decisions on the convexity of $g$ . In addition, the performance of the estimators varies a lot from different initial values. In Figure 7 we see that different initial conditions can lead to either over- or under-fitting $g"$ . Further examples are provided in Figures 8 and 9 for quadratic and linear generating $g$ respectively.

Appendix B Diagnostic Plots for the Jensen Effect: Single Index Model

Figures 10 and 11 give example $\delta$ functions using a single index model and the corresponding $t$ functions for links $g(s)=-s^{2}$ and $g(s)=s$ respectively.

Appendix C Diagnostic Plots for the Functional Single Index Model

Figures 12 and 13 give example $\delta$ functions using a single index model and the corresponding $t$ functions for links $g(s)=-s^{2}$ and $g(s)=s$ respectively.

Appendix D On Estimates of Residual Variance

We use the power simulations in Section 4.3 to confirm our expectation that we can select $\hat{sigma}$ based on residual squared error at smoothing values selected by GCV. In Figure 14 we examine both single index models and functional single index models. For single index models we plot $\hat{\sigma}_{\lambda}$ as a function of $\lambda$ the first 10 simulations and indicate the value of GCV with an asterisk for each. When the data is generated from a linear relationship, these curves are nearly flat, when we use the maximum value of $\eta$ there are noticeable changes in $\hat{\sigma}_{\lambda}$ but each curve has an extended flat area around the correct value which is reliably estimated by GCV. We note that GCV appropriately chooses large smoothing parameters when the relationship is linear and there is little bias, but selects lower values for simulations with higher curvature. It is less easy to make these plots for functional single index models, and we instead provide a histogram of estimates at the GCV value for both linear and maximally curved relationships where we see that in both cases our estimates focus on approximately the correct values.

Appendix E Plots for the copepod data

In order to obtain a functional covariate, we smoothed water temperature readings in each lake employing B-splines with 21 knots per year and a second derivative penalty with penalty parameter selected by GCV. These were then evaluated on each of the 60 days prior to a population observation and re-projected onto 12 order 6 Bsplines. Yearly temperature curves for each lake are given in Figure 15.

Plots 16 through 23 repeat the top four panels in Figure 4 for each species from the copepod study. For each species, we provide plots of $g(s)$ (top left), $\beta(t)$ (top right), and $g^{\prime\prime}(s)$ (bottom left) along with pointwise confidence intervals. All these plots are given at the values which minimize GCV. The bottom right of each figure plots $\delta$ as a function of both $\lambda_{g}$ and $\lambda_{\beta}$ . Regions where $t_{\lambda_{g},\lambda_{\beta}}$ exceed the critical value are indicated by a white background and a black square gives the minimizing value of GCV.

Bibliography28

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Chaudhuri and Marron (1999) Chaudhuri, P. and J. S. Marron (1999). Sizer for exploration of structures in curves. Journal of the American Statistical Association 94 (447), 807–823.
2Chen et al. (2011) Chen, D., P. Hall, H.-G. Müller, et al. (2011). Single and multiple index functional regression models with nonparametric link. The Annals of Statistics 39 (3), 1720–1747.
3Chesson (1994) Chesson, P. (1994). Multispecies competition in variable environments. Theoretical population biology 45 (3), 227–276.
4Chesson (2000 a) Chesson, P. (2000 a). General theory of competitive coexistence in spatially-varying environments. Theoretical population biology 58 (3), 211–237.
5Chesson (2000 b) Chesson, P. (2000 b). Mechanisms of maintenance of species diversity. Annual review of Ecology and Systematics 31 (1), 343–366.
6Chesson and Warner (1981) Chesson, P. L. and R. R. Warner (1981). Environmental variability promotes coexistence in lottery competitive systems. The American Naturalist 117 (6), 923–943.
7Cohen (1966) Cohen, D. (1966). Optimizing reproduction in a randomly varying environment. Journal of theoretical biology 12 (1), 119–129.
8Drake (2005) Drake, J. M. (2005). Population effects of increased climate variation. Proceedings of the Royal Society of London B: Biological Sciences 272 (1574), 1823–1827.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Abstract

1 Introduction

1.1 Related Literature: Single Index Models

1.2 Related Literature: SiZer

2 Estimation Procedure

2.1 Notes on Implementation and Model Selection

3 Jensen Effect

3.1 Hypothesis Test 111: Nonparametric Smoothing

3.2 Hypothesis Test 222: Functional Single Index Model

4 Simulation Study

4.1 Single Index Model

4.2 Functional Single Index Model

4.3 Power Analysis

5 Application to real ecological data

6 Conclusion

Appendix A A Simulated Demonstration

Appendix B Diagnostic Plots for the Jensen Effect: Single Index Model

Appendix C Diagnostic Plots for the Functional Single Index Model

Appendix D On Estimates of Residual Variance

Appendix E Plots for the copepod data

3.1 Hypothesis Test $1$ : Nonparametric Smoothing

3.2 Hypothesis Test $2$ : Functional Single Index Model