Super-Consistent Estimation of Points of Impact in Nonparametric   Regression with Functional Predictors

Dominik Po{\ss}; Dominik Liebl; Alois Kneip; Hedwig Eisenbarth; Tor D.; Wager; Lisa Feldman Barrett

arXiv:1905.09021·math.ST·July 14, 2020

Super-Consistent Estimation of Points of Impact in Nonparametric Regression with Functional Predictors

Dominik Po{\ss}, Dominik Liebl, Alois Kneip, Hedwig Eisenbarth, Tor D., Wager, Lisa Feldman Barrett

PDF

TL;DR

This paper introduces a super-consistent estimator for identifying specific impactful points in functional predictors within nonparametric regression models, improving accuracy without prior knowledge of model components.

Contribution

The authors develop a novel estimator for points of impact that achieves super-consistent convergence and does not require pre-estimates of other model parts.

Findings

01

Estimator has super-consistent convergence rate

02

Method performs well in finite samples

03

Applicable to nonparametric and generalized linear models

Abstract

Predicting scalar outcomes using functional predictors is a classic problem in functional data analysis. In many applications, however, only specific locations or time-points of the functional predictors have an impact on the outcome. Such ``points of impact'' are typically unknown and have to be estimated in addition to estimating the usual model components. We show that our points of impact estimator enjoys a super-consistent convergence rate and does not require knowledge or pre-estimates of the unknown model components. This remarkable result facilitates the subsequent estimation of the remaining model components as shown in the theoretical part, where we consider the case of nonparametric models and the practically relevant case of generalized linear models. The finite sample properties of our estimators are assessed by means of a simulation study. Our methodology is motivated by…

Figures10

Click any figure to enlarge with its caption.

Tables2

Table 1. Table 2: Mean Average Squared Errors (MASE) for the nonparametric estimator g ^ τ ^ subscript ^ 𝑔 ^ 𝜏 \widehat{g}_{\widehat{\tau}}

DGP	$p$	$n$	TRH	MPDP
			MASE
2	100	100	0.098	0.100
2	100	200	0.061	0.089
2	100	500	0.017
2	500	100	0.097	0.098
2	500	200	0.064	0.092
2	500	500	0.023
2	1000	100	0.094
2	1000	200	0.060
2	1000	500	0.022
3	100	100	0.155	0.175
3	100	200	0.105	0.156
3	100	500	0.058
3	500	100	0.150	0.173
3	500	200	0.102	0.161
3	500	500	0.060
3	1000	100	0.149
3	1000	200	0.100
3	1000	500	0.059

Table 2. Table 3: Average Mean Squared Errors (AvgMSE) for τ ^ 1 , … , τ ^ S subscript ^ 𝜏 1 … subscript ^ 𝜏 𝑆 \widehat{\tau}_{1},\dots,\widehat{\tau}_{S}

DGP	$p$	$n$	TRH	MPDP
			AvgMSE
2	100	100	0.0002	0.0063
2	100	200	0.0001	0.0023
2	100	500	0.0000
2	500	100	0.0002	0.0084
2	500	200	0.0001	0.0013
2	500	500	0.0000
2	1000	100	0.0002
2	1000	200	0.0001
2	1000	500	0.0000
3	100	100	0.0002	0.0186
3	100	200	0.0001	0.0036
3	100	500	0.0000
3	500	100	0.0002	0.0218
3	500	200	0.0001	0.0035
3	500	500	0.0000
3	1000	100	0.0002
3	1000	200	0.0001
3	1000	500	0.0000

Equations685

\displaystyle Y=g\big{(}X(\tau_{1}),\dots,X(\tau_{S})\big{)}+\varepsilon,

\displaystyle Y=g\big{(}X(\tau_{1}),\dots,X(\tau_{S})\big{)}+\varepsilon,

\displaystyle Y_{i}=g\big{(}X_{i}(\tau_{1}),\dots,X_{i}(\tau_{S})\big{)}+\varepsilon_{i},

\displaystyle Y_{i}=g\big{(}X_{i}(\tau_{1}),\dots,X_{i}(\tau_{S})\big{)}+\varepsilon_{i},

σ (s, t) = ω (s, t, ∣ s - t ∣^{κ}) .

σ (s, t) = ω (s, t, ∣ s - t ∣^{κ}) .

σ (s, t) = ω (s, t, ∣ s - t ∣^{2 H}) = (s^{2 H} + t^{2 H} - ∣ s - t ∣^{2 H}) c_{2}

σ (s, t) = ω (s, t, ∣ s - t ∣^{2 H}) = (s^{2 H} + t^{2 H} - ∣ s - t ∣^{2 H}) c_{2}

\sigma(s,t)=\omega_{\nu}(s,t,|s-t|)=\frac{\pi\phi}{2^{\nu-1}\Gamma(\nu+1/2)\alpha^{2\nu}}(\alpha|s-t|)^{\nu}K_{\nu}\big{(}\alpha|s-t|\big{)},

\sigma(s,t)=\omega_{\nu}(s,t,|s-t|)=\frac{\pi\phi}{2^{\nu-1}\Gamma(\nu+1/2)\alpha^{2\nu}}(\alpha|s-t|)^{\nu}K_{\nu}\big{(}\alpha|s-t|\big{)},

f_{X Y} (s) := E (X_{i} (s) Y_{i}) = r = 1 \sum S ϑ_{r} σ (s, τ_{r}) for s \in [a, b] .

f_{X Y} (s) := E (X_{i} (s) Y_{i}) = r = 1 \sum S ϑ_{r} σ (s, τ_{r}) for s \in [a, b] .

f_{X Y} (s) - \frac{1}{2} (f_{X Y} (s + δ) + f_{X Y} (s - δ)) .

f_{X Y} (s) - \frac{1}{2} (f_{X Y} (s + δ) + f_{X Y} (s - δ)) .

Z_{δ, i} (s) := X_{i} (s) - \frac{1}{2} (X_{i} (s - δ) + X_{i} (s + δ)) for s \in [a + δ, b - δ],

Z_{δ, i} (s) := X_{i} (s) - \frac{1}{2} (X_{i} (s - δ) + X_{i} (s + δ)) for s \in [a + δ, b - δ],

E (Z_{δ, i} (s) Y_{i}) = f_{X Y} (s) - \frac{1}{2} (f_{X Y} (s + δ) + f_{X Y} (s - δ)) .

E (Z_{δ, i} (s) Y_{i}) = f_{X Y} (s) - \frac{1}{2} (f_{X Y} (s + δ) + f_{X Y} (s - δ)) .

\begin{array}[]{ll}\mathbb{E}(Z_{\delta,i}(s)Y_{i})=\vartheta_{r}c(\tau_{r})\delta^{\kappa}+o(\delta^{\kappa})&\text{if }s\in\{\tau_{1},\dots,\tau_{S}\}\\ \mathbb{E}(Z_{\delta,i}(s)Y_{i})=O(\delta^{2})&\text{if }s\notin\{\tau_{1},\dots,\tau_{S}\},\end{array}

\begin{array}[]{ll}\mathbb{E}(Z_{\delta,i}(s)Y_{i})=\vartheta_{r}c(\tau_{r})\delta^{\kappa}+o(\delta^{\kappa})&\text{if }s\in\{\tau_{1},\dots,\tau_{S}\}\\ \mathbb{E}(Z_{\delta,i}(s)Y_{i})=O(\delta^{2})&\text{if }s\notin\{\tau_{1},\dots,\tau_{S}\},\end{array}

\frac{1}{n} i = 1 \sum n Z_{δ, i} (s) Y_{i} - E (Z_{δ, i} (s) Y_{i}) = O_{P} (δ^{κ} / n) .

\frac{1}{n} i = 1 \sum n Z_{δ, i} (s) Y_{i} - E (Z_{δ, i} (s) Y_{i}) = O_{P} (δ^{κ} / n) .

S = min {l \in N_{0} : \frac{\frac{1}{n} \sum _{i = 1}^{n} Z _{δ, i} ( τ _{l + 1} ) Y _{i}}{( \frac{1}{n} \sum _{i = 1}^{n} Z _{δ, i} ( τ _{l + 1} ) ^{2} ) ^{1/2}} < λ} for some threshold λ > 0.

S = min {l \in N_{0} : \frac{\frac{1}{n} \sum _{i = 1}^{n} Z _{δ, i} ( τ _{l + 1} ) Y _{i}}{( \frac{1}{n} \sum _{i = 1}^{n} Z _{δ, i} ( τ _{l + 1} ) ^{2} ) ^{1/2}} < λ} for some threshold λ > 0.

S r = 1, \dots, S max S s = 1, \dots, S S min ∣ τ_{r} - τ_{s} ∣ = O_{P} (n^{- 1/ κ}) .

S r = 1, \dots, S max S s = 1, \dots, S S min ∣ τ_{r} - τ_{s} ∣ = O_{P} (n^{- 1/ κ}) .

λ \equiv λ_{n} = A \frac{σ _{∣ y ∣}^{2}}{n} lo g (\frac{b - a}{δ}), A > D, and δ^{2} = O (n^{- 1}), then

λ \equiv λ_{n} = A \frac{σ _{∣ y ∣}^{2}}{n} lo g (\frac{b - a}{δ}), A > D, and δ^{2} = O (n^{- 1}), then

P (S = S) \to 1 as n \to \infty.

P (S = S) \to 1 as n \to \infty.

\operatorname{\mathbb{E}}\big{(}X_{i}(s)Y_{i}\big{)}=\sum_{r=1}^{S}\sigma^{*}(s,\tau_{r})\operatorname{\mathbb{E}}\left(V_{i}^{2}\vartheta_{r}(V_{i})\right)=\sum_{r=1}^{S}\sigma(s,\tau_{r})\frac{\operatorname{\mathbb{E}}\left(V_{i}^{2}\vartheta_{r}(V_{i})\right)}{\operatorname{\mathbb{V}}(V_{i})},

\operatorname{\mathbb{E}}\big{(}X_{i}(s)Y_{i}\big{)}=\sum_{r=1}^{S}\sigma^{*}(s,\tau_{r})\operatorname{\mathbb{E}}\left(V_{i}^{2}\vartheta_{r}(V_{i})\right)=\sum_{r=1}^{S}\sigma(s,\tau_{r})\frac{\operatorname{\mathbb{E}}\left(V_{i}^{2}\vartheta_{r}(V_{i})\right)}{\operatorname{\mathbb{V}}(V_{i})},

\begin{array}[]{ll}\mathbb{E}(Z_{\delta,i}(s)Y_{i})=C(\tau_{r})\delta^{\kappa}+o(\delta^{\kappa})&\text{if }s\in\{\tau_{1},\dots,\tau_{r}\}\\ \mathbb{E}(Z_{\delta,i}(s)Y_{i})=O(\delta^{2})&\text{if }s\notin\{\tau_{1},\dots,\tau_{r}\}\end{array}

\begin{array}[]{ll}\mathbb{E}(Z_{\delta,i}(s)Y_{i})=C(\tau_{r})\delta^{\kappa}+o(\delta^{\kappa})&\text{if }s\in\{\tau_{1},\dots,\tau_{r}\}\\ \mathbb{E}(Z_{\delta,i}(s)Y_{i})=O(\delta^{2})&\text{if }s\notin\{\tau_{1},\dots,\tau_{r}\}\end{array}

\tilde{Z}^{(4)}_{\delta,i}(s):=X_{i}(s)-\frac{2}{3}\big{(}X_{i}(s-\delta)+X_{i}\big{(}s+\delta)\big{)}+\frac{1}{6}\big{(}X_{i}(s-2\delta)+X_{i}(s+2\delta)\big{)}.

\tilde{Z}^{(4)}_{\delta,i}(s):=X_{i}(s)-\frac{2}{3}\big{(}X_{i}(s-\delta)+X_{i}\big{(}s+\delta)\big{)}+\frac{1}{6}\big{(}X_{i}(s-2\delta)+X_{i}(s+2\delta)\big{)}.

\displaystyle\widehat{g}_{\widehat{\tau}}(x_{1},\dots,x_{S})=\frac{\sum_{i=1}^{n}K\Big{(}\frac{X_{i}(\widehat{\tau}_{1})-x_{1}}{h_{1}},\dots,\frac{X_{i}(\widehat{\tau}_{S})-x_{S}}{h_{S}}\Big{)}Y_{i}}{\sum_{i=1}^{n}K\Big{(}\frac{X_{i}(\widehat{\tau}_{1})-x_{1}}{h_{1}},\dots,\frac{X_{i}(\widehat{\tau}_{S})-x_{S}}{h_{S}}\Big{)}},

\displaystyle\widehat{g}_{\widehat{\tau}}(x_{1},\dots,x_{S})=\frac{\sum_{i=1}^{n}K\Big{(}\frac{X_{i}(\widehat{\tau}_{1})-x_{1}}{h_{1}},\dots,\frac{X_{i}(\widehat{\tau}_{S})-x_{S}}{h_{S}}\Big{)}Y_{i}}{\sum_{i=1}^{n}K\Big{(}\frac{X_{i}(\widehat{\tau}_{1})-x_{1}}{h_{1}},\dots,\frac{X_{i}(\widehat{\tau}_{S})-x_{S}}{h_{S}}\Big{)}},

g_{τ} (x_{1}, \dots, x_{S}) - g (x_{1}, \dots, x_{S}) = O_{p} (r = 1 \sum S h_{r}^{2} + (n h_{1} \dots h_{S})^{- 1/2} + r = 1 \sum S \frac{1}{n ^{m i n {1, 1/ κ}} ( h _{1} \dots h _{S} ) h _{r}^{2}})

g_{τ} (x_{1}, \dots, x_{S}) - g (x_{1}, \dots, x_{S}) = O_{p} (r = 1 \sum S h_{r}^{2} + (n h_{1} \dots h_{S})^{- 1/2} + r = 1 \sum S \frac{1}{n ^{m i n {1, 1/ κ}} ( h _{1} \dots h _{S} ) h _{r}^{2}})

\widehat{g}_{\widehat{\tau}}(x_{1},\dots,x_{S})-g(x_{1},\dots,x_{S})=O_{p}\big{(}n^{-2/(S+4)}\big{)}.

\widehat{g}_{\widehat{\tau}}(x_{1},\dots,x_{S})-g(x_{1},\dots,x_{S})=O_{p}\big{(}n^{-2/(S+4)}\big{)}.

\displaystyle Y_{i}=g\Big{(}\alpha+\sum_{r=1}^{S}\beta_{r}X_{i}(\tau_{r})\Big{)}+\varepsilon_{i},

\displaystyle Y_{i}=g\Big{(}\alpha+\sum_{r=1}^{S}\beta_{r}X_{i}(\tau_{r})\Big{)}+\varepsilon_{i},

η_{i} = α + r = 1 \sum S β_{r} X_{i} (τ_{r})

η_{i} = α + r = 1 \sum S β_{r} X_{i} (τ_{r})

\displaystyle\mathbb{E}\left(\bigg{(}g(\alpha+\sum_{r=1}^{S}\beta_{r}X_{i}(\tau_{r}))-g(\alpha^{*}+\sum_{r=1}^{S^{*}}\beta_{r}^{*}X_{i}(\tau_{r}))\bigg{)}^{2}\right)>0,

\displaystyle\mathbb{E}\left(\bigg{(}g(\alpha+\sum_{r=1}^{S}\beta_{r}X_{i}(\tau_{r}))-g(\alpha^{*}+\sum_{r=1}^{S^{*}}\beta_{r}^{*}X_{i}(\tau_{r}))\bigg{)}^{2}\right)>0,

U_{n} (β) = D_{n} (β)^{T} V_{n} (β)^{- 1} (Y_{n} - μ_{n} (β)) .

U_{n} (β) = D_{n} (β)^{T} V_{n} (β)^{- 1} (Y_{n} - μ_{n} (β)) .

F_{n} (β) = D_{n} (β)^{T} V_{n} (β)^{- 1} D_{n} (β) and F_{n} (β) = D_{n} (β)^{T} V_{n} (β)^{- 1} D_{n} (β) .

F_{n} (β) = D_{n} (β)^{T} V_{n} (β)^{- 1} D_{n} (β) and F_{n} (β) = D_{n} (β)^{T} V_{n} (β)^{- 1} D_{n} (β) .

\displaystyle\sqrt{n}(\widehat{\operatorname{\boldsymbol{\beta}}}-\operatorname{\boldsymbol{\beta}}_{0})\stackrel{{\scriptstyle d}}{{\to}}N\big{(}\operatorname{\boldsymbol{0}},(\mathbb{E}(\operatorname{\mathbf{F}}(\operatorname{\boldsymbol{\beta}}_{0})))^{-1}\big{)}.

\displaystyle\sqrt{n}(\widehat{\operatorname{\boldsymbol{\beta}}}-\operatorname{\boldsymbol{\beta}}_{0})\stackrel{{\scriptstyle d}}{{\to}}N\big{(}\operatorname{\boldsymbol{0}},(\mathbb{E}(\operatorname{\mathbf{F}}(\operatorname{\boldsymbol{\beta}}_{0})))^{-1}\big{)}.

BIC_{X} (δ) = - 2 lo g L_{X} + K_{X} lo g (n) .

BIC_{X} (δ) = - 2 lo g L_{X} + K_{X} lo g (n) .

\displaystyle\widehat{\operatorname{\boldsymbol{\beta}}}_{m}=\widehat{\operatorname{\boldsymbol{\beta}}}_{m-1}+\big{(}\widehat{\operatorname{\mathbf{F}}}_{n}(\widehat{\operatorname{\boldsymbol{\beta}}}_{m-1})\big{)}^{-1}\widehat{\operatorname{\mathbf{U}}}_{n}(\widehat{\operatorname{\boldsymbol{\beta}}}_{m-1}).

\displaystyle\widehat{\operatorname{\boldsymbol{\beta}}}_{m}=\widehat{\operatorname{\boldsymbol{\beta}}}_{m-1}+\big{(}\widehat{\operatorname{\mathbf{F}}}_{n}(\widehat{\operatorname{\boldsymbol{\beta}}}_{m-1})\big{)}^{-1}\widehat{\operatorname{\mathbf{U}}}_{n}(\widehat{\operatorname{\boldsymbol{\beta}}}_{m-1}).

E

E

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

\newcites

appendixReferences

Super-Consistent Estimation of Points of Impact in Nonparametric Regression with Functional Predictors

Dominik Poß, Dominik Liebl, Alois Kneip††footnotemark: , Hedwig Eisenbarth, Tor D. Wager, Institute of Finance and Statistics, University of Bonn, Bonn, GermanyInstitute of Finance and Statistics and Hausdorff Center for Mathematics, University of Bonn, Bonn, GermanySchool of Psychology, Victoria University of Wellington, Wellington, New ZealandPresidential Cluster in Neuroscience and Department of Psychological and Brain Sciences, Dartmouth College, Hanover, New Hampshire, USA

and Lisa Feldman Barrett Department of Psychology, Northeastern University and Department of Psychiatry, Massachusetts General Hospital/Harvard Medical School, Boston, Massachusetts, USA; and Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Charlestown, Massachusetts, USA

Abstract

Predicting scalar outcomes using functional predictors is a classic problem in functional data analysis. In many applications, however, only specific locations or time-points of the functional predictors have an impact on the outcome. Such “points of impact” are typically unknown and have to be estimated in addition to estimating the usual model components. We show that our points of impact estimator enjoys a super-consistent convergence rate and does not require knowledge or pre-estimates of the unknown model components. This remarkable result facilitates the subsequent estimation of the remaining model components as shown in the theoretical part, where we consider the case of nonparametric models and the practically relevant case of generalized linear models. The finite sample properties of our estimators are assessed by means of a simulation study. Our methodology is motivated by data from a psychological experiment in which the participants were asked to continuously rate their emotional state while watching an affective video eliciting a varying intensity of emotional reactions.

Keywords: functional data analysis; variable selection; nonparametric regression; quasi-maximum likelihood; emotional stimuli; online video rating

1 Introduction

Identifying important time points in time continuous trajectories is a difficult but highly relevant problem. For instance, current psychological research on emotional experiences often includes time continuous stimuli such as videos to induce emotional states, say $X(t)\in\mathbb{R}$ , with $t\in[a,b]$ , where $a$ denotes the start of the video and $b$ the end (see Figure 1). The evaluation of such stimuli is based on asking participants whether the video made them feel negative, say $Y=0$ , or positive, say $Y=1$ . In this paper we consider a novel dataset where participants were asked to continuously report their emotional states while watching an affective documentary video on the persecution of African albinos. After watching the video, the participants were asked to rate their final overall feeling. Psychologists are interested in understanding how such concluding overall ratings relate to the fluctuations of the emotional states while watching the video, as this has implications for the way emotional states are assessed in research using such material. Due to a lack of appropriate statistical methods, existing approaches use heuristics such as the “peak-and-end rule” in order to link the overall ratings with the continuous emotional stimuli (see Section 5). Such heuristic approaches, however, can produce results that do not accurately capture the summary rating and can be easily over-interpreted, as there is no unbiased formal inference about which time points contribute to the summary rating. By contrast, our new methodology allows us to identify the crucial affective video scenes – the basic prerequisite to understanding the emergence of emotional states in this kind of experiment.

The identification of “influential” stimuli occurring in a video corresponds to identify corresponding time points $\tau\in(a,b)$ . We aim to estimate such time points within the nonparametric model

[TABLE]

where $\tau_{1},\dots,\tau_{S}\in(a,b)$ and their number $S\in\mathbb{N}$ are unknown and need to be estimated. The values $\tau_{1},\dots,\tau_{S}$ are called points of impact and provide specific locations at which the functional predictor $X\in L^{2}([a,b])$ influences the scalar outcome $Y$ . In our real data application in Section 5, $Y$ is a binary variable and the functional predictor $X$ is evaluated at two estimated points of impact $\hat{\tau}_{1}$ and $\hat{\tau}_{2}$ ; see Figure 5.

Our method builds upon the work of Kneip et al. (2016), however, we consider the much more challenging case of estimating points of impact within a fully nonparametric function $g$ . A remarkable feature of our method is that identification and estimation of the points of impact $\tau_{1},\dots,\tau_{S}$ neither require knowledge about the nonparametric function $g$ nor an estimate of $g$ . The estimation of the points of impact $\tau_{1},\dots,\tau_{S}$ is thus robust to model misspecifications and free of additional contaminating estimation errors. This result goes far beyond the special case of a functional linear model as considered by Kneip et al. (2016).

To the best of our knowledge, the problem of estimating $\tau_{1},\dots,\tau_{S}$ in (1) is, so far, only considered by Ferraty et al. (2010), who propose to estimate $g$ nonparametrically for any combination of point of impact candidates $t^{\ast}_{1},\dots,t^{\ast}_{S^{\ast}}\in\{t_{j}:t_{j}=a+j(b-a)/p\,\text{ with }\,j=1,\dots,p\}$ and to select the best model using cross-validation. This brute force method, however, becomes problematic in practice for $S\geq 2$ and large $p$ . Furthermore, the nonparametric estimation of $g$ implies that the points of impact $\tau_{1},\dots,\tau_{S}$ can be estimated at most with the non-parametric rate $n^{-2/(4+S)}$ , where $n$ denotes the sample size. Here the speed of convergence decreases dramatically for dimensions $S\geq 2$ . By contrast, we can estimate the points of impact with a super-consistent convergence rate, that is, faster than the parametric rate $n^{-1/2}$ , and our estimation algorithm is applicable in practice for any fixed $S$ and large $p\gg n$ .

The super-consistency result for our points of impact estimators is very beneficial for subsequent estimation problems and allows us to estimate the unknown function $g$ as if the points of impact were known. We demonstrate this for a nonparametric model $g$ as well as for the practically relevant case of generalized linear models with linear predictors, that is, $g(X(\tau_{1}),\dots,X(\tau_{S}))=g(\alpha+\sum_{r=1}^{S}\beta_{r}X(\tau_{r}))$ with assumed known parametric link function $g$ .

So far, the purely nonparametric framework is only considered by Ferraty et al. (2010). The case of a known $g$ and linear predictor function $\alpha+\sum_{r=1}^{S}\beta_{r}X(\tau_{r})$ had already been considered by previous studies; however, none of these studies provides a super-consistent estimation of points of impact independent of the model $g$ . The term “impact point” was coined by Lindquist and McKeague (2009) and McKeague and Sen (2010). Lindquist and McKeague (2009) consider a logistic regression framework and McKeague and Sen (2010) consider a linear regression framework. A point of impact model, where $S=1$ is assumed known, has also been studied in survival analysis for the Cox regression model (Zhang, 2012). Kneip et al. (2016) allow for an unknown number $S\geq 0$ of points of impact augmenting the functional linear regression model. Liebl et al. (2020) propose an improved estimation algorithm for the latter work. Aneiros and Vieu (2014) consider a linear regression framework with multiple points of impact postulating the existence of some consistent estimation procedure. Berrendero et al. (2019) consider a linear regression framework and propose a reproducing kernel Hilbert space approach. Selecting sparse features from functional data $X$ is also useful for clustering. For instance, Floriello and Vitelli (2017) propose a method for sparse clustering of functional data. In a slightly different context, Park et al. (2016) focus on selecting predictive subdomains of the functional data. Related to this paper is also the work of Lindquist (2012) and Sobel and Lindquist (2014). Lindquist (2012) extends structural equation models to the functional data analysis setting and uses his methodology to select significantly impacting time-intervals in functional magnetic resonance imaging (fMRI) data. Sobel and Lindquist (2014) propose a mixed effects model which facilitates selecting significant impact regions in fMRI data by controlling for systematic measurement errors.

The rest of this work is structured as follows. Section 2 considers the estimation of the points of impact $\tau_{r}$ and their number $S$ independent of the model $g$ . Subsequent estimation of the function $g$ is discussed in Section 3. The simulation study and the real data application are in Sections 4 and 5. All proofs and additional simulation results can be found in the appendixes of the supplementary paper supporting this article (Poß et al., 2020). The R-package fdapoi and the R-scripts for reproducing our main empirical results are also provided as part of the online supplementary material (Poß et al., 2020).

2 Estimating points of impact

In the following we present our theoretical framework (Section 2.1), the estimation algorithm (Section 2.2) and our asymptotic results (Section 2.3). The section concludes with a discussion of possibilities to generalize our theoretical results (Section 2.4).

2.1 Theoretical framework

In this section we present our theoretical framework for estimating the points of impact $\tau_{1},\dots,\tau_{S}$ without knowing or (pre-)estimating the possibly nonparametric model function $g$ . The identification of points of impact constitutes a particular variable selection problem. Since we consider the case where the functional predictor is observed over a densely discretized grid, one might be tempted to apply multivariate variable selection methods like Lasso or related procedures. Note, however, that the high correlation of the predictor at neighboring discretization points violates the basic requirements of these multivariate variable selection procedures.

Suppose we are given an i.i.d. sample of data $(X_{i},Y_{i})$ , $i=1,\dots,n$ , where $X_{i}=\{X_{i}(t),t\in[a,b]\}$ is a stochastic process with $\mathbb{E}(\int_{a}^{b}X_{i}(t)^{2}\,dt)<\infty$ , $[a,b]$ is a compact subset of $\mathbb{R}$ and $Y_{i}$ is a real valued random variable. It is assumed that the relationship between $Y_{i}$ and the functional predictor $X_{i}$ can be modeled as

[TABLE]

where $\varepsilon_{i}$ denotes the statistical error term with $\mathbb{E}(\varepsilon_{i}|X_{i}(t))=0$ for all $t\in[a,b]$ . The number $0\leq S<\infty$ and the points of impact $\tau_{1},\dots,\tau_{S}$ are unknown and have to be estimated from the data – without knowing the true model function $g$ . The points of impact $\tau_{1},\dots,\tau_{S}$ indicate the locations at which the functional values $X_{i}(\tau_{1}),\dots,X_{i}(\tau_{S})$ have a specific influence on $Y_{i}$ . Without loss of generality, we consider centered random functions $X_{i}$ with $\operatorname{\mathbb{E}}(X_{i}(t))=0$ for all $t\in[a,b]$ .

Surprisingly, the unknown function $g$ has to fulfill only some very mild regularity conditions and does not have to be estimated in order to estimate the points of impact $\tau_{1},\dots,\tau_{S}$ (see Theorem 2.1). Estimating points of impact, however, necessarily depends on the structure of $X_{i}$ . Motivated by our application we consider stochastic processes with rough sample paths such as (fractional) Brownian motion, Ornstein-Uhlenbeck processes, Poisson processes, etc. These processes also have important applications in fields such as finance, chemometrics, econometrics, and the analysis of gene expression data (Lee and Ready, 1991; Levina et al., 2007; Dagsvik and Strøm, 2006; Rohlfs et al., 2013). Common to these processes are covariance functions $\sigma(s,t)=\operatorname{\mathbb{E}}(X_{i}(s)X_{i}(t))$ which are two times continuously differentiable for all points $s\neq t$ , but not two times differentiable at the diagonal $s=t$ . The following assumption on the covariance function of $X_{i}$ describes a very large class of such stochastic processes and allows us to derive precise theoretical results:

Assumption 2.1.

For some open subset $\Omega\subset\mathbb{R}^{3}$ with $[a,b]^{2}\times[0,b-a]\subset\Omega$ , there exists a twice continuously differentiable function $\omega:\Omega\rightarrow\mathbb{R}$ as well as some $0<\kappa<2$ such that for all $s,t\in[a,b]$

[TABLE]

Moreover, $0<\inf_{t\in[a,b]}c(t)$ , where $c(t):=-\frac{\partial}{\partial z}\omega(t,t,z)|_{z=0}$ .

The parameter $\kappa$ quantifies the degree of smoothness of the covariance function $\sigma$ at the diagonal. While a twice continuously differentiable covariance function will satisfy (3) with $\kappa=2$ , small values $0<\kappa<2$ indicate a process with non-smooth sample paths.

Assumption 2.1 covers several important classes of stochastic processes. Recall, for instance, that the class of self-similar (not necessarily centered) processes $X_{i}=\{X_{i}(t):t\geq 0\}$ has the property that $X_{i}(c_{1}t)=c_{1}^{H}X_{i}(t)$ for any constant $c_{1}>0$ and some exponent $H>0$ . It is then well known that the covariance function of any such process $X_{i}$ with stationary increments and $0<\mathbb{E}(X_{i}(1)^{2})<\infty$ satisfies

[TABLE]

for some constant $c_{2}>0$ ; see Theorem 1.2 in Embrechts and Maejima (2000). If $0<H<1$ such a process respects Assumption 2.1 with $\kappa=2H$ and $c(t)=c_{2}$ . A prominent example of a self-similar process is the fractional Brownian motion.

Another class of processes is given when $X_{i}=\{X_{i}(t):t\geq 0\}$ is a second order process with stationary and independent increments. In this case it is easy to show that $\sigma(s,t)=\allowbreak\omega(s,t,|s-t|)=\allowbreak(s+t-|s-t|)\,c_{3}$ for some constant $c_{3}>0$ . The Assumption 2.1 then holds with $\kappa=1$ and $c(t)=c_{3}$ . The latter conditions on $X_{i}$ are, for instance, satisfied by second order Lévy processes which include important processes such as Poisson processes, compound Poisson processes, or jump-diffusion processes.

A third important class of processes satisfying Assumption 2.1 are those with a Matérn covariance function. For this class of processes the covariance function depends only on the distance between $s$ and $t$ through

[TABLE]

where $K_{\nu}$ is the modified Bessel function of the second kind, and $\rho$ , $\nu$ and $\alpha$ are non-negative parameters of the covariance. It is known that this covariance function is $2m$ times differentiable if and only if $\nu>m$ (cf. Stein, 1999, Ch. 2.7, p. 32). Assumption 2.1 is then satisfied for $\nu<1$ . For the special case where $\nu=0.5$ one may derive the long term covariance function of an Ornstein-Uhlenbeck process which is given as $\sigma(s,t)=\allowbreak\omega(s,t,|s-t|)=\allowbreak 0.5\,\exp(-\theta|s-t|)\sigma_{OU}^{2}/\theta$ , for some parameter $\theta>0$ and $\sigma_{OU}>0$ . Assumption 2.1 is then covered with $\kappa=1$ and $c(t)=0.5\,\sigma_{OU}^{2}$ .

The remarkable result that identification and estimation of the points of impact $\tau_{1},\dots,\allowbreak\tau_{S}$ requires neither knowledge about the possibly nonparametric function $g$ nor an estimate of $g$ is based on the following theorem.

Theorem 2.1.

Let $X_{i}$ be a Gaussian process. For any function $g(x_{1},\dots,x_{S})$ such that for all $r=1,\dots,S$ the partial derivative $\partial g(x_{1},\dots,x_{S})/\partial x_{r}$ is continuous almost everywhere and $0<|\operatorname{\mathbb{E}}(\frac{\partial}{\partial x_{r}}g(X_{i}(\tau_{1}),\dots,X_{i}(\tau_{S})))|<\infty$ , we define $\vartheta_{r}=\operatorname{\mathbb{E}}\big{(}\frac{\partial}{\partial x_{r}}g(X_{i}(\tau_{1}),\dots,X_{i}(\tau_{S}))\big{)}$ . Then the equation $\operatorname{\mathbb{E}}\big{(}X_{i}(s)Y_{i}\big{)}=\sum_{r=1}^{S}\vartheta_{r}\sigma(s,\tau_{r})$ holds for all $s\in[a,b]$ .

Theorem 2.1 allows to decompose the cross-covariance $\operatorname{\mathbb{E}}(X_{i}(s)Y_{i})$ into coefficients $\vartheta_{r}$ , which depend on the unknown function $g$ , and the covariance function $\sigma$ , which only depends on $X_{i}$ . Our estimation strategy for the points of impact $\tau_{r}$ works for unknown $\vartheta_{r}$ with $0<|\vartheta_{r}|<\infty$ . The latter imposes only mild regularity assumptions on $g$ and is fulfilled, for instance, by any nonparametric single-index model, $g(X_{i}(\tau_{1}),\dots,X_{i}(\tau_{S}))\equiv g(\eta_{i})$ with $\eta_{i}=\alpha+\sum_{r=1}^{S}\beta_{r}X_{i}(\tau_{r})$ , where $0<|\operatorname{\mathbb{E}}(g^{\prime}(\eta_{i}))|<\infty$ . Of course, the class of possible functions $g$ defined by Theorem 2.1 also contains much more complex cases than single-index models.

The intention of our estimator for the points of impact $\tau_{r}$ is to exploit the covariance structure of processes described by Assumption 2.1. Covariance functions $\sigma(s,t)$ satisfying Assumption 2.1 are obviously not two times differentiable at the diagonal $s=t$ , but are two times differentiable for $s\neq t$ . Using Theorem 2.1 in conjunction with Assumption 2.1 allows us to uniquely identify the locations of the points of impact from the cross-covariance $\operatorname{\mathbb{E}}(X_{i}(s)Y_{i})$ . Let us make this more precise by defining

[TABLE]

Since $\sigma(s,t)$ is not two times differentiable at $s=t$ , the cross-covariance $f_{XY}(s)$ will not be two times differentiable at $s=\tau_{r}$ , for all $r=1,\dots,S$ , resulting in kink-like features at $\tau_{r}$ as depicted in the upper plot of Figure 2.

A natural strategy for estimating $\tau_{r}$ is to detect these kinks by considering the following modified central difference approximation of the second derivative of $f$ at a point $s\in[a-\delta,b-\delta]$ for some $\delta>0$ :

[TABLE]

By defining the auxiliary process

[TABLE]

we have the following equivalent moment expression for (4):

[TABLE]

At $s=\tau_{r}$ , expression (5) will decline more slowly to zero as $\delta\to 0$ than for $s\neq\tau_{r}$ , $r=1,\dots,S$ . For suitable values of $\delta$ , the points of impact $\tau_{r}$ can then be estimated using the local extrema of the empirical counterpart of $|\mathbb{E}(Z_{\delta,i}(s)Y_{i})|$ (see middle panel of Figure 2).

More precisely, Theorem 2.1 together with Proposition C.1 and Lemma C.4 in Appendix C of the supplementary paper Poß et al. (2020) imply that as $\delta\to 0$

[TABLE]

where $0<\kappa<2$ and $c(\cdot)>0$ are as defined in Assumption 2.1.

Of course, $\mathbb{E}(Z_{\delta,i}(s)Y_{i})$ is not known and we have to rely on $n^{-1}\sum_{i=1}^{n}Z_{\delta,i}(s)Y_{i}$ as its estimate. Under our setting we will have that the variance $\operatorname{\mathbb{V}}(Z_{\delta,i}(s)Y_{i})=O(\delta^{\kappa})$ which implies

[TABLE]

Consequently, the identification of points of impact requires a sensible choice of $\delta$ . For too small $\delta$ -values (e.g., $\delta^{\kappa}\sim n^{-1}$ ) the estimation noise will overlay the signal; this situation is depicted in the bottom plot of Figure 2. For too large $\delta$ -values, however, it will not be possible to distinguish between neighboring points of impact.

Remark: Even if the covariance function $\sigma(s,t)$ does not satisfy Assumption 2.1, the points of impact $\tau_{r}$ may still be estimated using the local extrema of $\operatorname{\mathbb{E}}(Z_{\delta,i}(s)Y_{i})$ . Suppose, for instance, there exists a $m\geq 2$ times differentiable function $\widetilde{\sigma}:\mathbb{R}\to\mathbb{R}$ such that $\sigma(s,t)=\widetilde{\sigma}(|s-t|)$ , where $\widetilde{\sigma}(|s-t|)$ decays fast enough, as $|s-t|$ increases, such that $X_{i}(s)$ is essentially uncorrelated with $X_{i}(\tau_{r})$ for $|\tau_{r}-s|\gg 0$ . If $|\widetilde{\sigma}^{\prime\prime}(0)|>|\widetilde{\sigma}^{\prime\prime}(|s-t|)|$ , for $s\neq t$ , and $\min_{r\neq k}|\tau_{r}-\tau_{k}|$ is large enough, then all points of impact might be identified as local extrema of $\operatorname{\mathbb{E}}(Z_{\delta,i}(s)Y_{i})$ .

2.2 Estimation algorithm

In the following we consider the case where each $X_{i}$ has been observed over $p$ equidistant points $t_{j}=a+(j-1)(b-a)/(p-1)$ , $j=1,\dots,p$ , where $p$ may be much larger than $n$ . Estimators for the points of impact $\tau_{r}$ are determined by sufficiently large local maxima of $\left|n^{-1}\sum_{i=1}^{n}Z_{\delta,i}(t_{j})Y_{i}\right|$ .

Algorithm 2.1.

(Estimating points of impact)**

1.

Calculate* $\widehat{f}_{XY}(t_{j}):=\frac{1}{n}\sum_{i=1}^{n}X_{i}(t_{j})Y_{i},\quad\text{for each}\quad j=1,\dots,p$ * 2. 2.

Choose* $\delta>0$ such that there exists some $k_{\delta}\in\mathbb{N}$ with $1\leq k_{\delta}<(p-1)/2$ and $\delta=k_{\delta}(b-a)/(p-1)$ .* 3. 3.

Calculate* $\widehat{f}_{ZY}(t_{j}):=\widehat{f}_{XY}(t_{j})-\frac{1}{2}(\widehat{f}_{XY}(t_{j}-\delta)+\widehat{f}_{XY}(t_{j}+\delta))$ , for all $j\in{\cal J}_{\delta}$ , where ${\cal J}_{\delta}:=\{k_{\delta}+1,\ldots,p-k_{\delta}\}$ * 4. 4.

Repeat:**

Initiate* the repetition by setting $l=1$ .*
Estimate* the * $l$ *th point of impact candidate as $\widehat{\tau}_{l}=\underset{t_{j}:\,j\in{\cal J}_{\delta}}{\arg\max}|\widehat{f}_{ZY}(t_{j})|.$

Update* ${\cal J}_{\delta}$ by eliminating all points in ${\cal J}_{\delta}$ in an interval of size $\sqrt{\delta}$ around $\widehat{\tau}_{l}$ .*

Update* Set $l=l+1$ .*

End* repetition if ${\cal J}_{\delta}=\emptyset$ .*

The procedure will result in estimates $\widehat{\tau}_{1},\widehat{\tau}_{2},\dots,\widehat{\tau}_{M_{\delta}}$ , where $M_{\delta}<\infty$ denotes the maximum number of possible repetitions. The estimator of $S$ is

[TABLE]

An asymptotically valid choice of the threshold $\lambda$ is presented in Theorem 2.2 and a practical implementation of $\lambda$ is discussed below of Theorem 2.2.

Remark: * This estimation algorithm is made for the case of densely observed functional data. In practice this means functional data that are sampled at a high frequency such as in our real data example (Section 5). Unfortunately, we do not see a simple way to generalize our method to the case of irregularly or sparsely sampled functional data. Such a generalization would require a very different approach based on nonparametric smoothing procedures.*

2.3 Asymptotic results

In this section, we consider asymptotics as $n\rightarrow\infty$ with $p\equiv p_{n}\geq Ln^{1/\kappa}$ for some constant $(b-a)/2<L<\infty$ . We introduce the following assumption:

Assumption 2.2.

a)

$X_{1},\dots,X_{n}$ * are i.i.d. random functions distributed according to $X$ . The process $X$ is Gaussian with covariance function $\sigma(s,t)$ .*

b)

*There exists a $0<\sigma_{|y|}<\infty$ such that for each $m=1,2,\dots$ we have *

$\operatorname{\mathbb{E}}(|Y_{i}|^{2m})\leq 2^{m-1}m!\sigma_{|y|}^{2m}$ .

The moment condition in b) is obviously fulfilled for bounded $Y_{i}$ . For instance, in the functional logistic regression we have that $\operatorname{\mathbb{E}}(|Y_{i}|^{m})\leq 1$ for all $m=1,2,\dots$ . Condition b) also holds for any centered sub-Gaussian $Y_{i}$ , where a centering of $Y_{i}$ can always be achieved by substituting $g(X_{i}(\tau_{1}),\dots,X_{i}(\tau_{S}))+\operatorname{\mathbb{E}}(g(X_{i}(\tau_{1}),\dots,X_{i}(\tau_{S})))$ for $g(X_{i}(\tau_{1}),\dots,X_{i}(\tau_{S}))$ in Model (2). If $X_{i}$ satisfies condition a), then condition b) in particular holds if the errors $\varepsilon_{i}$ are sub-Gaussian and $g$ is differentiable with bounded partial derivatives.

The following result shows consistency of our estimators for the points of impact $\widehat{\tau}_{r}$ and the estimator $\widehat{S}$ :

Theorem 2.2.

Under Assumptions 2.1, 2.2, and the assumptions of Theorem 2.1, let $\delta\equiv\delta_{n}\rightarrow 0$ as $n\rightarrow\infty$ such that $n\delta^{\kappa}/|\log\delta|\rightarrow\infty$ and $\delta^{\kappa}/n^{-\kappa+1}\rightarrow 0$ . We then obtain that

[TABLE]

Moreover, there exists a constant $0<D<\infty$ such that when Algorithm 2.1 is applied with threshold

[TABLE]

Note that the rates of convergence in (6) are super-consistent, since $0<\kappa<2$ . For instance, for Ornstein-Uhlenbeck processs or Brownian motions we have $\kappa=1$ , such that $\max_{r=1,\ldots,\widehat{S}}\min_{s=1,\ldots,S}|\widehat{\tau}_{r}-\tau_{s}|\ =\ O_{P}(n^{-1})$ .

In principle, arbitrarily fast rates of convergence can be achieved for $\kappa$ -values close to zero, because small $\kappa$ -values correspond to rough processes, $X_{i}$ . Roughness means that the process has strong local variations also within small intervals $[\tau_{r}-\epsilon,\tau_{r}+\epsilon]$ , $\epsilon>0$ , which facilitates differentiating a point of impact $\tau_{r}$ , $r=1\dots,S$ , from the neighboring points $t\in[\tau_{r}-\epsilon,\tau_{r}+\epsilon]$ . By contrast, for smooth processes (large $\kappa$ -values) all values of $X_{i}(t)$ with $t\in[\tau_{r}-\epsilon,\tau_{r}+\epsilon]$ will be almost identical which makes it hard to identify the correct point of impact $\tau_{r}$ .

A practical and asymptotically valid threshold specification which performed well in our simulation studies is given by $\lambda=A((\operatorname{\mathbb{E}}(Y_{i}^{4}))^{1/2}\log\big{(}(b-a)/\delta\big{)}/n)^{1/2}$ , where $\operatorname{\mathbb{E}}(Y_{i}^{4})$ is estimated by $\widehat{\operatorname{\mathbb{E}}}(Y_{i}^{4})=n^{-1}\sum_{i=1}^{n}Y_{i}^{4}$ and $A=\sqrt{2\sqrt{3}}$ . This value is motivated by an argument using the central limit theorem in the derivations of the threshold for Theorem 2.2. See the remark after the proof of Lemma C.3 in Appendix C for additional information.

The super-consistency result in Theorem 2.2 is very general and does not require knowledge of $g$ or a pre-estimate of $g$ ; only a set of mild and verifiably assumptions on $g$ is postulated. Therefore, we expect that the theorem will be found useful for deriving inferential results for a broad variety of different models $g$ . In the following we demonstrate the usefulness of Theorem 2.2 for deriving inferential results for nonparametric models and parametric generalized linear models. Note that the related Corollary 1 in Ferraty et al. (2010) requires the simultaneous estimation of the nonparametric model function $g$ and the points of impact. This approach results in substantially slower nonparametric convergence rates and limits the applicability of their result considerably.

2.4 Generalizations

The above theoretical assumptions provide a tractable setup that will be used also in the remaining parts of the paper. In this subsection, however, we show that the Gaussian assumption of Theorem 2.1 and Theorem 2.2 can be relaxed and that our approach for identifying and estimating the points of impact may also work for a large class of non-Gaussian processes (Section 2.4.1). Moreover, we outline how our estimation procedure can be adapted to a more general version of the covariance Assumption 2.1 (Section 2.4.2).

2.4.1 Non-Gaussian processes

To generalize Theorem 2.1 one can build upon the framework of elliptical processes which includes the case of non-Gaussian, heavy-tailed distributions. That is, one can consider processes $X_{i}$ that depend on some latent random variable $V_{i}$ such that the conditional distribution of $X_{i}$ given $V_{i}=v$ is Gaussian. However, the (unconditional) distribution of $X_{i}$ then additionally depends on the distribution of $V_{i}$ and may be far from Gaussian.

Our conditions A and B in Appendix B.2 define a general framework for such non-Gaussian processes $X_{i}$ and Proposition B.1 in Appendix B.2 generalizes Theorem 2.1 for this general framework. Here in this subsection, however, we focus on the arguably most important special case of our general framework – namely, the case of elliptically distributed processes. Elliptical distributions include the special case of a Gaussian distribution as considered in Theorem 2.1, but also many important non-Gaussian distributions such as the t-distribution, the Laplace distribution, and the logistic distribution (see, for instance, Boente et al., 2014).

Let $X_{i}$ be a (centered) elliptical process, that is, let $X_{i}(t)\overset{d}{=}V_{i}X_{i}^{*}(t)$ , $t\in[a,b]$ , where $V_{i}>0$ is a strictly positive real-valued random variable, $X_{i}^{*}$ is a zero mean Gaussian process with covariance function $\sigma^{*}(s,t)$ , and where $V_{i}$ and $X_{i}^{*}$ are independent of each other. Moreover, let the error term $\varepsilon_{i}$ in (2) be independent of $V_{i}$ and $X_{i}$ and let $\operatorname{\mathbb{V}}(V_{i})<\infty$ . Then the elliptically distributed random function $X_{i}$ fulfills our conditions A and B in Appendix B.2 and it follows by Proposition B.1 in Appendix B.2 that

[TABLE]

where $\vartheta_{r}(V_{i})=\operatorname{\mathbb{E}}\big{(}\frac{\partial}{\partial x_{r}}g(X_{i}(\tau_{1}),\dots,X_{i}(\tau_{S}))\big{|}V_{i}\big{)}$ and $\sigma(s,t)=\operatorname{\mathbb{V}}(V_{i})\sigma^{*}(s,t)$ is the covariance function of the elliptically distributed process $X_{i}$ . As in the case of Theorem 2.1, the above result allows us to decompose the cross-covariance $\operatorname{\mathbb{E}}(X_{i}(s)Y_{i})$ into a scaling coefficient $\operatorname{\mathbb{E}}\left(V_{i}^{2}\vartheta_{r}(V_{i})\right)/\operatorname{\mathbb{V}}(V_{i})$ which depends on the unknown function $g$ (via $\vartheta_{r}$ ) and the covariance function $\sigma(s,\tau_{r})$ which only depends on $X_{i}$ . This result holds for elliptically distributed $X_{i}$ and requires only mild regularity assumptions on $g$ which are essentially equivalent to those imposed by Theorem 2.1; see conditions A and B in Appendix B.2.

As in the preceding section, the identification of the points of impact relies only on the structural covariance Assumption 2.1 which holds for rough – Gaussian or non-Gaussian – processes $X_{i}$ . Since $\sigma(s,t)=\operatorname{\mathbb{V}}(V_{i})\sigma^{*}(s,t)$ , the requirements of Assumption 2.1 may directly be applied to the covariance function $\sigma^{*}(s,t)$ of the Gaussian process component $X_{i}^{*}$ of the elliptical process $X_{i}$ . If $\sigma^{*}$ satisfies Assumption 2.1 for some $\omega^{*}:\Omega\to\mathbb{R}$ , then Proposition B.1 in Appendix B.2 leads to

[TABLE]

as $\delta\to 0$ with $C(\tau_{r})=c^{*}(\tau_{r})\operatorname{\mathbb{E}}\left(V_{i}^{2}\vartheta_{r}(V_{i})\right)$ , where $c^{*}(\tau_{r})=-\frac{\partial}{\partial z}\omega^{*}(\tau_{r},\tau_{r},z)|_{z=0}$ , $r=1,\dots,S$ .

Theorem 2.2 can also be generalized to the case that $X_{i}$ is elliptically distributed. Note that then $Z_{\delta,i}(s)Y_{i}\overset{d}{=}Z^{\star}_{\delta,i}(s)Y^{\star}_{i}$ , where $Z^{\star}_{\delta,i}(s)=X^{\star}_{i}(s)-\frac{1}{2}(X^{\star}_{i}(s-\delta),+X^{\star}_{i}(s+\delta))$ , for $s\in[a+\delta,b-\delta]$ , and $Y_{i}^{\star}=V_{i}Y_{i}$ . Therefore, estimating points of impact from data $(X_{i},Y_{i})$ is equivalent to estimating points of impact from data $(X_{i}^{\star},Y_{i}^{\star})$ . Thus, Theorem 2.2 remains valid if all conditions on $X_{i}$ and $Y_{i}$ in Theorem 2.2 now apply to $X_{i}^{\star}$ and $Y_{i}^{\star}$ .

Our more general framework of conditions A and B in Appendix B.2 includes even more complex cases than the above discussed elliptical processes. For instance, one may consider processes $X_{i}\overset{d}{=}V_{i1}(t)X_{i}^{*}(t)+V_{i2}(t)$ , where $(V_{i1},V_{i2})$ is jointly independent of $X_{i}^{*}$ and where $V_{i1}$ and $V_{i2}$ are almost surely twice continuously differentiable functions on $[a,b]$ (see Appendix B.2 in the supplementary paper Poß et al. (2020) for more details).

2.4.2 Generalizing covariance Assumption 2.1

Assumption 2.1 holds for non-smooth/rough processes $X_{i}$ with covariance function $\sigma(s,t)=\omega(s,t,|s-t|^{\kappa})$ , where the requirement $0<\kappa<2$ excludes all smooth, twice continuously differentiable processes, $X_{i}$ , with $\kappa\geq 2$ .

However, the degree of roughness of the processes, $X_{i}$ , is actually not a necessary requirement for identifying and estimating points of impact. The crucial property is that the covariance function $\sigma(s,t)$ of $X_{i}$ is less smooth at the diagonal than for $|t-s|>0$ . For instance, let $\sigma(s,t)$ be $d=4$ times continuously differentiable at all off-diagonal points, $s\neq t$ , but not $d=4$ times differentiable at the diagonal points, $s=t$ . This scenario corresponds to a generalization of Assumption 2.1 with $0<\kappa<d=4$ which now excludes only all four times continuously differentiable processes, $X_{i}$ , with $\kappa\geq d=4$ . In this case, one may look at the modified $4$ th central difference approximation of the $4$ th derivative of $\mathbb{E}(X_{i}(s)Y_{i})$ and replace $Z_{\delta,i}(s)$ by

[TABLE]

Theoretical results may then be derived under a generalized version of Assumption 2.1 demanding that there exists a $d=4$ times differentiable function $\omega$ such that (3) holds for any $\kappa<d=4$ .

Equivalent generalizations can, for instance, be made for any $d\in\{2,4,6,8,\dots\}$ , which would involve then a modified $d$ th order central difference processes $\tilde{Z}^{(d)}_{\delta,i}(s)$ . This way, Assumption 2.1 can be generalized to the requirement $0<\kappa<d$ which also then includes smooth processes, $X_{i}$ . Deriving the estimation theory under this setup would then lead to even more accurate points of impact estimators with an even faster super-consistent convergence rate. However, taking higher order differences in practice usually involves numerical instabilities.

3 Subsequent estimation of $g$

Given estimates of the points of impact $\tau_{1},\dots,\tau_{S}$ and their number $S$ , one is typically interested in the subsequent estimation and inference regarding the remaining model components. The following section considers the case of a nonparametric model $g$ . Section 3.2 considers the case of a generalized linear model, which is of particular practical relevance.

In the following we assume the existence of some consistent estimation procedure for the points of impact satisfying $\max_{r=1,\ldots,\widehat{S}}|\widehat{\tau}_{r}-\tau_{r}|\ =\ O_{P}(n^{-1/\kappa})$ and $P(\widehat{S}=S)\to 1$ , where we use matched labels in the sense that $\tau_{r}=\arg\min_{s=1,\dots,S}|\widehat{\tau}_{r}-\tau_{s}|$ . These requirements are fulfilled by our estimation procedure described in Section 2.2, but may also be fulfilled for alternative procedures.

3.1 Nonparametric estimation

Estimating the nonparametric function $g$ in (2) is a non-standard estimation problem, since the unknown points of impact $\tau_{r}$ of the predictor variables $X_{i}(\tau_{r})$ must be replaced by their estimates $\widehat{\tau}_{r}$ . That is, for given estimates $\widehat{\tau}_{1},\dots,\widehat{\tau}_{S}$ we may estimate the unknown regression function $g$ by the following Nadaraya-Watson type estimator

[TABLE]

where $K$ denotes a standard nonnegative symmetric bounded second-order kernel function with $\smallint K(u)du=1$ , and where $h_{1},\dots,h_{S}$ denote the bandwidth parameters.

For the following result we make use of our super-consistency result in Theorem 2.2. Note, however, that the rates of consistency for the point of impact estimators $\widehat{\tau}_{r}$ of Theorem 2.2 cannot be used directly to quantity the errors $|X_{i}(\widehat{\tau}_{r})-X_{i}(\tau_{r})|$ , $r=1,\dots,S$ , since under Assumption 2.1 we cannot make use of Taylor-expansions of $X_{i}$ . Therefore, the following result is non-standard because of the additional error component $\widehat{g}_{\widehat{\tau}}(x_{1},\dots,x_{S})-\widehat{g}_{\tau}(x_{1},\dots,x_{S})=O_{p}\big{(}\sum_{r=1}^{S}1/(n^{\min\{1,1/\kappa\}}(h_{1}\cdots h_{S})h_{r}^{2})\big{)}$ contained in (9), where $\widehat{g}_{\tau}$ is defined as in (8), but using the true predictor variables $X_{i}(\tau_{1}),\dots,X_{i}(\tau_{S})$ .

Theorem 3.1.

Let $\widehat{S}=S$ , $\max_{r=1,\dots,S}|\widehat{\tau}_{r}-\tau_{r}|=O_{p}(n^{-1/\kappa})$ , and let Assumptions 2.1 and 2.2 and the assumptions of Theorem 2.2 hold. Moreover, let the kernel function $K:\mathbb{R}^{S}\to\mathbb{R}$ be a second-order kernel (i.e., a density function that is symmetric around zero) with continuous second-order partial derivatives and let the regression function $g$ have continuous second-order partial derivatives. We then have for any points $x_{1},\dots,x_{S}$ in the interior of the support of $X$ that

[TABLE]

for $n\to\infty$ , and $h_{1},\dots,h_{S}\to 0$ with $n^{\min\{1,1/\kappa\}}(h_{1}\cdots h_{S})h_{r}^{2}\to\infty$ , for each $r=1,\dots,S$ .

If each bandwidth has the same order of magnitude and $0<\kappa\leq 1$ , the well-known optimal bandwidth choice $h_{r}\sim n^{-1/(S+4)}$ , $r=1,\dots,S$ , can be used to simplify Theorem 3.1 as following.

Corollary 3.1.

Under the assumptions of Theorem 3.1, let $0<\kappa\leq 1$ and $h_{r}\sim n^{-1/(S+4)}$ for all $r=1,\dots,S$ . Then

[TABLE]

That is, under the conditions of Corollary 3.1, we have the same optimal rates of convergence as in the case where the points of impact were known.

3.2 Parametric estimation

In this section it is assumed that the relationship between $Y_{i}$ and the functional predictor $X_{i}$ can be modeled using the framework of generalized linear models with known parametric function $g$ ,

[TABLE]

in which the i.i.d. error term $\varepsilon_{i}$ respects $\mathbb{E}(\varepsilon_{i}|X_{i}(t))=0$ for all $t\in[a,b]$ and where $\operatorname{\mathbb{V}}(\varepsilon_{i}|X_{i}(t),t\in[a,b])=\sigma^{2}(g(\eta_{i}))<\infty$ with strictly positive variance function $\sigma^{2}(\cdot)$ defined over the range of $g$ . For simplicity the function $g$ is assumed to be a known, strictly monotone and smooth function with bounded first and second order derivatives and hence invertible (see, for instance, Müller and Stadtmüller, 2005, for similar assumptions). The constant $\alpha$ allows us to consider centered random functions $X_{i}$ with $\operatorname{\mathbb{E}}(X_{i}(t))=0$ for all $t\in[a,b]$ . Note that we do not assume that the conditional distribution of $Y_{i}$ belongs to the exponential family of distributions. Denoting the linear predictor

[TABLE]

allows us to write $\operatorname{\mathbb{E}}(Y_{i}|X_{i})=g(\eta_{i})$ as well as $\operatorname{\mathbb{V}}(Y_{i}|X_{i})=\sigma^{2}(g(\eta_{i}))<\infty$ . Hence, this setup of model (10) belongs to the broad class of quasi-likelihood models which can be seen as a generalization of a generalized linear model framework (cf. McCullagh and Nelder, 1989, Ch. 9).

Identifiability of the model parameters in (10) is not obvious due to the functional predictor $X_{i}(\cdot)$ , which, in principle, allows for infinitely many alternative model candidates. The following Theorem 3.2 shows that any possible kind of model-misspecification in $\alpha$ , $\beta_{r}$ , $\tau_{r}$ , $r=1,\dots,S$ , or $S$ , will lead to a different model in the mean squared error sense implying the identifiability of model (10).

Theorem 3.2.

Let $g(\cdot)$ be invertible and assume that $X_{i}$ satisfies Assumptions 2.1 and 2.2. Then for all $S^{*}\geq S$ , all $\alpha^{*},\beta_{1}^{*},\ldots,\beta_{S^{*}}^{*}\in\mathbb{R}$ , and all $\tau_{1},\dots,\tau_{S^{*}}\in(a,b)$ with $\tau_{k}\notin\{\tau_{1},\dots,\tau_{S}\}$ , $k=S+1,\dots,S^{*}$ , we obtain

[TABLE]

whenever $|\alpha-\alpha^{*}|>0$ , or $\sup_{r=1,\ldots,S}|\beta_{r}-\beta^{*}_{r}|>0$ , or $\sup_{r=S+1,\ldots,S^{*}}|\beta^{*}_{r}|>0$ .

Note that the proof of Theorem 3.2 does only require the existence of second moments and, therefore, may be generalized also to the case of non-Gaussian processes $X_{i}$ .

Estimation of $\operatorname{\boldsymbol{\beta}}_{0}=(\alpha,\beta_{1},\dots,\beta_{S})^{T}$ is performed by quasi-maximum likelihood. Define $\operatorname{\mathbf{X}}_{i}(\widehat{\operatorname{\boldsymbol{\tau}}})=(1,X_{i}(\widehat{\tau}_{1}),\dots,X_{i}(\widehat{\tau}_{S}))^{T}$ and denote the $j$ th, $1\leq j\leq S+1$ , element of the latter vector as $\widehat{X}_{ij}$ . For $\operatorname{\boldsymbol{\beta}}\in\mathbf{R}^{S+1}$ let $\widehat{\eta}_{i}(\operatorname{\boldsymbol{\beta}})=\operatorname{\mathbf{X}}_{i}(\widehat{\operatorname{\boldsymbol{\tau}}})^{T}\operatorname{\boldsymbol{\beta}}$ , $\widehat{\operatorname{\boldsymbol{\mu}}}_{n}(\operatorname{\boldsymbol{\beta}})=(g(\widehat{\eta}_{1}(\operatorname{\boldsymbol{\beta}})),\dots,g(\widehat{\eta}_{n}(\operatorname{\boldsymbol{\beta}})))^{T}$ , $\widehat{\operatorname{\mathbf{D}}}_{n}(\operatorname{\boldsymbol{\beta}})$ be the $n\times(S+1)$ matrix with entries $g^{\prime}(\widehat{\eta}_{i}(\operatorname{\boldsymbol{\beta}}))\widehat{X}_{ij}$ , and let $\widehat{\operatorname{\mathbf{V}}}_{n}(\operatorname{\boldsymbol{\beta}})$ be a $n\times n$ diagonal matrix with elements $\sigma^{2}(g(\widehat{\eta}_{i}(\operatorname{\boldsymbol{\beta}})))$ . Furthermore, denote the corresponding objects evaluated at the true points of impact $\tau_{r}$ by $\operatorname{\mathbf{X}}_{i}(\operatorname{\boldsymbol{\tau}})$ , $X_{ij}$ , $\eta_{i}(\operatorname{\boldsymbol{\beta}})$ , $\operatorname{\boldsymbol{\mu}}_{n}(\operatorname{\boldsymbol{\beta}})$ , $\operatorname{\mathbf{D}}_{n}(\operatorname{\boldsymbol{\beta}})$ , and $\operatorname{\mathbf{V}}_{n}(\operatorname{\boldsymbol{\beta}})$ ; this notational convention applies also to the below defined objects.

Our estimator $\widehat{\operatorname{\boldsymbol{\beta}}}$ for $\operatorname{\boldsymbol{\beta}}_{0}=(\alpha,\beta_{1},\dots,\beta_{S})^{T}$ is defined as the solution of the $S+1$ score equations $\widehat{\operatorname{\mathbf{U}}}_{n}(\widehat{\operatorname{\boldsymbol{\beta}}})=0$ , where

[TABLE]

Note that these are non-classic score equations evaluated at the estimates $\widehat{\tau}_{r}$ instead of $\tau_{r}$ .

In the following, it will be convenient to define

[TABLE]

By definition it holds that $\operatorname{\mathbb{E}}(n^{-1}\operatorname{\mathbf{F}}_{n}(\operatorname{\boldsymbol{\beta}}))=[\operatorname{\mathbb{E}}(g^{\prime}(\eta_{i}(\operatorname{\boldsymbol{\beta}}))^{2}/\sigma^{2}(g(\eta_{i}(\operatorname{\boldsymbol{\beta}})))\,X_{ik}X_{il})]_{k,l}$ with $k,l=1,\dots,S+1$ . Let $\eta(\operatorname{\boldsymbol{\beta}})$ and $X_{j}$ be generic copies of $\eta_{i}(\operatorname{\boldsymbol{\beta}})$ and of the $j$ th component of $\operatorname{\mathbf{X}}_{i}(\operatorname{\boldsymbol{\tau}})$ , respectively. This allows us to write $\operatorname{\mathbb{E}}(n^{-1}\operatorname{\mathbf{F}}_{n}(\operatorname{\boldsymbol{\beta}}))=\operatorname{\mathbb{E}}(\operatorname{\mathbf{F}}(\operatorname{\boldsymbol{\beta}}))$ with $\operatorname{\mathbb{E}}(\operatorname{\mathbf{F}}(\operatorname{\boldsymbol{\beta}}))=[\operatorname{\mathbb{E}}(g^{\prime}(\eta(\operatorname{\boldsymbol{\beta}}))^{2}/\sigma^{2}(g(\eta(\operatorname{\boldsymbol{\beta}})))\,X_{k}X_{l})]_{k,l}$ , where we point out that $\operatorname{\mathbb{E}}(\operatorname{\mathbf{F}}(\operatorname{\boldsymbol{\beta}}))$ is for all $\operatorname{\boldsymbol{\beta}}\in\mathbf{R}^{S+1}$ a symmetric and strictly positive definite matrix with inverse $\operatorname{\mathbb{E}}(\operatorname{\mathbf{F}}(\operatorname{\boldsymbol{\beta}}))^{-1}$ . Indeed, suppose $\operatorname{\mathbb{E}}(\operatorname{\mathbf{F}}(\operatorname{\boldsymbol{\beta}}))$ were not strictly positive definite, we would then derive the contradiction $\operatorname{\mathbb{E}}((\sum_{j=1}^{S+1}a_{j}X_{j}g^{\prime}(\eta(\operatorname{\boldsymbol{\beta}}))/\sigma(g(\eta(\operatorname{\boldsymbol{\beta}}))))^{2})=0$ for nonzero constants $a_{1},\dots,a_{S+1}$ . A similar argument can be used to show that $\operatorname{\mathbb{E}}(\widehat{\operatorname{\mathbf{F}}}(\operatorname{\boldsymbol{\beta}}))$ is strictly positive definite, where $\operatorname{\mathbb{E}}(\widehat{\operatorname{\mathbf{F}}}(\operatorname{\boldsymbol{\beta}}))=[\operatorname{\mathbb{E}}(g^{\prime}(\widehat{\eta}(\operatorname{\boldsymbol{\beta}}))^{2}/\sigma^{2}(g(\widehat{\eta}(\operatorname{\boldsymbol{\beta}})))\,\widehat{X}_{k}\widehat{X}_{l})]_{k,l}$ .

The following additional set of assumptions are used to derive more precise theoretical statements:

Assumption 3.1.

a)

There exists a constant $0<M_{\varepsilon}<\infty$ , such that $\operatorname{\mathbb{E}}(\varepsilon_{i}^{p}|X_{i}(t))\leq M_{\varepsilon}$ , for all $t\in[a,b]$ and for some even $p$ with $p\geq\max\{2/\kappa+\epsilon,4\}$ and some $\epsilon>0$ .

b)

The function $g$ is monotone, invertible with two bounded derivatives $|g^{\prime}(\cdot)|\leq c_{g}$ , $|g^{\prime\prime}(\cdot)|\leq c_{g}$ , for some constant $0\leq c_{g}<\infty$ .

c)

$h(\cdot):=g^{\prime}(\cdot)/\sigma^{2}(g(\cdot))$ * is a bounded function with two bounded derivatives.*

Condition a) states that some higher moments of $\varepsilon_{i}$ exist. While the condition on $p\geq 4$ and $p$ being even simplifies the proofs, the condition $p>2/\kappa$ is a more crucial one and is used in the proof of Proposition D.2 in the supplementary Appendix D.2. Conditions a) to c) hold, for example, in the important case of a functional logistic regression with points of impact, where $g$ is the standard logistic function. Condition c) is satisfied, for instance, in the special case of generalized linear models with natural link functions. For the latter case, we have $\sigma^{2}(g(x))=g^{\prime}(x)$ such that $h(x)=1$ .

Theorem 3.3.

Let $\widehat{S}=S$ , $\max_{r=1,\dots,S}|\widehat{\tau}_{r}-\tau_{r}|=O_{P}(n^{-1/\kappa})$ and let $X_{i}$ be a Gaussian process satisfying Assumption 2.1. Under Assumption 3.1 we then obtain

[TABLE]

That is, our estimator based on $\widehat{\tau}_{r}$ enjoys the same asymptotic efficiency properties as if the true points of impact $\tau_{r}$ were known. In fact, it achieves the same asymptotic efficiency properties as under classic multivariate setups (cf. McCullagh, 1983). In practice one might replace $\mathbb{E}(\operatorname{\mathbf{F}}(\operatorname{\boldsymbol{\beta}}_{0}))$ with its consistent estimator $n^{-1}\widehat{\operatorname{\mathbf{F}}}_{n}(\widehat{\operatorname{\boldsymbol{\beta}}})$ in order to derive approximate results. This is a direct consequence of Equations (129) and (155) in the supplementary Appendix D.2.

3.2.1 Parametric estimation: Practical implementation

An implementation of our parametric estimation procedure comprises, first, the estimation of the points of impact $\tau_{r}$ and, second, the estimation of the parameters $\alpha$ and $\beta_{r}$ . Estimating the points of impact $\tau_{r}$ relies on the choice of $\delta$ and a choice of the threshold parameter $\lambda$ (see Section 2.2). Asymptotic specifications are given in Theorem 2.2; however, these determine the tuning parameters $\delta$ and $\lambda$ only up to constants and are generally of a limited use in practice. In the following we propose an alternative fully data-driven model selection approach.

For a given $\delta$ , our estimation procedure leads to a set of potential point of impact candidates $\{\widehat{\tau}_{1},\widehat{\tau}_{2},\dots,\widehat{\tau}_{M_{\delta}}\}$ (see Section 2.2). Selecting final point of impact estimates from this set of candidates corresponds to a classic variable selection problem. In the case where the distribution of $Y_{i}|X_{i}$ belongs to the exponential family (as in the logistic regression) one may perform a best subset selection optimizing a standard model selection criterion such as the Bayesian Information Criterion (BIC),

[TABLE]

Here, $\log\mathcal{L}_{\mathcal{X}}$ is the log-likelihood of the model with intercept and predictor variables $\mathcal{X}\subseteq\{X_{i}(\widehat{\tau}_{1}),X_{i}(\widehat{\tau}_{2}),\dots,X_{i}(\widehat{\tau}_{M_{\delta}})\}$ , where $K_{\mathcal{X}}=1+|\mathcal{X}|$ denotes the number of predictors. Minimizing $\operatorname{BIC}_{\mathcal{X}}(\delta)$ over $0<\delta<(b-a)/2$ leads to the final model choice.

In the more general case of quasi-likelihood models (cf. McCullagh and Nelder, 1989, Ch. 9) where only the first two moments $\operatorname{\mathbb{E}}(Y_{i}|X_{i})=g(\eta_{i})$ and $\operatorname{\mathbb{V}}(Y_{i}|X_{i})=\sigma^{2}(g(\eta_{i}))$ are known, one may replace the deviance $-2\log\mathcal{L}_{\mathcal{X}}$ by the expression for the quasi-deviance $-2Q_{\mathcal{X}}=-2\sum_{i=1}^{n}\int_{y_{i}}^{g(\widehat{\eta}_{\mathcal{X},i})}(y_{i}-t)/(\sigma^{2}(t))\,dt$ , where $\widehat{\eta}_{\mathcal{X},i}$ is the linear predictor with intercept and predictor variables $\mathcal{X}$ .

In order to calculate $\operatorname{BIC}_{\mathcal{X}}(\delta)$ , we need the estimates $\widehat{\operatorname{\boldsymbol{\beta}}}$ solving the estimation equations $\widehat{\operatorname{\mathbf{U}}}_{n}(\widehat{\operatorname{\boldsymbol{\beta}}})=0$ . In practice these equations are solved iteratively, for instance, by the usual Newton-Raphson method with Fisher-type scoring. That is, for an arbitrary initial value $\widehat{\operatorname{\boldsymbol{\beta}}}_{0}$ sufficiently close to $\widehat{\operatorname{\boldsymbol{\beta}}}$ one generates a sequence of estimates $\widehat{\operatorname{\boldsymbol{\beta}}}_{m}$ , with $m=1,2,\dots$ ,

[TABLE]

Iteration is executed until convergence and the final step of the procedure yields the estimate $\widehat{\operatorname{\boldsymbol{\beta}}}$ . Here, $\widehat{\operatorname{\mathbf{F}}}_{n}(\operatorname{\boldsymbol{\beta}})$ and $\widehat{\operatorname{\mathbf{U}}}_{n}(\operatorname{\boldsymbol{\beta}})$ replace $\operatorname{\mathbf{F}}_{n}(\operatorname{\boldsymbol{\beta}})$ and $\operatorname{\mathbf{U}}_{n}(\operatorname{\boldsymbol{\beta}})$ in the usual Fisher scoring algorithm, since the unknown $\tau_{r}$ , $1\leq r\leq S$ , are replaced by their estimates $\widehat{\tau}_{r}$ . The latter is justified asymptotically by our results in Corollary D.1 and Proposition D.3 in Appendix D.2.

4 Simulation

We investigate the finite sample performance of our estimators using Monte Carlo simulations. After simulating a trajectory $X_{i}$ over $p$ equidistant grid points $t_{j}$ , $j=1,\dots,p$ , on $[a,b]=[0,1]$ , linear predictors of the form $\eta_{i}=\alpha+\sum_{r=1}^{S}\beta_{r}X_{i}(\tau_{r})$ are constructed for some predetermined model parameters $\alpha$ , $\beta_{r}$ , $\tau_{r}$ , and $S$ , where a point of impact is implemented as the smallest observed grid point $t_{j}$ closest to $\tau_{r}$ . The response $Y_{i}$ is derived as a realization of a Bernoulli random variable with success probability $g(\eta_{i})=\exp(\eta_{i})/(1+\exp(\eta_{i}))$ , resulting in a logistic regression framework with points of impact. The simulation study is implemented in R (R Core Team, 2020), where we use the R-package glmulti (Calcagno, 2013) in order to implement the fully data-driven BIC-based best subset selection estimation procedure described in Section 3.2.1. The threshold estimator from Section 2.2 requires appropriate choices of $\delta=\delta_{n}$ and $\lambda=\lambda_{n}$ . Theorem 2.2 suggests that a suitable choice of $\delta$ is given by $\delta=c_{\delta}\,n^{-1/2}$ for some constant $c_{\delta}>0$ . Our simulation results are based on $c_{\delta}=1.5$ ; similar qualitative results were derived for a broader range of values $c_{\delta}$ . For the threshold $\lambda$ we use $\lambda=A((\widehat{\operatorname{\mathbb{E}}}(Y^{4}))^{1/2}\log((b-a)/\delta)/n)^{1/2}$ , where $A=\sqrt{2\sqrt{3}}$ and $\widehat{\operatorname{\mathbb{E}}}(Y^{4})=n^{-1}\sum_{i=1}^{n}Y_{i}^{4}$ , as motivated below of Theorem 2.2.

In what follows, we denote the BIC-based selection (see Section 3.2.1) of points of impact by POI and the threshold-based selection (Algorithm 2.1) by TRH. Estimated points of impact candidates are related to the true impact points by the following matching rule: In a first step the interval $[a,b]$ is partitioned into $S$ subintervals of the form $I_{j}=[m_{j-1},m_{j})$ , where $m_{0}=a$ , $m_{S}=b$ and $m_{j}=(\tau_{j}+\tau_{j+1})/2$ for $0<j<S$ . The candidate $\widehat{\tau}_{l}$ in interval $I_{j}$ with the closest distance to $\tau_{j}$ is then taken as the estimate of $\tau_{j}$ .

The simulation results for our parametric estimation procedure (Section 3.2) are based on $1000$ Monte Carlo iterations for each constellation of $n\in\{100,200,500,1000,3000\}$ and $p\in\{100,500,1000\}$ . The results for our nonparametric estimation procedure (Section 3.1), are based on the same general setup, but consider the reduced set of sample sizes $n\in\{100,200,500\}$ . Estimation errors for the parametric estimation procedure are illustrated by boxplots with error bars representing the $10\%$ and $90\%$ quantiles. The estimation errors for the nonparametric estimation procedure are quantified by the Mean Average Squared Error, $\operatorname{MASE}=1000^{-1}\sum_{r=1}^{1000}n^{-1}\sum_{i=1}^{n}\big{(}g(\eta_{i})-\widehat{g}^{r}_{\widehat{\tau}}(X^{r}_{i}(\widehat{\tau}^{r}_{1}),\dots,X^{r}_{i}(\widehat{\tau}^{r}_{\hat{S}}))\big{)}^{2}$ , where the superscript $r$ denotes the $r$ th simulation run.

Five data generating processes (DGP) are considered (see Table 1) using the following three processes $\{X_{i}(t):0\leq t\leq 1\}$ covering a broad range of situations:

OUP

Ornstein-Uhlenbeck Process. A Gaussian process with covariance function $\sigma(s,t)=\sigma^{2}_{u}/(2\theta)(\exp(-\theta|s-t|)-\exp(-\theta(s+t)))$ . We choose $\theta=5$ and $\sigma^{2}_{u}=3.5$ .

GCM

Gaussian Covariance Model. A Gaussian process with covariance function $\sigma(s,t)=\sigma(|s-t|)=\exp(-(|s-t|/d)^{2})$ . We choose $d=1/10$ .

EBM

Exponential Brownian Motion. A non Gaussian process with covariance function $\sigma(s,t)=\exp((s+t+|s-t|)/2)-1$ . It is defined by $X_{i}(t)=\exp(B_{i}(t))$ , where $B_{i}(t)$ is a Brownian motion.

DGP 1-3 are increasingly complex, but satisfy our theoretical assumptions. The general setups of DGP 4 and DGP 5 are equivalent to DGP 2, but the processes $X_{i}$ (GCM and EBM) violate our theoretical assumptions. The covariance function in DGP 4 is infinitely many times differentiable, even at the diagonal where $s=t$ , contradicting Assumption 2.1, but fitting the remark underneath this Assumption. The process in DGP 4 contradicts the Gaussian Assumption 2.2.

4.1 Evaluation of the parametric estimation procedure

DGP 1 allows us to compare our data-driven BIC-based estimation procedure from Section 3.2.1 (denoted as POI) with the estimation procedure of Lindquist and McKeague (2009) (denoted as LMcK). Lindquist and McKeague (2009) consider situations where $S=1$ is known and propose estimating the unknown parameters $\alpha,\beta_{1}$ and $\tau_{1}$ by simultaneously maximizing the likelihood over $\alpha$ , $\beta_{1}$ and the grid points $t_{j}$ . Our estimation procedure does not require knowledge about $S$ , but profits from a situation where $S=1$ is known. Therefore, for reasons of comparability, we restrict the BIC-based model selection process to allow only for models containing one point of impact candidate. The simulation results are depicted in Figure 3 and are virtually identical for both methods and show a satisfying behavior of the estimates. It should be noted, however, that our estimator is computationally advantageous as it greatly thins out the number of possible point of impact candidates by allowing only the local maxima of $|n^{-1}\sum_{i=1}^{n}Z_{\delta,i}(s)Y_{i}|$ as possible point of impact candidates. Our threshold-based estimation procedure leads to similar qualitative results. We omit these results, however, in order to allow for a clear display in Figure 3. The performance of our threshold-based procedure is reported in detail for the remaining simulation studies (DGP 2-5).

DGP 2 is more complex than DGP 1 since $S=2$ and considered unknown. Figure 4 compares the estimation errors from using our BIC-based POI estimator with those from our threshold-based estimator (denoted as TRH). For smaller sample sizes $n$ , the POI estimator seems to be preferable to the TRH estimator. Although, estimating the locations of the points of impact $\tau_{1}$ and $\tau_{2}$ is quite accurate for both procedures, the number $S$ is estimated correctly more often using the POI estimator (see upper right panel). The more precise estimation of $S$ when using the POI estimator results in essentially unbiased estimates of the parameters $\alpha$ , $\beta_{1}$ , and $\beta_{2}$ . By contrast, the less precise estimation of $S$ using the TRH estimator leads to clearly visible omitted variable biases in the estimates of the parameters $\alpha$ , $\beta_{1}$ , and $\beta_{2}$ . As the sample size increases, however, the accuracy of estimating $\widehat{S}$ improves for the TRH estimator such that both estimators show eventually a similar performance.

DGP 3 with $S=4$ unknown points of impact comprises an even more complex situation than DGP 2. For reasons of space, Figure 7 is referred to Appendix A. It shows that the qualitative results from DGP 2 still hold. For large $n$ , the POI and TRH estimators both lead to accurate estimates of the model parameters for all choices of $p$ . As already observed in DGP 2, however, the TRH estimator leads to imprecise estimates of $S$ for small $n$ , which results in omitted variables biases in the estimates of the parameters $\alpha$ , $\beta_{1}$ , $\beta_{2}$ , $\beta_{3}$ , and $\beta_{4}$ . Because of the increased complexity of DGP 3, these biases are even more pronounced than in DGP 2. The reason for this is partly due to the construction of the TRH estimator, where we set the value of $\delta$ to $\delta=c_{\delta}n^{-1/2}$ with $c_{\delta}=1.5$ . Asymptotically, the choice of $c_{\delta}$ has a negligible effect, but may be inappropriate for small $n$ , since the estimation procedure eliminates all points within a $\sqrt{\delta}$ -neighborhood around a chosen candidate $\widehat{\tau}_{r}$ (see Section 2.2). For DGP 3, the choice of $c_{\delta}=1.5$ results in a too large $\sqrt{\delta}$ -neighborhood, such that the estimation procedure also eliminates true point of impact locations for small $n$ . By contrast, the POI estimator is able to avoid such adverse eliminations as the BIC criterion is also minimized over $\delta$ .

DGP 4 takes up the general setup of DGP 2, but the functional data $X_{i}$ are simulated using a Gaussian covariance model (GCM) which is characterized by an infinitely many times differentiable covariance function. This setting contradicts our basic Assumption 2.1, but fits our remark at the end of Section 2.1. From Figure 5 it can be concluded that even under the failure of Assumption 2.1, both estimation procedures are capable of consistently estimating the points of impact and the model parameters. The TRH estimator, however, fails to estimate the number of points of impact $S$ even for large $n$ , since the $\lambda$ -threshold is tailored for situations under Assumption 2.1. Here the TRH estimator is able to estimate the true points of impact, but additionally selects more and more redundant point of impact candidates as $n$ becomes large. That is, the TRH estimator becomes more a screening than a selection procedure which can be problematic in practice. By contrast, the POI estimator is able to avoid such redundant selections of point of impact candidates, as the BIC criterion only selects points of impact candidates if they result in a sufficiently large improvement of the model fit.

DGP 5 also takes up the setup of DGP 2; however, the process $X_{i}$ is simulated as an exponential Brownian Motion (EBM) violating Assumption 2.2, but still satisfying Assumption 2.1. Here we set the asymptotically negligible tuning parameter $c_{\delta}$ of the TRH estimator equal to 3. The evolution of the estimation errors can be seen in Figure 8 in Appendix A. The results are comparable with our previous simulations in DGP 2 and DGP 3, indicating that the estimation procedure is robust to at least some violations of Assumption 2.2.

4.2 Evaluation of the nonparametric estimation procedure

Table 2 contains the simulation results for our nonparametric estimation procedure described in Section 3.1. We focus on the more challenging data generating processes, DGP2-5, with at least two points of impact and compare our nonparametric method with the Most-Predictive Design Points (MPDP) method of Ferraty et al. (2010). To the best of our knowledge, the MPDP method is the only comparable method in the literature. We tried hard to carry out the full simulation study for the MPDP method; however, Ferraty et al. (2010) use a brute force minimization approach based on cross-validation considering $2^{p}$ grid point combinations, which makes their method computationally extremely expensive.111Due to the high computational costs, the simulation study in Ferraty et al. (2010) is based on only 50 Monte Carlo replications. In a readme-file, provided at Frederic Ferraty’s homepage, the authors report that one run with a dataset of $n=149$ curves and $p=700$ grid points lasts about 30 minutes. For the MPDP method, we, therefore, had to limit the number of Monte Carlo replications to 500, the number of grid points to $p\in\{100,500\}$ and the sample sizes to $n\in\{100,200\}$ .

The results in Table 2 show that the MASE decreases with increasing sample size $n$ and that the effect of different numbers of grid points $p$ is essentially negligible for both methods. The differences in the simulation results for the different data generating processes are generally equivalent to those discussed for the parametric estimation procedure. DGP 3 with its four points of impact is the most challenging case and, therefore, produces the largest estimation errors. The MPDP method of Ferraty et al. (2010) has throughout larger estimation errors than our nonparametric estimation results based on the TRH estimator (Algorithm 2.1). The larger estimation errors in $\widehat{g}$ of the MPDP method can be explained by its larger estimation errors when estimating the points of impact $\tau_{1},\dots,\tau_{S}$ (see Table 3). In fact, our super-consistent points of impact estimator has substantially smaller estimation errors (factor $1/10$ to $1/100$ ) than the MPDP method.

5 Points of impact in continuous emotional stimuli

Current psychological research on emotional experiences increasingly includes continuous emotional stimuli such as videos to induce emotional states as an attempt to increase ecological validity (Trautmann et al., 2009). Asking participants to evaluate those stimuli is most often done using an overall rating such as “How positive or negative did this video make you feel?”. Such global overall ratings are guided by the participant’s affective experiences while watching the video (Schubert, 1999; Mauss et al., 2005) which makes it crucial to identify the relevant parts of the stimulus impacting the overall rating in order to understand the emergence of emotional states and to make use of specific “impacting” parts of the stimuli.

Due to a lack of appropriate statistical methods, existing approaches use heuristics such as the “peak-and-end rule” in order to link the overall ratings with the continuous emotional stimuli. The peak-and-end rule states that people’s evaluations can be well predicted using just two characteristics: the moment of emotional peak intensity and the ending of the emotional stimuli (Fredrickson, 2000). Such a heuristic approach, however, is only of limited practical use. The peak intensity moment and the ending are not necessarily good predictors. Furthermore, the peak intensity moment can vary strongly across participants, which prevents linking the overall rating to specific moments in the continuous emotional stimuli that are of a common relevance.

Our case study comprises data from $n=65$ participants, who were asked to continuously report their emotional state (from very negative to very positive) while watching a documentary video (112 sec.) on the persecution of African albinos. A version of the video can be found online at YouTube.222Link to the video: https://youtu.be/9F6UpuJIFaY. The video clip used in the experiment corresponds approximately to the first 115 sec. of the video at YouTube. The first six data points ( $<1$ sec.) are removed as they contain some obviously erratic components. Figure 1 shows the standardized emotion trajectories $X_{i}(t_{j})$ , where $t_{j}$ are equidistant grid points within the unit-interval $0=t_{1}<\dots<t_{p}=1$ with $p=167$ . After watching the video, the participants were asked to rate their final overall feeling. This overall rating was coded as a binary variable $Y_{i}\in\{0,1\}$ , where $Y_{i}=0$ denotes “I feel negative” ( $48\%$ of the participants) and $Y_{i}=1$ denotes “I do not feel negative” ( $52\%$ of the participants). The data were collected in May 2013. Participants were recruited through Amazon Mechanical Turk (www.mturk.com) and received 1USD for completing the ratings via the online survey platform SoSci Survey (www.soscisurvey.de). The study was approved by the local institutional review board (IRB, University of Colorado Boulder). The documentary video is taken from the Interdisciplinary Affective Science Laboratory Movie Set (Feldman Barrett, L., unpublished).

To analyze the data we use our parametric estimation procedure (Section 3.2) using a logit link function $g$ and the BIC-based selection of points of impact (Section 3.2.1). We compare our estimation procedure with the performance of the following two logit regression models based on peak-and-end rule (PER) predictor variables:

PER-1

Logit regression with peak intensity predictor $X_{i}(p^{\operatorname{abs}}_{i})$ and the end-feeling predictor $X_{i}(1)$ , where $p^{\operatorname{abs}}_{i}=\arg\max_{t}(|X_{i}(t)|)$

PER-2

Logit regression with peak intensity predictors $X_{i}(p^{\operatorname{pos}}_{i})$ and $X_{i}(p^{\operatorname{neg}}_{i})$ and end-feeling predictor $X_{i}(1)$ , where $p^{\operatorname{pos}}_{i}=\arg\max_{t}(X_{i}(t))$ and $p^{\operatorname{neg}}_{i}=\arg\min_{t}(X_{i}(t))$

Table 4 shows the estimated coefficients, standard errors, as well as summary statistics for each of the three models, where our estimation procedure is denoted by POI. In comparison to our POI estimator, both benchmark models (PER-1 and PER-2) have significantly lower model fits (McFadden Pseudo R2) and significantly lower predictive abilities (Somers’ $D_{xy}$ ), where $D_{xy}=0$ means that a model is making random predictions and $D_{xy}=1$ means that a model discriminates perfectly.

Figure 6 shows the positive (p) and negative (n) peak intensity predictors, $X_{i}(p_{i}^{\operatorname{pos}})$ and $X_{i}(p_{i}^{\operatorname{neg}})$ , for all participants; the absolute intensity predictors, $X_{i}(p_{i}^{\operatorname{abs}})$ , form a subset of these. The peak intensity predictors are distributed across the total domain and, therefore, do not allow linking the overall ratings $Y_{i}$ to specific common time points $t\in[0,1]$ in the continuous emotional stimuli. By contrast, the estimated points of impact $\widehat{\tau}_{1}$ and $\widehat{\tau}_{2}$ allow for such a link and point to two emotionally arousing movie scenes:

$\widehat{\tau}_{1}$ :

Portrait shot of the traumatized African albino protagonist nervously moving eyes.

$\widehat{\tau}_{2}$ :

Spoken words: “[…]the money we’ve got from selling his body parts.”

Supplementary materials. The online supplementary materials (Poß et al., 2020) include the supplementary paper containing additional simulation results and the proofs of our theoretical results, the R-package fdapoi and R-scripts for reproducing our main empirical results.

Acknowledgments. The online rating tool for the data collection was kindly provided by Dominik Leiner (SoSci Survey, Germany). Many thanks go to the Editor, the Associate Editor and two anonymous referees whose constructive comments helped us to improve our manuscript and motivated Section 2.4.

Fundings. Data collection was funded by the National Institutes of Health Director’s Pioneer Award (DP1OD003312) to Lisa Feldman Barrett, and the National Institute on Drug Abuse grant (R01DA035484) to Tor D. Wager. The development of the Interdisciplinary Affective Science Laboratory (IASLab) Movie Set was supported by a grant from the U.S. Army Research Institute for the Behavioral and Social Sciences (W5J9CQ-11-C-0046) to Lisa Feldman Barrett and Tor D. Wager. The views, opinions, and/or findings contained in this paper are those of the authors and shall not be construed as an official U.S. Department of the Army position, policy, or decision, unless so designated by other documents.

Appendix A Additional simulation results

This appendix contains the additional simulation results discussed in Section 4 of the main paper. Figure 7 depicts the results for DGP 3 and Figure 8 illustrates the results for DGP 5.

Appendix B Identifying points of impact

B.1 Proof of Theorem 2.1

Proof of Theorem 2.1.

Since $X_{i}$ is a Gaussian process satisfying $\operatorname{\mathbb{E}}(X_{i}(s))=0$ for all $s\in[a,b]$ , $\operatorname{\mathbf{X}}_{s,i}=(X_{i}(s),X_{i}(\tau_{1}),\dots,X_{i}(\tau_{S}))^{T}$ follows for all $s\in[a,b]$ a multivariate normal distribution with mean zero and covariance matrix $\operatorname{\boldsymbol{\Sigma}}_{s}=\operatorname{\mathbb{E}}(\operatorname{\mathbf{X}}_{s,i}\operatorname{\mathbf{X}}_{s,i}^{T})$ .

Let $\mathbf{Z}_{s,i}$ be an $(S+1)$ -dimensional standard normal distributed random vector, then $\operatorname{\mathbf{X}}_{s,i}=\operatorname{\boldsymbol{\Sigma}}_{s}^{1/2}\mathbf{Z}_{s,i}$ . Furthermore, define $\tilde{g}(\operatorname{\mathbf{X}}_{s,i})=g(X_{i}(\tau_{1}),\dots,X_{i}(\tau_{S}))$ for all $s\in[a,b]$ . It then follows from the proof of Lemma 1 in \citeappendixL1994 that

[TABLE]

Specifically,

[TABLE]

Setting $\vartheta_{r}=\operatorname{\mathbb{E}}(\frac{\partial}{\partial x_{r}}g(X_{i}(\tau_{1}),\dots,X_{i}(\tau_{S}))$ completes the proof. ∎

B.2 Identification for non-Gaussian processes

So far, our strategy for identifying and estimating points of impact rests upon assuming a non-smooth Gaussian process. Theorem 2.1 together with Assumption 2.1 then implies that $\operatorname{\mathbb{E}}\big{(}X_{i}(s)Y_{i}\big{)}$ is not twice differentiable at the points of impact $\tau_{r}$ , $r=1,\dots,S$ . For the auxiliary process $Z_{\delta,i}$ defined in Section 2 we then obtain peaks of the function $\operatorname{\mathbb{E}}\big{(}Z_{\delta,i}(s)Y_{i}\big{)}$ for $s\in\{\tau_{1},\dots,\tau_{S}\}$ .

In the following we will show that the Gaussian assumption can be relaxed. We will provide more general assumptions under which the same identification strategy can be pursued. The generalization is based on the idea that realizations of $X_{i}$ may depend on some latent random variable $V_{i}$ such that the conditional distributions of $X_{i}$ given $V_{i}=v$ are Gaussian. The (unconditional) distribution of $X_{i}$ then additionally depends on the distribution of $V_{i}$ and may be far from Gaussian.

A simple example are elliptical processes: For instance, for some strictly positive real-valued random variable $V_{i}>0$ with $E(V_{i}^{2})<\infty$ it holds

[TABLE]

where $X_{i}^{*}(t)$ is a zero mean Gaussian process with covariance function $\sigma^{*}(s,t)$ . In this case the conditional distribution of $X_{i}$ given $V_{i}=v$ is Gaussian with mean zero and covariance function $v^{2}\sigma^{*}(s,t)$ .

A more general framework is given by the following condition:

A)

$X_{i}$ is a zero mean stochastic process on $[a,b]$ . Realizations of the process $X_{i}$ depend on the realizations of a latent random variable $V_{i}$ defined on a metric space ${\cal V}$ . The joint distribution of $(X_{i},V_{i})$ is such that for each $v\in{\cal V}$ the conditional distribution of $X_{i}$ given $V_{i}=v$ is Gaussian with conditional mean function

[TABLE]

and continuous conditional covariance function

[TABLE]

Moreover, the error term $\varepsilon_{i}$ in (2) is independent of $V_{i}$ and $X_{i}$ , and for all $r=1,\dots,S$

[TABLE]

is a measurable function of $v\in{\cal V}$ . In the following we will additionally assume that the joint distribution of $(X_{i},V_{i})$ is such that all conditional and unconditional expectations used in subsequent arguments exist. This will go without saying.

An additional condition then ensures identifiability of points of impact:

B)

$M(s):=\operatorname{\mathbb{E}}\left[\mu(s;V_{i})\operatorname{\mathbb{E}}\big{(}g(X_{i}(\tau_{1}),\dots,X_{i}(\tau_{r}))\big{|}V_{i}\big{)}\right]$ is a twice continuously differentiable function of $s\in[a,b]$ . Furthermore, the conditional covariance functions satisfy Assumption 2.1. With $\Omega:=[a,b]^{2}\times[0,b-a]$ , there exists a function $\omega:\Omega\times{\cal V}\rightarrow\mathbb{R}$ such that

[TABLE]

where $\omega(\cdot;v)$ is measurable in $v$ , and

[TABLE]

are twice continuously differentiable functions of $s,t,z\in[a,b]^{2}\times[0,b-a]$ . Moreover, $0\neq C(\tau_{r}):=-\frac{\partial}{\partial z}W_{r}(\tau_{r},\tau_{r},z)|_{z=0}$ for all $r=1,\dots,S$ .

Proposition B.1.

i)

Under Condition A) we obtain

[TABLE]

ii)

Under Conditions A) and B) we have $\operatorname{\mathbb{E}}\big{(}X_{i}(s)Y_{i}\big{)}=\sum_{r=1}^{S}W_{r}(s,\tau_{r},|s-\tau_{r}|^{\kappa})+M(s)$ , and

[TABLE]

as $\delta\rightarrow 0$ .

The elliptical process introduced in (17) provides an example. In this case we have $\sigma(s,t;v)=v^{2}\sigma^{*}(s,t)$ as well as $\mu(s;v)=0$ . If all relevant moments exist, then (20) simplifies to

[TABLE]

where $\sigma(s,t)=\operatorname{\mathbb{V}}(V_{i})\sigma^{*}(s,t)$ is the covariance function of $X_{i}$ . Additionally, if the Gaussian process $X_{i}^{*}$ satisfies Assumption 2.1 for some $\omega^{*}:\Omega\rightarrow\mathbb{R}$ , then $\omega(t,s;v)=v^{2}\omega^{*}(s,t,|s-t|^{\kappa})$ , and (19) leads to

[TABLE]

which is twice continuously differentiable in $(s,t,z)$ . Result (21) then holds with $C(\tau_{r})=c^{*}(\tau_{r})\operatorname{\mathbb{E}}\left(V_{i}^{2}\vartheta_{r}(V_{i})\right)$ , where $c^{*}(\tau_{r})=-\frac{\partial}{\partial z}\omega^{*}(\tau_{r},\tau_{r},z)|_{z=0}$ , $r=1,\dots,S$ .

A more complex example is given by the following situation: There exist smooth, non-Gaussian stochastic processes $V_{i1}(t),V_{i2}(t)$ , $t\in[a,b]$ , as well as a zero mean Gaussian process $X_{i}^{*}(t)$ such that

[TABLE]

where $V_{i}=(V_{i1},V_{i2})$ are independent of $X_{i}^{*}$ . All relevant moments exist, and with probability 1 any realization of $V_{i1}$ as well as any realization of $V_{i2}$ are twice continuously differentiable functions on $[a,b]$ . If $\sigma^{*}$ denotes the covariance function of $X_{i}^{*}$ , then given $(V_{i1},V_{i2})=(v_{1},v_{2})$ the conditional distribution of $X_{i}$ is Gaussian, and

[TABLE]

Consequently, (20) becomes

[TABLE]

Smoothness of $v_{2}$ implies smoothness of $v_{2}(s)\operatorname{\mathbb{E}}\big{(}g(X_{i}(\tau_{1}),\dots,X_{i}(\tau_{r}))\big{|}V_{i}=v_{2}\big{)}$ in $s\in[a,b]$ , and hence $M(s)=\operatorname{\mathbb{E}}\left(V_{i2}(s)\operatorname{\mathbb{E}}\big{(}g(X_{i}(\tau_{1}),\dots,X_{i}(\tau_{r}))\big{|}V_{i}\big{)}\right)$ is twice continuously differentiable for $s\in[a,b]$ . Furthermore, if $X_{i}^{*}$ satisfies Assumption 2.1 for some $\omega^{*}:\Omega\rightarrow\mathbb{R}$ , then for almost every realization $v=(v_{1},v_{2})$ of $V_{i}$ and any $r=1,\dots,S$

[TABLE]

is a.s. a twice continuously differentiable function of $(s,t,z)$ . This in turn implies that

[TABLE]

are twice continuously differentiable functions of $s,t,z\in[a,b]^{2}\times[0,b-a]$ .

Proof of Proposition B.1.

For $v\in\Omega$ the conditional distribution of $X_{i}-\mu(\cdot;v)$ given $V_{i}=v$ corresponds to a zero mean Gaussian process, and hence the arguments in the proof of Theorem 2.1 imply that

[TABLE]

Therefore

[TABLE]

Assertion i) thus follows from

[TABLE]

Now note that under Condition B) our definition of $Z_{\delta,i}$ leads to

[TABLE]

Since $M(s)$ and $W(t,s,z)$ are twice continuously differentiable, Taylor expansions then immediately lead to $\mathbb{E}(Z_{\delta,i}(s)Y_{i})=O(\delta^{2})$ for all $s\notin\{\tau_{1},\dots,\tau_{r}\}$ . Furthermore, for $s\in\{\tau_{1},\dots,\tau_{r}\}$ Taylor expansions imply that

[TABLE]

as $\delta\rightarrow 0$ , $r=1,\dots,S$ .

∎

Appendix C Estimating points of impact

Parts of the proofs of Proposition C.1, Lemmas C.2-C.4, Theorem 2.2 and Lemma D.1 follow similar arguments as in Kneip et al. (2016). However, we emphasize that we consider a fundamentally different, more challenging nonparametric statistical model, which requires additional new theoretical arguments. Furthermore, we correct some minor mistakes occurring in Kneip et al. (2016); especially the arguments around Equation (70) of Theorem 2.2 correct the arguments around Equation (C.36) in Appendix C of the supplementary paper \citeappendixKPS_S_2015.

Proposition C.1.

Under Assumption 2.1 we have for all $t\in(a,b)$ , and any sufficiently small $\delta>0$ for some constants $M_{1}<\infty$ and $M_{2}<\infty$ :

[TABLE]

Moreover, for any $0<c<\infty$ we have for any sufficiently small $\delta>0$ and all $u\in[-c,c]$ :

[TABLE]

where for some constants $M_{3,c}<\infty$ and $M_{4,c}<\infty$

[TABLE]

hold for all $u\in[-c,c]$ .

Finally, for all $s\in[a,b]$ with $|t-s|\geq\delta$ we have for some constant $M_{5}<\infty$

[TABLE]

Proof of Proposition C.1.

Assumption 2.1 implies that the absolute values of all first and second order partial derivatives of $\omega(t,s,z)$ are uniformly bounded by some constant $M<\infty$ for all $(t,s,z)$ in the compact subset $[a,b]^{2}\times[0,b-a]$ of $\Omega$ .

By definition of $Z_{\delta,i}$ it thus follows from a Taylor expansion of $\omega$ that for $t\in(a,b)$ , any sufficiently small $\delta>0$ and some constant $M_{1}<\infty$

[TABLE]

which proofs assertion (22). For $\operatorname{\mathbb{E}}(Z_{\delta,i}(t)^{2})$ , i.e. the variance of $Z_{\delta,i}(t)$ , we obtain by similar arguments

[TABLE]

for some constant $M_{2}<\infty$ , i.e. assertion (23).

Assertion (24) and (25) follow again from Taylor expansions of $\omega$ , i.e. we have for any $0<c<\infty$ and any sufficiently small $\delta>0$ and all $u\in[-c,c]$

[TABLE]

where for some constants $M_{3,c}<\infty$ and $M_{4,c}<\infty$

[TABLE]

hold for all $u\in[-c,c]$ . Assertion (24) now follows immediately from (LABEL:zdelta.aux1) together with the expression of $\operatorname{\mathbb{E}}(Z_{\delta,i)}(t)X_{i}(t))$ in terms of $\omega$ , while assertion (25) corresponds to (30).

In order to proof Assertion (26), let $s\in[a,b]$ with $|t-s|\geq\delta$ . Another Taylor expansion yields:

[TABLE]

where for some constant $M_{5,1}<\infty$ we have

[TABLE]

and $\omega_{3}(x,y,z)$ denotes the partial derivative of $\omega(x,y,z)$ with respect to its third argument.

For $\kappa=1$ we then have for $|t-s|\geq\delta$ , $||t-s+\delta|-|t-s||=\delta$ , $||t-s-\delta|-|t-s||=\delta$ as well as $|t-s+\delta|+|t-s-\delta|-2|t-s|=0$ , which leads together with (31) to

[TABLE]

for some constant $M_{5,2}<\infty$ .

For $\kappa\neq 1$ , define $v:=t-s$ to shorten the notation and suppose $v\geq\delta>0$ (i.e. suppose $t>s$ , the case for which $s>t$ follows from similar arguments as below).

[TABLE]

Suppose $v\geq 2\delta$ . It follows from a Taylor expansion of $f(\delta)$ around [math], that there exists a $\xi$ between [math] and $\delta$ such that

[TABLE]

Since $\kappa<2$ we have

[TABLE]

On the other hand, since $v\geq 2\delta$ , we have for all $0\leq\xi\leq\delta$

[TABLE]

Now consider the case $\delta\leq v\leq 2\delta$ . If $\kappa>1$ , then $f(\delta)\geq 0$ and

[TABLE]

For $\kappa<1$ we have $f(\delta)\leq 0$ . With $\delta\leq v\leq 2\delta$ , we then have

[TABLE]

Finally, similar arguments may be used to show that additionally we have for some constant $M_{5,3}<\infty$

[TABLE]

Assertion (26) then follows from (31) - (37). ∎

Proofs for estimation of impact points

We begin by stating a deviation bound for the central $\chi^{2}$ distribution.

Lemma C.1.

Let $W\sim\chi^{2}_{n}$ then for all $0\leq x<1/2$ we have

[TABLE]

Proof of Lemma C.1.

Equations (A.2) and (A.3) in \citeappendixJL2009 imply that for $0\leq x<1/2$ we have

[TABLE]

∎

Lemma C.2.

Under Assumption 2.1 there exist constants $0<D_{1}<\infty$ and $0<D_{2}<\infty$ , such that for all $n$ , all $0<\delta<(b-a)/2$ , all $t\in[a+\delta,b-\delta]$ , all $0<s\leq 1/2$ with $\delta^{\kappa}s^{\kappa}\geq s\delta^{2}$ , and every $0<z\leq\sqrt{n}$ we obtain

[TABLE]

and

[TABLE]

Proof of Lemma C.2..

Choose some arbitrary $0<\delta<(b-a)/2$ , $t\in[a+\delta,b-\delta]$ , as well as $0<s\leq 1/2$ . For $q_{1},q_{2}\in[-1,1]$ , Taylor expansions then yield

[TABLE]

for some constant $L_{2,1}<\infty$ .

Note that there exists a constant $L_{2,2}<\infty$ such that for all $0<s\leq 0.5$ we have $||q_{2}-q_{1}+\frac{1}{s}|^{\kappa}+|q_{2}-q_{1}-\frac{1}{s}|^{\kappa}-2\frac{1}{s^{\kappa}}|\leq L_{2,2}|q_{2}-q_{1}|^{2}$ as well as $||\frac{q_{2}-q_{1}}{2}+\frac{1}{s}|^{\kappa}+|\frac{q_{2}-q_{1}}{2}-\frac{1}{s}|^{\kappa}-2\frac{1}{s^{\kappa}}|\leq L_{2,2}|q_{2}-q_{1}|^{2}$ .

Together with (41) this implies that there exists a constant $L_{2,3}<\infty$ , which can be chosen independent of $s$ and $\delta$ , such that for all $q_{1},q_{2}\in[-1,1]$

[TABLE]

Define $Z_{\delta,i}^{*}(q):=\frac{1}{\sqrt{s^{\kappa}\delta^{\kappa}}}(Z_{\delta,i}(t+qs\delta)Y_{i}-\mathbb{E}(Z_{\delta,i}(t+qs\delta)Y_{i}))$ and $Z_{\delta}^{*}(q):=\frac{1}{\sqrt{n}}\sum_{i=1}^{n}Z_{\delta,i}^{*}(q)$ . By bounding the absolute moments of $E(|Y_{i}|^{2m})$ according to Assumption 2.2, one can easily verify that for $K=4\sqrt{L_{2,3}|q_{2}-q_{1}|^{\min\{1,\kappa\}}}\sigma_{|y|}$ , the Bernstein condition

[TABLE]

holds for all $0<s\leq 0.5$ , all integers $m\geq 2$ , all $q_{1},q_{2}\in[-1,1]$ and all $0<\delta<(b-a)/2$ .

An application of Corollary 1 in \citeappendixvandeGeer2013 then guarantees that the Orlicz norm of $Z^{*}(q_{1})-Z^{*}(q_{2})$ is bounded, i.e., one has for all $q_{1},q_{2}\in[-1,1]$

[TABLE]

for some constant $0<L_{2,4}<\infty$ .

The proof then follows from well known maximal inequalities of empirical process theory. In particular, by (43) one may apply theorem 2.2.4 of \citeappendixvanderVaart1996. It is immediately seen that the covering integral appearing in this theorem is finite, and we can thus infer that there exists a constant $0<D_{1,1}<\infty$ such that

[TABLE]

For every $z>0$ , the Markov inequality then yields

[TABLE]

At the same time it follows from a Taylor expansion that for any $0<z\leq\sqrt{n}$ there exists a constant $0<D_{1,2}<\infty$ such that

[TABLE]

Assertion (C.2) is an immediate consequence.

In order to prove (C.2) first note that $Z_{\delta,i}(t_{1})^{2}-Z_{\delta,i}(t_{2})^{2}=(Z_{\delta,i}(t_{1})-Z_{\delta,i}(t_{2}))(Z_{\delta,i}(t_{1})+Z_{\delta,i}(t_{2}))$ . Equation (26) implies the existence of a constant $0<L_{2,5}<\infty$ such that $\mathbb{E}((Z_{\delta,i}(t+q_{1}s\delta)+Z_{\delta,i}(t+q_{2}s\delta))^{2})\leq L_{2,5}\delta^{\kappa}$ for all $q_{1},q_{2}\in[-1,1]$ , and all $n$ , $t$ , $s$ and $\delta$ . With $Z_{\delta}^{**}(q)=\frac{1}{\sqrt{\delta^{2\kappa}s^{\kappa}}}\frac{1}{\sqrt{n}}\sum_{i=1}^{n}(Z_{\delta,i}(t+qs\delta)^{2}-\mathbb{E}(Z_{\delta,i}(t+qs\delta)^{2}))$ , similar steps as above now imply the existence of a constant $0<L_{2,6}<\infty$ such that

[TABLE]

Using again maximal inequalities of empirical process theory and (44), Assertion (C.2) now follows from arguments similar to those used to prove (C.2). ∎

Lemma C.3.

Under the assumptions of Theorem 2.2 there exist constants $0<D_{3}<D_{4}<\infty$ and $0<D_{5}<\infty$ such that

[TABLE]

Moreover, there exist a constant $0<D<\infty$ such that for any $A^{*}$ with $D<A^{*}\leq A$ we obtain as $n\rightarrow\infty$ :

[TABLE]

for any constant $D_{4,\epsilon}<\infty$ with $D_{4}<D_{4,\epsilon}$ .

Proof of Lemma C.3..

Assertion 45 follows directly from Assumption 2.1 and equation (23).

Let ${\cal J}_{\delta}:=\{j|\ t_{j}\in[a+\delta,b-\delta],j\in\{1,\dots,p\}\}$ . Choose some constants $w_{1},w_{2}$ with $1<w_{1}<w_{2}$ and determine an equidistant grid $s_{1}=a+\delta<s_{2}<\dots<s_{N_{w1}}=b-\delta$ of $N_{w_{1}}=[(\frac{b-a}{\delta})^{w_{1}}]$ points in $[a+\delta,b-\delta]$ . Obviously, $\ell_{w_{1}}:=|s_{j}-s_{j-1}|=O(\delta^{w_{1}})$ , $j=2,\dots,N_{w_{1}}$ , as $\delta\rightarrow 0$ . Then

[TABLE]

By Assumption 2.2, using the deviation bound (38) as well as $\sup_{t\in[a+\delta,b-\delta]}\mathbb{E}(Z_{\delta,i}(t)^{2})\leq D_{4}\delta^{\kappa}$ , it follows from the Bonferroni-inequality that as $n\rightarrow\infty$

[TABLE]

while Lemma C.2 implies that as $n\rightarrow\infty$

[TABLE]

Recall that $\frac{\ell_{w_{1}}^{\kappa}}{\delta^{\kappa}}=O(\delta^{\kappa(w_{1}-1)})$ and hence $\sqrt{\frac{\ell_{w_{1}}^{\kappa}}{\delta^{\kappa}n}\log(\frac{b-a}{\delta})}=o(\sqrt{\frac{1}{n}\log(\frac{b-a}{\delta})})$ . When combining the above arguments we thus obtain (46).

Before considering (47) note that it follows from (45) and (46) that there exists a constant $0<L_{3,1}<\infty$ such that as $n\rightarrow\infty$ .

[TABLE]

Now consider (47) and keep in mind that

[TABLE]

Some straightforward computations lead to

[TABLE]

Furthermore,

[TABLE]

and it follows from (42) and (45) that there is a constant $0<L_{3,2}<\infty$ such that for every $j\in\{2,\dots,N_{w_{1}}\}$

[TABLE]

holds for all sufficiently large $n$ . Together with (49) We can therefore infer from Lemma C.2 that for some constants $0<L_{3,3}<\infty$ , $0<L_{3,4}<\infty$

[TABLE]

as $n\rightarrow\infty$ . Moreover, by (49), we have with probability $1$ as $n\to\infty$ ,

[TABLE]

Chose an arbitrary point $s_{j}$ , and define $W_{i}(s_{j}):=\frac{1}{\sqrt{D_{4}\delta^{\kappa}\sigma_{|y|}^{2}}}(Z_{\delta,i}(s_{j})Y_{i}-\operatorname{\mathbb{E}}(Z_{\delta,i}(s_{j})Y_{i}))$ . Then $E(W_{i}(s_{j}))=0$ and it is easy to show that under Assumption 2.2 with $K=4$ , a constant which is independent of $s_{j}$ , $W_{i}(s_{j})$ satisfies the Bernstein condition in Corollary 1 of \citeappendixvandeGeer2013, i.e., we have

[TABLE]

It immediately follows from an application of Corollary 1 in \citeappendixvandeGeer2013 that there exists a constant $0<L_{3,5}<\infty$ such that the Orlicz-Norm $||\frac{1}{\sqrt{n}}\sum_{i=1}^{n}W_{i}(s_{j})||_{\Psi}$ can be bounded by $L_{3,5}<\infty$ . And hence we can infer that

[TABLE]

It then follows from similar steps as in the proof of Lemma C.2 that there exists a constant $0<L_{3,6}<\infty$ such that for all $0<z\leq\sqrt{n}$ we obtain for all $s_{j}$

[TABLE]

We can thus conclude that there exists a constant $0<L_{3,7}<\infty$ such that

[TABLE]

Choose some $\omega_{3}$ with $\omega_{3}>\omega_{1}>1$ and note that our conditions on $\delta$ imply that $z=\sqrt{\omega_{3}\log{(\frac{b-a}{\delta}})}\leq\sqrt{n}$ for all sufficiently large $n$ . Using the union bound this leads to

[TABLE]

With $D=\sqrt{\omega_{3}}L_{3,7}<\infty$ , assertion (47) holds for all $A^{*}>D$ by (LABEL:lem0pr01)-(54).

Finally, (48) follows from (45), (46) and (47) by noting that the assertions in particular imply that with probability converging to $1$ (as $n\to\infty$ ) we have $\sup_{t\in(a,b)}\frac{1}{n}\sum_{i=1}^{n}Z_{\delta,i}(t)^{2}\leq D_{4,\epsilon}\delta^{\kappa}$ for any constant $D_{4,\epsilon}>D_{4}$ . ∎

Contrary to Lemma 2 in the supplementary paper to Kneip et al. (2016), we don’t have the simple expression $D=\sqrt{2}$ ; this is the price to pay for not assuming $Y_{i}$ to be Gaussian.

Remarks to Lemma C.3 concerning the threshold $\lambda$ :

Using a slight abuse of notation, first note that there is a close connection between $\lambda=A\sqrt{\sigma_{|y|}^{2}\log(\frac{b-a}{\delta})/n}$ for some $A>D$ given in Theorem 2.2 and $\widetilde{\lambda}:=A\sqrt{\sqrt{\operatorname{\mathbb{E}}(Y^{4})}\log(\frac{b-a}{\delta})/n}$ for $A=\sqrt{2\sqrt{3}}$ as used in our simulations. Indeed, set $\sigma_{|y|}^{2}=\operatorname{\mathbb{E}}(Y^{2})$ . Jensen’s inequality implies that there exists a constant $0<\widetilde{D}\leq 1$ such that $\operatorname{\mathbb{E}}(Y^{2})\widetilde{D}=\sqrt{\operatorname{\mathbb{E}}(Y^{4})}$ . We can therefore rewrite the expression for $\widetilde{\lambda}$ in the form of $\lambda$ presented in Theorem 2.2 as $A\sqrt{\sigma_{|y|}^{2}\log(\frac{b-a}{\delta})/n}$ with $A=\sqrt{2\sqrt{3}\widetilde{D}}$ .

We proceed to give more details about the motivation for the threshold used in the simulation:

Arguments for the applicability of the threshold $\lambda$ in the proof of Theorem 2.2 follow from Lemma C.3. The crucial step for determining an operable threshold $\lambda$ is to derive useful bounds on

[TABLE]

Define $V_{\delta}(t):=(1/n\sum_{i=1}^{n}Z_{\delta,i}(t)Y_{i}-\operatorname{\mathbb{E}}(Z_{\delta,i}(t)Y_{i}))/(1/n\sum_{i=1}^{n}Z_{\delta,i}(t)^{2})^{1/2}$ . It is then easy to see that under our assumptions $\sqrt{n}(1/n\sum_{i=1}^{n}Z_{\delta,i}(t)Y_{i}-\operatorname{\mathbb{E}}(Z_{\delta,i}(t)Y_{i}))$ satisfies the Lyapunov conditions. We hence can conclude that $\sqrt{n}V_{\delta}(t)$ converges for all $t$ in distribution to $N(0,\operatorname{\mathbb{V}}(Z_{\delta,i}(t)Y_{i})/\operatorname{\mathbb{E}}(Z_{\delta,i}(t)^{2}))$ , while at the same time the Cauchy-Schwarz inequality implies $\operatorname{\mathbb{V}}(Z_{\delta,i}(t)Y_{i})/\operatorname{\mathbb{E}}(Z_{\delta,i}(t)^{2})\leq\sqrt{3\operatorname{\mathbb{E}}(Y_{i}^{4})}$ .

If the convergence to the normal distribution is sufficiently fast, the union bound in the proof of Lemma C.3 together with an elementary bound on the tails of the normal distribution leads to

[TABLE]

for some $A^{*}\geq\sqrt{2\sqrt{3}}$ . The threshold $A\sqrt{\sqrt{\operatorname{\mathbb{E}}(Y_{i}^{4})}\log(\frac{b-a}{\delta})/n}$ for some $A\geq\sqrt{2\sqrt{3}}$ is then an immediate consequence.

Lemma C.4.

Under the assumptions of Theorem 2.2 let $I_{r}:=\{t\in[a,b]|\ |t-\tau_{r}|\leq\min_{s\neq r}|t-\tau_{s}|\}$ , $r=1,\dots,S$ . If $S>0$ , there then exist a constant $0<Q_{1}<\infty$ and for each $0<\kappa<2$ a constant $0<Q_{2}<\infty$ such that for all sufficiently small $\delta>0$ and all $r=1,\dots,S$ we have

[TABLE]

*as well as *

[TABLE]

and for any $u\in[-0.5,0.5]$

[TABLE]

where $|R_{7;r}(u)|\leq\widetilde{M}_{r}||u|^{1/2}\delta|^{\min\{2\kappa,2\}}$ for some constants $\widetilde{M}_{r}<\infty$ , $r=1,\dots,S$ and $\vartheta_{r}=\operatorname{\mathbb{E}}(\frac{\partial}{\partial x_{r}}g(X_{i}(\tau_{1}),\dots,X_{i}(\tau_{S}))$ .

Proof of Lemma C.4..

Theorem 2.1 guarantees us the existence of constants $\vartheta_{1},\dots,\vartheta_{S}$ with $\vartheta_{r}=\operatorname{\mathbb{E}}(\frac{\partial}{\partial x_{r}}g(X_{i}(\tau_{1}),\dots,X_{i}(\tau_{S})))$ , $r=1,\dots,S$ , such that that for all $t\in[a+\delta,b-\delta]$ we have

[TABLE]

Since $\tau_{1},\dots,\tau_{S}\in(a,b)$ are fixed, we have $\tau_{r}\in[a+\delta,b-\delta]$ , $r=1,\dots,S$ , as well as $\delta\ll\frac{1}{2}\min_{r\neq s}|\tau_{r}-\tau_{s}|$ for all sufficiently small $\delta>0$ . Using (58), assertions (55) and (56) are thus immediate consequences of (25) and (26).

In order to prove (C.4)) note that similar to (24) straightforward Taylor expansions can be used to show that

[TABLE]

Where for $|u|\leq 0.5$ we have $|R_{8;u,t,\delta}|\leq L_{4,1}(|u|^{0.5}\delta)^{\min\{2,2\kappa\}}$ for some constant $L_{4,1}$ with $|L_{4,1}|<\infty$ . Since $\delta\ll\frac{1}{2}\min_{r\neq s}|\tau_{r}-\tau_{s}|$ for all sufficiently small $\delta$ . We can conclude from (LABEL:eq:lem4.aux) that

[TABLE]

for some constant $L_{4,2}$ with $|L_{4,2}|<\infty$ and all $r,s\in\{1,\dots,S\}$ , $r\neq s$ . Assertion (C.4) is then an immediate consequence of (58) and (24). ∎

Proof of Theorem 2.2..

Let $\lambda_{n}=A\sqrt{\frac{\sigma_{|y|}^{2}}{n}\log\big{(}\frac{b-a}{\delta}\big{)}}$ and let $I_{\delta}:=\{t_{j}|\ t_{j}\in[a+\delta,b-\delta],j\in\{1,\dots,p\}\}$ . For any $t\in I_{\delta}$ we have

[TABLE]

In the case in which there are no points of impact, i.e. $S=0$ , we then have $\operatorname{\mathbb{E}}(Z_{\delta,i}(t)Y_{i})=0$ for all $t\in[a-\delta,b-\delta]$ and all $\delta>0$ . Lemma C.3 implies that

[TABLE]

and hence $P(\widehat{S}=S)\rightarrow 1$ , as $n\rightarrow\infty$ .

Now consider the case $S\geq 1$ . Select some arbitrary $\alpha>2$ . As $n\rightarrow\infty$ we have $\delta\equiv\delta_{n}\rightarrow 0$ . Therefore, $\tau_{r}\in[a+\delta,b-\delta]$ , $r=1,\dots,S$ , as well as $\sqrt{\delta}/\alpha<\frac{1}{2}\min_{r\neq s}|\tau_{r}-\tau_{s}|$ , provided that $n$ is sufficiently large.

Let $I_{r,\delta,\alpha}:=\{t\in I_{\delta}\ |\ |t-\tau_{r}|\leq\sqrt{\delta}/\alpha\}$ , $r=1,\dots,S$ , as well as $I_{\delta,\alpha}=\bigcup_{r=1}^{S}I_{r,\delta,\alpha}$ and $I_{\delta,\alpha}^{C}:=I_{\delta}\backslash I_{\delta,\alpha}$ .

By our assumptions on the sequence $\delta\equiv\delta_{n}$ we can infer from (60), (55), and (48) that there exist constants $0<C_{1}<\infty$ and $0<C_{2}<\infty$ such that the event

[TABLE]

holds with probability tending to 1 as $n\rightarrow\infty$ . Since by assumption $\frac{|\log\delta|}{n\delta^{\kappa}}\rightarrow 0$ and hence $\sqrt{\frac{\delta^{\kappa}}{n}|\log\delta|}=o(\delta^{\kappa})$ , the decomposition given in (60) together with (56) and (48) imply the existence of a constant $0<C_{3}<Q_{2}$ such that

[TABLE]

hold with probability tending to 1 as $n\rightarrow\infty$ .

For $r=1,\dots,S$ let $j(r)$ be an index satisfying $|\tau_{r}-t_{j(r)}|=\min_{j=1,\dots,p}|\tau_{r}-t_{j}|$ . Obviously $|\tau_{r}-t_{j(r)}|\leq\frac{b-a}{2(p-1)}$ and by (25) our conditions on $p\equiv p_{n}$ imply

[TABLE]

Using again (48) together with $\sqrt{\frac{\delta^{\kappa}}{n}|\log\delta|}=o(\delta^{\kappa})$ and (26), we can thus conclude that there exists a sequence $\{\epsilon_{n}\}$ of positive numbers with $\lim_{n\rightarrow\infty}\epsilon_{n}\rightarrow 0$ such that

[TABLE]

holds with probability tending to 1 as $n\rightarrow\infty$ .

Now define

[TABLE]

Since $\delta^{1+\kappa/2}=o(\delta^{\kappa})$ and since $\alpha>2$ , one can infer from (61) - (63) that the following assertions hold with probability tending to 1 as $n\rightarrow\infty$ :

[TABLE]

as well as

[TABLE]

But under (65) and (66) construction of the estimators $\widehat{\tau}_{k}$ , $k=1,\dots,S$ , for the first $S$ steps of our estimation procedure implies that $\{\widehat{\tau}_{1},\dots,\widehat{\tau}_{S}\}=\{\widetilde{\tau}_{1},\dots,\widetilde{\tau}_{S}\}$ . Therefore,

[TABLE]

as $n\rightarrow\infty$ .

By definition of $\widetilde{\tau}_{r}$ , $r=1,\dots,S$ , in (64) it already follows from (67) that $\widehat{\tau}_{1},\dots,\widehat{\tau}_{S}$ provide consistent estimators of the true points of impact. Some more precise approximations are, however, required to show Assertion (6).

Note that for all $r=1,\dots,S$ and all $t\in(a,b)$

[TABLE]

Note that for all sufficiently large $n$ , our assumptions on $p_{n}$ guarantee that $|\tau_{r}-t_{j(r)}|\leq\frac{b-a}{2(p-1)}\leq M_{L}n^{-1/\kappa}$ for some constant $M_{L}<1$ . Let $M_{p}:=arg\max\{m\in\mathbb{N}|\ \frac{\delta}{2^{m}}\geq 2n^{-1/\kappa}\}$ . Our assumptions on the sequence $\delta\equiv\delta_{n}$ yield $\sup_{m=1,\dots,M_{p}}\frac{|2^{-m/2}\delta|^{\min\{2\kappa,2\}}}{2^{-\kappa m}\delta^{\kappa}}\rightarrow 0$ .

We can therefore infer from (C.4) that there are constants $0<C_{5,m}<C_{6,m}<\infty$ with $\inf(C_{6,m}-C_{5,m})\geq M_{C}$ for some constant $M_{C}>0$ such that for all $m=1,2,\dots,M_{p}$ and all sufficiently large $n$

[TABLE]

hold for every $r=1,\dots,S$ .

On the other hand, the exponential inequality (C.2) obviously implies the existence of a constant $0<C_{7}<\infty$ such that for any $0<q\leq\sqrt{n}$

[TABLE]

holds for all $m=1,2,\dots$ and each $r=1,\dots,S$ .

For all $m=1,2,\dots$ and $r=1,\dots,S$ let ${\cal A}(n,m,r)$ denote the event that

[TABLE]

Inequality (71) implies that with $C_{8}:=\frac{C_{7}}{M_{C}(\min_{r=1,\dots,S}|\vartheta_{r}|c(\tau_{r}))}$ the complementary events ${\cal A}(n,m,r)^{C}$ can be bounded by

[TABLE]

for all $m=1,2,\dots$ and $r=1,\dots,S$ . If $m\leq M_{p}$ , then (69) and (70) imply that under ${\cal A}(n,m,r)$ we have

[TABLE]

for each $r=1,\dots,S$ and all sufficiently large $n$ .

Choose an arbitrary $\epsilon>0$ and set

[TABLE]

whenever there exists an integer $m>0$ such that $\epsilon\geq C_{8}\sqrt{\frac{2^{\kappa m}}{\delta^{\kappa}n}}$ and set $m^{*}(\epsilon):=1$ otherwise. Furthermore define

[TABLE]

By our assumptions on $\delta\equiv\delta_{n}$ there then obviously exists a constant $A(\epsilon)<\infty$ such that for all sufficiently large $n$ ,

[TABLE]

Now consider the event ${\cal A}(n,\epsilon):=\bigcap_{r=1}^{S}\bigcap_{m=1}^{m(\epsilon)}{\cal A}(n,m,r)$ . By (72) the Bonferroni inequality implies that

[TABLE]

But under event ${\cal A}(n,\epsilon)$ we can infer from (73) that

[TABLE]

Additionally let ${\cal A}^{*}(n)$ denote the event that $\{\widehat{\tau}_{1},\dots,\widehat{\tau}_{S}\}=\{\widetilde{\tau}_{1},\dots,\widetilde{\tau}_{S}\}$ . The definitions in (64) and (73) yield $\widetilde{\tau}_{r,1}=\widetilde{\tau}_{r}$ , $r=1,\dots,S$ , and we can thus conclude from (74) and (76) that under events ${\cal A}^{*}(n)$ and ${\cal A}(n,\epsilon)$ we have

[TABLE]

for all $n$ sufficiently large.

Recall that by (67) we have $P({\cal A}^{*}(n))\rightarrow 1$ as $n\rightarrow\infty$ . Since $\epsilon$ is arbitrary, (6) thus follows from (75) and (77).

It remains to prove Assertion (7). For some $A^{*}<A$ with $D\leq A^{*}$ define $\lambda_{n}^{*}<\lambda_{n}$ by $\lambda_{n}^{*}:=A^{*}\sqrt{\frac{\sigma_{|y|}^{2}}{n}\log(\frac{b-a}{\delta})}$ . By (47) it is immediately seen that in addition to (61) also

[TABLE]

holds with probability tending to 1 as $n\rightarrow\infty$ .

But (46) and our assumptions on the sequence $\delta\equiv\delta_{n}$ lead to $\frac{\delta^{1+\kappa/2}}{\inf_{t\in I_{\delta,\alpha}^{C}}\sqrt{\frac{1}{n}\sum_{i=1}^{n}Z_{\delta,i}(t)^{2}}}=o_{P}(\lambda_{n}^{*})$ .

Using (68), the construction of the estimator $\widehat{\tau}_{S+1}$ therefore implies that as $n\rightarrow\infty$ ,

[TABLE]

while (63) together with (67), (45) and (46) yield

[TABLE]

By definition of our estimator $\widehat{S}$ , (7) is an immediate consequence. ∎

Appendix D Subsequent estimation of $g$

In this appendix, $||X||_{\Psi}=\inf\{C>0|\operatorname{\mathbb{E}}(\Psi(|X|/C))\leq 1\}$ refers to the Orlicz norm of a random variable $X$ with respect to $\Psi(x)=\exp(n/6(\sqrt{1+2\sqrt{6}x/\sqrt{n}}-1)^{2})-1$ . Similar we use for $p\geq 1$ the Orlicz norm $||X||_{p}=\{\inf C>0:(\operatorname{\mathbb{E}}(|X|^{p}))^{1/p}<C\}$ which corresponds to the usual $L_{p}$ -norm.

For the proofs of this section we need the following Lemma:

Lemma D.1.

Under Assumption 2.1, there exist constants $M<\infty$ , $M_{*}<\infty$ and $M_{**}<\infty$ such that for all sufficiently small $s>0$ , all $q\in[-1,1]$ , all $t\in(a,b)$ and all $t^{*}\in[a,b]$ we have

[TABLE]

Proof of Lemma D.1.

Let $q\in[-1,1]$ and choose an arbitrary $t^{*}\in[a,b]$ as well as $t\in(a,b)$ for all sufficiently small $s$ it then follows from Taylor expansions that

[TABLE]

Assertion (78) then follows immediately from the fact that $||t+qs-t^{*}|^{\kappa}-|t-t^{*}|^{\kappa}|=O(|qs|^{\min\{1,\kappa\}})$ as well as $|qs|=O(|qs|^{\min\{1,\kappa\}})$ .

Moreover, further Taylor expansions can be used to show that for all $q_{1},q_{2}\in[-1,1]$ , all $t\in(a,b)$ and all sufficiently small $s>0$ we have

[TABLE]

proving Assertion (79). ∎

D.1 The nonparametric case

In order to proof Theorem 3.1 we need auxiliary results.

Proposition D.1.

Let $X_{i}=(X_{i}(t):t\in[a,b])$ , i=1,…, n, be i.i.d. Gaussian processess with covariance functions $\sigma(s,t)$ satisfying Assumption 2.1. For any differentiable and bounded function $F(x_{1},\dots,x_{S})$ with $|\operatorname{\mathbb{E}}\Big{(}\frac{\partial}{\partial x_{r}}F(X_{i}(\tau_{1}),\dots,X_{i}(\tau_{S}))\Big{)}|<\infty$ for $r=1,\dots,S$ , we then have

[TABLE]

Proof of Proposition D.1.

To ease notation, let $F(\operatorname{\mathbf{X}}_{i})=F(X_{i}(\tau_{1}),\dots,X_{i}(\tau_{S}))$ . For an arbitrary choice of $r\in\{1,\dots,S\}$ choose any $0<s$ sufficiently small and define for $q_{1},q_{2}\in[-1,1]$

[TABLE]

We then have $\operatorname{\mathbb{E}}(\chi_{i}(q_{1},q_{2}))=0$ and it follows from some straightforward calculations, since $|F(\operatorname{\mathbf{X}}_{i})|\leq M_{F}<\infty$ , that there exists a constant $0<M_{1}<\infty$ such that for $m=2,3,\dots$ we have

[TABLE]

Corollary 1 in \citeappendixvandeGeer2013 now guarantees that there exists a constant $0<M_{2}<\infty$ such that the Orlicz norm of $\frac{1}{\sqrt{ns^{\kappa}}}\sum_{i=1}^{n}\chi_{i}(q_{1},q_{2})$ can be bounded, i.e., we have for some $0<M_{2}<\infty$ :

[TABLE]

The proof then follows from maximum inequalities of empirical processes. By (85) one can apply Theorem 2.2.4 of \citeappendixvanderVaart1996. The covering integral in this theorem can easily be seen to be finite and one can thus infer that there exists a constant $0<M_{3}<\infty$ such that

[TABLE]

For every $x>0$ , the Markov inequality then yields

[TABLE]

For improving the readability, it then follows from a Taylor expansion of $\frac{n}{6}(\sqrt{1+x/\sqrt{n}}-1)^{2}$ that there exists a constant $0<M_{4}<\infty$ such that for all $0<x\leq\sqrt{n}$ we have

[TABLE]

Now, note that it follows from Theorem 2.1 that

[TABLE]

Together with (78) we we then have for all $q_{1}\in[-1,1]$ for some constant $M_{5}<\infty$ :

[TABLE]

Using (87) together with (86) we then can conclude that for all $0<x\leq\sqrt{n}$ :

[TABLE]

Since $|\widehat{\tau}_{r}-\tau_{r}|=O_{p}(n^{-\frac{1}{\kappa}})$ , Assertion (80) then follows immediately.

The proof of Assertion (81) follows from similar steps. Define

[TABLE]

We then have $\operatorname{\mathbb{E}}\Big{(}\chi_{i}^{(1)}(q_{1},q_{2})\Big{)}=0$ and with

[TABLE]

it follows from some straightforward calculations, that there exists a constant $0<M_{6}<\infty$ such that for $m=2,3,\dots$ we have

[TABLE]

Once more, Corollary 1 in \citeappendixvandeGeer2013 now guarantees that there exists a constant $M_{7}<\infty$ such that

[TABLE]

By (90) another application of the maximal inequalities of empirical processes, using similar steps as in the proof of Assertion (80), and since by (79) we have $\operatorname{\mathbb{E}}((X_{i}(\tau_{r}+sq_{1})-X_{i}(\tau_{r}))^{2})\leq M_{8}s^{\kappa}$ for some constant $M_{8}<\infty$ , we can conclude that for all $0<x\leq\sqrt{n}$ we have for some constant $0\leq M_{9}<\infty$ :

[TABLE]

Since $|\widehat{\tau}_{r}-\tau_{r}|=O_{p}(n^{-\frac{1}{\kappa}})$ , Assertion (82) then follows immediately.

In order to show assertion $\eqref{eq:lcdev2}$ we make use the Orlicz-norm $||X||_{p}$ .

Choose some $p>\frac{2}{\kappa}=p_{\kappa}$ , and let $p$ be even. Note that $\operatorname{\mathbb{E}}((X_{i}(\tau_{r}+sq_{1})-X_{i}(\tau_{r}+sq_{2}))\varepsilon_{i}F(\operatorname{\mathbf{X}}_{i}))=0$ . For all sufficiently small $0<s$ and all $q_{1},q_{2}\in[-1,1]$ it is easy to show that there exists a constant $M_{10}<\infty$ such that

[TABLE]

We may conclude that

[TABLE]

By inequality (92) one may once again apply Theorem 2.2.4 in \citeappendixvanderVaart1996. Our condition on $p$ ensures that the covering integral appearing in this theorem is finite. The maximum inequalities of empirical processes then imply:

[TABLE]

for some constant $M_{11}<\infty$ . At the same time, the Markov inequality implies

[TABLE]

Assertion (82) then follows from $|\widehat{\tau}_{r}-\tau_{r}|=O_{p}(n^{-\frac{1}{\kappa}})$ .

It remains to proof (83). For real numbers $x$ and $y$ it obviously holds that $x^{4}-y^{4}=(x-y)(x+y)(x^{2}+y^{2})$ . With the help of this equation and inequality (79) it is easy to see that there exists a constant $M_{12}<\infty$ such that for all $p\geq 1$ , all sufficiently small $s$ , all $q_{1},q_{2}\in[-1,1]$ we now have

[TABLE]

At the same time inequality (79) implies that there exists a constant $0<M_{13}<\infty$ such that $|\operatorname{\mathbb{E}}((X_{i}(\tau_{r}+sq_{1})-X_{i}(\tau_{r}))^{4})|\leq M_{13}s^{2\kappa}$ . Choose some $p>\frac{2}{\kappa}$ , by (93) and with the help of another application of the maximum inequalities for empirical processes we can then conclude that there exists a constant $M_{14}<\infty$ such that

[TABLE]

Assertion (83) then follows once more from $|\widehat{\tau}_{r}-\tau_{r}|=O_{p}(n^{-\frac{1}{\kappa}})$ . ∎

Suppose $\widehat{S}=S$ . The feasible kernel density estimator for the density $f_{\tau}(\operatorname{\mathbf{x}})=f(x_{1},\dots,x_{S})$ of $(X_{i}(\tau_{1}),\dots,X_{i}(\tau_{S}))$ replaces the unknown $\tau_{r}$ with their estimates $\widehat{\tau}_{r}$ :

[TABLE]

Theorem D.1.

Let $\widehat{S}=S$ , $\max_{r=1,\dots,S}|\widehat{\tau}_{r}-\tau_{r}|=O_{p}(n^{-1/\kappa})$ and let $X_{i}$ be Gaussian process satisfying Assumption 2.1. If the Kernel $K:\mathbb{R}^{S}\to\mathbb{R}$ is bounded and twice continuously differentiable with bounded derivatives, we then have

[TABLE]

where $\widehat{f}_{\tau}(\operatorname{\mathbf{x}})$ denotes the kernel density estimator for the density of $(X_{i}(\tau_{1}),\dots,X_{i}(\tau_{S}))$ .

Proof of Theorem D.1.

By the usual decomposition we have

[TABLE]

with $K\Big{(}\frac{\operatorname{\mathbf{X}}_{i}(\widehat{\operatorname{\boldsymbol{\tau}}})-\operatorname{\mathbf{x}}}{\operatorname{\mathbf{h}}}\Big{)}=K\Big{(}\frac{X_{i}(\widehat{\tau_{1}})-x_{1}}{h_{1}},\dots,\frac{X_{i}(\widehat{\tau_{S}})-x_{S}}{h_{S}}\Big{)}$ and $K\Big{(}\frac{\operatorname{\mathbf{X}}_{i}(\operatorname{\boldsymbol{\tau}})-\operatorname{\mathbf{x}}}{\operatorname{\mathbf{h}}}\Big{)}=K\Big{(}\frac{X_{i}(\tau_{1})-x_{1}}{h_{1}},\dots,\frac{X_{i}(\tau_{S})-x_{S}}{h_{S}}\Big{)}$ . A Taylor expansion then yields

[TABLE]

where $|R_{i}(\operatorname{\mathbf{x}})|\leq M\sum_{r=1}^{S}\left(\frac{X_{i}(\widehat{\tau}_{r})-X_{i}(\tau_{r})}{h_{r}}\right)^{2}$ for some constant $M<\infty$ . Remember that under our assumptions we have by assertion (82) $\frac{1}{n}\sum_{i=1}^{n}(X_{i}(\widehat{\tau}_{r})-X_{r}(\tau_{r}))^{2}=O_{p}(n^{-1})$ , hence:

[TABLE]

It then remains to bound $\frac{1}{n}\sum_{i=1}^{n}\big{(}X_{i}(\widehat{\tau}_{r})-X_{r}(\tau_{r})\big{)}K_{r}^{\prime}\Big{(}(\operatorname{\mathbf{X}}_{i}(\operatorname{\boldsymbol{\tau}})-\operatorname{\mathbf{x}})/\operatorname{\mathbf{h}}\Big{)}$ . With $F(X_{i}(\tau_{1}),\dots,X_{i}(\tau_{S}))=K_{r}^{\prime}\Big{(}(X_{i}(\tau_{1})-x_{1})/h_{1},\dots,(X_{i}(\tau_{S})-x_{S})/h_{S}\Big{)}$ we have

[TABLE]

Since $|K_{rl}^{\prime\prime}(\cdot)|<\infty$ , it then follows from (80) in Proposition D.1 that

[TABLE]

Note that for some constant $0<M_{S}<\infty$ we have $(\sum_{r=1}^{S}1/h_{r})^{2}\leq M_{S}\sum_{r=1}^{S}1/h_{r}^{2}$ , we can thus conclude from (98) that

[TABLE]

The assertion of the theorem now follows immediately from (95), (96), (97) and (99). ∎

Proof of Theorem 3.1.

The proof follows from generalizations of arguments used in proofs for nonparametric regression (see for example \citeappendix[][Ch. 2]LR06). We have

[TABLE]

where $\widehat{m}_{\widehat{\tau}}(\operatorname{\mathbf{x}})=(\widehat{g}_{\widehat{\tau}}(\operatorname{\mathbf{x}})-g(\operatorname{\mathbf{x}}))\widehat{f}_{\widehat{\tau}}(\operatorname{\mathbf{x}})$ . With $Y_{i}=g(\operatorname{\mathbf{X}}_{i}(\operatorname{\boldsymbol{\tau}}))+\varepsilon_{i}$ we can then write

[TABLE]

where

[TABLE]

With $\widehat{m}_{\tau,1}(\operatorname{\mathbf{x}})=(nh_{1}\cdots h_{S})^{-1}\sum_{i=1}^{n}g(\operatorname{\mathbf{X}}_{i}(\operatorname{\boldsymbol{\tau}}))-g(\operatorname{\mathbf{x}}))K((\operatorname{\mathbf{X}}_{i}(\operatorname{\boldsymbol{\tau}})-\operatorname{\mathbf{x}})/\operatorname{\mathbf{h}})$ we have by the usual decomposition:

[TABLE]

A Taylor expansion then yields

[TABLE]

where $|R_{i,1}(\operatorname{\mathbf{x}})|\leq M\sum_{r=1}^{S}\left(\frac{X_{i}(\widehat{\tau}_{r})-X_{i}(\tau_{r})}{h_{r}}\right)^{2}$ for some constant $0<M<\infty$ , since all partial second derivatives of $K$ are bounded.

Now, assuming that $g$ and its first derivative is bounded, we obtain by Proposition D.1:

[TABLE]

On the other hand, since by (81) we have $\frac{1}{n}\sum_{i=1}^{n}(X_{i}(\widehat{\tau}_{r})-X_{i}(\tau_{r}))^{2}=O_{p}(n^{-1})$ and since $g$ is bounded, we obtain:

[TABLE]

Together with 104 and 105 we then obtain

[TABLE]

At the same time, since $K$ is bounded, (82) implies

[TABLE]

Another Taylor expansion then yields

[TABLE]

With $|R_{i,2}(\operatorname{\mathbf{x}})|\leq M_{K}\sum_{r=1}^{S}(X_{i}(\widehat{\tau}_{r})-X_{i}(\tau_{r}))^{2}/h_{r}^{2}\cdot|\varepsilon_{i}|$ , for some constant $M_{K}<\infty$ , it follows from (82) and (83) together with the Cauchy-Schwarz inequality, that

[TABLE]

With (106) and (107) we can conclude from (101) that

[TABLE]

Since $f_{\tau}(\operatorname{\mathbf{x}})>0$ and $\widehat{m}_{\tau}=O_{p}\Big{(}\sum_{r=1}^{S}h_{r}^{2}+(h_{1}\dots h_{S})^{-1/2}\Big{)}$ (see for example \citeappendix[][Ch.2]LR06) we then arrive together with (94) at

[TABLE]

provided $n^{\min\{1,1/\kappa\}}(h_{1}\cdots h_{S})h_{r}^{2}\to\infty$ for $r=1,\dots,S$ ∎

Proof of Corollary 3.1.

For $\kappa\leq 1$ and $h_{r}\sim n^{-1/(S+4)}$ , $r=1,\dots,S$ , the assertion of Corollary 3.1 is a direct consequence of (9) in Theorem 3.1. ∎

D.2 The parametric case

In proofs of this appendix, we will make use of the following Lemma which summarizes results from \citeappendixB2012b and \citeappendixB2012a:

Lemma D.2.

Let $X_{1}$ and $X_{2}$ be two random variables with $0<\operatorname{\mathbb{V}}(X_{j})<\infty$ , $j=1,2$ . For any real valued function $f:\mathbb{R}\to\mathbb{R}$ with $\operatorname{\mathbb{E}}(|f(X_{1})|)<\infty$ and $\operatorname{\mathbb{E}}(|f(X_{1})X_{1}|)<\infty$ we have

[TABLE]

provided that $cov\left(X_{2}-\frac{cov(X_{1},X_{2})}{\operatorname{\mathbb{V}}(X_{1})}X_{1},f(X_{1})\right)=0$ .

Proof of Lemma D.2..

With $X_{2}=\frac{cov(X_{1},X_{2})}{\operatorname{\mathbb{V}}(X_{1})}X_{1}+\left(X_{2}-\frac{cov(X_{1},X_{2})}{\operatorname{\mathbb{V}}(X_{1})}X_{1}\right)$ , the assertion of the lemma follows immediately, since

[TABLE]

∎

We begin with the proof of Theorem 3.2.

Proof of Theorem 3.2..

Since $X_{i}$ satisfies Assumption 2.1, Theorem 3 in Kneip et al. (2016) implies that the assumptions of Theorem 1 in Kneip et al. (2016) are met. Since

[TABLE]

It follows from Theorem 1 in Kneip et al. (2016) that

[TABLE]

whenever $|\alpha-\alpha^{*}|>0$ , or $\sup_{r=1,\ldots,S}|\beta_{r}-\beta^{*}_{r}|>0$ , or $\sup_{r=S+1,\ldots,S^{*}}|\beta^{*}_{r}|>0$ .

Now suppose

[TABLE]

It then follows that $g(\alpha+\sum_{r=1}^{S}\beta_{r}X_{i}(\tau_{r}))$ and $g(\alpha^{*}+\sum_{r=1}^{S^{*}}\beta_{r}^{*}X_{i}(\tau_{r}))$ must be identical, i.e.,

[TABLE]

Since $g$ is invertible we then have

[TABLE]

if and only if

[TABLE]

But by (110) we have

[TABLE]

whenever $|\alpha-\alpha^{*}|>0$ , or $\sup_{r=1,\ldots,S}|\beta_{r}-\beta^{*}_{r}|>0$ , or $\sup_{r=S+1,\ldots,S^{*}}|\beta^{*}_{r}|>0$ , implying

[TABLE]

whenever $|\alpha-\alpha^{*}|>0$ , or $\sup_{r=1,\ldots,S}|\beta_{r}-\beta^{*}_{r}|>0$ , or $\sup_{r=S+1,\ldots,S^{*}}|\beta^{*}_{r}|>0$ , which proves the assertion of the theorem. ∎

The following Propostion (D.2) is instrumental to derive rates of convergence for the system of estimated score equations $\widehat{\operatorname{\mathbf{U}}}_{n}$ and their derivatives.

Proposition D.2.

Let $X_{i}=(X_{i}(t):t\in[a,b])$ , $i=1,...,n$ be i.i.d. Gaussian processes with covariance function $\sigma(s,t)$ satisfying Assumption 2.1. Let $\operatorname{\mathbb{E}}(\varepsilon_{i}|X_{i})=0$ with $\operatorname{\mathbb{E}}(\varepsilon_{i}^{p}|X_{i})\leq M_{\varepsilon}<\infty$ for some even $p$ with $p>\frac{2}{\kappa}$ and let $\widehat{\tau}_{r}$ enjoy the property given by (6), i.e. $|\widehat{\tau}_{r}-\tau_{r}|=O_{P}(n^{-\frac{1}{\kappa}})$ . We then have for any bounded function $f:\mathbb{R}\to\mathbb{R}$ with $|f(x)|\leq M_{f}<\infty$ , any $t^{*}\in[a,b]$ , any linear predictor $\eta_{i}^{*}=\beta_{0}^{*}+\sum_{r=1}^{S^{*}}\beta_{r}^{*}X_{i}(t_{r}^{*})$ , where $t_{r}^{*}\in[a,b]$ , $\beta_{r}^{*}\in\mathbf{R}$ and $S^{*}$ are arbitrary and any $r=1,\dots,S$ :

[TABLE]

The proof of this Proposition shares some arguments used in the proof of Lemma C.2.

Proof of Proposition D.2..

Before the different assertions are proven, note that if $(X_{1},X_{2})$ are bivariate Gaussian, then $cov\left(X_{2}-\frac{cov(X_{1},X_{2})}{\operatorname{\mathbb{V}}(X_{1})}X_{1},X_{1}\right)=0$ and $\left(X_{2}-\frac{cov(X_{1},X_{2})}{\operatorname{\mathbb{V}}(X_{1})}X_{1}\right)$ and $X_{1}$ are independent. Hence we additionally have $cov\left(X_{2}-\frac{cov(X_{1},X_{2})}{\operatorname{\mathbb{V}}(X_{1})}X_{1},f(X_{1})\right)=0$ and by Lemma D.2:

[TABLE]

Furthermore, it follows from Stein’s Lemma (\citeappendixS1981) that $\frac{cov(f(X_{1}),X_{1})}{Var(X_{1})}=\operatorname{\mathbb{E}}(f^{\prime}(X_{1}))$ provided $f$ is differentiable and $\operatorname{\mathbb{E}}(|f^{\prime}(X_{1})|)<\infty$ (cf. Lemma 1 in \citeappendixB2012a)

We are now equipped with the tools to proof the different assertions of the proposition. Assertion (111) and (116) are proven in Proposition D.1. The remaining assertions are proven in a similar manner: In order to proof Assertion (112), choose any $0<s$ sufficiently small and define for $q_{1},q_{2}\in[-1,1]$

[TABLE]

We then have $\mathbb{E}(\chi_{i}(q_{1},q_{2}))=0$ and it follows from some straightforward calculations, since $|f(\eta_{i}^{*})|\leq M_{f}$ , that there exists a constant $L_{1}<\infty$ such that for $m=2,3,\dots$ we have

[TABLE]

Corollary 1 in \citeappendixvandeGeer2013 now guarantees that there exists a constant $0<L_{2}<\infty$ such that the Orlicz norm of $\frac{1}{\sqrt{ns^{\kappa}}}\sum_{i=1}^{n}(\chi_{i}(q_{1},q_{2}))$ can be bounded, i.e., we have for some $0<L_{2}<\infty$ :

[TABLE]

By (118) one may apply Theorem 2.2.4 of \citeappendixvanderVaart1996. The covering integral in this theorem can easily be seen to be finite and one can thus infer that there exists a constant $0<L_{3}<\infty$ such that

[TABLE]

For every $x>0$ , the Markov inequality then yields

[TABLE]

Improving the readability, it then follows from a Taylor expansion of $\frac{n}{6}(\sqrt{1+x/\sqrt{n}}-1)^{2}$ that we may conclude that there exists a constant $0<L_{4}<\infty$ such that for all $0<x\leq\sqrt{n}$ we have

[TABLE]

Now, note that it follows from Lemma D.2 that there exists a constant $|c_{0}|<\infty$ , not depending on $t^{*}$ , such that $\mathbb{E}(X(t^{*})f(\eta_{i}^{*}))=c_{0}\operatorname{\mathbb{E}}(X(t^{*})\eta_{i}^{*})$ for all $t^{*}\in[a,b]$ . Together with (78)we can therefore conclude that there exists a constant $0\leq L_{5}<\infty$ such that for all $q_{1}\in[-1,1]$ :

[TABLE]

Using (120) together with (119) we can conclude that for all $0<x\leq\sqrt{n}$ we have:

[TABLE]

Assertion (112) then follows immediately from (6).

By the boundedness of $f$ , the proof of (113) proceeds similar, but one now has to bound

[TABLE]

For $X_{i}(t^{*})=\eta_{i}^{*}$ , Lemma D.2 together with (78) already implies that there exists a constant $L_{6}<\infty$ such that $|\mathbb{E}(X_{i}(t^{*})(X_{i}(\tau_{r}+sq_{1})-X_{i}(\tau_{r}))f(\eta_{i}^{*}))|\leq L_{6}s^{\min\{1,\kappa\}}$ . Let $X_{i}(t^{*})\neq\eta_{i}^{*}$ . Note that $(X_{i}(t^{*}),(X_{i}(\tau_{r}+sq_{1})-X_{i}(\tau_{r})),\eta_{i}^{*})$ are multivariate normal. Hence also the conditional distribution of $((X_{i}(\tau_{r}+sq_{1})-X_{i}(\tau_{r})),\eta_{i}^{*})$ given $X_{i}(t^{*})$ is multivariate normal. To ease the notation set $X_{1}=\eta_{i}^{*}$ , $X_{2}=(X_{i}(\tau_{r}+sq_{1})-X_{i}(\tau_{r}))$ and $X_{3}=X_{i}(t^{*})$ and define by $\sigma_{i,j}$ , $i,j\in\{1,2,3\}$ their associated covariance and variances. We then have by conditional expectation together with an application of Lemma D.2

[TABLE]

Using (78)it is then easy to see that there exists a constant $0<L_{7}<\infty$ such that $|\sigma_{12}|\leq L_{7}s^{\min\{1,\kappa\}}$ , as well as $|\sigma_{23}|\leq L_{7}s^{\min\{1,\kappa\}}$ . On the other hand Assumption 2.1 implies that there exists a constant $0<L_{8}<\infty$ such that $|\sigma_{13}|\leq L_{8}$ . Note that $cov(f(X_{1}),X_{1}|X_{3})=\operatorname{\mathbb{E}}(f(X_{1})X_{1}|X_{3})-\operatorname{\mathbb{E}}(f(X_{1})|X_{3})\operatorname{\mathbb{E}}(X_{1}|X_{3})$ and $\operatorname{\mathbf{V}}(X_{1}|X_{3})=\sigma_{11}-\frac{\sigma_{13}^{2}}{\sigma_{33}}>0.$ Moreover note that if $f$ is assumed to be differentiable and $\operatorname{\mathbb{E}}(|f^{\prime}(X_{1})||X_{3})<\infty$ , it follows from and Stein’s Lemma (\citeappendixS1981) that $cov(f(X_{1}),X_{1}|X_{3})/\operatorname{\mathbf{V}}(X_{1}|X_{3})$ can be substituted by $\operatorname{\mathbb{E}}(f^{\prime}(X_{1})|X_{3})$ .

Since $f$ is bounded it then follows immediately that for all linear predictors $\eta_{i}^{*}$ and all $t^{*}\in[a,b]$ there exists a constant $0<L_{9}<\infty$ such that for all $q_{1}\in[-1,1]$ and all sufficiently small $s$ and all $r=1,\dots,S$ we have:

[TABLE]

By (122) one can conclude similar to (121) that for all $0<x\leq\sqrt{n}$ and for some constant $0<L_{10}<\infty$

[TABLE]

Assertion (113) then follows again immediately from (6).

In order to show assertion $\eqref{lem:res7}$ we make use the Orlicz-norm $||X||_{p}$ .

Choose some $p>\frac{2}{\kappa}=p_{\kappa}$ , and let $p$ be even. Note that $\operatorname{\mathbb{E}}((X_{i}(\tau_{r}+sq_{1})-X_{i}(\tau_{r}+sq_{2}))\varepsilon_{i}f(\eta_{i}^{*}))=0$ . For all sufficiently small $0<s$ and all $q_{1},q_{2}\in[-1,1]$ it is easy to show that there exists a constant $L_{11}<\infty$ such that

[TABLE]

We may conclude

[TABLE]

By assertion (123) one may apply Theorem 2.2.4 in \citeappendixvanderVaart1996. Our condition on $p$ ensures that the covering integral appearing in this theorem is finite. The maximum inequalities of empirical processes then imply:

[TABLE]

for some constant $L_{12}<\infty$ . At the same time, the Markov inequality implies

[TABLE]

Assertion (114) then follows from (6) and our conditions on $p$ . Moreover, assertion (115) follows from similar steps.

∎

The rest of the Appendix makes use of the notation introduced in Section 3.2. In what follows, it will, however, be convenient to set $\operatorname{\mathbf{X}}_{i}:=\operatorname{\mathbf{X}}_{i}(\operatorname{\boldsymbol{\tau}})=(1,X_{i}(\tau_{1}),\dots,X_{i}(\tau_{S})$ and $\widehat{\operatorname{\mathbf{X}}}_{i}:=\operatorname{\mathbf{X}}_{i}(\widehat{\operatorname{\boldsymbol{\tau}}})=(1,X_{i}(\widehat{\tau}_{1}),\dots,X_{i}(\widehat{\tau}_{S})$ . The $j$ th element of $\operatorname{\mathbf{X}}_{i}$ and $\widehat{\operatorname{\mathbf{X}}}_{i}$ is then denoted by $X_{ij}$ and $\widehat{X}_{ij}$ , respectively.

For the following proofs we introduce some additional notation. Let $h(x)=g^{\prime}(x)/\sigma^{2}(g(x))$ and note that differentiating the estimation equation

[TABLE]

leads to

[TABLE]

Similarly, one obtains by replacing the estimates $\widehat{\tau}_{r}$ with their true counterparts $\tau_{r}$ :

[TABLE]

where

[TABLE]

and

[TABLE]

Now, let $\widehat{\eta}(\operatorname{\boldsymbol{\beta}})$ , $\widehat{\operatorname{\mathbf{X}}}$ and $y$ be generic copies of $\widehat{\eta}_{i}(\operatorname{\boldsymbol{\beta}})$ , $\widehat{\operatorname{\mathbf{X}}}_{i}$ and $y_{i}$ . We then have

[TABLE]

as well as

[TABLE]

In a similar manner $\operatorname{\mathbb{E}}(\operatorname{\mathbf{F}}(\operatorname{\boldsymbol{\beta}}))=\operatorname{\mathbb{E}}(n^{-1}\operatorname{\mathbf{F}}_{n}(\operatorname{\boldsymbol{\beta}}))$ and $\operatorname{\mathbb{E}}(\operatorname{\mathbf{R}}(\operatorname{\boldsymbol{\beta}}))=\operatorname{\mathbb{E}}(n^{-1}\operatorname{\mathbf{R}}_{n}(\operatorname{\boldsymbol{\beta}}))$ are defined.

The next proposition is crucial, as it tells us that the estimated score function and its derivative are sufficiently close to each other. Of particular importance are the facts that

[TABLE]

and

[TABLE]

which follow from this proposition.

Proposition D.3.

Let $X_{i}=(X_{i}(t):t\in[a,b])$ , $i=1,...,n$ be i.i.d. Gaussian processes. Under Assumption 3.1 and under the results of Proposition D.2 we have

[TABLE]

Additionally, for all $\operatorname{\boldsymbol{\beta}}\in\mathbf{R}^{S+1}$ :

[TABLE]

Moreover, we have as $n\to\infty$

[TABLE]

Particularly,

[TABLE]

Proof of Proposition D.3..

To ease notation we use $\operatorname{\boldsymbol{\beta}}_{0}=(\beta_{0}^{(0)},\beta_{1}^{(0)},\dots,\beta_{S}^{(0)})^{T}$ to denote the true parameter vector. For instance, the intercept is given by $\beta_{0}^{(0)}$ , while $\beta_{r}^{(0)}$ is the coefficient for the $r$ th point of impact. Similar we denote the entries of $\operatorname{\boldsymbol{\beta}}$ by $(\beta_{0},\dots,\beta_{S})$ . Write

[TABLE]

then $\mathbf{Rest}_{n}(\operatorname{\boldsymbol{\beta}})$ can be decomposed into two parts:

[TABLE]

The first summand $\mathbf{Rest}_{1}(\operatorname{\boldsymbol{\beta}})$ is given by:

[TABLE]

The $j$ th equation of $\mathbf{Rest}_{1}(\operatorname{\boldsymbol{\beta}})$ can be written as

[TABLE]

With $h(x)=g^{\prime}(x)/\sigma^{2}(g(x))$ , a Taylor expansion implies the existence of some some $\xi_{i,1}$ between $\widehat{\eta}_{i}(\operatorname{\boldsymbol{\beta}})$ and $\eta_{i}(\operatorname{\boldsymbol{\beta}})$ such that for $\operatorname{\boldsymbol{\beta}}=\operatorname{\boldsymbol{\beta}}_{0}$

[TABLE]

Since $|h^{\prime}(\cdot)|\leq M_{h}$ and $|h^{\prime\prime}(\cdot)|\leq M_{h}$ , $R_{j,1,a}(\operatorname{\boldsymbol{\beta}}_{0})=O_{P}(n^{-1})$ for $j=1,\dots,S+1$ follows immediately from (114) and (115) together with the Cauchy-Schwarz inequalitiy and (116). At the same time it follows from similar arguments that for all $j=1,\dots,S+1$ we have $R_{j,1,b}(\operatorname{\boldsymbol{\beta}}_{0})=\frac{1}{n}\sum_{i=1}^{n}(\widehat{X}_{ij}-X_{ij})h(\widehat{\eta}_{i}(\operatorname{\boldsymbol{\beta}}_{0}))\varepsilon_{i}=O_{P}(n^{-1})$ . The above arguments then imply:

[TABLE]

The $j$ th equation of $\mathbf{Rest}_{2}(\operatorname{\boldsymbol{\beta}})$ can be written as $Rest_{j,2}(\operatorname{\boldsymbol{\beta}})=\frac{1}{n}\sum_{i=1}^{n}h(\widehat{\eta}_{i}(\operatorname{\boldsymbol{\beta}}))\widehat{X}_{ij}(g(\eta_{i}(\operatorname{\boldsymbol{\beta}}))-g(\widehat{\eta}_{i}(\operatorname{\boldsymbol{\beta}})))$ . Using again Taylor expansions together with assertions (112), (113) as well as the Cauchy-Schwarz inequality together with (116), can now be used to conclude that for all $\operatorname{\boldsymbol{\beta}}$ and $j=1,\dots,S+1$ we have

[TABLE]

Assertion (124) then follows from (135), (136) and (137). Note that our assumptions in particular imply that $\mathbf{Rest}_{1}(\operatorname{\boldsymbol{\beta}}_{0})$ and $\mathbf{Rest}_{2}(\operatorname{\boldsymbol{\beta}}_{0})$ are uniform integrable. Additional to $\eqref{eq:thQML1}$ , we thus have $\operatorname{\mathbb{E}}(\mathbf{Rest}_{n}(\operatorname{\boldsymbol{\beta}}_{0}))\to\operatorname{\boldsymbol{0}}$ implying (132), $\operatorname{\mathbb{E}}(\widehat{\operatorname{\mathbf{U}}}_{n}(\operatorname{\boldsymbol{\beta}}_{0})/n)\to\operatorname{\boldsymbol{0}}$ , since $\operatorname{\mathbb{E}}(\operatorname{\mathbf{U}}_{n}(\operatorname{\boldsymbol{\beta}}_{0})/n)=0$ .

In order to proof assertion (125) suppose $\operatorname{\boldsymbol{\beta}}\neq\operatorname{\boldsymbol{\beta}}_{0}$ and note that we still have (137). However, $\mathbf{Rest}_{1}(\operatorname{\boldsymbol{\beta}})$ needs a closer investigation. Its $j$ th row can be written as

[TABLE]

To obtain (125) it is sufficient to use some rather conservative inequalities of each of the appearing terms. For instance, another Taylor expansion together with the Cauchy-Schwarz inequality and (111) now yield

[TABLE]

While the Cauchy-Schwarz inequalitiy together with (111) yields

[TABLE]

It follows from additional Taylor expansions that there exists a $\xi_{i,2}$ between $\widehat{\eta}_{i}(\operatorname{\boldsymbol{\beta}})$ and $\eta_{i}(\operatorname{\boldsymbol{\beta}})$ as well as some $\xi_{i,3}$ between $\eta_{i}(\operatorname{\boldsymbol{\beta}})$ and $\eta_{i}(\operatorname{\boldsymbol{\beta}}_{0})$ such that:

[TABLE]

Again, with the help of the Cauchy-Schwarz inequality together with (111) it can immediately seen that

[TABLE]

Similar one may show that

[TABLE]

Assertion (125) then follows from (137) and (138)–(141). (128) follows again from a closer investigation of the existence and boundedness of moments of the involved remainder terms, leading to (137).

In order to proof (126), note that the $(s+1)\times(s+1)$ matrix $\widehat{\operatorname{\mathbf{F}}}(\operatorname{\boldsymbol{\beta}})=\widehat{\operatorname{\mathbf{D}}}^{T}(\operatorname{\boldsymbol{\beta}})\widehat{\operatorname{\mathbf{V}}}^{-1}(\operatorname{\boldsymbol{\beta}})\widehat{\operatorname{\mathbf{D}}}(\operatorname{\boldsymbol{\beta}})$ may be written as

[TABLE]

$\mathbf{Rest}_{n}^{(F)}(\operatorname{\boldsymbol{\beta}})$ has a typical element $Rest_{jk}^{(F)}(\operatorname{\boldsymbol{\beta}})$ which is given by

[TABLE]

$Rest_{jk}^{(F)}(\operatorname{\boldsymbol{\beta}})$ consists of the sum of four terms. We begin with (142).

Define $h_{1}(x)=g^{\prime}(x)^{2}/\sigma^{2}(g(x))$ and note that $|h_{1}(x)|\leq M_{h_{1}}$ as well as $|h_{1}^{\prime}(x)|\leq M_{h_{1}}$ for some constant $M_{h_{1}}<\infty$ . With the help of the Cauchy-Schwarz inequality and $\eqref{lem:res0}$ , it follows from another Taylor expansion that there exists a $\xi_{i,4}$ between $\widehat{\eta}_{i}(\operatorname{\boldsymbol{\beta}})$ and $\eta_{i}(\operatorname{\boldsymbol{\beta}})$ such that:

[TABLE]

On the other hand, the Cauchy-Schwarz inequality together with the boundedness $|h_{1}(x)|$ and (111) implies that each of the other terms (143)–(145) is $O_{P}(n^{-1/2})$ . Assertion (126) is then an immediate consequence. Moreover, since $h_{1}(x)$ is bounded, it can immediately be seen that $Rest_{jk}^{(F)}(\operatorname{\boldsymbol{\beta}})$ is uniform integrable, providing additionally $\operatorname{\mathbb{E}}(Rest_{jk}^{(F)}(\operatorname{\boldsymbol{\beta}}))\to 0$ . Assertion (129) follows immediately.

In order so show (127), note that $\widehat{\operatorname{\mathbf{R}}}_{n}(\operatorname{\boldsymbol{\beta}})/n={\operatorname{\mathbf{R}}}_{n}(\operatorname{\boldsymbol{\beta}})/n+\mathbf{Rest}_{n}^{(R)}(\operatorname{\boldsymbol{\beta}})$ . A typical entry $Rest^{(R)}_{jk}(\operatorname{\boldsymbol{\beta}})$ of $\mathbf{Rest}_{n}^{(R)}(\operatorname{\boldsymbol{\beta}})$ reads as

[TABLE]

We will first show

[TABLE]

For $\operatorname{\boldsymbol{\beta}}=\operatorname{\boldsymbol{\beta}}_{0}$ , since $|h^{\prime\prime}(\cdot)|\leq M_{h}$ , a Taylor expansion together with the Cauchy-Schwarz inequality and (111) yield $\frac{1}{n}\sum_{i=1}^{n}(h^{\prime}(\widehat{\eta}_{i}(\operatorname{\boldsymbol{\beta}}_{0}))-h^{\prime}(\eta_{i}(\operatorname{\boldsymbol{\beta}}_{0})))X_{ij}X_{ik}\varepsilon_{i}=O_{P}(n^{-\frac{1}{2}})$ . Similarly each of the assertions (147)–(149) are $O_{P}(n^{-\frac{1}{2}})$ At the same time another Taylor expansion of (150) yields together with the Cauchy-Schwarz inequality and (111) for some $\xi_{i,5}$ between $\widehat{\eta}_{i}(\operatorname{\boldsymbol{\beta}})$ and $\eta_{i}(\operatorname{\boldsymbol{\beta}})$ :

[TABLE]

We may conclude that

[TABLE]

Moreover, our assumptions in particular imply that besides $Rest^{(R)}_{jk}(\operatorname{\boldsymbol{\beta}}_{0})/n=O_{P}(n^{-\frac{1}{2}})$ we have $\operatorname{\mathbb{E}}(Rest^{(R)}_{jk}(\operatorname{\boldsymbol{\beta}}_{0}))\to 0,$ proving assertions (151) and (131).

Now suppose $\operatorname{\boldsymbol{\beta}}\neq\operatorname{\boldsymbol{\beta}}_{0}$ and take another look at (146):

[TABLE]

Similar arguments as before, together with $\operatorname{\mathbb{E}}(\varepsilon_{i}^{4})<\infty$ , can now be used to show that

[TABLE]

A Taylor expansion of $g(\eta_{i}(\operatorname{\boldsymbol{\beta}}))$ leads for some $\xi_{i,6}$ between $\eta_{i}(\operatorname{\boldsymbol{\beta}})$ and $\eta_{i}(\operatorname{\boldsymbol{\beta}}_{0})$ to

[TABLE]

Another Taylor expansion of $h^{\prime}(\widehat{\eta}_{i}(\operatorname{\boldsymbol{\beta}}))$ together with the Cauchy-Schwarz inequality and the boundedness of $|g^{\prime}(x)|$ and $|h^{\prime\prime}(x)|$ leads for some $\xi_{i,7}$ between $\widehat{\eta}_{i}(\operatorname{\boldsymbol{\beta}})$ and $\eta_{i}(\operatorname{\boldsymbol{\beta}})$ to

[TABLE]

With similar arguments $\eqref{eq:h5}$ and (147) are, for all $\operatorname{\boldsymbol{\beta}}$ , $O_{P}(n^{-\frac{1}{2}})$ .

Considerations for (148)–(149) are parallel to the case $\eqref{eq:h2}$ assertion (127) follows immediately. (130) follows again from a closer investigation of the existence and boundedness of the moments of the rest terms used in the derivations (127). ∎

The proof of Theorem 3.3 consists roughly of two steps. In a first step asymptotic existence and consistency of our estimator $\widehat{\operatorname{\boldsymbol{\beta}}}$ is developed. In a second step we can then make use of the usual Taylor expansion of the estimation equation $\widehat{\operatorname{\mathbf{U}}}_{n}(\operatorname{\boldsymbol{\beta}})$ . With the help of Proposition D.3 asymptotic normality of our estimator will follow.

Proof of Theorem 3.3..

For a $q_{1}\times q_{2}$ matrix $\mathbf{A}$ let $||\mathbf{A}||=\sqrt{\sum_{i=1}^{q_{1}}\sum_{j=1}^{q_{2}}a_{ij}^{2}}$ its Frobenius norm. Moreover we denote by $\mathbf{A}^{1/2}$ ( $\mathbf{A}^{T/2}$ ) the left (the corresponding right) square root of a positive definite matrix $\mathbf{A}$ .

The proof generalizes the arguments used in Corollary 3 and Theorem 1 in \citeappendixFK1985. For $\delta_{1}>0$ define the neighborhoods

[TABLE]

and remember that with $h_{1}(x)=g^{\prime}(x)^{2}/\sigma^{2}(g(x))$ we have:

[TABLE]

The $(j,k)$ -element of this random matrix is given by $1/n\sum_{i=1}^{n}h_{1}(\widehat{\eta_{i}}(\operatorname{\boldsymbol{\beta}}))\widehat{X}_{ij}\widehat{X}_{ik}$ and constitutes a triangular array of row-wise independent and identical distributed random variables. Let $\widehat{\eta}(\operatorname{\boldsymbol{\beta}})$ , $\widehat{\operatorname{\mathbf{X}}}$ and $\varepsilon$ be generic copies of $\widehat{\eta_{i}}(\operatorname{\boldsymbol{\beta}})$ , $\widehat{\operatorname{\mathbf{X}}}_{i}$ and $\varepsilon_{i}$ . Since $h_{1}$ is bounded it is then easy to see that for any compact neighborhood $N$ around $\operatorname{\boldsymbol{\beta}}_{0}$ we have for all $p\geq 1$ :

[TABLE]

for some constant $M_{1}<\infty$ , not depending on $n$ . On the other hand the $(j,k)$ -element of $\widehat{\operatorname{\mathbf{R}}}_{n}(\operatorname{\boldsymbol{\beta}})/n$ can be written as

[TABLE]

Using the boundedness of $g^{\prime}$ and $h^{\prime}$ it follows from a Taylor expansion that for all $p\geq 1$ :

[TABLE]

for some constant $M_{2}<\infty$ , not depending on $n$ . While the Cauchy-Schwarz inequality together with the assumption $\operatorname{\mathbb{E}}(\varepsilon^{4})<\infty$ implies that for $1\leq p\leq 2$ :

[TABLE]

for some constant $M_{3}<\infty$ , not depending on $n$ . By (152), (153) and (154) a uniform law of large numbers for triangular arrays leads to

[TABLE]

Moreover, by (152), $\widehat{\operatorname{\mathbf{F}}}_{n}(\operatorname{\boldsymbol{\beta}}_{0})/n$ converges a.s. to $\operatorname{\mathbb{E}}(\widehat{\operatorname{\mathbf{F}}}(\operatorname{\boldsymbol{\beta}}_{0}))$ , implying $\lambda_{min}\widehat{\operatorname{\mathbf{F}}}_{n}(\operatorname{\boldsymbol{\beta}}_{0})\to\infty$ a.s., where $\lambda_{min}\mathbf{A}$ denotes the smallest eigenvalue of a matrix $\mathbf{A}$ . Note that as a direct consequence the neighborhoods $N_{n}(\delta_{1})$ shrink (a.s.) to $\operatorname{\boldsymbol{\beta}}_{0}$ for all $\delta_{1}>0$ . On the other hand, since by (131), $\operatorname{\mathbb{E}}(\widehat{\operatorname{\mathbf{R}}}(\operatorname{\boldsymbol{\beta}}_{0}))\to 0$ and $\operatorname{\mathbb{E}}(\widehat{\operatorname{\mathbf{R}}}(\operatorname{\boldsymbol{\beta}}))$ is continuous in $\operatorname{\boldsymbol{\beta}}$ we have for all $\epsilon>0$ , with probability converging to $1$ ,

[TABLE]

if $\operatorname{\boldsymbol{\beta}}$ is sufficiently close to $\operatorname{\boldsymbol{\beta}}_{0}$ .

The usual decomposition then yields for all $\epsilon>0$ , with probability converging to $1$ :

[TABLE]

if $\operatorname{\boldsymbol{\beta}}$ is sufficiently close to $\operatorname{\boldsymbol{\beta}}_{0}$ . Similar to the proof of Corollary 3 in \citeappendixFK1985 we may infer from this inequality that for all $\delta_{1}>0$ we have

[TABLE]

where $\widehat{\boldsymbol{\mathcal{V}}}_{n}(\operatorname{\boldsymbol{\beta}})=-\widehat{\operatorname{\mathbf{F}}}_{n}^{-1/2}(\operatorname{\boldsymbol{\beta}}_{0})\widehat{\operatorname{\mathbf{H}}}_{n}(\operatorname{\boldsymbol{\beta}})\widehat{\operatorname{\mathbf{F}}}_{n}^{-T/2}(\operatorname{\boldsymbol{\beta}}_{0})$ and $\operatorname{\mathbf{I}}_{p}$ denotes the $p\times p$ identity matrix. Again, following the arguments in \citeappendix[cf. Section 4.1]FK1985, this in particular implies that for all $\delta_{1}>0$ we have

[TABLE]

for some constant $c>0$ , $c$ independent of $\delta_{1}$ .

Let $\widehat{Q}_{n}(\operatorname{\boldsymbol{\beta}})$ be the quasi-likelihood function evaluated at the points of impact estimates $\widehat{\tau}_{r}$ . We aim to show that for any $\zeta>0$ there exists a $\delta_{1}>0$ such that

[TABLE]

for all sufficiently large $n$ . Note that the event $\widehat{Q}_{n}(\operatorname{\boldsymbol{\beta}})-\widehat{Q}_{n}(\operatorname{\boldsymbol{\beta}}_{0})<0\text{ for all }\operatorname{\boldsymbol{\beta}}\in\partial N_{n}(\delta_{1})$ implies that the there is a maximum inside of $N_{n}(\delta_{1})$ . Moreover, since $\widehat{\operatorname{\mathbf{R}}}_{n}(\operatorname{\boldsymbol{\beta}})/n$ is asymptotical negligible in a neighborhood around $\operatorname{\boldsymbol{\beta}}_{0}$ , and at the same time $\widehat{\operatorname{\mathbf{F}}}_{n}(\operatorname{\boldsymbol{\beta}})/n$ converges in probability to a positive definite matrix, the maximum will, with probability converging to 1, be uniquely determined as a zero of the score function $\widehat{\operatorname{\mathbf{U}}}_{n}(\operatorname{\boldsymbol{\beta}})$ . (158) then in particular implies that $P(\widehat{\operatorname{\mathbf{U}}}_{n}(\widehat{\operatorname{\boldsymbol{\beta}}})=0)\to 1$ and, together with the observation that $N_{n}(\delta_{1})$ shrink (a.s.) to $\operatorname{\boldsymbol{\beta}}_{0}$ , it implies consistency of our estimator, i.e. $\widehat{\operatorname{\boldsymbol{\beta}}}\stackrel{{\scriptstyle p}}{{\to}}\operatorname{\boldsymbol{\beta}}_{0}$ .

A Taylor expansion yields, with $\boldsymbol{\lambda}=\widehat{\operatorname{\mathbf{F}}}_{n}^{T/2}(\operatorname{\boldsymbol{\beta}}_{0})(\operatorname{\boldsymbol{\beta}}-\operatorname{\boldsymbol{\beta}}_{0})/\delta_{1}$ , for some $\widetilde{\operatorname{\boldsymbol{\beta}}}$ on the line segment between $\operatorname{\boldsymbol{\beta}}$ and $\operatorname{\boldsymbol{\beta}}_{0}$ :

[TABLE]

Using for the next few lines the spectral norm one may argue similarly to (3.9) in \citeappendixFK1985, that it suffices to show that for any $\zeta>0$ we have

[TABLE]

Note that (157) implies that with probability converging to one we have

[TABLE]

Hence, with probability converging to one:

[TABLE]

At the same time (124) and (126) can be used to derive

[TABLE]

By the continuous mapping theorem we then have for all $\epsilon>0$ with probability converging to $1$

[TABLE]

Since $\operatorname{\mathbb{E}}(||\operatorname{\mathbf{F}}_{n}^{-1/2}(\operatorname{\boldsymbol{\beta}}_{0})\operatorname{\mathbf{U}}_{n}(\operatorname{\boldsymbol{\beta}}_{0})||^{2})=p$ , we may conclude from (159) that with probability converging to $1$ we have for all sufficiently large $n$ :

[TABLE]

yielding (158) for $\delta_{1}^{2}=8p/(c^{2}\zeta)$ . Asymptotic existence and consistency of our estimator are immediate consequences.

Remember that we have

[TABLE]

Now, a Taylor expansion of $\widehat{\operatorname{\mathbf{U}}}_{n}(\widehat{\operatorname{\boldsymbol{\beta}}})$ around $\operatorname{\boldsymbol{\beta}}_{0}$ yields for some $\widetilde{\operatorname{\boldsymbol{\beta}}}$ between $\widehat{\operatorname{\boldsymbol{\beta}}}$ and $\operatorname{\boldsymbol{\beta}}_{0}$ (note that $\widetilde{\operatorname{\boldsymbol{\beta}}}$ obviously differs from element to element):

[TABLE]

With some straightforward calculations this leads to

[TABLE]

By (126) and (127) in Proposition D.3 we have

[TABLE]

But since $h^{\prime}$ is bounded we have for all $\operatorname{\boldsymbol{\beta}}\in\mathbf{R}^{S+1}$

[TABLE]

implying $(\operatorname{\mathbf{H}}_{n}(\operatorname{\boldsymbol{\beta}}_{0})+\operatorname{\mathbf{F}}_{n}(\operatorname{\boldsymbol{\beta}}_{0}))/n=O_{P}(n^{-\frac{1}{2}})$ and hence also

[TABLE]

By using (155) and (156) we can conclude that for any compact neighborhood $N$ around $\operatorname{\boldsymbol{\beta}}_{0}$ :

[TABLE]

Obviously, $\widetilde{\operatorname{\boldsymbol{\beta}}}$ is consistent for $\operatorname{\boldsymbol{\beta}}_{0}$ , since $\widehat{\operatorname{\boldsymbol{\beta}}}$ is consistent for $\operatorname{\boldsymbol{\beta}}_{0}$ . We may conclude that $\widetilde{\operatorname{\boldsymbol{\beta}}}$ will be in some compact neighborhood $N$ around $\operatorname{\boldsymbol{\beta}}_{0}$ with probability converging to $1$ . Moreover, since $\operatorname{\mathbb{E}}(\widehat{\operatorname{\mathbf{H}}}(\operatorname{\boldsymbol{\beta}}))$ is continuous in $\operatorname{\boldsymbol{\beta}}$ , (161) then implies that additionally we have

[TABLE]

The above arguments can then be used to show that

[TABLE]

Hence it also holds that

[TABLE]

The asymptotic prevailing term in (160) can then be seen as

[TABLE]

It is easy to see that our assumptions on $h(x)=g^{\prime}(x)/\sigma^{2}(g(x))$ imply that $\operatorname{\mathbb{E}}(||\operatorname{\mathbf{F}}_{n}(\operatorname{\boldsymbol{\beta}}_{0})/n||^{2})=O(\frac{1}{n})$ . Together with (126) we thus have $\widehat{\operatorname{\mathbf{F}}}_{n}(\operatorname{\boldsymbol{\beta}}_{0})/n=\operatorname{\mathbf{F}}_{n}(\operatorname{\boldsymbol{\beta}}_{0})/n+O_{P}(n^{-\frac{1}{2}})=\operatorname{\mathbb{E}}(\operatorname{\mathbf{F}}(\operatorname{\boldsymbol{\beta}}_{0}))+O_{P}(n^{-\frac{1}{2}})$ as well as $(\widehat{\operatorname{\mathbf{F}}}_{n}(\operatorname{\boldsymbol{\beta}}_{0})/n)^{-1}=(\operatorname{\mathbb{E}}(\operatorname{\mathbf{F}}(\operatorname{\boldsymbol{\beta}}_{0})))^{-1}+O_{P}(n^{-\frac{1}{2}})$ .

On the other hand, the Lindeberg-Lévy central limit theorem implies that $\frac{1}{\sqrt{n}}U(\operatorname{\boldsymbol{\beta}}_{0})\stackrel{{\scriptstyle d}}{{\to}}N(\operatorname{\boldsymbol{0}},\operatorname{\mathbb{E}}(\operatorname{\mathbf{F}}(\operatorname{\boldsymbol{\beta}}_{0})))$ . Together with (124) we then obtain

[TABLE]

which proves the theorem. ∎

Corollary D.1.

Under the assumptions of Section 3.2. For any compact neighborhood $N$ around $\operatorname{\boldsymbol{\beta}}_{0}$ we have

[TABLE]

Proof of Corollary D.1:.

The proofs of Assertions 164-166 are very similar. We begin with the proof of Assertion 164. Using again generic copies of $\widehat{\eta}_{i}$ , $\widehat{\operatorname{\mathbf{X}}}_{i}$ and $y_{i}$ we have with $h(x)=g^{\prime}(x)/\sigma^{2}(g(x))$ :

[TABLE]

The $j$ -th equation of $\widehat{\operatorname{\mathbf{U}}}(\operatorname{\boldsymbol{\beta}})$ can be rewritten as

[TABLE]

Choose an arbitrary compact neighborhood $N$ around $\operatorname{\boldsymbol{\beta}}_{0}$ . Since $|h(\cdot)|\leq M_{h}$ , $\operatorname{\mathbb{E}}(\epsilon^{4})<\infty$ and $|g^{\prime}(\cdot)|<M_{g}$ , it follows from a Taylor expansion that for $1\leq p\leq 2$ we have

[TABLE]

for a constant $0\leq M_{1,1}<\infty$ not depending on $n$ . By (168) we can apply a uniform law of large numbers for triangular arrays to conclude that

[TABLE]

Similar considerations lead to

[TABLE]

By the usual decomposition we have

[TABLE]

Assertion (164) then follows immediately from (169), (170), if we can show that $\operatorname{\mathbb{E}}(\widehat{\operatorname{\mathbf{U}}}(\operatorname{\boldsymbol{\beta}}))$ converges uniformly to $\operatorname{\mathbb{E}}(\operatorname{\mathbf{U}}(\operatorname{\boldsymbol{\beta}}))$ and not only pointwise as given in (128).

It is well known that pointwise convergence of a sequence of functions $f_{n}$ on a compact set $N$ can be extended to uniform convergence over $N$ , if $f_{n}$ is an equicontinuous sequence. Remember that a sufficient condition for equicontinuity is that there exists a common Lipschitz constant. We aim to show that there exists a constant $L<\infty$ , where $L$ does not depend on $n$ , such that for all $\operatorname{\boldsymbol{\beta}}$ and $\widetilde{\operatorname{\boldsymbol{\beta}}}$ in $N$ we have $||\operatorname{\mathbb{E}}(\widehat{\operatorname{\mathbf{U}}}(\operatorname{\boldsymbol{\beta}}))-\operatorname{\mathbb{E}}(\widehat{\operatorname{\mathbf{U}}}(\widetilde{\operatorname{\boldsymbol{\beta}}}))||\leq L||\operatorname{\boldsymbol{\beta}}-\widetilde{\operatorname{\boldsymbol{\beta}}}||.$ Remember that the $j$ th equation of $\operatorname{\mathbb{E}}(\widehat{\operatorname{\mathbf{U}}}(\operatorname{\boldsymbol{\beta}}))$ is given by $\operatorname{\mathbb{E}}(h(\widehat{\eta}(\operatorname{\boldsymbol{\beta}}))\widehat{X}_{j}(y-g(\widehat{\eta}(\operatorname{\boldsymbol{\beta}}))))$ . Note that

[TABLE]

Since for a $J\times K$ matrix $A$ we have $||A||=\sqrt{\sum_{j,k}a_{jk}^{2}}\leq\sum_{j,k}|a_{jk}|$ and since $h$ , $h^{\prime}$ and $g^{\prime}$ are bounded and $N$ is compact, our assumptions on $X$ then in particularly imply together with (172) that there exists a constant $L<\infty$ , which is in particular independent of $n$ such that for all $\operatorname{\boldsymbol{\beta}}$ and $\widetilde{\operatorname{\boldsymbol{\beta}}}\in N$

[TABLE]

Assertion (164) then follows from (171) together with (169), (170), (128) and (173).

In order to proof Assertion (165) we can use the decomposition

[TABLE]

Let $h_{1}(x)=g^{\prime}(x)^{2}/\sigma^{2}(g(x))$ and remember that $|h_{1}(\cdot)|\leq M_{h_{1}}$ for some constant $M_{h_{1}}<\infty$ . It immediately follows that for any compact neighborhood $N$ arround $\operatorname{\boldsymbol{\beta}}_{0}$ we have

[TABLE]

By $\eqref{eq:lalalalala}$ we can apply a uniform law of large numbers to derive

[TABLE]

Assertion 165 then follows immediately from (175), (155) and (129), which holds uniformly on $N$ by similar steps as above by noting that a typical element of $\widehat{\operatorname{\mathbf{F}}}(\operatorname{\boldsymbol{\beta}})-\widehat{\operatorname{\mathbf{F}}}(\widetilde{\operatorname{\boldsymbol{\beta}}})$ can be written as $\widehat{X}_{j}\widehat{X}_{k}(h_{1}(\widehat{\eta}(\operatorname{\boldsymbol{\beta}}))-h_{1}(\widehat{\eta}(\widetilde{\operatorname{\boldsymbol{\beta}}})))$ .

Assertion 166 can be proved in a similar manner using (128), (161) and the uniform convergence of $||\operatorname{\mathbf{R}}_{n}(\operatorname{\boldsymbol{\beta}})/n-\operatorname{\mathbb{E}}(\operatorname{\mathbf{R}}(\operatorname{\boldsymbol{\beta}}))||$ , which is easy to establish using $h_{1}(\eta(\operatorname{\boldsymbol{\beta}}))\operatorname{\mathbf{X}}\operatorname{\mathbf{X}}^{T}(y-g(\eta(\operatorname{\boldsymbol{\beta}})))=h_{1}(\eta(\operatorname{\boldsymbol{\beta}}))\operatorname{\mathbf{X}}\operatorname{\mathbf{X}}^{T}\varepsilon+h_{1}(\eta(\operatorname{\boldsymbol{\beta}}))\operatorname{\mathbf{X}}\operatorname{\mathbf{X}}^{T}(g(\eta(\operatorname{\boldsymbol{\beta}}_{0}))-g(\eta(\operatorname{\boldsymbol{\beta}})))$ and the assumption that $|h_{1}^{\prime}(\cdot)|\leq M_{h_{1}}$ .

To proof assertion (167), remember that $\widehat{\operatorname{\mathbf{H}}}(\operatorname{\boldsymbol{\beta}})/n=-\widehat{\operatorname{\mathbf{F}}}_{n}(\operatorname{\boldsymbol{\beta}})/n+\widehat{\operatorname{\mathbf{R}}}_{n}(\operatorname{\boldsymbol{\beta}})/n$ , assertion (167) follows then immediately from (165) and (166). ∎

\bibliographystyleappendix

Chicago \bibliographyappendixbibfileappendix

Bibliography29

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Aneiros and Vieu (2014) Aneiros, G. and P. Vieu (2014). Variable selection in infinite-dimensional problems. Statistics & Probability Letters 94 , 12–20.
2Berrendero et al. (2019) Berrendero, J. R., B. Bueno-Larraz, and A. Cuevas (2019). An RKHS model for variable selection in functional linear regression. Journal of Multivariate Analysis 170 , 25–45.
3Boente et al. (2014) Boente, G., M. S. Barrera, and D. E. Tyler (2014). A characterization of elliptical distributions and some optimality properties of principal components for functional data. Journal of Multivariate Analysis 131 , 254–264.
4Calcagno (2013) Calcagno, V. (2013). glmulti: Model selection and multimodel inference made easy . R package version 1.0.7.
5Dagsvik and Strøm (2006) Dagsvik, J. K. and S. Strøm (2006). Sectoral labour supply, choice restrictions and functional form. Journal of Applied Econometrics 21 (6), 803–826.
6Embrechts and Maejima (2000) Embrechts, P. and M. Maejima (2000). An introduction to the theory of self-similar stochastic processes. International Journal of Modern Physics B 14 (12n 13), 1399–1420.
7Ferraty et al. (2010) Ferraty, F., P. Hall, and P. Vieu (2010). Most-predictive design points for functional data predictors. Biometrika 97 (4), 807–824.
8Floriello and Vitelli (2017) Floriello, D. and V. Vitelli (2017). Sparse clustering of functional data. Journal of Multivariate Analysis 154 , 1–18.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Super-Consistent Estimation of Points of Impact in Nonparametric Regression with Functional Predictors

Abstract

1 Introduction

2 Estimating points of impact

2.1 Theoretical framework

Assumption 2.1**.**

Theorem 2.1**.**

2.2 Estimation algorithm

Algorithm 2.1**.**

2.3 Asymptotic results

Assumption 2.2**.**

Theorem 2.2**.**

2.4 Generalizations

2.4.1 Non-Gaussian processes

2.4.2 Generalizing covariance Assumption 2.1

3 Subsequent estimation of ggg

3.1 Nonparametric estimation

Theorem 3.1**.**

Corollary 3.1**.**

3.2 Parametric estimation

Theorem 3.2**.**

Assumption 3.1**.**

Theorem 3.3**.**

3.2.1 Parametric estimation: Practical implementation

4 Simulation

4.1 Evaluation of the parametric estimation procedure

4.2 Evaluation of the nonparametric estimation procedure

5 Points of impact in continuous emotional stimuli

Appendix A Additional simulation results

Appendix B Identifying points of impact

B.1 Proof of Theorem 2.1

Proof of Theorem 2.1.

B.2 Identification for non-Gaussian processes

Proposition B.1**.**

Proof of Proposition B.1.

Appendix C Estimating points of impact

Proposition C.1**.**

Proof of Proposition C.1.

Proofs for estimation of impact points

Lemma C.1**.**

Proof of Lemma C.1.

Lemma C.2**.**

Proof of Lemma C.2..

Lemma C.3**.**

Proof of Lemma C.3..

Lemma C.4**.**

Proof of Lemma C.4..

Proof of Theorem 2.2..

Appendix D Subsequent estimation of ggg

Lemma D.1**.**

Proof of Lemma D.1.

D.1 The nonparametric case

Proposition D.1**.**

Proof of Proposition D.1.

Theorem D.1**.**

Proof of Theorem D.1.

Proof of Theorem 3.1.

Proof of Corollary 3.1.

D.2 The parametric case

Lemma D.2**.**

Proof of Lemma D.2..

Proof of Theorem 3.2..

Proposition D.2**.**

Proof of Proposition D.2..

Proposition D.3**.**

Proof of Proposition D.3..

Proof of Theorem 3.3..

Corollary D.1**.**

Proof of Corollary D.1:.

Assumption 2.1.

Theorem 2.1.

Algorithm 2.1.

Assumption 2.2.

Theorem 2.2.

3 Subsequent estimation of $g$

Theorem 3.1.

Corollary 3.1.

Theorem 3.2.

Assumption 3.1.

Theorem 3.3.

Proposition B.1.

Proposition C.1.

Lemma C.1.

Lemma C.2.

Lemma C.3.

Lemma C.4.

Appendix D Subsequent estimation of $g$

Lemma D.1.

Proposition D.1.

Theorem D.1.

Lemma D.2.

Proposition D.2.

Proposition D.3.

Corollary D.1.