Correlation bounds, mixing and m-dependence under random time-varying   network distances with an application to Cox-Processes

Alexander Kreiss

arXiv:1906.03179·math.ST·July 15, 2024

Correlation bounds, mixing and m-dependence under random time-varying network distances with an application to Cox-Processes

Alexander Kreiss

PDF

Open Access

TL;DR

This paper develops new correlation and mixing bounds for stochastic processes on dynamic networks, and applies these results to analyze a Cox-process model for bike-sharing data, demonstrating asymptotic properties of a goodness-of-fit test.

Contribution

It introduces novel correlation and mixing bounds for processes on time-varying networks and applies them to Cox-process models, advancing understanding of dependence in dynamic network data.

Findings

01

Established exponential inequalities for weak dependence on dynamic networks.

02

Proved asymptotic properties of a goodness-of-fit test in Cox-process models.

03

Applied the theoretical results to real bike-sharing data.

Abstract

We will consider multivariate stochastic processes indexed either by vertices or pairs of vertices of a dynamic network. Under a dynamic network we understand a network with a fixed vertex set and an edge set which changes randomly over time. We will assume that the spatial dependence-structure of the processes conditional on the network behaves in the following way: Close vertices (or pairs of vertices) are dependent, while we assume that the dependence decreases conditionally on that the distance in the network increases. We make this intuition mathematically precise by considering three concepts based on correlation, beta-mixing with time-varying beta-coefficients and conditional independence. These concepts allow proving weak-dependence results, e.g. an exponential inequality, which might be of independent interest. In order to demonstrate the use of these concepts in an application…

Figures5

Click any figure to enlarge with its caption.

Equations913

(N_{n, ij}, X_{n, ij}, C_{n, ij})_{(i, j) \in L_{n}} \sim (N_{n, σ (i) σ (j)}, X_{n, σ (i) σ (j)}, C_{n, σ (i) σ (j)})_{(i, j) \in L_{n}}

(N_{n, ij}, X_{n, ij}, C_{n, ij})_{(i, j) \in L_{n}} \sim (N_{n, σ (i) σ (j)}, X_{n, σ (i) σ (j)}, C_{n, σ (i) σ (j)})_{(i, j) \in L_{n}}

V_{n} := Var \frac{1}{r _{n}} (i, j) \in L_{n} \sum Z_{n, ij}

V_{n} := Var \frac{1}{r _{n}} (i, j) \in L_{n} \sum Z_{n, ij}

=

=

\forall n \in N, \exists δ_{n} > 0, \forall t_{0} \in [0, T], \forall J \subseteq L_{n} : Given F_{t_{0}}^{n}

\forall n \in N, \exists δ_{n} > 0, \forall t_{0} \in [0, T], \forall J \subseteq L_{n} : Given F_{t_{0}}^{n}

(N_{n, ij} (t))_{(i, j) \in J, t \in [t_{0}, t_{0} + 6 δ_{n}]} is cond. independent of

\displaystyle\quad\quad\sigma\Big{(}\mathcal{N}_{n,ij}(r)\cdot\mathbbm{1}(d_{s}^{n}((i,j),J)\geq m):s\leq t_{0},r\leq s+6\delta_{n},\,(i,j)\in L_{n}\Big{)}.

F_{t}^{n, J, m}

F_{t}^{n, J, m}

F_{I, t}^{n, J, m} :=

F_{I, t}^{n, J, m} :=

\lor σ (N_{n, ij} (r) \mathbbm 1 (d_{s}^{n} ((i, j), J) \geq m) : s \leq max (0, t - 4 δ_{n}), r \leq s + 6 δ_{n}, (i, j) \in L_{n}) .

E (i, j) \in L_{n} \sum \int_{0}^{T} φ_{n, ij} (t) d M_{n, ij} (t)^{2} \leq (i, j) \in L_{n} \sum \int_{0}^{T} E (φ_{n, ij}^{ij} (t)^{2} C_{n, ij} (t) λ_{n, ij} (t)) d t

E (i, j) \in L_{n} \sum \int_{0}^{T} φ_{n, ij} (t) d M_{n, ij} (t)^{2} \leq (i, j) \in L_{n} \sum \int_{0}^{T} E (φ_{n, ij}^{ij} (t)^{2} C_{n, ij} (t) λ_{n, ij} (t)) d t

+ 2 (i, j), (k, l) \in L_{n} \sum E (\int_{0}^{T} φ_{n, ij}^{ij, k l} (t) d M_{n, ij} (t) \int_{0}^{T} (φ_{n, k l} (r) - φ_{n, k l}^{ij, k l} (r)) d M_{n, k l} (r))

+ (i, j), (k, l) \in L_{n} \sum E (\int_{0}^{T} (φ_{n, ij} (t) - φ_{n, ij}^{ij, k l} (t)) d M_{n, ij} (t) \int_{0}^{T} (φ_{n, k l} (r) - φ_{n, k l}^{ij, k l} (r)) d M_{n, k l} (r)) .

\frac{1}{r _{n}} (i, j) \neq = (k, l) (i, j), (k, l) \in L_{n} \sum \int_{0}^{T} \int_{t - 2 h}^{t -} φ_{n, ij, k l} (t, r) d M_{n, k l} (r) d M_{n, ij} (t),

\frac{1}{r _{n}} (i, j) \neq = (k, l) (i, j), (k, l) \in L_{n} \sum \int_{0}^{T} \int_{t - 2 h}^{t -} φ_{n, ij, k l} (t, r) d M_{n, k l} (r) d M_{n, ij} (t),

φ_{n, ij, k l} := (v_{1}, v_{2}) \neq = (i, j), (k, l) (v_{1}, v_{2}) \in L_{n} \sum \int_{t}^{r + 2 h} f (X_{n, v_{1} v_{2}} (τ)) d τ

φ_{n, ij, k l} := (v_{1}, v_{2}) \neq = (i, j), (k, l) (v_{1}, v_{2}) \in L_{n} \sum \int_{t}^{r + 2 h} f (X_{n, v_{1} v_{2}} (τ)) d τ

t \mapsto \int_{0}^{t -} φ (t, r) d X (r)

t \mapsto \int_{0}^{t -} φ (t, r) d X (r)

\frac{1}{r _{n}} u_{1} \neq = u_{2} u_{1}, u_{2} \in L_{n} \sum \int_{0}^{T} \int_{t - 2 δ_{n}}^{t -} φ_{n, u_{1} u_{2}} (t, r) d M_{n, u_{2}} (r) d M_{n, u_{1}} (t) \to P 0,

\frac{1}{r _{n}} u_{1} \neq = u_{2} u_{1}, u_{2} \in L_{n} \sum \int_{0}^{T} \int_{t - 2 δ_{n}}^{t -} φ_{n, u_{1} u_{2}} (t, r) d M_{n, u_{2}} (r) d M_{n, u_{1}} (t) \to P 0,

\frac{1}{r _{n}} u_{1} \neq = u_{2} u_{1}, u_{2} \in L_{n} \sum \int_{0}^{T} \int_{t - 2 δ_{n}}^{t -} (φ_{n, u_{1} u_{2}} (t, r) - φ_{n, u_{1} u_{2}}^{u_{1} u_{2}} (t, r)) d M_{n, u_{2}} (r) d M_{n, u_{1}} (t) = o_{P} (1),

\frac{1}{r _{n}} u_{1} \neq = u_{2} u_{1}, u_{2} \in L_{n} \sum \int_{0}^{T} \int_{t - 2 δ_{n}}^{t -} (φ_{n, u_{1} u_{2}} (t, r) - φ_{n, u_{1} u_{2}}^{u_{1} u_{2}} (t, r)) d M_{n, u_{2}} (r) d M_{n, u_{1}} (t) = o_{P} (1),

\displaystyle\mathbb{E}\Bigg{(}\frac{1}{r_{n}^{2}}\underset{u_{1}\neq u_{2},u_{3}\neq u_{4}}{\sum_{u_{1},u_{2},u_{3},u_{4}\in L_{n}}}\int_{0}^{T}\int_{t-2\delta_{n}}^{t-}\left(\widetilde{\varphi}_{n,u_{1}u_{2}}^{u_{1}u_{2}}(t,r)-\widetilde{\varphi}_{n,u_{1}u_{2}}^{u_{1}u_{2}u_{3}u_{4}}\right)(t,r)dM_{n,u_{2}}(r)dM_{n,u_{1}}(t)

\displaystyle\quad\quad\times\int_{0}^{T}\int_{t-2\delta_{n}}^{t-}\left(\widetilde{\varphi}_{n,u_{3}u_{4}}^{u_{3}u_{4}}(t,r)-\widetilde{\varphi}_{n,u_{3}u_{4}}^{u_{1}u_{2}u_{3}u_{4}}(t,r)\right)dM_{n,u_{4}}(r)dM_{n,u_{3}}(t)\Bigg{)}=o(1),

\displaystyle\frac{2}{r_{n}^{2}}\underset{u_{1}\neq u_{2},u_{3}\neq u_{4}}{\sum_{u_{1},u_{2},u_{3},u_{4}\in L_{n}}}\mathbb{E}\Bigg{[}\int_{0}^{T}\int_{t-2\delta_{n}}^{t-}\left(\widetilde{\varphi}_{n,u_{1}u_{2}}^{u_{1}u_{2}}(t,r)-\widetilde{\varphi}_{n,u_{1}u_{2}}^{u_{1}u_{2}u_{3}u_{4}}(t,r)\right)dM_{n,u_{2}}(r)

\displaystyle\times\int_{t}^{t+2\delta_{n}}\int_{\xi-2\delta_{n}}^{\xi-}\widetilde{\varphi}_{n,u_{3}u_{4}}^{u_{1}u_{2}u_{3}u_{4}}(\xi,\rho)dM_{n,u_{4}}(\rho)dM_{n,u_{3}}(\xi)\mathbbm{1}(\neg u_{3},u_{4}\in F_{u_{1}}(t-2\delta_{n}))dM_{n,u_{1}}(t)\Bigg{]}=o(1),

\displaystyle\frac{1}{r_{n}^{2}}\underset{u_{1}\neq u_{2}}{\sum_{u_{1},u_{2}\in L_{n}}}\int_{0}^{T}\int_{t-2\delta_{n}}^{t-}\mathbb{E}\Bigg{[}\widetilde{\varphi}_{n,u_{1}u_{2}}^{u_{1}u_{2}}(t,r)^{2}C_{n,u_{1}}(t)\lambda_{n,u_{1}}(t)C_{n,u_{2}}(r)\lambda_{n,u_{2}}(r)

\displaystyle\quad\quad\times\mathbbm{1}(u_{2}\in F_{u_{1}}(t-2\delta_{n}))\Bigg{]}drdt=o(1),

\displaystyle\frac{1}{r_{n}^{2}}\underset{u_{1}\neq u_{2}}{\sum_{u_{1},u_{2}\in L_{n}}}\underset{u_{4}\neq u_{2}}{\sum_{u_{4}\in L_{n}}}\int_{0}^{T}\mathbb{E}\Bigg{[}\int_{t-2\delta_{n}}^{t-}\widetilde{\varphi}_{n,u_{1}u_{2}}^{u_{1}u_{2}u_{4}}(t,r)dM_{n,u_{2}}(r)\int_{t-2\delta_{n}}^{t-}\widetilde{\varphi}_{n,u_{1}u_{4}}^{u_{1}u_{2}u_{4}}(t,r^{\prime})dM_{n,u_{4}}(r^{\prime})

\displaystyle\quad\quad\quad\times C_{n,u_{1}}(t)\lambda_{n,u_{1}}(t)\mathbbm{1}(\neg u_{2},u_{4}\in F_{u_{1}}(t-2\delta_{n}))\Bigg{]}dt=o(1).

\frac{1}{r _{n} p _{n} ( t )} (i, j) \in L_{n} \sum (Z_{n, ij} - E (Z_{n, ij})),

\frac{1}{r _{n} p _{n} ( t )} (i, j) \in L_{n} \sum (Z_{n, ij} - E (Z_{n, ij})),

β (A, B) := C \in A \otimes B sup ∣ P_{A \otimes B} (C) - (P_{A} \otimes P_{B}) (C) ∣,

β (A, B) := C \in A \otimes B sup ∣ P_{A \otimes B} (C) - (P_{A} \otimes P_{B}) (C) ∣,

P_{A \otimes B} (A \times B)

P_{A \otimes B} (A \times B)

(P_{A} \otimes P_{B}) (A \times B)

I_{n, ij}^{k, m, t} (Δ)

I_{n, ij}^{k, m, t} (Δ)

U_{k, m}^{n, t} (Δ) := (i, j) \in L_{n} \sum [Z_{n, ij} \cdot I_{n, ij}^{k, m, t} (Δ) - E (Z_{n, ij} \cdot I_{n, ij}^{k, m, t} (Δ))] .

U_{k, m}^{n, t} (Δ) := (i, j) \in L_{n} \sum [Z_{n, ij} \cdot I_{n, ij}^{k, m, t} (Δ) - E (Z_{n, ij} \cdot I_{n, ij}^{k, m, t} (Δ))] .

β_{t} (Δ) := k \in {1, ..., K} M \in N max β ([U_{k, m}^{n, t} (Δ)]_{m \leq M - 1}, U_{k, M}^{n, t} (Δ)) .

β_{t} (Δ) := k \in {1, ..., K} M \in N max β ([U_{k, m}^{n, t} (Δ)]_{m \leq M - 1}, U_{k, M}^{n, t} (Δ)) .

k = 1 ⋃ K m ⋃ G^{t} (k, m, Δ) = E_{n, t},

k = 1 ⋃ K m ⋃ G^{t} (k, m, Δ) = E_{n, t},

(i, j) \in L_{n} \sum Z_{n, ij} = k = 1 \sum K m = 1 \sum \infty (i, j) \in L_{n} \sum Z_{n, ij} I_{n, ij}^{k, m, t} .

(i, j) \in L_{n} \sum Z_{n, ij} = k = 1 \sum K m = 1 \sum \infty (i, j) \in L_{n} \sum Z_{n, ij} I_{n, ij}^{k, m, t} .

E (∣ U_{k, m}^{n, t} (Δ_{n}) ∣^{ρ}) \leq \frac{ρ !}{2} E_{k, m}^{n, t} σ^{2} \cdot (E_{k}^{n, t} c_{1})^{ρ - 2},

E (∣ U_{k, m}^{n, t} (Δ_{n}) ∣^{ρ}) \leq \frac{ρ !}{2} E_{k, m}^{n, t} σ^{2} \cdot (E_{k}^{n, t} c_{1})^{ρ - 2},

\frac{1}{∣ E ∣ _{n, t}} m = 1 \sum r_{n} E_{k, m}^{n, t} \geq c_{2} and E_{k}^{n, t} \leq c_{3} \frac{∣ E ∣ _{n, t}}{lo g ∣ E ∣ _{n, t}} .

\frac{1}{∣ E ∣ _{n, t}} m = 1 \sum r_{n} E_{k, m}^{n, t} \geq c_{2} and E_{k}^{n, t} \leq c_{3} \frac{∣ E ∣ _{n, t}}{lo g ∣ E ∣ _{n, t}} .

P \frac{1}{∣ E ∣ _{n, t}} (i, j) \in L_{n} \sum (Z_{n, ij} - E (Z_{n, ij})) \geq x \cdot \frac{lo g ∣ E ∣ _{n, t}}{∣ E ∣ _{n, t}}

P \frac{1}{∣ E ∣ _{n, t}} (i, j) \in L_{n} \sum (Z_{n, ij} - E (Z_{n, ij})) \geq x \cdot \frac{lo g ∣ E ∣ _{n, t}}{∣ E ∣ _{n, t}}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Complex Network Analysis Techniques · Stochastic processes and statistical mechanics

Full text

Correlation bounds, mixing and $m$ -dependence under random time-varying network distances with an application to Cox-Processes

Alexander Kreiß

KU Leuven

ORSTAT KU Leuven

Naamsestraat 69

3000 Leuven

Belgium

[email protected]

We will consider multivariate stochastic processes indexed either by vertices or pairs of vertices of a dynamic network. Under a dynamic network we understand a network with a fixed vertex set and an edge set which changes randomly over time. We will assume that the spatial dependence-structure of the processes conditional on the network behaves in the following way: Close vertices (or pairs of vertices) are dependent, while we assume that the dependence decreases conditionally on that the distance in the network increases. We make this intuition mathematically precise by considering three concepts based on correlation, $\beta$ -mixing with time-varying $\beta$ -coefficients and conditional independence. These concepts allow proving weak-dependence results, e.g. an exponential inequality, which might be of independent interest. In order to demonstrate the use of these concepts in an application we study the asymptotics (for growing networks) of a goodness of fit test in a dynamic interaction network model based on a Cox-type model for counting processes. This model is then applied to bike-sharing data.

1 Introduction

Data indexed by vertices or pairs of vertices of networks has become popular in recent times (see e.g. Brownlees et al. [4], Demirer et al. [13], Butts [5] for recent applications) when also the availability of such data sets increases, see e.g. websites of SNAP (Stanford University) or KONECT (University of Koblenz-Landau). In order to illustrate the contribution of this paper, we consider the following example of network data: Suppose we observe on the interval $[0,T]$ a network with vertex set $V_{n}:=\{1,...,n\}$ and random, dynamic adjacency matrix $C_{n}$ , i.e., for all $i,j\in V_{n},i\neq j$ we have a stochastic process $C_{n,ij}:[0,T]\to\{0,1\}$ , where $C_{n,ij}(t)=1$ means that $i$ and $j$ are connected by an edge at time $t$ . We consider the vertices to be actors who can interact with each other whenever they are connected by an edge. As an example, the actors could be users of a social media platform and an interaction is sending a private message. Then, we observe for all pairs $(i,j)$ a counting process $N_{n,ij}$ which counts the interactions between $i$ and $j$ and a multivariate process $X_{n,ij}$ which carries information about $i$ and $j$ , e.g. the number of interactions in the past or information about mutually shared interests. In this situation it is intuitive that the tuples $(N_{n,ij},X_{n,ij},C_{n,ij})$ cannot be modelled as independent. Instead, we adopt the following heuristic: For any two pairs $(i,j),(k,l)$ and time points $t\in[0,T]$ , we suppose that on a small neighbourhood $U_{t}$ around $t$ the dependence is influenced by the closeness of $(i,j)$ and $(k,l)$ where closeness is to be understood relative to the random adjacency matrix $C_{n}(t)$ (we will be more precise later):

The processes $(N_{n,ij},X_{n,ij},C_{n,ij})$ and $(N_{n,kl},X_{n,kl},C_{n,kl})$ restricted to $U_{t}$ are dependent conditional on $(i,j)$ and $(k,l)$ being close at time $t$ in $C_{n}(t)$ . 2. 2.

The processes $(N_{n,ij},X_{n,ij},C_{n,ij})$ and $(N_{n,kl},X_{n,kl},C_{n,kl})$ restricted to $U_{t}$ are almost independent conditional on $(i,j)$ and $(k,l)$ being far apart in $C_{n}(t)$ .

Note that we implicitly allow that the dependence structure may randomly change over time by allowing that the adjacency matrix $C_{n}$ is a random function of time. In order to use this intuition we will have to assume that in large networks a given pair $(i,j)$ is at a given time $t$ most likely not close to too many other pairs.

The main contribution of this work is to make the above heuristic precise. We do this by formalizing time-varying spatial dependence concepts for multivariate processes indexed by pairs of vertices in a network. More precisely, we extend the concept of asymptotic uncorrelation which was used in the previous work Kreiß et al. [28] to momentary- $m$ -dependence and $\beta$ -mixing on networks. In contrast to asymptotic uncorrelation the two new concepts take the random network structure into account. Thus, time-varying $\beta$ -Mixing coefficients will allow us to prove exponential inequalities. Moreover, by using Momentary- $m$ -Dependence we can adapt a technique from Mammen and Nielsen [32] to networks in order to handle predictability problems related to counting processes which were also noted e.g. in Nielsen et al. [37]. In order to illustrate the necessity of these concepts in a specific situation we study a goodness of fit test in a counting process based network model. In the derivation of the asymptotic distribution of the test statistic under the null we require uniform control of the whole estimated parameter function. This cannot be handled by simple second order conditions (e.g. asymptotic uncorrelation as in Kreiß et al. [28]). However, more generally, these concepts can be used to provide interpretable conditions to transfer inference results for multivariate counting processes from the iid case (cf. Andersen and Gill [2]) to the case of random network data. Thus, Section 2 is of independent interest for the literature on multivariate (counting) processes on networks (see e.g. Butts [5], Perry and Wolfe [41], Fox et al. [15], Vu et al. [49] for such models).

For an overview of statistical methods in network analysis we refer to the books Kolaczyk [26], Jackson [23] and Newman [35]. The general situation that the relational structure of network data is different from other dependent-data scenarios like time-series and spatial data analysis is for example mentioned in the beginning of Chapter 2 in Kolaczyk [27]. Classical results about dependent processes can e.g. be found in the books Doukhan [14] and Rio [43]. Some models which are used in the context of mixing, particularly in econometrics, are mentioned in Nze and Doukhan [38]. Further asymptotic normality results based on local dependence can be found in Chen and Shao [6] for random fields and in Schweinberger and Handcock [44], Kojevnikov et al. [25] for random, non-dynamic networks. Other approaches for modelling dependence in random networks are for example the extension of the concept of stationarity to random (but not time-changing) networks as in Vainora [46] or Bayesian networks (cf. Pearl [40] and Friedman et al. [16] for an extension to time series and Grzegorczyk et al. [19] for an application). In the application we will study a Cox-type proportional hazard model (cf. Andersen et al. [3], Martinussen and Scheike [33], Cox [8], Andersen and Gill [1]). Generalisations and variations of such models have been studied outside a network context e.g. in Nielsen et al. [37], Nielsen and Linton [36], Linton et al. [31, 30]. For network interactions, parameter estimation in models of this type has been considered e.g. in Butts [5] and Perry and Wolfe [41]. The goodness of fit test which we will consider is based on an $L^{2}$ -type test statistic as in Härdle and Mammen [21]. Particular references for smooth testing in survival analysis are Müller and Van Keilegom [34] (use the same type of test statistic) and Kauermann and Berger [24] (use a local likelihood approach), however, not within a network context.

After collecting some notation and briefly reviewing asymptotic uncorrelation in Sections 2.1 and 2.2, we introduce in the main part of Section 2 the concepts of momentary- $m$ -dependence (Section 2.3) and $\beta$ -mixing on networks (Section 2.4). In the end of Section 2, in Section 2.5, we provide examples of data generating processes and motivate why they exhibit these properties. In Section 3 we apply the methods established in Section 2 to a goodness of fit test problem. The whole procedure is then illustrated on bike sharing data in Section 4. In the Appendix (Section 5) we collect missing proofs from the main part of the paper as well as some additional technical results. R-code which is used for the bike-data illustration is available on https://github.com/akreiss/Estimate-Event-Network.

2 Describing Dependence on Dynamic Networks

In this section we introduce the dependence concepts. For ease of exposition we stick to a model for relational event data which was also used e.g. in Butts [5], Perry and Wolfe [41], Kreiß et al. [28]. In Section 2.1 we will briefly review the basics of this framework and in Sections 2.2-2.4 we introduce the dependence concepts. Section 2.5 provides examples. We finish in Section 2.6 with a short note on processes indexed by vertices.

2.1 Preliminaries and Notation

We use the following notation from graph theory. We consider directed (undirected), dynamic, random networks $G_{n,t}=(V_{n},E_{n,t})$ for $n\in\mathbb{N}$ and $t\in[0,T]$ which are comprised of a fixed vertex set $V_{n}:=\{1,...,n\}$ and a random dynamic edge set $E_{n,t}\subseteq L_{n}$ , where $L_{n}:=\{(i,j):i,j\in V_{n},i\neq j\}$ is the set of all directed (undirected) pairs (we exclude loops). The adjacency matrix of $G_{n,t}$ at time $t$ is denoted by $(C_{n,ij}(t))_{i,j\in V_{n}}$ . Furthermore we denote by $r_{n}:=|L_{n}|$ the number of directed (undirected) pairs of vertices.

We study stochastic processes $(N_{n,ij},X_{n,ij},C_{n,ij})_{(i,j)\in L_{n}}$ with the following properties.

**Measurability

For all $n\in\mathbb{N}$ there is a filtration $(\mathcal{F}_{t}^{n})_{t\in[0,T]}$ such that for all $i,j\in V_{n}$ the processes $N_{n,ij}:[0,T]\to\mathbb{N}$ are counting processes which are adapted to $\mathcal{F}_{t}^{n}$ and such that for all $i,j\in V_{n}$ the processes $X_{n,ij}:[0,T]\to\mathbb{R}^{q}$ and $C_{n,ij}:[0,T]\to\{0,1\}$ are predictable with respect to $\mathcal{F}_{t}^{n}$ . Moreover, the intensity function of $N_{n,ij}$ is given by $\lambda_{n,ij}(t)=C_{n,ij}(t)\lambda(t,X_{n,ij}(t))$ for some link function $\lambda:[0,T]\times\mathbb{R}^{q}\to[0,\infty)$ .**

Remark 2.1.

We choose here to index the processes with pairs of vertices. Similarly one could also index the processes with the vertices directly (cf. Section 2.6). We choose pairs here because we imagine observations to be driven by the interplay of two actors. 2. 2.

The processes $C_{n,ij}$ are indicators which indicate whether the pair $(i,j)$ is currently active at time $t$ ( $C_{n,ij}(t)=1$ ) or not ( $C_{n,ij}(t)=0$ ). Our understanding is that, for a given $(i,j)\in L_{n}$ , the process $N_{n,ij}$ is only interesting (i.e. useful for inference) on the set $\{t\in[0,T]:\,C_{n,ij}(t)=1\}$ . 3. 3.

We are not too much concerned about the existence of a filtration as required in the above definition. One possibility would be to assume that $C_{n,ij}$ and $X_{n,ij}$ are continuous from the left and let $\mathcal{F}_{t}^{n}:=\sigma(N_{n,ij}(s),X_{n,ij}(s),C_{n,ij}(s):\,(i,j)\in L_{n},s\leq t)$ be the filtration generated by the processes $(N_{n,ij},X_{n,ij},C_{n,ij})$ for all $(i,j)\in L_{n}$ .

It is intuitively reasonable to assume that relabelling the vertices is not going to change the distribution of the processes. Hence, we will assume that

[TABLE]

holds for all permutations $\sigma:V_{n}\to V_{n}$ and all $n\in\mathbb{N}$ . This property is also called joint exchangeability of arrays (cf. Orbanz and Roy [39]). Note that for any two different pairs $(i,j),(k,l)\in L_{n}$ we can construct a permutation $\sigma$ with $\sigma(i)=k$ and $\sigma(j)=l$ (recall that we consider networks without loops). Hence, $(N_{n,ij},X_{n,ij},C_{n,ij})$ and $(N_{n,kl},X_{n,kl},C_{n,kl})$ are identically distributed. This notion allows for the concept of hubs but every vertex has a priory the same potential of becoming a hub. Moreover, we assume that all possible interactions between vertices are observed. Therefore we do not have to worry about edge sampling issues as mentioned in Crane and Dempsey [10]. Note lastly that the permutations $\sigma$ from above are deterministic and thus in particular they may not be chosen dependent on the actual observed network structure. This will be similar in Section 2.2 when discussing asymptotic uncorrelation. Thus, these two properties do not take the actually observed network into account. However, in Sections 2.3 and 2.4, when introducing Momentary- $m$ -Dependence and $\beta$ -Mixing we condition on the observed network. Thus, in these concepts we consider the observations after making choices which are strongly dependent on the observed network.

One way of taking the network structure into account is through random distance functions on networks. A random distance function on networks is a collection of stochastic processes $d_{t}^{n}:L_{n}\times L_{n}\to[0,\infty]$ such that for any $t\in[0,T]$ and $n\in\mathbb{N}$ , $d_{t}^{n}$ is almost surely a metric. For later reference we collect all the above in a single definition.

Definition 2.2.

The processes $(N_{n,ij},X_{n,ij},C_{n,ij})_{(i,j)\in L_{n}}$ on $[0,T]$ together with the random distance function on networks $d_{t}^{n}$ is called structured interaction network process if for all $n\in\mathbb{N}$

the above mentioned measurability properties hold, 2. 2.

the network process is exchangeable, i.e., (2.1) holds for all permutations $\sigma:V_{n}\to V_{n}$ , 3. 3.

$t\mapsto d_{t}^{n}((i,j),(k,l))$ * is predictable with respect to $\mathcal{F}_{t}^{n}$ for all $(i,j),(k,l)\in L_{n}$ .*

In this case $p_{n}(t):=\mathbb{P}(C_{n,ij}(t)=1)$ is well defined.

Remark 2.3.

•

Later the interpretation of $d_{t}^{n}$ will be as follows: The distance $d_{t}^{n}((i,j),(k,l))$ reflects how strongly the pairs $(i,j)$ and $(k,l)$ are related conditionally on the observed network (short distance means strong relation, large distance means weak relation).

•

From a modelling perspective, we emphasize that the distance function $d^{n}_{t}$ does not need to be known to the researcher. It is only necessary that it exists.

•

In order to allow sparsity we explicitly allow that $p_{n}(t)\to 0$ for $n\to\infty$ .

2.2 Asymptotic Uncorrelation

We briefly review a stationarity type result which was similarly used for static networks in Vainora [46] and for dynamic networks in Kreiß et al. [28]. In this subsection we restrict to undirected networks (see also the paragraph below Corollary 2.4). Consider square integrable random variables $(Z_{n,ij})_{(i,j)\in L_{n}}$ with the following property: The $Z_{n,ij}$ are identically distributed. For $(i,j),(k,l)\in L_{n}$ let $\kappa((i,j),(k,l)):=|\{i,j\}\cap\{k,l\}|\in\{0,1,2\}$ denote the number of common vertices of $(i,j)$ and $(k,l)$ . For $(i,j),(k,l),(i^{\prime},j^{\prime}),(k^{\prime},l^{\prime})\in L_{n}$ the pairs $(Z_{n,ij},Z_{n,kl})$ and $(Z_{n,i^{\prime}j^{\prime}},Z_{n,k^{\prime}l^{\prime}})$ are identically distributed if $\kappa((i,j),(k,l))=\kappa((i^{\prime},j^{\prime}),(k^{\prime},l^{\prime}))$ .

We will later consider $Z_{n,ij}:=\varphi(N_{n,ij},X_{n,ij},C_{n,ij})$ where $\varphi$ takes real values and $\mathbb{E}(Z_{n,ij}^{2})<\infty$ . The exchangeability assumption in Definition 2.2 guarantees the above property which in turn yields the following corollary:

Corollary 2.4.

For all $n\in\mathbb{N}$ , let $(Z_{n,ij})_{(i,j)\in L_{n}}$ be as above. Recall that $r_{n}=\frac{n(n-1)}{2}$ is the number of undirected pairs. Then, for pairwise different vertices $v_{1},v_{2},v_{3},v_{4}\in V_{n}$ ,

[TABLE]

For the proof of this corollary we just need to think about the number of terms in each sum. It is a combinatorial exercise to find that their sizes are of the order $r_{n},r_{n}^{\frac{3}{2}}$ and $r_{n}^{2}$ respectively. In order to have $\mathcal{V}_{n}\to 0$ we hence require that $\textrm{Cov}(Z_{n,v_{1}v_{2}},Z_{n,v_{2}v_{3}})=o\left(r_{n}^{\frac{1}{2}}\right)$ and $\textrm{Cov}(Z_{n,v_{1}v_{2}},Z_{n,v_{3}v_{4}})=o(1)$ . We will call assumptions of this type asymptotic uncorrelation assumptions. This result naturally extends to directed networks by splitting the sum in all possible patterns which two directed pairs can have.

2.3 Momentarily m-Dependent Networks

We introduce momentary- $m$ -dependence for processes $\mathcal{N}_{n,ij}:=(N_{n,ij},X_{n,ij},C_{n,ij})$ . The aim is to mathematically formulate and use the following intuition: The processes $\mathcal{N}_{n,ij}$ and $\mathcal{N}_{n,kl}$ are dependent for any fixed choice of $(i,j)$ and $(k,l)$ . However, if we choose $(i,j)$ and $(k,l)$ such that they are far apart in the observed network (in terms of $d_{t_{0}}^{n}$ for some time $t_{0}\in[0,T]$ ), then for real world actors it is likely that it takes some time for the pair $(i,j)$ to receive knowledge of interactions between $(k,l)$ and to process them before reacting by casting interactions themselves. Therefore, we assume: Provided that we know the network structure at time $t_{0}$ and that we know the past of all processes up to time $t_{0}$ and provided that we know that for two pairs $(i,j),(k,l)\in L_{n}$ the distance $d_{t_{0}}^{n}((i,j),(k,l))$ is large, then the processes $\mathcal{N}_{n,ij}(t)_{t\in[t_{0},t_{0}+6\Delta]}$ and $\mathcal{N}_{n,kl}(t)_{t\in[t_{0},t_{0}+6\Delta]}$ are conditionally independent given all information up to time $t_{0}$ for some $\Delta>0$ (the factor six is chosen for later convenience). We illustrate this in Figure 1: The horizontal axis is time and the vertical axis is distance. The two lines correspond to two pairs $(i,j)$ and $(k,l)$ and the vertical distance between these two lines represents the distance between the pairs $(i,j)$ and $(k,l)$ . Dots on the lines indicate events between the respective vertices. The two gray rectangles in the future (next to the line at $t_{0}$ ) stand for the information of the processes of $(i,j)$ on the interval $[t_{0},t_{0}+6\Delta]$ and the processes of $(k,l)$ on the interval $[t_{0},t_{0}+6\Delta]$ . We suppose that these two are conditionally independent given the information up to time $t_{0}$ . So there is no direct information flow between these two areas. However, they are not unconditionally independent because we can infer from the gray rectangle in the future of $(k,l)$ on its past when $(i,j)$ and $(k,l)$ were possibly close, such that we can infer on the past of $(i,j)$ which is informative about its future. But if we already know the past, then additional knowledge of the future of $(k,l)$ is independent of the future of $(i,j)$ .

In mathematical terms this can be described as follows. For a set $J\subseteq L_{n}$ of pairs, let $d_{s}^{n}((i,j),J):=\min\{d_{s}^{n}((i,j),(k,l)):\,(k,l)\in J\}$ be the distance of $(i,j)$ to $J$ at time $s$ .

Definition 2.5.

A structured interaction network process $(\mathcal{N}_{n,ij})_{(i,j)\in L_{n}}$ with filtration $(\mathcal{F}_{t}^{n})_{t\in[0,T]}$ and distance $d^{n}_{t}$ is said to be momentarily- $m$ -dependent for $m\in[0,\infty)$ , if

[TABLE]

In order to work with momentary- $m$ -dependent networks, we introduce two augmentations of the filtration $\mathcal{F}_{t}^{n}$ . Generally, when extending filtrations, we have more predictable processes and fewer martingales. In the following definition we introduce two extensions of $\mathcal{F}_{t}^{n}$ , one of which is the exact right trade-off: Certain processes become predictable with respect to the extension while certain other processes remain martingales (see Lemma 2.7).

For two $\sigma$ -fields $\mathcal{A}$ and $\mathcal{B}$ we denote by $\mathcal{A}\lor\mathcal{B}$ the $\sigma$ -field which is generated by the union of $\mathcal{A}$ and $\mathcal{B}$ .

Definition 2.6.

Let $(\mathcal{N}_{n,ij})_{(i,j)\in L_{n}}$ be a structured interaction network with filtration $(\mathcal{F}_{t}^{n})_{t\in[0,T]}$ and distance $d^{n}_{t}$ . For a subset $J\subseteq L_{n}$ define

[TABLE]

We call $\mathcal{F}^{n,J,m}_{t}$ the long-sighted leave- $J$ -out filtration. In contrast, the short-sighted leave- $J$ -out filtration $\widetilde{\mathcal{F}}^{n,J,m}_{I,t}$ for $I\subseteq J$ is defined by

[TABLE]

Denote further for any pair $(i,j)\in L_{n}$ , $F_{(i,j)}(t):=\{(k,l)\in L_{n}:\,d_{t}^{n}((i,j),(k,l))\geq m\}$ . Functions which are predictable with respect to $\widetilde{\mathcal{F}}_{I,t}^{n,J,m}$ will be called of leave- $m$ -out type.

It holds that $\mathcal{F}_{t}^{n,J,m}\supseteq\widetilde{\mathcal{F}}_{t}^{n,J,m}$ . We can now make the earlier mentioned property of the long-sighted leave- $J$ -out filtration precise: The counting processes stay counting processes and in particular their martingales are still martingales. The proof of the result is a direct consequence of the definition and can be found in Appendix 5.5.

Lemma 2.7.

We consider a structured momentarily- $m$ -dependent interaction network. For $J\subseteq L_{n}$ , the processes $\left(N_{n,ij}(t)\right)_{(i,j)\in J}$ form a multivariate counting process with respective intensity functions $(\lambda_{n,ij}(t))_{(i,j)\in J}$ with respect to $\mathcal{F}_{t}^{n,J,m}$ . This means in particular that the counting process martingales $M_{n,ij}(t):=N_{n,ij}(t)-\int_{0}^{t}\lambda_{n,ij}(s)ds$ remain martingales with respect to $\mathcal{F}_{t}^{n,J,m}$ .

Remark 2.8.

Throughout we will use the notion of Stieltjes and Itô Integration interchangeably when possible. In particular, when $\varphi$ is predictable, we will understand $\int_{0}^{T}\varphi(t)dM_{n,ij}(t)$ as an Itô Integral and use its martingale properties (since $M_{n,ij}$ is a martingale). If $\varphi$ is not predictable we can understand the same integral as Stieltjes Integral which is defined path-wise (no predictability required) but is itself no martingale (in contrast to the Itô Integral).

We can use momentary- $m$ -dependence in order to extend a technique which Mammen and Nielsen [32] applied to iid observations in a non-network context: Approximate non-predictable integrands by processes which are predictable with respect to a larger filtration. The proof of the following result is along the lines of Mammen and Nielsen [32] and is given in Appendix 5.5.

Proposition 2.9.

Let $\mathcal{N}_{n,ij}$ be momentarily- $m$ -dependent and let $\varphi_{n,ij}:[0,T]\to\mathbb{R}$ for $n\in\mathbb{N},(i,j)\in L_{n}$ be random functions (not necessarily predictable). Let furthermore $\widetilde{\varphi}_{n,ij}^{J}:[0,T]\to\mathbb{R}$ for $(i,j)\in J\subseteq L_{n}$ and $|J|\leq 2$ be of leave- $m$ -out type, i.e., predictable with respect to $\widetilde{\mathcal{F}}_{(i,j),t}^{n,J,m}$ . Then, we have ( $\lambda_{n,ij}$ and $M_{n,ij}$ mean the same as in Lemma 2.7)

[TABLE]

For our purposes we have to extend this technique even further: When studying kernel estimators we encounter integrals of the type

[TABLE]

where $h\to 0$ is a bandwidth and

[TABLE]

for some real-valued function $f$ . Hence, for a given $t$ the integrand in (2.2) is non-predictable. However, under momentary- $m$ -dependence, by removing the correct terms from the sum in the definition of $\varphi_{n,ij,kl}(t,r)$ , we obtain processes which are partially predictable with respect to $\widetilde{\mathcal{F}}^{n,I,m}_{\{(i,j),(k,l)\},t}$ :

Definition 2.10.

Let $\varphi$ be a real-valued stochastic process defined on $[0,T]^{2}$ . $\varphi$ is called partially-predictable with respect to a filtration $\mathcal{G}_{t}$ if for any filtration $\mathcal{H}_{t}\supseteq\mathcal{G}_{t}$ and any process $X$ which is adapted to $\mathcal{H}_{t}$ the process

[TABLE]

is predictable with respect to $\mathcal{H}_{t}$ . Note that $\varphi(r,t)=g(r)h(t)f(r,t)$ with $g$ being adapted, $h$ being predictable (both with respect to $\mathcal{G}_{t}$ ) and $f$ deterministic has this property.

Since the martingales $M_{n,ij}$ and $M_{n,kl}$ remain martingales under the correct long-sighted filtration, we can now use stochastic integral properties. For ease of notation, we use the convention $u_{r}:=(i_{r},j_{r})\in L_{n}$ for $r=1,...,4$ and we write sets without curly brackets, e.g. instead of $\{u_{1},u_{2},u_{3}\}\subseteq L_{n}$ , we simply write $u_{1}u_{2}u_{3}\subseteq L_{n}$ . The proof of the following result is given in Appendix 5.1.

Theorem 2.11.

Let $(N_{n,ij},X_{n,ij},C_{n,ij})_{(i,j)\in L_{n}}$ be a structured interaction network with filtration $(\mathcal{F}_{t}^{n})_{t\in[0,T]}$ and distance $d^{n}_{t}$ . Let $\varphi_{n,u_{1}u_{2}}:[0,T]\times[0,T]\to\mathbb{R}$ for $u_{1},u_{2}\in L_{n}$ be random functions (possibly not predictable with respect to $\mathcal{F}_{t}^{n}$ ). It holds that

[TABLE]

for $n\to\infty$ , if

the processes $(N_{n,ij},X_{n,ij},C_{n,ij})_{(i,j)\in L_{n}}$ are momentarily- $m$ -dependent and 2. 2.

there exist random functions $\widetilde{\varphi}_{n,u_{1}u_{2}}^{I}(t,r)$ for all $u_{1}u_{2}\subseteq J\subseteq L_{n}$ with $|J|\leq 4$ which are partially predictable with respect to $\widetilde{\mathcal{F}}_{u_{1}u_{2},t}^{n,J,m}$ , respectively, and such that (the symbol $\neg$ means negation)

[TABLE]

2.4 Mixing Networks

In this section our interest lies in proving a Bernstein type exponential inequality e.g. for the following average

[TABLE]

where we will later have $Z_{n,ij}=\varphi(N_{n,ij},X_{n,ij},C_{n,ij})$ for a real-valued function $\varphi$ . However, the following results do not depend on this specific functional form as long as the $Z_{n,ij}$ have the exchangeability property 2 in Definition 2.2. The difficulties here are two-fold: We usually have that $Z_{n,ij}=0$ when $C_{n,ij}(t)=0$ and hence the number of terms in the sum is random and, secondly, the terms are dependent. We argued in the discussion of Figure 1 that unconditional independence is not a good assumption. However, it is reasonable to assume that, conditionally on the network, far apart actors influence each other very weakly. We include this aspect in the model by imposing mixing assumptions with time-varying mixing coefficients. These mixing assumptions will be used in the proofs by applying the grouping technique for mixing random variables (cf. Rio [43], Doukhan [14], Viennet [48]): The idea is to group the random variables $Z_{n,ij}$ in blocks which have large distances between each other in the observed network. To this end, we define a partitioning of a network as follows (the existence of such partitions will be discussed in Section 2.5).

Definition 2.12.

Let $\Delta>0$ , $t\in[0,T]$ , $\mathcal{K},n,m\in\mathbb{N}$ and $k\in\{1,...,\mathcal{K}\}$ . We call the random sets $G^{t}(k,m,\Delta)\subseteq L_{n}$ a $\Delta$ -partition of the network at time $t$ (note that we omit $n$ in the notation) if

$(k,m)\neq(k^{\prime},m^{\prime})\,\Rightarrow\,G^{t}(k,m,\Delta)\cap G^{t}(k^{\prime},m^{\prime},\Delta)=\emptyset$ , 2. 2.

For $k\in\{1,...,\mathcal{K}\}$ and $m\neq m^{\prime}$ : $(i,j)\in G^{t}(k,m,\Delta),\,(k,l)\in G^{t}(k,m^{\prime},\Delta)\,\Rightarrow\,d_{t}^{n}((i,j),(k,l))\geq\Delta$ .

Intuitively speaking, the sets $G^{t}(k,m,\Delta)$ form random groups where two different groups of the same type $k$ are far apart in the random network. For the following definition we use the notion of $\beta$ -mixing coefficients. For any two $\sigma$ -fields $\mathcal{A}$ and $\mathcal{B}$ denote the $\beta$ -mixing coefficient by (cf. e.g. Rio [43])

[TABLE]

where $\mathbb{P}_{\mathcal{A}\otimes\mathcal{B}}$ and $\mathbb{P}_{\mathcal{A}}\otimes\mathbb{P}_{\mathcal{B}}$ denote measures on $\mathcal{A}\otimes\mathcal{B}$ for which for all sets $A\times B\in\mathcal{A}\otimes\mathcal{B}$

[TABLE]

For two random variables $X,Y$ we denote $\beta(X,Y):=\beta(\sigma(X),\sigma(Y))$ where $\sigma(X)$ and $\sigma(Y)$ denote the $\sigma$ -fields generated by $X$ and $Y$ respectively.

Definition 2.13.

Let $(Z_{n,ij})_{(i,j)\in L_{n}}$ be a sequence of random variables, let $\Delta>0$ and let $G^{t}(k,m,\Delta)$ be a $\Delta$ -partition of the network as in Definition 2.12. For every time point $t$ and every pair $(i,j)\in L_{n}$ , we define

[TABLE]

the indicator function which checks if $(i,j)$ belongs to the $m$ -th block of type $k$ at time $t$ . Group the $Z_{n,ij}$ based on the partition $G^{t}(k,m,\Delta)$ , i.e.,

[TABLE]

Then we define the $\beta$ -Mixing coefficient which depends on the graph partitioning $G^{t}(k,m,\Delta)$ (which we do not indicate in the notation) via:

[TABLE]

Remark 2.14.

In most (but not all) situations we have additionally to the properties of Definition 2.12 that

[TABLE]

where $E_{n,t}$ is the random edge set of the network. In case where $Z_{n,ij}=0$ for $C_{n,ij}(t)=0$ , i.e., if $Z_{n,ij}C_{n,ij}(t)=Z_{n,ij}$ all relevant pairs $(i,j)\in L_{n}$ are covered by the partition and it holds that

[TABLE]

In general, for our results to hold, we do not have to require (2.9). It will be sufficient to assume that (2.10) holds.

In applications, the random variables $Z_{n,ij}$ will depend on a time point $t_{0}\in[0,T]$ . So it will be the case that for $t$ close to $t_{0}$ the $\beta$ -Mixing coefficients at time $t$ will be small while they might be large for $t$ far away from $t_{0}$ . The following result is the main result of this section (inspired by Doukhan [14]). The proof is deferred to Section 5.1.

Lemma 2.15.

Let $(Z_{n,ij})_{(i,j)\in L_{n}}$ be an array of random variables which fulfils (2.10) for a given $t\in[0,T]$ and let $\Delta_{n}>0,\mathcal{K}_{n}\in\mathbb{N}$ . Suppose that for all $n\in\mathbb{N}$ there exist $\Delta_{n}$ -partitions with $\mathcal{K}_{n}$ block types and numbers $E_{k,m}^{n,t},E_{k}^{n,t}>0$ for $k=1,...,\mathcal{K}_{n}$ and $m=1,...,r_{n}$ as well as $\sigma^{2},c_{1},c_{2},c_{3}>0$ such that (cf. notation from Definition 2.13)

$\forall n\in\mathbb{N},\rho\in\mathbb{N}\setminus\{0,1\},k\in\{1,...,\mathcal{K}\},m\in\{1,...,r_{n}\}:$ **

[TABLE] 2. 2.

for $|E|_{n,t}:=\sum_{k=1}^{\mathcal{K}}\sum_{m=1}^{r_{n}}E_{k,m}^{n,t}$ , it holds for all $n\in\mathbb{N}$ and all $k=1,...,\mathcal{K}_{n}$

[TABLE]

Then, for any $x>0$ and all $n\in\mathbb{N}$ ,

[TABLE]

Note that $E_{k,m}^{n,t}$ can be understood as the expected size of group $m$ of type $k$ and that $E_{k}^{n,t}$ can be understood as the largest expected group size of groups of type $k$ . Then, $|E|_{n,t}$ can be understood as the expected number of edges in the network at time $t$ , i.e., $|E|_{n,t}\approx r_{n}p_{n}(t)$ . Now the first part of condition 2 in Lemma 2.15 translates to assuming that the expected fraction of edges contained in all groups of type $k$ is non-negligible. The second part means that the largest single group cannot be too large. The first condition, the moment condition, will be discussed in the next lemma. We will also show that it suffices to assume the existence of a suitable partition as above with high probability. To this end we introduce an indicator function $\Gamma_{n}^{t}$ which ensures that we can partition the network suitably. Conditionally on that, we can use the previous mixing results. In order to obtain an unconditional result we need to assume that $\Gamma_{n}^{t}=1$ sufficiently often. This is reflected in the unusual condition on $x$ . The proof of the following result can be found in Appendix 5.1. In addition we will show in the Appendix (Lemma 5.22) a different result which provides an exponential inequality for (unbounded) martingales and also avoids the moment condition.

Lemma 2.16.

Let $(Z_{n,ij})_{(i,j)\in L_{n}}$ be random variables bounded by $M>0$ and let $(C_{n,ij}(t))_{i,j\in V_{n}}$ be the adjacency matrix of a random, undirected network at time $t\in[0,T]$ . Let furthermore $\Delta_{n}>0$ and $I_{n,ij}^{k,m,t}$ be the indicators of a $\Delta_{n}$ -partition with $\mathcal{K}_{n}$ group types which fulfils (2.10) (cf. also Definition 2.13). Suppose there are numbers $E_{k,m}^{n,t}>0$ , $c_{3}>0$ such that for $|E|_{n,t}:=\sum_{k,m}E_{k,m}^{n,t}$ and

[TABLE]

there are constants $c_{2}>0$ and $C>0$ such that $\forall k=1,...,\mathcal{K}:\frac{1}{|E|_{n,t}}\sum_{m=1}^{r_{n}}E_{k,m}^{n,t}\geq c_{2}$ and that for pairwise different vertices $i,j,k,l\in V_{n}$

[TABLE]

Let $\beta_{t}(\Delta_{n})$ denote the $\beta$ -mixing coefficients with respect to $(Y_{n,ij})_{(i,j)\in L_{n}}$ as in Definition 2.13. Then, for $x>M\mathbb{P}(\Gamma_{n}^{t}=0)r_{n}\left(\log(|E|_{n,t})\cdot|E|_{n,t}\right)^{-\frac{1}{2}}$

[TABLE]

2.5 Examples

In the following we discuss the previous concepts on examples.

2.5.1 On $\Delta$ -Partitions

For the exponential inequality to hold, we do not need to know the specific partition in practice: Knowledge of existence is sufficient. Nevertheless, we discuss under which circumstances a $\Delta$ -partition with the properties of Lemma 2.16 can be expected to exist. Let $G_{n}$ be a given random, dynamic network with adjacency matrix $C_{n}$ . As a distance function we take the graph distance, i.e., $d(ij,kl)$ denotes the length of the shortest path between the pairs $(i,j)$ and $(k,l)$ if $C_{n,ij}=C_{n,kl}=1$ (e.g. $d(ij,kl)=1$ if $(i,j)$ and $(k,l)$ are adjacent, $d(ij,kl)=2$ if there is one edge between $(ij)$ and $(k,l)$ and so forth). Otherwise or if there is no path, we set $d(ij,kl)=\infty$ . We begin by supposing that for a given point in time $t$ the network $G_{n}(t)$ is a two-dimensional grid. In that situation we consider a chess-board like partitioning $(G^{t}(k,m,\Delta))_{k,m}$ of the edges as illustrated in Figure 2 where the sides and corners of the blocks lie exactly on the vertices. For edges which lie on the sides of the blocks we take the convention that the bottom and left side belong to the respective block. Each block (square) is of side length $\Delta$ and each block is assigned one of four types. In Figure 2 all blocks of the same type $k\in\{1,...,4\}$ have been assigned the same number. It is clear that the distance between two points taken from two different blocks of the same type $k$ is at least $\Delta$ . We assign numbers $\{1,2,3,...\}$ to all blocks of the same type such that we can speak of the $m$ -th block of type $k$ . Later we will choose $\Delta_{n}\approx a\log n$ and $E_{k,m}^{n,t}$ in Lemma 2.16 will be the expected size of the $m$ -th block of type $k$ . Above the $\Delta$ -partition is made such that all edges are contained in exactly one set $G^{t}(k,m,\Delta)$ and we obtain as a consequence that by definition $|E|_{n,t}$ will equal the expected number of edges. Moreover, the blocks $G^{t}(k,m,\Delta)$ all have identical size $2\Delta_{n}^{2}\approx 2(a\log n)^{2}$ . Thus $\frac{1}{|E|_{n,t}}\sum_{m=1}^{r_{n}}E_{k,m}^{n,t}=1/4$ . Also $S_{k}(t)=2(a\log n)^{2}$ and hence $\Gamma_{n}^{t}=0$ for $c_{3}$ chosen large enough. These considerations can be directly transferred to higher dimensional grids. Hence, for networks which form a grid of any dimension, the assumption of the existence of a sequence of $\Delta_{n}$ -partitions as required in Lemma 2.16 with $\Delta_{n}=O(\log n)$ is proven. In consideration of this, we conclude that for a network which roughly looks like a grid, the above construction still yields a valid partition.

In order to check the assumption for a general network, we assign to each pair of vertices random, $d$ -dimensional coordinates. Then, we plot these coordinates in the $d$ -dimensional plane and partition the edges by using a chess-board like partitioning as before. We suggest two example strategies for doing this.

Example 2.17.

Let $e_{1},,...,e_{d}\in L_{n}$ be arbitrary pairs of vertices. For any $n\in\mathbb{N},t\in[0,T]$ and $(i,j)\in L_{n}$ we call $(d_{t}^{n}((i,j),e_{1}),...,d_{t}^{n}((i,j),e_{d}))$ the coordinates of $(i,j)$ at time $t$ . Let $G^{t}(k,m,\Delta)$ for $k=1,...,2^{d}$ and $m\in\mathbb{N}$ comprise all pairs $(i,j)$ with coordinates lying in the $m$ -th block of type $k$ in a chess-board like grouping (similar to Figure 2 in the case $d=2$ ).

Note that above we construct the partition for each time point $t$ individually. Hence, the choice of the reference pairs $e_{1},...,e_{d}$ may depend on time as well. Moreover, the pairs may be chosen randomly since $\Delta$ -partitions are allowed to be random. That we produce indeed a $\Delta$ -partition in the above example is ensured by the following Lemma.

Lemma 2.18.

Let $\Delta>0$ be given. The sets $G^{t}(k,m,\Delta)$ defined in Example 2.17 form a $\Delta$ -partition of the network in the sense of Definition 2.12.

Proof.

We consider the case $d=2$ (The proof for $d>2$ follows analogous arguments). $G^{t}(k,m,\Delta)$ and $G^{t}(k^{\prime},m^{\prime},\Delta)$ are disjoint for $(k,m)\neq(k^{\prime},m^{\prime})$ by construction. Let $(i,j),(k,l)\in L_{n}$ and denote by $(q,r):=(d_{t}^{n}((i,j),e_{1}),d_{t}^{n}((i,j),e_{2}))$ and $(q^{\prime},r^{\prime}):=(d_{t}^{n}((k,l),e_{1}),d_{t}^{n}((k,l),e_{2}))$ their respective coordinates. Then we obtain by the triangle inequality

[TABLE]

which yields $d_{t}^{n}((i,j),(k,l))\geq|q-q^{\prime}|$ . Analogously, we obtain $d_{t}^{n}((i,j),(k,l))\geq|r-r^{\prime}|$ . The second condition in Definition 2.12 follows if we notice that by definition for $m\neq m^{\prime}$ , $(i,j)\in G^{t}(k,m,\Delta)$ and $(k,l)\in G^{t}(k,m^{\prime},\Delta)$ implies either $|q-q^{\prime}|\geq\Delta$ or $|r-r^{\prime}|\geq\Delta$ . ∎

Additionally to Example 2.17, we provide another method of how to equip edges with $d$ -dimensional coordinates via multidimensional scaling.

Example 2.19.

Use Multidimensional Scaling (MDS) (cf. Cox and Cox [9]) to find for each $(i,j)\in L_{n}$ coordinates $p(i,j)$ in $\mathbb{R}^{d}$ such that $\|p(i,j)-p(k,l)\|_{2}\approx d_{t}^{n}((i,j),(k,l))$ , where $\|.\|_{2}$ denotes the Euclidean distance in $\mathbb{R}^{d}$ .

In general it is not possible to have equality above. So the method yields only an approximation. However, the resulting partition might still be valid for a different $\Delta$ . In general we expect that for networks in which the vertices are already related to some position in $\mathbb{R}^{d}$ (e.g. geographical positions) the assumption of the existence of such a $\Delta$ -partition is not restrictive.

2.5.2 Example: Momentary- $m$ -Dependence

This section provides an example of a data generating process which is momentarily- $m$ -dependent and exchangeable. Consider the following over-simplistic model for the use of on-line communication: A population $V_{n}:=\{1,...,n\}$ of people (e.g. employees of a company) is connected through a social network with adjacency matrix $C_{n}$ , i.e., people $i,j\in V_{n}$ are in regular personal contact if $C_{n,ij}=1$ . Now a new on-line communication tool is introduced. Consider a pair $(i,j)\in L_{n}$ with $C_{n,ij}=1$ . At a given point in time $t\in[0,T]$ , the pair either has started to communicate via the on-line tool ( $N_{n,ij}(t)=1$ ) or not ( $N_{n,ij}(t)=0$ ). We suppose that pairs $(i,j)$ with $C_{n,ij}=0$ will also not connect via the communication tool and hence $N_{n,ij}\equiv 0$ in these cases. So the processes $N_{n,ij}$ have at most one jump in the period $[0,T]$ . Suppose we are interested in studying a statistic which depends on the array $\left(N_{n,ij}\right)_{n,ij}$ . Clearly it would not be justifiable to assume that all $N_{n,ij}$ are independent because people who are connected will influence each other. However, assuming momentarily- $m$ -dependence and exchangeability is less restrictive as we will motivate next.

In order to focus on the main ideas, we restrict to a time-constant network model. However, we can also apply dynamic network models and consider the distribution of a snapshot of the network at a given time of interest $t_{0}$ . As a network generating process we consider a stochastic block model (cf. Holland et al. [20]) with random group assignments. That is, we suppose that every vertex $i\in V_{n}$ is randomly assigned to a group $g(i)\in\{1,...,\mathcal{G}\}$ . While the number $\mathcal{G}\in\mathbb{N}$ is fixed, the random variables $g(i)$ for $i=1,...,n$ are assumed to be independent and identically distributed. Now we suppose that the random variables $C_{n,ij}\in\{0,1\}$ are independent conditionally on all $g(i)$ and that for $i>j$ $\mathbb{P}(C_{n,ij}=1|g(i),g(j))=Q(g(i),g(j))$ where $Q\in[0,1]^{\mathcal{G}\times\mathcal{G}}$ contains the connection probabilities. Set $C_{n,ij}=C_{n,ji}$ for $i>j$ and $C_{n,ii}=0$ . We suppose that all these random variables are measurable with respect to $\mathcal{F}^{n}_{0}$ .

The model for the processes $N_{n,ij}$ is as follows. We assume that the decision of a pair $(i,j)$ with $C_{n,ij}=1$ to use the communication tool is influenced by how many neighbouring communication connections are established in the sense that the pair is more likely to use the tool if many others use it as well. In addition, we assume that it takes some time to process information such that if a pair $(i,j)$ uses the tool at time $t_{0}$ pair $(j,k)$ will be influenced by it not before time $t+\delta_{ij,jk}$ (let $\delta_{ij,jk}=\infty$ if $C_{n,ij}=0$ or $C_{n,jk}=0$ ). We allow that some pairs process information faster than others but we do not allow chains of arbitrary fast communication, i.e., we suppose there is $\delta_{0,n}>0$ and $m_{0}\in\mathbb{N}$ such that

[TABLE]

Let $U_{n,ij}\geq 0$ denote pair $(i,j)$ ’s perception of the new tool. We suppose that the $U_{n,ij}$ are independent and identically distributed among all pairs. For any pair $(i,j)\in L_{n}$ define moreover by $L_{n}(i,j):=\{(k,l):k\in\{i,j\},l\in V_{n}\}$ the set of potential neighbours of $(i,j)$ . Using these preparations, we consider the following model for the process $N_{n,ij}$ for given $\alpha_{0},\theta_{0}>0$

[TABLE]

For simplicity of exposition, we choose here a model without covariates and consider only the process $(N_{n,ij},C_{n,ij})_{(i,j)\in L_{n}}$ . Since the group assignments and the initial perceptions $U_{n,ij}$ are iid, the process $(N_{n,ij},C_{n,ij})_{(i,j)\in L_{n}}$ fulfils the exchangeability property (2.1).

Let $\mathcal{F}^{n}_{t}$ denote the canonical filtration with respect to which all $N_{n,ij}$ are adapted and $C_{n,ij}$ and $g(1),...,g(n)$ are measurable with respect to $\mathcal{F}_{0}^{n}$ . Definition 2.5 reads in this situation as follows

[TABLE]

Note firstly that $C_{n,ij}$ is measurable with respect to $\mathcal{F}_{t_{0}}^{n}$ for all $t_{0}$ and thus may be treated as a constant. In order to see that the above holds for $m=m_{0}$ and $6\delta_{n}<\delta_{0,n}$ we use the following notation: A sequence of pairs $P=(p_{a})_{a=1}^{M}\subseteq L_{n}$ is called a path from $(i,j)$ to $(k,l)$ if $p_{1}=(i,j)$ , $p_{M}=(k,l)$ and $p_{a}$ and $p_{a+1}$ share at least one vertex. For such a path we denote by $\delta(P):=\sum_{a=1}^{M-1}\delta_{p_{a},p_{a+1}}$ . Let $t_{0}\in[0,T],J\subseteq L_{n}$ be arbitrary and let $(k,l)\in J$ . Let, moreover, $(i,j)\in L_{n}$ be given with $d((i,j),(k,l))\geq m_{0}$ and let $t\in[t_{0},t_{0}+6\delta_{n}]$ . By construction it is clear that $N_{n,kl}(t)$ depends only on those events of $N_{n,ij}$ which happened before time (the $\inf$ is taken over all paths from $(k,l)$ to $(i,j)$ )

[TABLE]

since $6\delta_{n}\leq\delta_{0,n}$ . Information about these is available in $\mathcal{F}^{n}_{t_{0}}$ . Hence, the events of the processes $N_{n,ij}\mathbbm{1}(d((i,j),J)\geq m_{0})$ on $[t_{0},t_{0}+6\delta_{n}]$ are non-influential to $N_{n,kl}$ for $(k,l)\in J$ on $[t_{0},t_{0}+6\delta_{n}]$ provided that $\mathcal{F}_{t_{0}}^{n}$ is known. Therefore momentary- $m$ -dependence holds.

2.5.3 Example: Mixing

In this section we will show that a simplified version of the process described in Section 2.5.2 is exchangeable and $\beta$ -mixing (see also Remark 2.20 at the end of this section). Let $G_{0}$ be a 2-dimensional discrete torus with a suitable number of $n$ vertices, i.e., the network has grid structure as in Figure 2 but the vertices on the left and on the right are identified, as well as the vertices on the bottom and the top. The random network $G_{n}$ is obtained by randomly assigning labels to the vertices of $G_{0}$ . As before $C_{n}$ denotes the adjacency matrix of $G_{n}$ . We consider processes $N_{n,ij}(t)=C_{n,ij}\mathbbm{1}(A_{n,ij}(t)\geq\theta_{0})$ where $A_{n,ij}(t)$ is a stochastic process which we specify now. Let $\varphi:\{1,...,r_{n}\}\to L_{n}$ be an arbitrary enumeration of the pairs of vertices $L_{n}$ and let $A(t):=(A_{n,\varphi(x)}(t))_{x=1,...,r_{n}}$ . Denote by $\widetilde{C}\in\{0,1\}^{r_{n}\times r_{n}}$ the random matrix with $\widetilde{C}_{x,y}=1$ if and only if $C_{n,\varphi(x)}=C_{n,\varphi(y)}=1$ and the pairs $\varphi(x)$ and $\varphi(y)$ share exactly one vertex. Set $\widetilde{C}_{x,x}=0$ . We suppose that $A(t)$ follows the AR-model

[TABLE]

where $\alpha_{0}<1/6$ and $\varepsilon=(\varepsilon_{1},...,\varepsilon_{r_{n}})^{T}$ is (for simplicity) a vector of independent Brownian motions scaled by $t^{-1/2}$ for $t>0$ . Then, $\varepsilon_{x}(t)\sim N(0,1)$ for all $t$ and all $x$ . Since we assigned the vertex labels randomly, the processes $A_{n,ij}$ and thus $N_{n,ij}$ are exchangeable.

We prove now that the mixing coefficients at a given time $t$ decay exponentially fast. The $\Delta$ -partition we consider is as follows: Fix a chess-board like partitioning as in Figure 2 with side-length $\Delta-1\in\mathbb{N}$ on the deterministic network $G_{0}$ . The random blocks $G^{t}(k,m,\Delta)$ are formed based on the edges which lie in the corresponding square in $G_{0}$ . Fix $(k,m_{1})\neq(k,m_{2})$ and let for ease of notation $I_{1}:=G^{t}(k,m_{1},\Delta)$ and $I_{2}:=G^{t}(k,m_{2},\Delta)$ . The distance $d$ is defined as before. Then, $d(ij,kl)\geq\Delta$ if $(i,j)\in I_{1}$ and $(k,l)\in I_{2}$ . Denote $U_{1}:=\sum_{(i,j)\in L_{n}}\left(N_{n,ij}(t)-\mathbb{E}(N_{n,ij}(t)\mathbbm{1}((i,j)\in I_{1})\right)$ and $U_{2}$ is defined analogously for $I_{2}$ . Note that by the symmetry of the network and the choice of the $\Delta$ -partition the conditional distribution of $U_{2}$ given $C_{n}$ is actually the same for all realisations of $C_{n}$ . As a consequence $\mathbb{P}(U_{2}\in S_{2}|C_{n}=C_{0})=\mathbb{P}(U_{2}\in S_{2})$ for all adjacency matrices $C_{0}$ and all sets $S_{2}\subseteq\mathbb{R}$ . In consideration of this, we can find the mixing coefficient $\beta(U_{1},U_{2})$ as the supremum over all partitions $(S_{1,a}),(S_{2,b})$ of $\mathbb{R}$ of (cf. Dedecker et al. [11])

[TABLE]

where $\sum_{C_{0}}$ is the sum over all adjacency matrices. On $C_{n}=C_{0}$ , the random variables $U_{1}$ and $U_{2}$ are deterministic functions of $D_{1}:=(A_{n,ij})_{(i,j)\in I_{1}}$ and $D_{2}:=(A_{n,ij})_{(i,j)\in I_{2}}$ , respectively. Thus, by Pinsker’s Inequality (e.g., Lemma 2.5 in Tsybakov [45])

[TABLE]

where $KL(\cdot,\cdot|C_{n}=C_{0})$ denotes the Kullback-Leibler divergence (conditionally on $C_{n}=C_{0}$ ) and $(\widetilde{D_{1}},\widetilde{D}_{2})$ are independent with the same marginal distributions as $(D_{1},D_{2})$ . It follows from the properties of the normal-distribution that an exponential bound on $\textrm{Cov}(A_{x}(t),A_{y}(t)|C_{n}=C_{0})$ implies a similar exponential bound on the Kullback-Leibler divergence and thus on (2.13). Details are given in Appendix 5.6. We prove now an exponential bound on the covariances for $x\neq y$ .

Note that all eigenvalues of $\alpha_{0}\widetilde{C}$ can be bounded in absolute value by $6\alpha_{0}<1$ (since every edge has exactly six neighbours). Hence, $A(t)=(I-\alpha_{0}\widetilde{C})^{-1}\varepsilon(t)$ and by the Neumann series representation

[TABLE]

Thus, conditionally on $C_{n}$ , all $A_{x}(t)$ are normally distributed. Recall that $\left(\widetilde{C}^{k}\right)_{z_{1},z_{2}}$ gives the number of paths from $\varphi(z_{1})$ to $\varphi(z_{2})$ of exactly length $k$ . Hence, for all pairs $\varphi(z)\in L_{n}$ we must have $\left(\widetilde{C}^{k}\right)_{x,z}\left(\widetilde{C}^{r}\right)_{y,z}=0$ whenever $r+k<d(\varphi(x),\varphi(y))$ because otherwise there would be a path of length shorter than $d(\varphi(x),\varphi(y))$ which connects $\varphi(x)$ and $\varphi(y)$ via $\varphi(z)$ . Moreover, $\left(\widetilde{C}^{k}_{z_{1},z_{2}}\right)\leq 6^{k}$ for all $z_{1},z_{2}\in\{1,...,r_{n}\}$ . Therefore we obtain by symmetry of $\widetilde{C}$ that there is a constant $c^{*}>0$ (which depends only on $\alpha_{0}$ ) such that

[TABLE]

Remark 2.20.

If $\mathbb{P}(B\in S_{2}|C_{n}=C_{0})$ depends on $C_{0}$ , we can write a more general version of (2.13) which requires two estimates: Firstly, the distribution of the sum over a single block may not depend too strongly on the specific network. In that sense, the main task of the $\Delta$ -partition is to group pairs together such that similarly behaved blocks emerge. This is possibly also the case in the example from Section 2.5.2 if the $\Delta$ -partition takes the original group structure into account. Once this holds, in a second step, it suffices to bound the conditional mixing coefficients for all fixed network realisations.

2.6 Processes Indexed by Vertices

The dependence concepts in Sections 2.3 and 2.4 have been introduced for processes $Z_{n,ij}$ which are indexed by pairs $(i,j)\in L_{n}$ . The results also transfer to processes $(\widetilde{Z}_{n,i})_{i\in V_{n}}$ indexed by vertices. The results and definitions from Sections 2.3 and 2.4 can be obtained for this case by replacing all $Z_{n,ij}$ by $\widetilde{Z}_{n,i}$ , all indices $(i,j)$ of pairs of vertices by vertex indices $i$ and by replacing the set $L_{n}$ by $V_{n}$ . Moreover, $r_{n}$ has to be adopted.

3 Application

We apply the previously introduced dependence concepts to find the asymptotic null-distribution of an $L^{2}$ -type test statistic in the following situation. We consider a structured interaction network process $(N_{n,ij},X_{n,ij},C_{n,ij})_{(i,j)\in L_{n}}$ (cf. Definition 2.2). In the measurability assumption in Section 2.1 we consider a Cox-type link function $\lambda$ which depends on an unknown parameter function $\theta_{0}:[0,T]\to\Theta\subseteq\mathbb{R}^{q}$ (recall that $q$ is the dimension of the covariate functions $X_{n,ij}$ ), i.e., the intensity functions of the counting processes $N_{n,ij}$ are given by

[TABLE]

Examples for choices of the covariate vector $X_{n,ij}$ can be found in Butts [5], Perry and Wolfe [41] and Kreiß et al. [28]. Our interest lies in testing the hypothesis

[TABLE]

On $\textrm{H}_{0}$ , we denote the value of the constant parameter function also by $\theta_{0}$ . For setting up a test statistic, we compare a non-parametric estimator of $\theta_{0}(t)$ with a parametric estimator which assumes that $\theta_{0}(t)$ is constant. As non-parametric estimator we use the local maximum likelihood estimator $\hat{\theta}_{n}(t_{0}):=\underset{\theta\in\Theta}{\operatorname{argmin}}\,\ell_{n}(\theta;t_{0})$ as in Kreiß et al. [28] where $\ell_{n}(\theta;t_{0})$ is the localized-likelihood which is given by

[TABLE]

where $K_{h,t_{0}}(t):=\frac{1}{h}K\left(\frac{t-t_{0}}{h}\right)$ is a kernel with kernel function $K$ and bandwidth $h>0$ . Note that when removing the kernel $K_{h,t_{0}}$ in (3.1) we end up with the regular likelihood $\ell_{n}(\theta)$ for the case when $\theta_{0}$ is a constant (cf. Andersen et al. [3]). Denote finally by $\overline{\theta}_{n}$ a parametric estimator for $\theta_{0}$ which assumes that the parameter function is constant (e.g. the maximum-likelihood estimator $\overline{\theta}_{n}:=\underset{\theta\in\Theta}{\operatorname{argmin}}\,\ell_{n}(\theta)$ ). Similar as in Härdle and Mammen [21] we compare the non-parametric and parametric estimator above by means of the following test statistic

[TABLE]

where $w$ is a non-negative weight function with $\textrm{supp}\,w\subseteq[\delta,T-\delta]$ for $\delta>0$ and $\overline{p}_{n}(t_{0}):=\int_{0}^{T}K_{h,t_{0}}(s)p_{n}(s)ds$ is the smoothed version of $p_{n}(t)=\mathbb{P}(C_{n,ij}(t)=1)$ . In contrast to Härdle and Mammen [21], we know in advance that we test for a constant function. Therefore we can directly compare the parametric and non-parametric estimate and we do not require additional smoothing. For the statement of the following theorem define (note that under the following Assumption (A3, 1) the right hand side below does not depend on $(i,j)$ )

[TABLE]

with the abbreviation (on $H_{0}$ ) $\Sigma_{t}:=\Sigma(t,\theta_{0})$ . The following theorem gives the asymptotic distribution of the test statistic on the hypothesis $H_{0}$ . The proof is given in Section 5.2 in the Appendix.

Theorem 3.1.

Under the Assumptions stated in the remainder of this section, on $\textrm{H}_{0}$

[TABLE]

Note that $A_{n}$ can be approximated by using a plug in estimator for $\Sigma$ and $B$ can be approximated by Lemma 5.2 in the Appendix.

In the following we firstly state an assumption and then discuss its meaning and the intuition behind it. All assumptions are formulated on $\textrm{H}_{0}$ , in particular, $\theta_{0}$ denotes the true value of the constant parameter function. We use the abbreviation

[TABLE]

such that $\lambda_{n,ij}(t,\theta_{0})$ denotes the true intensity function on $H_{0}$ .

(A1) Boundary Cut-Off

$w\colon[0,T]\to[0,\infty)$ * is continuous, bounded and $\mathbbm{T}:=\textrm{supp}(w)\subseteq[\delta,T-\delta]$ for some $\delta>0$ . *

(A2) Exhaustiveness of $\Theta$

There is an open and bounded set $\Theta\subseteq\mathbb{R}^{q}$ (denote the bound by $\tau$ ) such that $\theta_{0}\in\Theta$ .

Assumption (A1) allows to ignore convergence issues of the kernel estimator at the boundary and Assumption (A2) allows us to simplify some notation. Both assumptions are not very restrictive.

(A3) Modelling Assumptions **

The conditional distribution of $(X_{n,ij}(s),N_{n,ij}(s))$ given $C_{n,ij}(s)=1$ is independent of $n$ and $(i,j)\in L_{n}$ . 2. 2.

For $\overline{p}_{n}:=\int_{0}^{T}\overline{p}_{n}(s)ds$ the estimator $\overline{\theta}_{n}$ fulfils $\left\|\overline{\theta}_{n}-\theta_{0}\right\|=O_{P}\left((r_{n}\overline{p}_{n})^{-\frac{1}{2}}\right)$ . 3. 3.

The covariates $X_{n,ij}$ are almost surely bounded by a constant $\hat{K}$ . Together with (A2) this implies that $\lambda_{n,ij}(t,\theta)$ is almost surely bounded by a constant $\Lambda$ for all $\theta\in\Theta$ .

Assumption (A3, 1) is identical to Assumption (A1) in Kreiß et al. [28]. It is reflecting our intuition about the asymptotics of the network: For growing networks we assume that the number of actors to whom a fixed actor has active connections remains bounded over time. In our intuition, the distribution of the covariates and events on an active edge is therefore only influenced by this group which is not growing. In consideration of this, we regard Assumption (A3, 1) not restrictive. (A3, 2) holds for example for the maximum likelihood estimator as introduced in Chapter VI.1.2. in Andersen et al. [3]. However, for our theory here, it is not required that $\overline{\theta}_{n}$ is the maximum likelihood estimator. For (A3, 3) we note that examples of covariates are the number of common friends, age difference, number of interactions in the past and so on. These quantities are naturally bounded e.g. if we believe that interactions and maintaining friendships requires time. More generally, we expect the intensity functions to be bounded if the actors have to invest time in the interactions (e.g. sending a message takes some time even though the actual event of sending is instantaneous). Because in this case, at least on average, actors will not cast arbitrarily many events in a given time frame.

(A4) Kernel and Bandwidth

For $p_{n}:=\inf_{t\in[0,T]}p_{n}(t)$ the bandwidth $h$ fulfils $\frac{\sqrt{r_{n}p_{n}}\cdot h}{(\log r_{n})^{\frac{3}{2}}}\to\infty$ and $h(\log r_{n})^{2}\to 0$ . 2. 2.

The kernel $K:\mathbb{R}\to[0,\infty)$ is supported on $[-1,1]$ and is Hoelder continuous with exponent $\alpha_{K}$ and constant $H_{K}$ , i.e., $|K(x)-K(y)|\leq H_{K}\cdot|x-y|^{\alpha_{K}}$ . As a consequence it is bounded by a constant which we also denote by $K$ .

(A4, 1) holds for example when $h\approx(p_{n}r_{n})^{-\frac{1}{5}}$ is the asymptotically optimal bandwidth choice in most one-dimensional regression contexts (e.g. Tsybakov [45]), so they are standard for this type of problem. The Hoelder continuity of the kernel in (A4, 2) is a mild assumption which avoids technical problems later. For most simple kernels like Epanechnikov or a triangular kernel it is true.

(A5) Invertibility of Fisher-Information

*The matrix $\Sigma_{t}=\Sigma(t,\theta_{0})$ (cf. (3.2)) is invertible for all $t\in[0,T]$ and $t\mapsto\Sigma_{t}$ is continuously differentiable. Particularly, $D:=\sup_{t\in[0,T]}\left\|\partial_{t}\Sigma_{t}\right\|<\infty$ and $t\mapsto\Sigma_{t}$ is uniformly continuous on $[0,T]$ . *

In (A5) we assume that the Fisher Information is invertible. This is a classical assumption. The assumption that $t\mapsto\Sigma_{t}$ is smooth reflects our believe that the behaviour of the network is also changing smoothly over time. Note that $\Sigma_{t}$ is a conditional expectation conditional on $C_{n,ij}(t)=1$ , i.e., changes in the network itself (appearance or disappearance of edges) do not interfere with the smoothness of $\Sigma_{t}$ .

**(A6) Behaviour of $p_{n}(t)$

The quotient $\frac{\max_{s\in[0,T]}p_{n}(s)}{\min_{s\in[0,T]}p_{n}(s)}$ is bounded in $n\in\mathbb{N}$ and the function $p_{n}(t)$ is Hoelder continuous with fixed exponent $\alpha_{c}$ but the constant $H_{n,c}$ may vary like a power of $n$ .**

In this assumption we require that $p_{n}(t)$ lies for a given $n$ always on the same scale. The convergence rate of the non-parametric estimator at a given point in time $t$ depends on $r_{n}p_{n}(t)$ . Hence, we actually assume here that the non-parametric estimator has the same rate at all points in time. Note, however, that $p_{n}=\min_{s\in[0,T]}p_{n}(s)\to 0$ is still allowed.

Before we can present the assumptions on the weak dependence structure, we introduce the concept of hubs. Informally speaking, a hub is a pair $(i,j)$ which is close to many other pairs.

Definition 3.2.

Let $m>0$ , $F\in\mathbb{N}$ and $[a,b]\subseteq[0,T]$ . For a subset of pairs $A\subseteq L_{n}$ we let

[TABLE]

be the maximal number of active edges being close to pairs in $A$ . A pair $(i,j)\in L_{n}$ is called a hub on $[a,b]$ if $K_{m}^{(i,j)}(a,b)\geq F$ .

Consider a collection $[a_{t},b_{t}]\subseteq[0,T]$ for $t\in[0,T]$ . Every random variable $H_{UB}^{A}\in\{0,1\}$ with

[TABLE]

is called hub-ability of the set $A$ . By $N_{UB}:=\sum_{(i,j)\in L_{n}}H_{UB}^{ij}$ we denote an upper bound on the number of possible hubs in the networks $G_{n,t}$ .

The definitions of $H_{UB}^{A}$ and $N_{UB}$ depend on the choice of $([a_{t},b_{t}])_{t\in[0,T]}$ . In order to avoid notation clutter, we do not indicate this in the notation. Note that $K_{m}^{L_{n}}(a,b)$ denotes the size of the largest hub on $[a,b]$ . We think about hubs in the following way: Consider a social media setting where every edge represents the connection between two people. In the works Golder et al. [18], Huberman et al. [22] it is argued that in social media most of the friendships between users are actually inactive in the sense that they do not interchange messages. This underpins the very much believable idea that every actor has only close contact to a bounded number of people. Having close contact means in our formulas that their distance is less than $m$ . That means that most people interact with not more than, say $F$ people, regardless of the size of the network. Thus, if one edge exceeds the threshold of $F$ , we call it a hub. In the following assumptions (H1) and (H2) we have to balance the size and frequency of hubs. This is necessary because if there was one pair which strongly influences the entire network, inference would be impossible.

**(H1) Hub Predictability

For some $m\in\mathbb{N}$ and for $a_{t}:=t-4h,b_{t}:=t+2h$ , the random variables $K_{m}^{L_{n}}:=\sup_{t\in[0,T]}K_{m}^{L_{n}}(a_{t},b_{t})$ and $H_{UB}^{ij}$ are measurable with respect to $\mathcal{F}_{0}^{n}$ . As a consequence also $N_{UB}$ is measurable with respect to $\mathcal{F}_{0}^{n}$ as well as $\mathcal{C}_{n,ij}:=F+3H_{UB}^{ij}\left(K_{m}^{L_{n}}\right)^{2}$ .**

In this assumption we require that the fact whether a pair of vertices has the potential to become a hub is determined in the beginning of the observation period. Note that this does not require that every potential hub is a hub from the beginning. A pair can be close to few others in the beginning and then become a hub later. In addition, the maximal size of the hubs over time is assumed to be determined in the beginning as well (however it might be unknown). This latter assumption might be relaxed to an exponential growth condition.

**(H2) Hub size restriction

Let $m\in\mathbb{N}$ be as in (H1). The frequency of hubs is restricted to the following constraints:**

[TABLE]

The following assumptions refer to the dependence types we reviewed and introduced in Sections 2.2-2.4. For a discussion of them we refer to the respective section.

**(D1) Momentary- $m$ -Dependence

Let $m>0$ be as in (H1). The processes are momentarily- $m$ -dependent in the sense of Definition 2.5. Moreover, the conditions (2.4)-(2.8) of Theorem 2.11 are fulfilled for**

[TABLE]

where $u_{1},u_{2}\in L_{n}$ , $J\subseteq L_{n}$ ( $d_{t}^{n}(u,\emptyset)=\infty$ ) and

[TABLE]

*and $\varphi_{n,u_{1}u_{2}}(t,r):=\widetilde{\varphi}_{n,u_{1}u_{2}}^{\emptyset}(t,r)$ . *

Proving the conditions (2.4)-(2.8) is very tedious. Therefore, we assume here that they hold and provide in Appendix 5.4 a list of technical but easy to believe assumptions under which they can be proven.

**(D2) Asymptotic Uncorrelation

It holds that**

[TABLE]

(D3) Mixing

For any $a>0$ , $n\in\mathbb{N}$ and $t\in[0,T]$ there is a $\Delta_{n}$ -partition with $\Delta_{n}=a\log n$ and $\mathcal{K}$ many types such that for all $(i,j)\in L_{n}$

[TABLE]

and $I_{n,ij}^{k,m,t}(\Delta_{n})$ is measurable with respect to $\mathcal{F}^{n}_{t-h}$ . Define

[TABLE]

where $L$ is either the kernel $K$ or $\widetilde{K}(u)=\frac{1}{2}\mathbbm{1}(u\in[-2,0])$ and $L_{h,t}$ is defined analogously to $K_{h,t}$ . Suppose that for $p_{n}^{*}:=\sup_{t\in[0,T]}p_{n}(t)$ there is $c_{3}>0$ such that for

[TABLE]

it holds that $\sup_{t\in[0,T]}\mathbb{P}\left(\Gamma_{n}^{t}=0\right)$ vanishes exponentially fast. Also we suppose that there is a constant $c_{2}>0$ such that for all $t\in[0,T]$ and either choice of $L$

[TABLE]

Let $\beta_{t}$ denote the mixing coefficients as in Definition 2.13 of

[TABLE]

We suppose that $\beta_{t}(\Delta)\leq\alpha_{1}\exp(-\alpha_{2}\Delta)$ for some $\alpha_{1},\alpha_{2}>0$ . Let

[TABLE]

for any choice $r_{1},r_{2}\in\{1,...,q\}$ . Consider for each $t\in[\delta,T-\delta]$ and each $\theta\in\Theta$ either

[TABLE]

Suppose that for either choice, there is $C>0$ such that for pairwise different vertices $i,j,k,l\in V_{n}$ , all $n\in\mathbb{N}$ and all $t\in[0,T]$ it holds that (use $L=K$ in the definition of $q_{n}^{L}$ for (3.9)-(3.11) and $L=\widetilde{K}$ for (3.12))

[TABLE]

The assumptions have been mostly discussed in Sections 2.2-2.4. However, we would like to comment on $\Gamma_{n}^{t}$ and the measurability in Assumption (D3). The way $\Gamma_{n}^{t}$ is used ensures that the mixing property is only required if the partitioning of the network is reasonable. However, we also assume that the probability that the partitioning is reasonable is large. The inequality in (3.8) means that we assume that the percentage of the edges which are on average contained in the blocks of type $k$ is never negligible, i.e., that no block type is obsolete, a plausible assumption. We also tacitly assume that the number of block types $\mathcal{K}$ is the same for all time points and does not change with $n$ . This assumption reflects the idea that the network geometry is staying the same while the network size is increasing. The measurability assumption is required because of Lemma 5.22 in the Appendix. In the proof of the lemma we see that measurability is essential because we have to apply martingale results. In practice this means that the $\sigma$ -field $\mathcal{F}_{t-h}^{n}$ contains the information which at the time $t-h$ inactive pairs $(i,j)$ (i.e., $C_{n,ij}(t-h)=0$ ) will possibly be active in the interval $[t-h,t+h]$ (i.e., $\sup_{r\in[t-h,t+h]}C_{n,ij}(r)=1$ ). Since there is $\geq$ in the condition in the beginning of (D3) the information is not required exactly: It is no problem if the partitioning contains a few pairs too many. When adding this information to the filtration we assume that the intensity process remains unaffected. This is plausible because we only add information about the future connectivity (not activity) of pairs which are currently known to be inactive (so they are known to not cast events among each other currently regardless of their future behaviour).

Denote for the next assumption

[TABLE]

The following set of assumptions looks very clumsy and difficult to check. However, the reader is politely asked to read the following assumptions by keeping in mind that Assumptions (AD, 3.14,3.15) are moment conditions which merely require a polynomial growth (but do not specify the exponent). Moreover, in (AD, 3.15) the integral is over an interval of length $2h$ , so it is to be expected that this integral is small.

(AD) Additional Dependence

For any given $k_{0}\in\mathbb{N}$ we can choose $\xi>0$ such that

[TABLE]

For the next assumptions we use the notation $d|M_{n,ij}|(s):=dN_{n,ij}(s)+\lambda_{n,ij}(s)ds$ . There is $\kappa>0$ such that for all $\xi>1$ , $(i,j)\in L_{n}$ it holds that

[TABLE]

Additionally suppose that

[TABLE]

Assumption (AD, 3.13) requires the network to concentrate around its expected size. It could be proven on the expense of other technical assumptions. In order to prove (3.13) we need an exponential inequality for averages of counting processes. Such an inequality can be shown by employing $\beta$ -mixing as in the proof of Lemma 2.15. However, instead of using the Bernstein inequality (see e.g. Proposition 5.25) we need a tail bound valid for independent sums of counting processes with bounded intensity functions. For our purposes it is sufficient to use a tail bound induced by using Chebyshev’s Inequality in its exponential form. The remaining assumptions (AD, 3.14)-(AD, 3.16) are moment growth conditions. Overall they appear to be weak because we only require that they do not grow super-polynomially. The main reason why we need these assumptions is that the martingales cannot be computed under the conditional probability.

**(AC) Additional Continuity

Recall the definition of $H_{n,ij}(s,\theta_{0})$ in (D3). For every choice of entries $r_{1},r_{2}\in\{1,...,q\}$ there is $k>0$ such that**

[TABLE]

Instead of posing specific assumptions on the covariate processes $X_{n,ij}$ , we choose to state the continuity which is required in the proofs directly. Assumption (AC, 3.18) could be replaced by assuming that the conditional expectation function $\mathbb{E}(X_{n,ij}(t)|C_{n,ij}(t)=1)$ is uniformly continuous. For Assumption (AC, 3.17) we could for example assume that the sample paths of the covariate processes are continuous and that the number of edges which change their status in a small time interval is very small.

4 Bike Data Illustration

In this section we apply the test from Section 3 to bike-sharing data from Washington D.C. The data is publicly available at https://www.capitalbikeshare.com/system-data. For this small application we use the same setting as in Section 3.2 in Kreiß et al. [28]. In particular we consider the $527$ bike stations as vertices. The bike stations $i$ and $j$ interact whenever there is a bike ride from station $i$ to station $j$ . In contrast to Kreiß et al. [28] we consider only bike rides which happened on May, 5th 2018 between 5am in the morning and 10pm in the evening. We consider a short time span because for a longer time span it would be obvious that the parameter function is not constant (e.g. on weekends and weekdays). Without additional detailed information about short term effects (e.g. street closures due to accidents or increased biking due to festivals), it is difficult to observe the true dynamic network. We therefore use a non-dynamic conservative network as described in Section 2.1 Kreiß et al. [28]: We consider two bike stations $i$ and $j$ connected by an edge if they are regularly used by which we mean that there were at least ten bike rides from $i$ to $j$ in April 2018 (that is, more than two rides per week). The true (but unobserved) time varying network is supposed to contain at each time more edges than the conservative network. But the above methodology could also be applied if we had a dynamic conservative network. Note finally that this small data application serves just as an illustration and is not meant to be and in-depth analysis of bike data which would particularly include a sensitivity analysis of the threshold for the network construction.

As covariates, we choose for this application the geographical distance between the bike stations. Let $d_{i,j}$ denote the logarithm of the distance (in minutes of bike time) between bike station $i$ and $j$ . Then, we consider the following covariate vector

[TABLE]

We suppose that the weak dependence concepts which we introduced in Sections 2.2-2.4 are applicable here because the bike stations have an underlying geographic structure and it is very plausible that bike connections which are geographically far away can be treated more or less independently. Therefore, we use a distance function which is related to the geographical distance (recall that we do not need the actual values of the distance function to apply the technique). To be more specific: If we observe the bike rides between two bike stations $i$ and $j$ during a short time-period $[t_{0},t_{1}]$ , we can make inference about other bike rides between other stations $k$ and $l$ in the same time period $[t_{0},t_{1}]$ only if these stations lie geographically close to each other. If there is e.g. a sudden traffic incident which affects the bike rides between $i$ and $j$ it is likely that the bike rides between $k$ and $l$ are also affected, if they lie in the same area. However, if they lie in a different part of the city, there is no influence. Therefore, we assume that asymptotic uncorrelation and $\beta$ -mixing are plausible assumptions. In order for the assumption of momentary- $m$ -dependence to hold we need to assume that global events which effect the entire city, like big sport events, need to be included in the filtration as we condition on it. As a consequence special events should be included in the intensity function too. Since we restrict the data example to one day (May, 5th 2018) this is a plausible assumption too.

The bandwidth choice for the non-parametric estimator as defined above (3.1) is carried out in the same way as in Kreiß et al. [28] and for details about the procedure we refer to that paper: We compute firstly for different bandwidths $h$ a prediction of the bike rides per edge by using a locally-linear estimator with a one-sided kernel. The resulting prediction error is seen in Figure 3 for a discrete grid of choices of $h$ (we chose the grid for computational simplicity). It can be seen that the prediction error starts to flatten out roughly at $h=1$ and is minimal for $h=1.1$ . So we take that bandwidth and transfer it to the case of a regular kernel estimator by dividing by $\rho\approx 1.82$ (see Kreiß et al. [28] for details). Hence, the bandwidth we use is $h\approx 0.604$ . The non-parametric and parametric estimates in are shown in Figure 4(a). In this scenario the centred and scaled test statistic yields a value of above $16$ . At least asymptotically, we consider the centred and scaled test statistics to be $N(0,1)$ distributed if the underlying data generating process has indeed a constant parameter function. So from this point of view, we have provided evidence that the model with the time-varying parameter function fits the data better. When looking at Figure 4(a) this result is at least intuitively not surprising. If we focus on a shorter time period, e.g. 4pm to 8pm, the result is not as extreme. In Figure 4(b) the corresponding estimators are shown. In this case the scaled and centred test statistic is about $-0.79$ which results in a p-value of about $0.43$ which is usually not regarded as significant.

5 Appendix

5.1 Proofs of Section 2

Proof of Theorem 2.11.

The idea of the proof is to translate the convergence statement about $\varphi_{n,u_{1}u_{2}}$ to statements about $\widetilde{\varphi}_{n,u_{1}u_{2}}^{J}$ . This will be useful because the latter are partially predictable with respect to the short sighted filtration. Since we have certain processes which are martingales with respect to the short sighted filtration (cf. Lemma 2.7) we can make use of martingale properties of the Itô Integral. For the first step, we see that the asymptotic behaviour of (2.3) is the same as the sum over the leave- $m$ -out approximations, i.e.,

[TABLE]

and (5.1) converges to zero by (2.4). Hence, we only have to study (5.2). $\widetilde{\varphi}_{n,u_{1}u_{2}}^{u_{1}u_{2}}(t,r)$ is partially-predictable with respect to the filtration $\mathcal{F}^{n,u_{1}u_{2},m}_{t}$ and, by the assumption of Momentary $m$ -Dependence (c.f. Definition 2.5 and Lemma 2.7), $M_{n,ij}$ is a martingale with respect to $\mathcal{F}^{n,J,m}_{t}$ for all $J\subseteq L_{n}$ with $(i,j)\in J$ . We will use this observation in order to prove that (5.2) converges to zero in probability by applying Markov’s Inequality:

[TABLE]

We will treat the terms (5.3)-(5.5) separately. Note, that in contrast to (2.3), all of the above expressions contain only the approximations with their predictability property. We will show in the following how this is useful.

(5.3) converges to zero by (2.5).

In order to see that (5.4) converges to zero, we note firstly that the two stochastic integrals in (5.4) (with respect to $M_{n,u_{1}}(t)$ and $M_{n,u_{3}}(t)$ ) are martingales with respect to the correct leave- $m$ -out filtrations (namely $\mathcal{F}_{t}^{n,u_{1},m}$ and $\mathcal{F}_{t}^{n,u_{3},m}$ , respectively). Although these two filtrations are in general not the same, we can make use of the fact that the leave- $m$ -out filtrations allow future knowledge. Define furthermore for Lebesgue sets $A\subseteq\mathbb{R}$

[TABLE]

Note that $M_{n,u_{3}}$ and $M_{n,u_{4}}$ are adapted with respect to all leave- $m$ -out filtrations. Since $\widetilde{\varphi}_{n,u_{3}u_{4}}^{u_{1}u_{2}u_{3}u_{4}}(t,r)$ is partially-predictable with respect to $\mathcal{F}_{t}^{n,u_{1}u_{2}u_{3}u_{4},m}$ , we get that

[TABLE]

is predictable (cf. Definition 2.10) and as a consequence, $t\mapsto J_{[0,t)}(u_{1},u_{2},u_{3},u_{4})$ is predictable as well with respect to $\mathcal{F}_{t}^{n,u_{1}u_{2}u_{3}u_{4},m}$ .

With these definitions we have (if $\alpha>\beta$ we define $(\alpha,\beta]:=\emptyset$ )

[TABLE]

We show that this is $o(1)$ by considering the tree lines separately. Recall therefore that $F_{u_{1}}(t)=\{u_{2}\in L_{n}:d_{t}^{n}(u_{2},u_{1})\geq m\}$ is the set of pairs which are further away than $m$ from $u_{1}$ at time $t$ .

For (5.6), we prove firstly that for each $q\in[t-2\delta_{n},t]$

[TABLE]

is measurable with respect to $\mathcal{F}_{q}^{n,u_{1},m}$ . This follows from the following intermediate results:

The integrators $M_{n,u_{3}}$ and $M_{n,u_{4}}$ in $J_{[t,t+2\delta_{n}]}(u_{1},u_{2},u_{3},u_{4})$ are only considered up to time at most $t+2\delta_{n}$ and $\mathcal{F}_{q}^{n,u_{1},m}$ contains information up to and including time $q+6\delta_{n}\geq t+4\delta_{n}$ for processes which are at time $q$ at least of distance $m$ to $u_{1}$ . 2. 2.

We show that $\int_{a-2\delta_{n}}^{a-}\widetilde{\varphi}_{n,u_{3}u_{4}}^{u_{1}u_{2}u_{3}u_{4}}(a,r)dM_{n,u_{2}}(r)\mathbbm{1}(u_{3},u_{4}\in F_{u_{1}}(q))$ is measurable with respect to $\mathcal{F}_{q}^{n,u_{1},m}$ for all $a\in[t,t+2\delta_{n}]$

(a)

$\widetilde{\varphi}_{n,u_{3}u_{4}}^{u_{1}u_{2}u_{3}u_{4}}(a,r)$ is partially-predictable with respect to $\widetilde{\mathcal{F}}_{u_{3}u_{4},a}^{n,u_{1}u_{2}u_{3}u_{4},m}$ by assumption. In particular, it is measurable with respect to $\widetilde{\mathcal{F}}_{u_{3}u_{4},t+2\delta_{n}}^{n,u_{1}u_{2}u_{3}u_{4},m}$ for all $r<a\leq t+2\delta_{n}$ . Thus $\widetilde{\varphi}_{n,u_{3}u_{4}}^{u_{1}u_{2}u_{3}u_{4}}(a,r)\mathbbm{1}(u_{3},u_{4}\in F_{u_{1}}(q))$ requires two types of information: One on $X_{u_{3}}$ and $X_{u_{4}}$ up to time $t+2\delta_{n}$ , and another type of information about the future (after $t+2\delta_{n}$ ) on all processes which are away from $u_{1}u_{2}u_{3}u_{4}$ . Both are contained in $\mathcal{F}_{q}^{n,u_{1},m}$ as we shall show in the following two steps. 2. (b)

The information about $X_{u_{3}}(\tau)$ and $X_{u_{4}}(\tau)$ for $\tau\leq t+2\delta_{n}$ is well included in $\mathcal{F}_{q}^{n,u_{1},m}$ by the same arguments as in 1. 3. (c)

Let $s\leq t-2\delta_{n}$ and $r\leq s+6\delta_{n}$ , then

[TABLE]

is measurable with respect to $\mathcal{F}_{q}^{n,u_{1},m}$ because $s\leq t-2\delta_{n}\leq q$ .

Together the above points imply that $J_{[t,t+2\delta_{n}]}(u_{1},u_{2},u_{3},u_{4})\mathbbm{1}(u_{3},u_{4}\in F_{u_{1}}(t-2\delta_{n}))$ is predictable with respect to $\mathcal{F}_{t}^{n,u_{1},m}$ . Moreover, $M_{n,u_{1}}$ is a martingale with respect to $\mathcal{F}_{t}^{n,u_{1},m}$ by momentary $m$ -dependence. Hence,

[TABLE]

The last part is $o(1)$ by (2.6).

In (5.7), we see that $J_{[0,t)}(u_{1},u_{2},u_{3}u_{4})$ is predictable with respect to

$\mathcal{F}_{t}^{n,u_{1},m}\supseteq\mathcal{F}_{t}^{n,u_{1}u_{2}u_{3}u_{4},m}$ . Thus, we conclude by using that $M_{n,u_{1}}$ is a martingale with respect to $\mathcal{F}_{t}^{n,u_{1},m}$ (with analogue arguments as in the first case): $\eqref{eq:case2}=0$ .

For (5.8), we note firstly that

[TABLE]

Now, we can play a similar game: This time, $M_{n,u_{3}}$ is a martingale with respect to $\mathcal{F}_{\xi}^{n,u_{3},m}$ . Furthermore, $I_{[0,\xi-2\delta_{n})}(u_{1},u_{2},u_{3},u_{4})$ requires knowledge of $M_{n,u_{1}}(\tau)$ , $M_{n,u_{2}}(\tau)$ , $X_{n,u_{1}}(\tau)$ and $X_{n,u_{2}}(\tau)$ for $\tau<\xi-2\delta_{n}$ which is included in $\mathcal{F}_{\xi}^{n,u_{3},m}$ as well as knowledge of $[N_{n,ij}(r),X_{n,ij}(r),C_{n,ij}(r)]\cdot\mathbbm{1}(d_{s}^{n}((i,j),\{u_{1},u_{2}\})\geq m)$ for $s\leq\xi-6\delta_{n}$ and $r\leq s+6\delta_{n}$ , i.e., $r\leq\xi$ which is again included in $\mathcal{F}_{\xi}^{n,u_{3},m}$ . Hence, $\xi\mapsto I_{[0,\xi-2\delta_{n})}(u_{1},u_{2},u_{3},u_{4})$ is predictable with respect to $\mathcal{F}_{\xi}^{n,u_{3},m}$ . Hence, the integrand of (5.8) is a martingale and we obtain $\eqref{eq:case3}=0$ . Thus, we have shown that $\eqref{eq:Ib}=o(1)$ .

Finally, we consider (5.5). Therefore note firstly that $\widetilde{\varphi}_{n,u_{1}u_{2}}^{u_{1}u_{2}u_{3}u_{4}}(t,r)$ and $\widetilde{\varphi}_{n,u_{3}u_{4}}^{u_{1}u_{2}u_{3}u_{4}}(t,r)$ are both partially-predictable with respect to $\mathcal{F}_{t}^{n,u_{1}u_{2}u_{3}u_{4},m}$ . Moreover, $M_{n,u_{1}}$ , $M_{n,u_{2}}$ , $M_{n,u_{3}}$ and $M_{n,u_{4}}$ are all martingales with respect to $\mathcal{F}_{t}^{n,u_{1}u_{2}u_{3}u_{4},m}$ . Hence,

[TABLE]

is also a predictable function in $t$ and

[TABLE]

is a martingale. The same holds when $M_{n,u_{1}}$ and $M_{n,u_{2}}$ are replaced by $M_{n,u_{3}}$ and $M_{n,u_{4}}$ . Hence, for $u_{1}\neq u_{3}$

[TABLE]

For $u_{1}=u_{3}$ we will apply firstly a martingale result to compute the covariance of the two stochastic integrals (first equality below), in the second equality below we employ a similar technique as in the computations for (5.4): Note that $C_{n,u_{1}}(t)\lambda_{n,u_{1}}(t)\mathbbm{1}(u_{2},u_{4}\in F_{u_{1}}(t-2\delta_{n}))$ is measurable with respect to $\mathcal{F}_{t-2\delta_{n}}^{n,u_{2}u_{4},m}$ , additionally $M_{n,u_{2}}$ and $M_{n,u_{4}}$ are martingales with respect to $\mathcal{F}_{t}^{n,u_{2}u_{4},m}$ . Hence,

[TABLE]

So we may rewrite

[TABLE]

By (2.7) and (2.8) we conclude $\eqref{eq:Ic}=o(1)$ . Thus we have finally shown that $\eqref{eq:a2}\overset{\mathbb{P}}{\to}0$ and hence the proof is complete. ∎

Proof of Lemma 2.15.

The proof of (2.11) is an immediate consequence of the following Proposition 5.1 together with the assumptions:

[TABLE]

∎

Proposition 5.1.

Let $(Z_{n,ij})_{(i,j)\in L_{n}}$ be a set of random variables which fulfils (2.10) for a given $t\in[0,T]$ . With the same notation as in Definition 2.13 assume that there is a $\Delta$ -partition such that for all $\rho\in\mathbb{N}$ with $\rho\geq 2$ and all $k\in\{1,...,\mathcal{K}\}$ and $m\in\{1,...,r_{n}\}$

[TABLE]

for some numbers $\sigma^{2},E_{k,m},E_{k}$ and $C$ with $|E|_{n}:=\sum_{k=1}^{\mathcal{K}}\sum_{m=1}^{r_{n}}E_{k,m}<+\infty$ . Then,

[TABLE]

Proof.

With the definitions as in Definition 2.13 we obtain by (2.10) that

[TABLE]

Hence,

[TABLE]

In order to reduce notation, we omit $(\Delta)$ when talking about $U_{k,m}^{n,t}(\Delta)$ . By Lemma 5.24 we can construct sequences $U_{k,m}^{*}$ as follows: We assume that the $\sigma$ -field $\mathcal{F}_{t}^{n}$ is rich enough to allow for independent extra random variables $\delta_{k,m}$ which are uniformly distributed on $[0,1]$ and which are independent amongst each other and of everything else. The construction is the same for every $k$ , so we only construct the sequence $U_{1,m}^{*}$ , all other sequences $U_{k,m}^{*}$ for $k\geq 2$ are constructed analogously. Define $U_{1,1}^{*}:=U_{1,1}$ . For $m\geq 2$ there is by Lemma 5.24 a function $f_{m}$ such that $U_{1,m}^{*}:=f_{m}(U_{1,1},...,U_{1,m-1},\delta_{1,m},U_{1,m})$ has the same distribution as $U_{1,m}$ , is independent of $U_{1,1},...,U_{1,m-1}$ and

[TABLE]

To sum it up, we have sequences $U_{k,m}^{*}$ with

For any $k$ and any fixed $R\in\mathbb{N}$ , $\left(U_{k,m}^{*}\right)_{m=1,...,R}$ is a sequence of independent random variables. 2. 2.

$U_{k,m}^{*}$ and $U_{k,m}$ have the same distribution. 3. 3.

For all $k=1,...,\mathcal{K}$ : $\mathbb{P}\left(\exists m\in\{1,...,r_{n}\}:U_{k,m}\neq U_{k,m}^{*}\right)\leq r_{n}\cdot\beta_{t}(\Delta)$ .

Denote by $R_{k}$ the random number of blocks $U_{k,m}$ of type $k$ which exist, i.e., such that for $m>R_{k}$ we have $U_{k,m}=0$ . So we obtain by (5.9) for any $x\geq 0$ and any sequence $(\alpha_{k})_{k=1,...,\mathcal{K}}$ with $\sum_{k=1}^{\mathcal{K}}\alpha_{k}=1$ and $\alpha_{k}\geq 0$ :

[TABLE]

For every $k$ the sequence $U_{k,m}^{*}$ is a sequence of independent random variables. Moreover, by definition $\mathbb{E}(U_{k,m})=0$ . So, the assumptions of Proposition 5.25 are fulfilled with $\sigma_{m}^{2}:=E_{k,m}\sigma^{2}$ and $c:=E_{k}C$ . So we can estimate the first part of (5.10) by

[TABLE]

We chose $\alpha_{k}=|E|_{n}^{-1}\sum_{m=1}^{r_{n}}E_{k,m}$ and obtain by combining the equalities (5.10) and (5.11),

[TABLE]

∎

Proof of Lemma 2.16.

Define $\varepsilon:=x\cdot\sqrt{\frac{\log|E|_{n,t}}{|E|_{n,t}}}$ . Then,

[TABLE]

Line (5.15) is part of the statement, so we just leave it as it is. For line (5.14) we have

[TABLE]

Thus line (5.14) equals zero by choice of $x$ . For line (5.13) we can make a similar argument

[TABLE]

The last expression is a false statement and hence the first line cannot be true. Thus, $\eqref{eq:exp2}=0$ . For line (5.12) we apply Lemma 2.15 to $Y_{n,ij}$ which is given in the statement of Lemma 2.16. We have

[TABLE]

In order to see that the conditions of Lemma 2.15 hold, let $\rho\in\mathbb{N}$ and greater or equal than two. Going on, we conclude for the grouping of $Y_{n,ij}$ by using the above estimation (recall that $\mathbb{E}\left(U_{k,m}^{n,t}(\Delta_{n})\right)=0$ )

[TABLE]

Moreover, by assumption

[TABLE]

Thus the first requirement of Lemma 2.15 holds for the definitions in the statement of this Lemma and $\sigma^{2}=3CM^{2}$ and $c_{1}=4M$ . The first part of the second condition in Lemma 2.15 holds by assumption and the second part holds by definition of $E_{k}^{n,t}$ . Thus, we may apply Lemma 2.15 and obtain for (5.12)

[TABLE]

This yields the statement. ∎

5.2 Proof of Theorem 3.1

Recall that $M_{n,ij}(t):=N_{n,ij}(t)-\int_{0}^{t}\lambda_{n,ij}(s)ds$ denotes the counting process martingale and decompose the likelihood as follows:

[TABLE]

Define moreover for $(i,j),(k,l)\in L_{n}$ and $s,t\in[0,T]$ (we use the convention $\Sigma_{t}^{-T}:=\left(\Sigma_{t}^{-1}\right)^{T}$ )

[TABLE]

We will also need the following functions defined for all $(i,j),(k,l)\in L_{n}$

[TABLE]

where $\int_{0}^{s-}$ denotes the integral over the set $[0,s)$ . Most technical difficulties are contained in the proofs of the following Lemmas 5.2-5.7. Their proofs are presented in Appendix 5.3.

Lemma 5.2.

It holds that

[TABLE]

and

[TABLE]

The definition of $B$ is given in Theorem 3.1.

Lemma 5.3.

For any $\varepsilon>0$

[TABLE]

Lemma 5.4.

There is a sequence $B_{n}$ with $B_{n}=O_{P}(1)$ , such that for all $t_{0}\in\mathbbm{T}$

[TABLE]

Lemma 5.5.

There is a sequence $K_{n}$ with $K_{n}=O_{P}(1)$ such that for all $\theta_{1},\theta_{2}$ and $t\in\mathbbm{T}$

[TABLE]

Lemma 5.6.

Denote by $T_{n,k}$ for $n,k\in\mathbb{N}$ the grid

[TABLE]

Then, for any $k_{0}$ there is $C>0$ such that

[TABLE]

Lemma 5.7.

Define for $k,n\in\mathbb{N}$ the grid

[TABLE]

Then, for any $k_{0}\in\mathbb{N}$ , there is $C>0$ such that

[TABLE]

The above lemmas hold under the assumptions in Theorem 3.1. Therefore, we can use all their statements in the following. We begin the proof of Theorem 3.1 by showing the following small lemmas.

Lemma 5.8.

Suppose (A4, 2) holds and that $p_{n}>0$ . Let $\alpha_{p}:=\alpha_{K}$ . Then it holds for any $t_{0},t_{1}\in[h,T-h]$ and all $n\in\mathbb{N}$ that

[TABLE]

Suppose that, in addition, (A6) holds. Then,

[TABLE]

are uniformly bounded in $n$ and $t$ .

Proof.

The proof is just a direct calculation: Note that $\overline{p}_{n}(t)\geq p_{n}$ for $t\in[h,T-h]$ and hence

[TABLE]

The second statement is now a direct consequence by noting that for $v\in[-2,2]$

[TABLE]

The right hand side is uniformly bounded under (A6). The boundedness of $\frac{p_{n}(t)}{\overline{p}_{n}(t)}$ is also a direct consequence of (A6). ∎

Lemma 5.9.

Suppose Assumption (A5) holds. There exist $M,\rho\in(0,\infty)$ such that for all $t$ and all matrices $X\in\mathbb{R}^{p\times p}$ with $\|\Sigma(t,\theta_{0})-X\|<\rho$ it holds that $X$ is invertible and $\|X^{-1}\|<M$ .

Proof.

We begin by showing that

[TABLE]

Define $\rho_{n}:=\frac{1}{n}$ and suppose the statement was wrong. Then, we find for all $n\in\mathbb{N}$ numbers $t_{n}\in[0,T]$ and matrices $X_{n}\in\mathbb{R}^{p\times p}$ such that $\|X_{n}-\Sigma(t_{n},\theta_{0})\|<\rho_{n}$ but $X_{n}$ is not invertible. Since $(t_{n})_{n\in\mathbb{N}}\subseteq[0,T]$ and $[0,T]$ is compact, there is a subsequence $(t_{n_{k}})_{k\in\mathbb{N}}$ such that $t_{n_{k}}\to t_{0}\in[0,T]$ for $k\to\infty$ . By continuity of $\Sigma(t,\theta_{0})$ in $t_{0}$ we conclude that

[TABLE]

and hence $X_{n_{k}}\to\Sigma(t_{0},\theta_{0})$ for $k\to\infty$ . Note finally that the space of non-invertible matrices is given by $\textrm{det}^{-1}(\{0\})$ . Since $\textrm{det}:\mathbb{R}^{p\times p}\to\mathbb{R}$ is continuous, the set of non-invertible matrices is closed. By construction $X_{n_{k}}$ is non-invertible and hence $\Sigma(t_{0},\theta_{0})$ is non-invertible, too. This is a clear contradiction to (A5).

In order to find $M>0$ choose $\rho$ in (5.26) such that $\rho\cdot\sup_{t\in[0,T]}\|\Sigma(t,\theta_{0})^{-1}\|\leq\frac{1}{2}$ . This is possible because inverting a matrix is a continuous operation and by continuity of $t\mapsto\Sigma(t,\theta_{0})$ as in (A5). Let now $t$ and $X$ be as in (5.26). By using the fact that the spectral-norm of a matrix is sub-multiplicative, we find

[TABLE]

Hence, $\|X^{-1}\|\leq 2\sup_{t\in[0,T]}\|\Sigma(t,\theta_{0})^{-1}\|=:M<\infty$ . ∎

Lemma 5.10.

Under (A3, 3), the functions $\ell_{n}(\theta,t)$ and $P_{n}(\theta,t)$ are twice differentiable with respect to $\theta$ and the derivatives can be computed by interchanging integral and differential.

Proof.

The integral with respect to $M_{n,ij}$ can be split in an integral with respect to $N_{n,ij}$ (which is a sum) and a regular Lebesgue integral. Therefore, the stochastic integration is not inducing additional difficulties and we can apply standard theory for Lebesgue integration. The integrands are clearly differentiable with respect to $\theta$ . Boundedness of the covariates guarantees that the derivatives can be bounded by an integrable function (which does not depend on $\theta$ ). Then the integral and derivative may be interchanged. ∎

Lemma 5.11.

Under Assumptions (A4, 2), (A5) and (A3, 3) and $p_{n}>0$ we have that for any $(i,j),(k,l)\in L_{n}$ and any $r\in\{1,...,p\}$ the order of integration in the following integrals can be interchanged

[TABLE]

Proof.

Note that similar to the proof of Lemma 5.10, the integrals with respect to the martingales $M_{n,ij}$ can be split into two integrals. The integral with respect to the counting process is a sum and hence it is clear that the order of integration can be interchanged. For the other (Lebesgue) integrals we can apply Fubini’s Theorem: We show that the iterated integrals exist even after taking the norm within the integral. For both iterated integrals we may apply Lemma 5.9 in order to remove $\Sigma$ from the consideration. Then, the innermost integral is in both cases an integral over the kernel, the weight function $w$ and in case of the first iterated integral of $\overline{p}_{n}(t)$ . All these functions are bounded (cf Assumptions (A1), (A4, 2) and Lemma 5.8) and hence the innermost integral can just be bounded by a constant. The outer integrals are now integrals over $\left\|\partial_{\theta}\lambda_{n,i}(t,\theta_{0})\right\|$ or $\left\|\partial_{\theta}\lambda_{n,i}(t,\theta_{0})\cdot\lambda_{n,i}(t,\theta_{0})\right\|$ , both of which are integrable by Assumption (A3, 3). ∎

Lemma 5.12.

Suppose that (A2) and (A3, 3) hold true. Then, there is $\gamma_{\Sigma}:[0,T]\to(0,\infty)$ such that $\left\|\Sigma(s,\theta_{1})-\Sigma(s,\theta_{2})\right\|\leq\gamma_{\Sigma}(s)\|\theta_{1}-\theta_{2}\|$ for all $s\in[0,T]$ and all $\theta_{1},\theta_{2}\in\Theta$ , i.e., $\theta\mapsto\Sigma(t,\theta)$ is Lipschitz continuous in $\theta$ for every fixed $t$ . Additionally,

[TABLE]

Proof.

The proof is immediate since the covariates are bounded by Assumption (A3, 3) and the parameter space $\Theta$ is bounded by Assumption (A2). ∎

Lemma 5.13.

Suppose that (A1), (A3, 3), (A4), (A5) and (A6) hold. For $g_{n,ij}(s):=h^{-\frac{1}{2}}\int_{[0,s)}f_{n,ij,ij}(s,t)dM_{n,ij}(t)$ , we have for $n\to\infty$

[TABLE]

Proof.

We use the bounds from (A1), (A3, 3) and Lemma 5.9 as well as the kernel properties (A4, 2) to obtain

[TABLE]

Using this estimate we can bound (denote $C:=K\hat{K}^{2}\|w\|_{\infty}$ )

[TABLE]

The statement follows now by using the properties of $p_{n}(t)$ in (A6) and the bandwidth $h$ in (A4, 1). ∎

Lemma 5.14.

Suppose that (H2) holds. Then, we have for

[TABLE]

Proof.

Follows by applying (H2, 3.4). ∎

We continue with three more involved propositions. It is through these propositions how dependence structures enter the proof of Theorem 3.1.

Proposition 5.15.

Under the same assumptions as in Theorem 3.1 we have

[TABLE]

Proof.

We note firstly that existence of the derivative of $\ell_{n}$ is ensured by Lemma 5.10 and we can compute the derivative by taking the derivative under the integral sign. Let $\delta_{n}:=\sqrt{\frac{\log r_{n}}{r_{n}h}}$ and recall the definition of the grid $T_{n,k}$ in (5.24). Denote by $\pi_{n,k}:[0,T]\to T_{n,k}$ the corresponding projection on $T_{n,k}$ . Then $\#T_{n,k}\leq(T+1)\cdot hn^{k}$ and $|t-\pi_{n,k}(t)|\leq\frac{1}{hn^{k}}$ . Using this projection we can estimate for $C>0$

[TABLE]

We have to prove that both (5.28) and (5.29) converge to zero. We start with (5.28). Denote therefore $g_{n,ij}(t,t_{0})=hK_{h,t_{0}}\left(t\right)\partial_{\theta}\log\lambda_{n,i}(\theta_{0},t)$ , then

[TABLE]

because $P_{n}^{\prime}(\theta_{0},t_{0})=0$ . Then we get

[TABLE]

For (5.31) we apply Lenglart’s inequality (cf. Lemma 5.26 in the Appendix) to obtain for any choice of $c^{*}>0$

[TABLE]

If we restrict to $c^{*}<C$ we obtain furthermore

[TABLE]

Since for any $x,y\geq 0$ , $|\sqrt{x}-\sqrt{y}|\leq\sqrt{|x-y|}$ , Lemma 5.8 implies that $\sqrt{\frac{1}{\overline{p}_{n}(t_{0})}}$ is Hoelder continuous with exponent $\frac{\alpha_{p}}{2}$ and constant $\sqrt{H_{n,p}}$ . Moreover, we have $\sup_{t_{0}\in\mathbb{T}}\frac{1}{\sqrt{\overline{p}_{n}(t_{0})}}\leq\frac{1}{\sqrt{p_{n}}}$ and Hoelder continuity of the kernel K by Assumption (A4, 2) (we denote the bound on the kernel also by $K$ ). Combining all these, we obtain for $|t_{0}-s_{0}|\leq hn^{-k}$

[TABLE]

since $\alpha_{p}=\alpha_{K}$ by Lemma 5.8. So we get

[TABLE]

Since by definition of $H_{n,p}$ in Lemma 5.8, we have $H_{n,p}p_{n}h^{\alpha_{p}}=O(1)$ and the covariates are bounded by (A3, 3). Hence, we get that (5.33) is small, because we can choose $k=k_{0}$ such that for large enough $c^{*}$ the probability is small for all $n\in\mathbb{N}$ and then we can choose $C$ large enough such that the whole expression is small. Then, also (5.28) is small, for this good choice $k=k_{0}$ which we keep fixed from now on.

Let us now turn to (5.29). Here we take the supremum over a finite set and so we can estimate by applying union bound and Lemma 5.6 for $C>0$ large enough

[TABLE]

∎

Having established Proposition 5.15, we can quickly show the following result.

Lemma 5.16.

Under the same assumptions as in Theorem 3.1 we have

[TABLE]

Proof.

By Lemmas 5.4 and 5.5 we have that for any choice of $t_{0}\in\mathbbm{T}$

[TABLE]

where $B_{n},K_{n}=O_{P}(1)$ . Thus, we find by Proposition 5.15 that

[TABLE]

Hence, we can apply Kantorovich’s Theorem (cf. Theorem 5.29) for all $t_{0}\in\mathbbm{T}$ with the same choice of $B_{n},K_{n}$ and $\eta_{n}$ as above. Thus, there is $\hat{\theta}_{n}(t_{0})$ such that for all $t_{0}$

[TABLE]

∎

Corollary 5.17.

The probability of the event for all $t_{0}\in\mathbbm{T}$ it holds that $\hat{\theta}_{n}(t_{0})\in\Theta$ converges to one.

Proof.

By Assumption (A2) it holds $\theta_{0}\in\Theta$ and hence by Lemma 5.16 all estimates $\hat{\theta}_{n}(t_{0})$ lie also in $\Theta$ . ∎

Proposition 5.18.

*Under the same assumptions as in Theorem 3.1 for any choice of

$\theta_{1}^{*}(t_{0}),...,\theta^{*}_{p}(t_{0})\in[\theta_{0},\hat{\theta}(t_{0})]$ (where for $a,b\in\mathbb{R}^{p}$ we denote by $[a,b]$ the connecting line between $a$ and $b$ ), define the matrix*

[TABLE]

where $\partial_{\theta}^{2}\ell_{n,r\cdot}$ denotes for $r\in\{1,...,p\}$ the $r$ -th row of the second derivative of $\ell_{n}$ with respect to $\theta$ . The matrix $\ell_{n}^{*}(t_{0})$ concentrates around $\Sigma(\theta_{0},t_{0})$ (cf. (3.2)), i.e.,

[TABLE]

Furthermore, $\ell_{n}^{*}(t_{0})$ is invertible and

[TABLE]

Proof.

We begin by rewriting $\ell_{n}^{*}$ in terms of the second derivatives

[TABLE]

Since $p$ doesn’t vary in $n$ , it is enough to consider each term in the sum on the right hand side above separately. In order to reduce notation, we do not indicate which intermediate value $\theta^{*}_{r}(t_{0})$ we consider and write simply $\theta^{*}(t_{0})$ instead. Recall the definitions of $H_{n,ij}$ , $\widetilde{H}_{n,ij}$ and $\Sigma(s,\theta)$ in (5.19), (5.20) and (3.2), respectively. It holds that $\Sigma(s,\theta)=\mathbb{E}(H_{n,ij}|C_{n,ij}(s)=1)$ . Now, we can separate the problem as follows: Recall the abbreviation $\Sigma_{t}:=\Sigma(t,\theta_{0})$

[TABLE]

We note firstly that $\eqref{eq:L4}=0$ by definition of $\overline{p}_{n}(t_{0})$ . Moreover, after taking the $\sup$ over all $t_{0}$ , the convergence rate of line (5.36) equals $O_{P}\left(\sqrt{\frac{\log r_{n}}{r_{n}p_{n}h}}\right)$ , because of the Lipschitz continuity of $\Sigma$ in Lemma 5.12 and Lemma 5.16 (recall that $\theta^{*}(t_{0})$ is an intermediate value between $\hat{\theta}_{n}(t_{0})$ and $\theta_{0}$ in Taylor’s Formula). The expression in (5.37) can be handled by Assumption (A5) which states boundedness of $\partial_{t}\Sigma_{t}$ together with a Taylor expansion in the time parameter:

[TABLE]

where we used in the last step that the kernel is supported on $[-1,1]$ and hence $|s-t_{0}|\leq h$ . So (5.37) is of order $h$ .

To deal with the first expression, line (5.35), we let $\delta_{n}:=\sqrt{\frac{\log r_{n}p_{n}}{r_{n}p_{n}\cdot h}}$ and $C>0$ and denote by $T_{n,k_{0}}$ the discrete grid covering $\mathbbm{T}\times\Theta$ as defined in (5.25). We apply the same splitting technique as in (5.28) and (5.29) and obtain

[TABLE]

In order to show that (5.39) converges to zero, we note that for $|t_{0}-t_{0}^{\prime}|\leq hn^{-{k_{0}}},|\theta_{1}-\theta_{2}|\leq n^{-{k_{0}}}$ . Note that by Lemma 5.12 and Assumption (A3, 3) $\widetilde{H}_{n,ij}(s,\theta)$ is Hoelder continuous with exponent $\alpha_{H}$ and random, time dependent constant $\gamma_{n,ij}(s)$ which is uniformly bounded. Thus, we get by Hoelder continuity of the kernel (Assumption (A4, 2)) and of $\overline{p}_{n}(t_{0})^{-1}$ (Lemma 5.8)

[TABLE]

which converges to zero when $k_{0}$ is chosen large enough. (5.40) converges to zero by Statement 5.7. Thus we have shown the first part of the proposition. To prove that inversion preserves the rate, we denote $X_{n}(t_{0}):=\frac{1}{r_{n}\overline{p}_{n}(t_{0})}\ell_{n}^{*}(t_{0})$ . Since we have just shown above that $X_{n}(t_{0})$ converges in probability to $\Sigma_{t_{0}}$ we conclude by Lemma 5.9 that firstly $X_{n}(t_{0})$ is with probability converging to one invertible and $\|X_{n}(t_{0})^{-1}\|\leq M$ . Thus, on this event,

[TABLE]

which concludes the proof of the proposition. ∎

Proposition 5.19.

Under Assumption (A3, 2)

[TABLE]

Proof.

To begin with, we use the Taylor expansion from equation (5.45) which is shown there without reference to this Proposition. By using also the Cauchy-Schwarz Inequality we get for every entry $r\in\{1,...,p\}$

[TABLE]

We show now that (5.41) and (5.42) are both $o_{P}(1)$ . We begin with (5.41). Let $\varepsilon,\eta>0$ be arbitrary, then for any $C>0$

[TABLE]

In order to deal with the first part, we use Markov’s Inequality. The resulting expectation is written in terms of (5.30) and can be bounded by using the fact that the counting process martingales are uncorrelated and that everything is identically distributed. More precisely, we obtain for $h<\delta/2$ (cf. Assumption (A1))

[TABLE]

where $K$ is the bound on the kernel from (A4, 2), $\|w\|_{\infty}<\infty$ by (A1), the supremum is finite by Assumption (A3, 3) and $\int_{\delta/2}^{T-\delta/2}\frac{p_{n}(t)}{\overline{p}_{n}}dt\leq 1$ by definition. By Proposition 5.18 and the assumptions on $h$ in (A4, 1) we find that for all $C>0$ and thus in particular $C=\frac{\varepsilon\sqrt{\eta}}{\sqrt{2C^{*}}}$ it holds for $n$ large enough that

[TABLE]

Now, by using all previous considerations we may estimate by using (5.43) for $n$ large enough by

[TABLE]

Since $\varepsilon,\eta>0$ were chosen arbitrarily, we have shown that $\eqref{eq:f1}=o_{P}(1)$ .

We continue with (5.42). This is easier to handle because $\Sigma^{-1}_{t_{0}}$ is deterministic and thus in particular predictable. It is therefore not necessary to separate first and second derivative as we did in (5.41). Let $\varepsilon>0$ be arbitrary, then we find by applying Lemma 5.11 again for $h<\delta/2$

[TABLE]

Since $h\to 0$ , this converges to zero by Assumption (A3, 3), Lemma 5.9 and using once again that $\int_{\delta/2}^{T-\delta/2}\frac{p_{n}(t)}{\overline{p}_{n}}dt\leq 1$ . Thus, also $\eqref{eq:f2}=o_{P}(1)$ . ∎

Proof of Theorem 3.1.

We note firstly that we may replace the estimator $\overline{\theta}_{n}$ in the test statistic with $\theta_{0}$ because for $T_{0,n}:=\int_{0}^{T}\left\|\hat{\theta}_{n}(t_{0})-\theta_{0}\right\|^{2}\overline{p}_{n}(t_{0})w(t_{0})dt_{0}$ it holds that

[TABLE]

By the Assumptions (A1), (A3, 2), Proposition 5.19 and $h\to 0$ the last two terms may be asymptotically neglected. Hence, the limiting distribution of $r_{n}h^{\frac{1}{2}}T_{n}$ can be found by studying $r_{n}h^{\frac{1}{2}}T_{0,n}$ .

By Corollary 5.17, $\hat{\theta}(t_{0})\in\Theta$ with high probability. Since $\ell_{n}$ is differentiable by Lemma 5.10, we thus have $\partial_{\theta}\ell_{n}(\hat{\theta}(t_{0}),t_{0})=0$ on this event. As we are concerned with convergence in the distribution, we may restrict to this event. By a Taylor expansion there are $\theta^{*}_{r}(t_{0})$ which lie on the connecting line between $\hat{\theta}(t_{0})$ and $\theta_{0}$ such that

[TABLE]

where $\partial_{\theta}\ell_{n,r}$ is the $r$ -th component of the gradient (with respect to $\theta$ ) of $\ell_{n}$ and $\partial_{\theta}^{2}\ell_{n,r\cdot}$ denotes the $r$ -th row of the Hessian Matrix of $\ell_{n}$ with respect to $\theta$ . Define

[TABLE]

By Proposition 5.18 and (A4, 1) we have that $\ell_{n}^{*}(t_{0})$ is uniformly close to $\Sigma_{t_{0}}$ . Hence, by Lemma 5.9 we find that with probability tending to one $\ell^{*}_{n}(t_{0})$ is invertible for all $t_{0}\in[0,T]$ . Thus, (5.44) is equivalent to

[TABLE]

Using this expansion and by applying Propositions 5.15 and 5.18, we obtain (use also the properties of $w$ in (A1))

[TABLE]

Hence,

[TABLE]

where the $O_{p}$ part is $o_{P}(1)$ by Assumption (A4, 1) on the bandwidth. Thus, for the asymptotic considerations, we have to investigate only the first part. By noting that $\log x\cdot y-x\leq\log y\cdot y-y$ for all $x,y>0$ we see that $\theta\mapsto\log\lambda_{n,ij}(\theta,s)\cdot\lambda_{n,ij}(\theta_{0},s)-\lambda_{n,ij}(\theta,s)$ is maximal for $\theta=\theta_{0}$ and this holds for all $(i,j)\in L_{n}$ and and all $t\in[0,T]$ . Hence, $\theta\mapsto P_{n}(\theta,t_{0})$ defined in (5.16) has a local maximum at $\theta_{0}$ . By differentiability of $P_{n}$ (cf. Lemma 5.10) we conclude that $\partial_{\theta}P_{n}(\theta_{0},t_{0})=0$ for all $t_{0}\in[0,T]$ . Using this together with the decomposition of the likelihood in (5.17) we obtain (the order of integration may be interchanged by Lemma 5.11)

[TABLE]

where $f_{n,ij,kl}$ was defined in (5.18). Note that $f_{n,ij,kl}(s,t)=f_{n,kl,ij}(t,s)$ . Then (in the second line we reorder integration by Lemma 5.11, and the third equality is not term-wise the same but for the whole sum),

[TABLE]

We will consider lines (5.47) and (5.48) separately. We start with line (5.47) and in there, we start with the second integral: Note that the martingales $M_{n,ij}$ have jumps of height exactly one at those positions where the counting processes $N_{n,ij}$ jump (this is because we assume a continuous integrated intensity process). Hence we have

[TABLE]

and furthermore

[TABLE]

Using the above equality, we obtain

[TABLE]

The first sum is a sum of uncorrelated martingales and so it will converge to zero in probability by an application of Markov’s inequality: Denote by $g_{n,ij}(s)$ a sequence of identically distributed, predictable functions, then in general $\mathbb{E}\left(\int_{0}^{T}g_{n,ij}(s)dM_{n,ij}(s)\right)=0$ and for $(i,j),(k,l)\in L_{n}$

[TABLE]

So we get for any $\varepsilon>0$

[TABLE]

When letting $g_{n,ij}(s)=h^{-\frac{1}{2}}\int_{0}^{s-}f_{n,ij,ij}(s,t)dM_{n,ij}(t)$ , we have by Lemma 5.13 that the above converges to zero. Moreover, by definition

[TABLE]

Combining these considerations yields $\eqref{eq:line1}=o_{p}(1)+h^{-\frac{1}{2}}A_{n}$ .

Next we consider (5.48). Firstly, we note, using an analogue of (5.49), that the second integral in (5.48) equals zero because the two martingales $M_{n,ij}$ and $M_{n,kl}$ never jump simultaneously because $(i,j)\neq(k,l)$ . To investigate the first integral we simplify notation by using the predictable functions $\tau_{n,ij,kl}$ defined in (5.21). Then

[TABLE]

is a martingale in $T$ . We intent to show convergence to a normal distribution by using Rebolledo’s martingale central limit theorem (Theorem 5.28). To this end, we need to prove the convergence of the variation towards a deterministic quantity and that the jump parts of the process converge to zero. We start with the quadratic variation (note that $M_{n,ij}$ and $M_{n,kl}$ are uncorrelated whenever $(i,j)\neq(k,l)$ ):

[TABLE]

by Lemma 5.2. Now the jump process (the process which contains all jumps of size greater than or equal to $\varepsilon>0$ ) is given by (note that no two martingales jump at the same time)

[TABLE]

which converges to zero by Lemma 5.3. Hence, by Rebolledo’s martingale central limit theorem (see Theorem 5.28 in the Appendix)

[TABLE]

and the statement of the theorem is shown. ∎

5.3 Proof of Lemmas 5.2-5.7

Proof of Lemma 5.2.

Recall that for $u,v\in L_{n}$

[TABLE]

By substituting $y=\frac{s-t_{0}}{h}$ we obtain

[TABLE]

For ease of notation we denote

[TABLE]

Note firstly that $\widetilde{f}_{n}$ is not random and secondly that for $t<s-2h$ the above expression equals zero because we assume that the kernel is supported on $[-1,1]$ (cf. Assumption (A4, 2)). So we obtain for $\tau_{n,u,v}$ :

[TABLE]

The integral in the above display is over a vector-valued integrand. Such integrals are always understood element-wise. We begin with proving (5.23). By using the representation of $\tau_{n,u,v}$ in (5.51) we obtain

[TABLE]

In order to study the behaviour of these integrals, we write the product of the two integrals as a sum. The equation from before continues (we can interchange the order of integration because the integrand after the second equality is non-negative and the martingales can be split into the counting process integral and a regular Lebesgue integral):

[TABLE]

Note that we do not need the indicator $\mathbbm{1}_{t=r}$ because $u_{1}\neq u_{2}$ and hence the martingales $M_{n,u_{1}}$ and $M_{n,u_{2}}$ will not jump simultaneously almost surely. We continue (for the second equality interchange the roles of $u_{1}$ and $u_{2}$ as well as the roles of $t$ and $r$ )

[TABLE]

here we have introduced the notation $\varphi_{n,u_{1}u_{2}}(t,r):=\widetilde{\varphi}_{n,u_{1}u_{2}}^{\emptyset}(t,r)$ , where for any set of pairs $I\subseteq L_{n}$ ( $d_{t}^{n}(u,\emptyset)=\infty$ )

[TABLE]

The functions $\widetilde{\varphi}^{I}_{n,u_{1}u_{2}}(t,r)$ are partially predictable with respect to $\widetilde{\mathcal{F}}_{u_{1}u_{2},t}^{n,I,m}$ for $I\supseteq\{u_{1},u_{2}\}$ because the integrand above has the product structure as mentioned in Definition 2.10 and summation and integration is preserving this property as it is a measurability property. In order to prove that (5.52) converges to zero we apply Theorem 2.11. To this end we have to prove that (2.4)-(2.8) with $\delta_{n}=h$ hold. This is either true by Assumption (D1) or Lemma 5.21. Thus, we may apply Theorem 2.11 and thus we can conclude that (5.23) holds.

To prove (5.22) in Lemma 5.2 we will apply very similar techniques as before. In fact, we can use almost exactly the same steps with $u_{2}=u_{1}$ we have taken in order to arrive at (5.52) with one exception: At some point we said that we can ignore the indicator function $\mathbbm{1}_{t=r}$ because $u_{1}\neq u_{2}$ , this is not true now and we need to take care of this. We obtain

[TABLE]

where we used the abbreviations $\varphi_{n,u}(r,t):=\varphi_{n,uu}(r,t)$ and $\varphi_{n,u}(t):=\varphi_{n,u}(t,t)$ . We prove that (5.54) converges to zero in probability by applying similar techniques as before. We start by approximating $\varphi_{n,u}$ by its measurable approximation $\widetilde{\varphi}_{n,u}^{u}$ :

[TABLE]

We now use again the approximation (5.69) and obtain by using martingale properties (recall that $K_{m}^{L_{n}}$ is measurable by Assumption (H1)) and Markov’s Inequality for any $\varepsilon>0$

[TABLE]

by Assumption (H2, 3.5).

For (5.57), we recall that $\widetilde{\varphi}_{n,u}^{u}$ is partially predictable with respect to $\mathcal{F}_{t}^{n,u,m}$ (cf. remark after (5.53)) and thus we may apply Lemma 2.7. Together with (5.98) we get

[TABLE]

This converges to zero by conditioning on $A_{n}(t)/r_{n}p_{n}>\alpha$ and Assumptions (AD, 3.13, 3.15).

We study now the convergence behaviour of (5.55). Note firstly that

[TABLE]

The first part, (5.58), converges to zero by an application of Proposition 2.9. We get by said Proposition with $\widetilde{\varphi}_{n,u}^{I}(t):=\widetilde{\varphi}_{n,u}^{I}(t,t)$

[TABLE]

We apply estimates (5.69) and (5.98) to show that the three lines above converge to zero. We have by exchangeability of the network

[TABLE]

Convergence to zero of the above expression is implied by the fact that the third moment of $A_{n}(t)$ exist by Assumption (AD, 3.16). We continue with (5.61) and (5.62) to get

[TABLE]

which converges to zero by conditioning and Assumptions (H2, 3.4) and (AD, 3.13, 3.14). Moreover,

[TABLE]

It can be shown that the above converges to zero by using martingale properties and the Assumptions (H2, 3.4), 5.27 and (D2, 3.7). And we conclude that (5.58) converges to zero.

So we have left to prove convergence of (5.59) which we do as follows: Denote by superscripts entries of vectors or matrices, i.e., $X^{r_{1}}_{n,u}(t)^{2}$ refers to the square of the $r_{1}$ -th entry of $X_{n,u}(t)$ and $\widetilde{f}^{r_{1},r_{2}}_{n}(t+\xi h,t)$ refers to the entry in row $r_{1}$ and column $r_{2}$ of the matrix $\widetilde{f}_{n}(t+\xi h,t)$ . Then

[TABLE]

where for all $a,b,c,d\in\{1,...,q\}$

[TABLE]

We keep this in mind and prove now for all $r_{1},r_{1}^{\prime},r_{2},r_{2}^{\prime}\in\{1,...,q\}$

[TABLE]

via exponential inequality techniques. Since the argument is the same for all indices, we omit $r_{1},r_{1}^{\prime},r_{2},r_{2}^{\prime}$ in the notation. Let therefore $\mathbbm{T}_{n}$ denote a grid of $[0,T]\times[0,2]^{2}$ with mesh $H_{n,p}^{-\frac{1}{\alpha_{p}}}n^{-k_{X}}$ (where $k_{X}$ is chosen later) and let $(t^{*},\sigma^{*},\tau^{*})$ be the projection of $(t,\sigma,\tau)\in[0,T]\times[0,2]^{2}$ onto $\mathbbm{T}_{n}$ , i.e., $\|(t,\sigma,\tau)-(t^{*},\sigma^{*},\tau^{*})\|\leq H_{n,p}^{-\frac{1}{\alpha_{p}}}n^{-k_{X}}$ . We obtain

[TABLE]

For (5.66) we define $\xi_{n,u}(t):=X_{n,u}(t)X_{n,u}(t)C_{n,u}(t)\lambda_{n,u}(t)$ . Then, we find

[TABLE]

By the Assumptions (A1), (A5), (A6) and Lemma 5.8 we have that the functions $\Sigma_{t}$ , $w$ , $p_{n}$ and $\overline{p}_{n}$ are all continuous on the compact interval $[0,T]$ . Therefore $p_{n}$ and $\widetilde{f}_{n}$ are uniformly continuous on $[0,T]$ . Hence, we can choose $k_{X}$ large enough such that the first term converges to zero (recall also that the covariates are bounded by Assumption (A3, 3). The second and third term converge to zero by Assumptions (AC, 3.17, 3.18), respectively, after possibly increasing $k_{X}$ further. Keep this choice of $k_{X}$ fixed for the remainder of the proof.

Lastly we need to discuss (5.67). To this end, we apply a standard union bound technique together with Lemma 2.16. We can estimate when noting the sup in (5.67) is actually only taken over $\mathbbm{T}_{n}$ that for every $\varepsilon>0$ by (5.70)

[TABLE]

The two lines above work completely analogously and hence, we continue only with the first line. The proof of the second line is then identical, we just have to replace $\xi_{n,u}$ by $-\xi_{n,u}$ . We will also replace now $\frac{\varepsilon}{C}$ by $\varepsilon$ for notational convenience, i.e., we show that

[TABLE]

To this end, we apply Lemma 2.16 to the array of random variables

[TABLE]

which is bounded by $M:=\hat{K}^{2}\Lambda$ . Consider the sequence of $\Delta_{n}$ -partitions as in Assumption (D3). Since $\xi_{n,u}(t)=\xi_{n,u}(t)C_{n,u}(t)$ , we have that $(Z_{n,u})_{u\in L_{n}}$ fulfils (2.10) with respect to this $\Delta_{n}$ -partitioning. The further requirements of Lemma 2.16 on the partitioning were required in (D3). The asymptotic uncorrelation condition of Lemma 2.16 holds by Assumption (D3, 3.9). Note that we show in the proof of Lemma 5.22 that $|E|_{n,t}=r_{n}\overline{p}_{n}(t)$ . Thus, we may apply Lemma 2.16 with $M=\hat{K}^{2}\Lambda$ . We use that $\frac{p_{n}}{p_{n}^{*}}>\frac{1}{c_{0}}$ for some $c_{0}>0$ by Assumption (A6) and that $x\mapsto\frac{x}{\log x}$ is monotonically increasing to obtain

[TABLE]

The exponent of $r_{n}p_{n}$ can be chosen arbitrarily small. By Assumption (D3) $\mathbb{P}(\Gamma_{n}^{t}=0)\to 0$ converges to zero exponentially fast and we can choose $\Delta_{n}$ such that $\beta_{t}(\Delta_{n})$ vanishes as fast as we want. Thus, also the product with $\#\mathbbm{T}_{n}$ converges to zero. This was the last piece for establishing (5.65).

We can now continue to compute (5.59) or equivalently (5.64). Note that by (5.70) it holds uniformly over all indices that

[TABLE]

as $n\to\infty$ . Thus, the second line in (5.64) converges to zero. The limit of the first line of (5.64) is by using (5.65) the same as

[TABLE]

where we used continuity of $\Sigma_{t}$ from Assumption (A5) and where

[TABLE]

Thus we have proven (5.22) and the proof of Lemma 5.2 is complete. ∎

Lemma 5.20.

Suppose that (A1), (A3, 3), (A4, 2) and (A6) hold. Then, there is a constant $C^{*}>0$ such that for all $I,J\subseteq L_{n}$ and all $r,t\in[\delta,T-\delta]$ with $r\in[t-2h,t]$ it holds for $\widetilde{\varphi}$ defined in (5.53) that

[TABLE]

where $I\Delta J:=(I\setminus J)\cup(I\setminus J)$ denotes the symmetric difference of $I$ and $J$ and $K_{m}^{L_{n}}$ and $H_{UB}^{I\Delta J}$ are to be understood with respect to $U_{t}:=[t-4h,t+2h]$ .

Proof.

By Lemma 5.9, Assumptions (A1), (A4, (2)) and (A6) we find a constant $c>0$ such that for all $r\in[t-2h,t]$ and $t\in[\delta,T-\delta]$ (note that then $\frac{r-t}{h}\leq 0$ )

[TABLE]

By the assumption of bounded covariates (A3, 3), we get for any index sets $I,J\subseteq L_{n}$ and $r\in[t-2h,t]$ (i.e. $\frac{r-t}{h}\in[-2,0]$ ), that

[TABLE]

∎

Proof of Lemma 5.3.

Let $\varepsilon>0$ be arbitrary. We have to show that a martingale evaluated at a certain time point $T$ converges to zero in probability. By Lenglart’s Inequality as in Corollary 5.27 it is sufficient to prove that the quadratic variation converges to zero in probability. Simply taking the $\sup$ yields for the quadratic variation

[TABLE]

Lemma 5.2 is stating that the second part is converging and hence it is sufficient to prove that the indicator function is converging to zero in probability which is equivalent of proving uniform convergence in probability (uniform in $u$ and $s$ ) of

[TABLE]

to zero. We are going to employ Lemma 5.22. To this end note firstly that $\tau_{n,uv}(s)$ has the following structure

[TABLE]

where $\widetilde{f}_{n}(s,t)$ is defined in (5.50) and can be written as

[TABLE]

We can simplify the expression by interchanging the integrals, taking the norm inside and using the boundedness properties from Assumption (A3, 3) and Lemma 5.9:

[TABLE]

Now, $\eqref{eq:lem212ready}=o_{P}(1)$ because the expression in (5.71) is the same as in $\ell_{n}^{\prime}(\theta_{0},t_{0})$ but with $X_{n,v}(t)$ replaced by $\|X_{n,v}(t)\|$ . Moreover, all mixing properties valid for $X_{n,v}(t)$ hold for $\|X_{n,v}(t)\|$ as well and, of course, $\|X_{n,v}(t)\|$ is also bounded. Thus, we may repeat the proof of Proposition 5.15 and all subsidiary results (which proofs do not require this Lemma) word by word and (5.71) converges to zero in probability. We also have that (5.72) is $O_{P}\left(h^{\frac{1}{2}}\right)=o_{P}(1)$ by the later proven equation (5.78). Hence, we have shown that

[TABLE]

and this finalizes the proof of the Lemma. ∎

Proof of Lemma 5.4.

Consider the following event

[TABLE]

where $\rho$ is the same as in Lemma 5.9 and suppose for the moment that $\mathbb{P}(A_{n})\to 1$ . On $A_{n}$ , we find

[TABLE]

by Lemma 5.9. Hence, we conclude

[TABLE]

as required. In order to prove $\mathbb{P}(A_{n})\to 1$ , we have to prove the uniform convergence of $\frac{1}{r_{n}\overline{p}_{n}(t_{0})}\ell_{n}^{\prime\prime}(\theta_{0},t_{0})$ to $\Sigma(\theta_{0},t_{0})$ . Denote therefore by $T_{n}$ a grid of $\mathbbm{T}:=[\delta,T-\delta]$ with mesh $n^{-k}$ for some $k$ and let for $t_{0}\in\mathbbm{T}$ be $t_{0}^{*}$ be the projection of $t_{0}$ on $T_{n}$ , i.e., we have $|t_{0}-t_{0}^{*}|\leq n^{-k}$ . Then we obtain

[TABLE]

The second $\sup$ in (5.73) is converging to zero for $k$ chosen large enough because $\sup_{t_{0}\in\mathbbm{T}}|t_{0}-t_{0}^{*}|\to 0$ as $n\to\infty$ and by uniform continuity of $t\mapsto\Sigma(\theta_{0},t)$ (cf. Assumption (A5)). To prove that the first part of (5.73) is $o_{P}(1)$ , we note that by the boundedness Assumption (A3, 3)

[TABLE]

Since $K$ and $\overline{p}_{n}$ are Hoelder continuous by Assumption (A4, 2) and Lemma 5.8, respectively, the above converges to zero as $n\to\infty$ after possibly increasing $k$ further. For this choice of $k$ , which we keep fixed from now on, $\eqref{eq:L29cont}=o_{P}(1)$ . So finally, we have to prove that (5.74) is also $o_{P}(1)$ . To this end, we firstly note that the $\sup$ is actually only taken over $T_{n}$ because we only consider $t_{0}^{*}$ . So we apply a standard union bound technique to get the $\sup$ out of the probability and we include $\Gamma_{n}^{t_{0}}$ : Let $x>0$ and recall the Definition of $H_{n,u}(s,\theta)=-C_{n,u}(s)X_{n,u}(s)X_{n,u}(s)^{T}e^{\theta^{T}X_{n,u}(s)}$ from the proof of Proposition 5.18. Then,

[TABLE]

The probability in line (5.76) equals zero for $n$ large enough because for $t_{0}\in[\delta,T-\delta]$ we have by the definition of $\overline{p}_{n}(s)$

[TABLE]

Finally, line (5.75) will be treated by applying Lemma 2.16. Note therefore firstly that we may work element-wise because we can estimate the norm from above by the 1-norm and consider each term separately (note that the dimension of the covariates is not increasing). Thus, we may pretend for the following that $H_{n,u}(s,\theta_{0})$ is a number rather than a matrix. Moreover, we can repeat the following proof word by word for $-H_{n,u}(s,\theta_{0})$ and thus we may consider $H_{n,u}(s,\theta_{0})$ instead of $|H_{n,u}(s,\theta)|$ . We apply Lemma 2.16 to

[TABLE]

$Z_{n,u}$ is bounded by $M:=K\hat{K}^{2}\Lambda$ by (A3, 3). The assumptions on the $\Delta_{n}$ partitions and asymptotic uncorrelation are fulfilled by Assumption (D3, 3.10) with $C=O(1)$ . Then, the upper bound provided by Lemma 2.16 converges faster to zero than any power of $n$ because of the properties of the bandwidth $h$ in (A4, 1) and the properties of $\Gamma_{n}^{t_{0}}$ and the $\beta$ -mixing coefficients from Assumption (D3). This proves that (5.75) converges to zero for any choice of $x>0$ and thus $\eqref{eq:L29exp}=o_{P}(1)$ and the proof of the Lemma is complete. ∎

Proof of Lemma 5.5.

We note firstly that by Assumption (A3, 3) for all $t_{0}\in\mathbbm{T}$ and all $\theta_{1},\theta_{2}\in\Theta$ we can estimate by a Taylor approximation

[TABLE]

Hence, we obtain for all $t_{0}\in\mathbbm{T}$ and all $\theta_{1},\theta_{2}\in\Theta$

[TABLE]

Consequently, we can choose $K_{n}:=\sup_{t_{0}\in\mathbbm{T}}\hat{K}^{3}e^{\tau\hat{K}}\cdot\frac{1}{r_{n}\overline{p}_{n}(t_{0})}\sum_{u\in L_{n}}\int_{0}^{T}K_{h,t_{0}}(s)C_{n,u}(s)ds$ which is $O_{P}(1)$ if

[TABLE]

So let us prove this. Denote therefore by $T_{n}$ a grid with mesh $n^{-k}$ (where $k$ is chosen later) which covers $\mathbbm{T}$ . For a given time $t_{0}\in\mathbbm{T}$ we denote by $t_{0}^{*}\in T_{n}$ the closest element of $T_{n}$ to $t_{0}$ , i.e., $|t_{0}-t_{0}^{*}|\leq n^{-k}$ . Now we split the $\sup$ over an uncountable set as usual in a $\sup$ over close elements and a $\sup$ over a finite set:

[TABLE]

We can apply the simple bound $\sum_{u\in L_{n}}C_{n,u}(s)\leq r_{n}$ and use Hoelder continuity of $K$ and $\overline{p}_{n}(t)^{-1}$ , cf. Assumption (A4, 2) and Lemma 5.8 respectively, to see that (5.79) converges to zero in probability. Next, we show that $\eqref{eq:s3}=O_{P}(1)$ which concludes the proof of the Lemma. We begin, as usual, by taking the $\sup$ out of the probability (and recall that $\overline{p}_{n}(t_{0})=\int_{0}^{T}K_{h,t_{0}}(s)p_{n}(s)ds$ ):

[TABLE]

We will apply Lemma 2.16 to

[TABLE]

As before we use the $\Delta_{n}$ -partitions as provided in Assumption (D3). Then, all requirements of Lemma 2.16 on the partitions are fulfilled and $|E|_{n,t_{0}}=r_{n}\overline{p}_{n}(t_{0})$ . Moreover $Z_{n,u}$ is bounded by $M:=1$ and the asymptotic uncorrelation holds also by Assumption (D3, 3.11). As a consequence we can apply Lemma 2.16 to obtain an upper bound on (5.81). Taking into account the assumptions on $\Gamma_{n}^{t}$ and the $\beta$ mixing coefficients in Assumption (D3), the upper bound on (5.81) provided by Lemma 2.16 converges faster to zero than any power of $n$ for $x>1$ . Therefore $\eqref{eq:s31}\to 0$ as $n\to\infty$ . We conclude that $\eqref{eq:s3}=O_{P}(1)$ . ∎

Proof of Lemma 5.6.

By employing Lemma 5.22 the proof of this result is fairly straight forward. Let $c^{**}$ be the constant such that $\|y\|\leq c^{**}\cdot\|y\|_{1}$ for all $y\in\mathbb{R}^{q}$ where $\|.\|$ and $\|.\|_{1}$ denote the Euclidean- and the 1-Norm, respectively. We have

[TABLE]

Since

[TABLE]

we can directly apply Lemma 5.22 and obtain

[TABLE]

We see that for a $C>0$ chosen sufficiently large the first term decreases faster as $hn^{k_{0}}$ . Moreover, by Assumption (D3), $\beta(\Delta_{n})$ decreases fast enough too. Finally, $\mathbb{P}(\Gamma_{n}^{t_{0}}=0)$ decreases fast enough as well by the same assumption. ∎

Proof of Lemma 5.7.

Let $\delta_{n}:=\sqrt{\frac{\log r_{n}p_{n}}{r_{n}p_{n}\cdot h}}$ . We begin with the standard union bound argument:

[TABLE]

The above probability can be shown by Lemma 2.16 to decrease faster than any power when $C>0$ is chosen large enough. As a matter of fact we argued already that Lemma 2.16 can be applied in this situation when we discussed (5.75). We just have to replace $\theta_{0}$ by an arbitrary $\theta\in\Theta$ . But this does not change the argument and all necessary assumptions were made in Assumption (D3). ∎

5.4 Primitive Assumptions for (D1)

In this section we provide additional assumptions under which the conditions (2.4)-(2.8) of Theorem 2.11 can be proven. These additional assumptions are in the spirit of the assumptions (H2) and (AD). Therefore, we present them here as extensions of them.

(H2) Hub size restriction

There is a deterministic sequence $H_{n}$ such that $H_{n}\geq K_{m}^{L_{n}}$ with

[TABLE]

The extension of (H2) requires a deterministic bound to the maximal hub-size. This seems restrictive but for large networks it is believable that even highly connected actors are only connected to a fraction of the whole network. Keep also in mind, that connected here really means active influence. So if we were to allow that there is one single pair who influences the entire network, then statistical inference is (at least intuitively) impossible.

Denote

[TABLE]

(AD) Additional Dependence

Let $k\in\mathbb{N}$ be arbitrary and consider the following choices for the pair $(\varepsilon_{n},c_{n})$

[TABLE]

where $\alpha_{c}$ and $H_{n,c}$ are the Hoelder exponent and constant of $p_{n}(t)$ , respectively. For any given $k_{0}\in\mathbb{N}$ we can choose $k\in\mathbb{N}$ such that for both choices above it holds that

[TABLE]

Moreover, there is $\kappa>0$ such that for all $\xi_{1},\xi_{2}>1$ , $(i,j)\in L_{n}$ it holds that

[TABLE]

Assumption (AD, 5.84) is essentially stating that not too many $N_{n,ij}$ jump at the same time. Note that (AD, 5.84) could be proven (similar as for (AD, 3.13)) by using other technical assumptions. In order to prove these assumptions we need an exponential inequality for averages of counting processes. Such an inequality can be shown by employing $\beta$ -mixing as in the proof of Lemma 2.15. However, instead of using the Bernstein inequality (see e.g. Proposition 5.25) we need a tail bound valid for independent sums of counting processes with bounded intensity functions. For our purposes it is sufficient to use a tail bound induced by using Chebyshev’s Inequality in its exponential form. For the remaining assumptions we emphasize that they are requiring a polynomial growth of certain moments. Since the power of the polynomial can be chosen arbitrarily, we regard these assumptions as easy to believe.

Lemma 5.21.

Suppose that the the assumptions (H2) and (AD) hold in their extended form as above. Then, the conditions (2.4)-(2.8) of Theorem 2.11 hold for $\widetilde{\varphi}_{n,u_{1}u_{2}}^{I}$ as in (5.53) with $\delta_{n}=h$ .

Proof.

We begin with (2.4). Note that by Assumption (H1), $H_{UB}^{u}$ is measurable with respect to $\mathcal{F}_{0}^{n}$ for all $u\in L_{n}$ and that by (A2, 3) $\lambda_{n,u}(t)$ is bounded by $C_{n,u}(t)\Lambda$ . Denote $d|M_{n,u}|(t)=dN_{n,u}(t)+\lambda_{n,u}(t)dt$ . We get by applying the estimate (5.69) from Lemma 5.20 for any $\varepsilon>0$ and any $F>0$

[TABLE]

We keep this in mind and make a similar estimation for (2.5): We use (5.69) in Lemma 5.20 in order to obtain (the step from (5.92) to (5.93) is unmotivated at this point, but will be useful later)

[TABLE]

A simple application of Markov’s Inequality is hence showing that $\mathbb{E}(\eqref{eq:b1})\to 0$ implies that $\eqref{eq:byProduct}\to 0$ . To show the former we make the following definitions. Denote $\mathcal{C}_{n,u_{1}}:=F+3H_{UB}^{u_{1}}\left(K_{m}^{L_{n}}\right)^{2}$ . Using that $H_{UB}^{u_{1}u_{2}}\leq H_{UB}^{u_{1}}+H_{UB}^{u_{2}}$ , we can calculate that

[TABLE]

Define $\Omega_{n}(t):=\frac{1}{r_{n}p_{n}(t)}\sum_{u_{2}\in L_{n}}\int_{t-2h}^{t-}d|M_{n,u_{2}}|(r)\mathcal{C}_{n,u_{2}}$ and continue with the estimation:

[TABLE]

Define $\omega_{n}(s):=1/(r_{n}p_{n}(s))\sum_{v\in L_{n}}\mathcal{C}_{n,v}C_{n,v}(s)$ . By using Itô’s Lemma (cf. Theorem 5.30) for the semi-martingale $|M_{n,u}|(t):=N_{n,u}(t)+\int_{0}^{t}\lambda_{n,u}(s)ds$ in the first step we obtain for fixed constants $\xi_{1},\xi_{2}>0$

[TABLE]

Line (5.96) converges to zero by (5.27) and $h\to 0$ . Line (5.97) converges to zero by Lemma 5.23 and Assumption (AD, 5.85). For line (5.95) we use Lemma 5.23 together with Assumption (AD, 5.86) to show that it converges to zero for the correct choices of $\xi_{1}$ and $\xi_{2}$ . For line (5.94) we note that for $\xi>0$

[TABLE]

is bounded by Assumptions (H2, 3.4), (AD, 5.87) and Lemma 5.23. Thus we have shown that (2.4) and (2.5) hold.

For showing condition (2.6) we define the random number of active edges as

[TABLE]

and use this to estimate for all $u_{1},u_{2}\in L_{n}$ , all $I\subseteq L_{n}$ and all $r,t\in[0,T]$

[TABLE]

In addition to the above estimate, we also observe that

[TABLE]

Using these two estimates together with the estimate in (5.69), we obtain

[TABLE]

Note that $K_{m}^{u_{1}}\leq F+K_{m}^{L_{n}}H_{UB}^{u_{1}}$ . Hence, we just saw in the proof of (2.5) (cf. (5.93)) that the first part above converges to zero. For the second part we employ similar techniques, i.e., we condition on the event that $A_{n}(t)/r_{n}p_{n}$ is bounded. So for any $\alpha>0$ the second part above (without the square root) can be bounded by

[TABLE]

By the Assumptions (AD, 3.13, 5.90, 5.88) the expression above converges to zero and we have shown (2.6).

The indicator function in (2.7) is not significantly shortening the sum and hence we just ignore it. Moreover, we use the bound from (5.98) to obtain for any $\alpha>0$

[TABLE]

The expression above converges to zero by Assumptions (D2, 3.6), (AD, 3.13, 5.89). Hence, we have also that (2.7) holds.

For (2.8) we finally use that for every fixed choice of $u_{2},u_{2}^{\prime}\in L_{n}$ we get

[TABLE]

Thus, we obtain together with (5.98)

[TABLE]

The above converges to zero by conditioning on $A_{n}(t)/r_{n}p_{n}>\alpha$ and the Assumptions (H2, 3.4) and (AD, 3.13, 3.14). Hence, we have shown that (2.8) holds as well. ∎

5.5 Further Proofs

Proof of Lemma 2.7.

Note that $\mathcal{F}_{t}^{n,J,m}\supseteq\mathcal{F}_{t}^{n}$ . Hence, $(N_{n,ij}(t))_{(i,j)\in J}$ is adapted with respect to $\mathcal{F}_{t}^{n,J,m}$ and $(\lambda_{n,ij}(t))_{(i,j)\in J}$ is predictable with respect to $\mathcal{F}_{t}^{n,J,m}$ . So $N_{n,ij}(t)$ is a counting process for all $(i,j)\in J$ . In order to check that $\lambda_{n,ij}(t)$ is the intensity function, we need to check the martingale property: Let $t>0$ and $t^{\prime}\in[t,t+6\delta_{n}]$ , then by definition and assumption (recall $M_{n,ij}(t)=N_{n,ij}(t)-\int_{0}^{t}\lambda_{n,ij}(s)ds$ )

[TABLE]

∎

Proof of Proposition 2.9.

The proof is almost exactly along the lines of Mammen and Nielsen [32] but it is not identical and we give it here for completeness. We see at first that

[TABLE]

We use now that $\widetilde{\varphi}_{n,ij}^{ij,kl}$ and $\widetilde{\varphi}_{n,kl}^{ij,kl}$ are both predictable with respect to $\mathcal{F}_{t}^{n,\{(i,j),(k,l)\},m}$ and that $M_{n,ij}$ and $M_{n,kl}$ are uncorrelated martingales with respect to the same filtration (cf. Lemma 2.7). Hence, we obtain

[TABLE]

and the statement follows. ∎

The following result provides an exponential inequality for martingales. Note that the result is different from van de Geer [47] because the asymptotics in the motivation are different and in the following result we can avoid appearance of the higher order variation process in the probability.

Lemma 5.22.

Suppose that (A3, 3) and (A4, 2) hold. Recall the following definitions from (A3) and (A4): $\Lambda$ is the bound on the intensity function, $K$ the bound on the kernel and $\hat{K}$ the bound on the covariates. Let $A>0$ be so large such that

[TABLE]

Suppose we have a $\Delta_{n}$ -partition measurable with respect to $\mathcal{F}_{t_{0}-h}^{n}$ (we write $I_{n,u}^{k,m}:=I_{n,u}^{k,k,t_{0}-h}$ for ease of notation)

[TABLE]

for some $t_{0}\in[0,T]$ and all $u\in L_{n}$ . Define furthermore for arbitrary $c_{3}>0$ and $u\in L_{n}$ ,

[TABLE]

Assume that there is $c_{2}>0$ such that for all $k\in\{1,...,\mathcal{K}\}$

[TABLE]

Then it holds that

[TABLE]

where $q$ is the dimension of the covariate and $c^{**}$ is the constant for which $\|y\|\leq c^{**}\|y\|_{1}$ for all $y\in\mathbb{R}^{q}$ and $\|.\|$ and $\|.\|_{1}$ are the Euclidean and $1$ -Norm respectively. The process $X_{n,u}$ can be replaced by any other predictable process which is bounded by $\hat{K}$ .

Proof.

We remark firstly that it is sufficient to consider univariate covariates, because (denote by $X_{n,u}^{r}$ the $r$ -th entry of $X_{n,u}$ for $r=1,...,q$ )

[TABLE]

Since $-X_{n,u}^{r}$ is a covariate with the exact same properties as $X_{n,u}$ (in particular predictability with respect to $\mathcal{F}_{t}^{n}$ and boundedness by $\hat{K}$ , cf. Assumption (A3, 3), it is sufficient to assume (for simplicity of notation) that $X_{n,u}$ is univariate and to prove that

[TABLE]

The main idea of the proof is to apply Lemma 2.15 to the correct structured interaction network (in the sense of Definition 2.2). Define to this end

[TABLE]

Note that both, $\widetilde{F}_{n,u}(s)$ and $F_{n,u}^{k,m}(s)$ , are predictable processes because they are deterministically equal to zero for $s\leq t_{0}-h$ and the sets $t\mapsto G^{t}(k,m,\Delta_{n})$ are predictable with respect to $\mathcal{F}^{n}_{t}$ . Hence, $Z_{n,u}(t)$ is a martingale. We are going to prove that $(Z_{n,u}(T))_{u\in L_{n}}$ fulfils the conditions of Lemma 2.15. Condition (2.10) is easy to check. Note that for $s\in[t_{0}-h,t_{0}+h]$ we have $\sup_{r\in[t_{0}-h,t_{0}+h]}C_{n,u}(r)C_{n,u}(s)=C_{n,u}(s)$ and hence

[TABLE]

Thus, condition 2 of Lemma 2.15 holds simply by assumption and definition of $E_{k}^{n,t_{0}}$ . The main part of this proof is now to prove condition 1. Note therefore firstly that

[TABLE]

Hence, we need to show for $t=T$

[TABLE]

We will show (5.102) for all $t\in[0,T]$ and then it holds particularly in the case $t=T$ which is of primary interest to us. The idea of the proof is to prove a recursion inequality for the moments of stochastic integrals by applying Itô’s Formula and then using induction. Note that $F_{n,u}^{k,m}(s)=0$ for $s\notin[t_{0}-h,t_{0}+h]$ . Therefore, (5.102) holds trivially for $t\leq t_{0}-h$ and it holds for $t\geq t_{0}+h$ when it holds for $t=t_{0}+h$ . Hence, we can restrict to the case $t\in[t_{0}-h,t_{0}+h]$ .

For $\rho\geq 2$ we have that the function $f_{\rho}(x):=|x|^{\rho}$ is twice continuously differentiable and hence also $\widetilde{f}_{\rho}(x_{1},...,x_{m}):=f_{\rho}(x_{1}+...+x_{m})$ is twice continuously differentiable. So by the multivariate Itô Formula for semi-martingales with jumps given in Theorem 5.30 and the fact that with probability one no two counting processes jump at the same time, we obtain for $\rho\geq 2$ : Enumerate for the following computations the pairs in $L_{n}$ , i.e., such that $L_{n}=\{1,...,r_{n}\}$ . Then,

[TABLE]

and we can compute

[TABLE]

Note now that

[TABLE]

Hence, (*) contains a Taylor series expansion of $f_{\rho}$ around the point

[TABLE]

and we continue:

[TABLE]

where $\Delta(s)\in\Big{[}0,\sum_{r\in L_{n}}F_{n,r}^{k,m}(s)\Delta N_{n,r}(s)\Big{]}$ . Since only one of the counting processes jumps at a time, we obtain $|\Delta(s)|\leq K_{h}$ with $K_{h}:=\frac{1}{h}K\hat{K}$ and continue by using again that no two processes jump at the same time:

[TABLE]

where we used in the last line that $F_{n,u}(s)=0$ when $t\leq t_{0}-h$ . Now, the integrand is predictable and we can apply the expectation on both sides, to obtain a recursion formula: Use that for $x\geq 0$ we have $f_{\rho}^{\prime\prime}(x)=\rho(\rho-1)f_{\rho-2}(x)$ to get

[TABLE]

Define $Z^{k,m}(t)=\sum_{u\in L_{n}}\int_{0}^{t}F_{n,u}^{k,m}(\tau)dM_{n,u}(\tau)$ to summarize the previous inequality chain in the following recursion formula: For $\rho\geq 2$ it holds almost surely

[TABLE]

By uniting the (countably many) exception sets of measure zero, these inequalities hold for all $\rho\geq 2$ and all $t\in[0,T]\cap\mathbb{Q}$ on the same set of measure one. Since both sides are continuous from the right (cf. Corollary 5.1.9 in Cohen and Elliott [7]), we also have it for all $t\in[0,T]$ on the same set of measure one. Taking now limits from the left and repeating the same argument with continuity from the left, we obtain the same result for $Z^{k,m}(t-)$ on the left hand side also on the same set of measure one.

We are going to prove now via induction that almost surely (on the same set of measure one)

[TABLE]

We begin with the induction start: For $\rho=2$ , (5.103) gives for all $t\in[t_{0}-h,t_{0}+h]$

[TABLE]

where the last inequality holds by choice of $A$ in (5.100) and because $t\in[t_{0}-h,t_{0}+h]$ . Hence, the induction start is complete and we continue with the induction step. Assume that (5.104) holds for all powers $2\leq p\leq\rho$ and all $t\in[t_{0}-h,t_{0}+h]$ and show that it holds for $\rho+1$ and all $t\in[t_{0}-h,t_{0}+h]$ as well. We use first (5.103), then the binomial theorem and finally the induction hypothesis (5.104) for powers greater than one:

[TABLE]

Recall that $S_{k}=\max_{m=1,...,r_{n}}\sum_{u\in L_{n}}I_{n,u}^{k,m}\geq\sum_{u=1}I_{n,u}^{k,m}C_{n,u}(s)$ for all $k$ and $m$ as well as for all $s$ , moreover $S_{k}$ is measurable with respect to $\mathcal{F}_{t_{0}-h}^{n}$ . Hence, we may estimate

[TABLE]

Using this estimation we continue with the main inequality chain

[TABLE]

At this point, we see that we’re obviously done with the induction step if $\Gamma_{n}^{t_{0}}=0$ . Hence, we only need to show that the above is lesser than or equal to (5.104) on the event $\Gamma_{n}^{t_{0}}=1$ . This, in turn, we may conclude if the second part above is smaller than or equal to one (on the event $\Gamma_{n}^{t_{0}}=1$ ). This is the case because we have chosen $A$ appropriately and because $h\leq 1$ and $S_{k}\sqrt{h}\geq 1$ (and thus also $S_{k}\geq 1$ ) on $\Gamma_{n}^{t_{0}}$ :

[TABLE]

and the induction is complete. To finalize the proof, we compute the expectation of $S_{k,m}S_{k}^{\rho-2}$ . Note that on $\Gamma_{n}^{t_{0}}=1$ , $S_{k}\leq c_{3}\cdot\sqrt{\frac{r_{n}\overline{p}_{n}(t_{0})}{\log r_{n}\overline{p}_{n}(t_{0})}}=E_{k}^{n,t_{0}}$

[TABLE]

Taking expectations on both sides of (5.104) and together with the previous line, we obtain

[TABLE]

Hence, condition 1 of Lemma 2.15 is fulfilled and we can apply it to get

[TABLE]

∎

Lemma 5.23.

Let $\Omega_{n}(s)$ and $\omega_{n}(s)$ be defined as in the proof of Lemma 5.2. For any $\alpha>0$ there is $\xi>1$ and $C>0$ such that

[TABLE]

Proof.

The proof follows standard arguments. Let $T_{n,k}$ denote a discrete grid of $[0,T]$ with $O(n^{k})$ many elements such that for any $t,s\in T_{n,k}$ with $t\neq s$ it holds that $|t-s|<n^{-k}$ . Then,

[TABLE]

For the first probability we note that for $|t-s|<n^{-k}$ the intervals $[t-2h,t)$ and $[s-2h,s)$ are overlapping on a length of at most $2h$ and the area covered by only one interval is of length at most $2n^{-k}$ . We get for $t<s$ when using $d|M_{n,u}|(r)=dN_{n,u}(r)+\lambda_{n,u}(r)dr$ (recall that $\mathcal{C}_{n,u}$ is bounded by $H_{n}$ )

[TABLE]

The last line is deterministic and converges faster to zero than $\sqrt{h}$ by the Hoelder continuity of $p_{n}(t)$ (cf. Assumption (A6)) and since $H_{n}$ grows moderately (cf. Assumption (H2, (5.83))). For the expressions in the first line, we note that in the end it comes down to evaluating expressions of the type $\sup_{|t-s|<\varepsilon_{n}}\sum_{u\in L_{n}}N_{n,u}([s,t])$ where $\varepsilon_{n}$ equals either $n^{-k}$ or $2h+n^{-k}$ . In Assumption (AD, 5.84) we assume that in both cases the average behaves in such a way that the first probability in (5.106) converges to zero as fast as required if $k$ is chosen large enough. We keep this choice of $k$ fixed for the remainder of the proof.

For the second part of (5.106), we rewrite

[TABLE]

For both parts we have exponential inequalities available in Lemmas 5.22 and 2.16, respectively. So we just have to check that their conditions hold. Since $\mathcal{C}_{n,u}$ is bounded by $H_{n}^{2}$ , we can divide by the bound and apply Lemma 5.22 with $X_{n,u}(s)=1$ and $K_{h,t_{0}}(s)=\frac{1}{2h}\mathbbm{1}(s\in[t-2h,t))$ . Note firstly that by Assumption (A6) we can replace the $p_{n}(t)$ by $\overline{p}_{n}(t)$ when adding a multiplicative constant which we can compensate for by choosing $\xi$ appropriately. Moreover, by Assumption (D3) there are $\Delta_{n}$ -partitions as required and the $\beta$ -mixing coefficients decay exponentially fast. Since $\Delta_{n}=a\log n$ , the mixing coefficients decay as fast as required. Moreover, by the same assumption, $\sup_{t\in[0,T]}\mathbb{P}(\Gamma_{n}^{t}=0)$ vanishes exponentially fast. Finally, by Assumption (H4, 5.83), the bound $H_{n}$ on $K_{m}^{L_{n}}$ behaves exactly such that also the leading term decays as fast we want if $\xi$ is chosen large enough. Therefore, the probability that the first part of (5.107) is larger than $\frac{\xi-1}{2}\sqrt{h}$ decreases to zero faster than any given power of $r_{n}p_{n}$ for large enough $\xi$ .

The second term in (5.107) can be bounded by analogous arguments and Lemma 2.16. Denote $Y_{n,u}=\frac{1}{h}\int_{t-2h}^{t-}\mathcal{C}_{n,u}\lambda_{n,u}(r)dr$ . Then $\mathbb{E}(Y_{n,u})/p_{n}(t)\leq c^{*}$ by Assumption (H2, 3.4) (note that $c^{*}$ is independent of $t$ and $n$ ). Keeping this in mind we obtain for the second term in (5.107) for small enough $h$ for all $n$ and $t$

[TABLE]

Choose $E_{k,m}^{n,t}$ in the same way as in Lemma 5.22 with $K_{h,t}(s)=\frac{1}{2h}\mathbbm{1}(s\in[t-2h,t))$ . Then $|E|_{n,t}=r_{n}\overline{p}_{n}(t)$ and also $E_{k}^{n,t}$ is as defined in Lemma 5.22. Therefore all restrictions on the $\Delta_{n}$ -partitioning are fulfilled by Assumption (D3) and the mixing coefficients vanish exponentially fast. Moreover $\mathbb{P}(\Gamma_{n}^{t}=0)$ vanishes exponentially fast. Lastly, by Assumption (D3, 3.12), the asymptotic uncorrelation conditions hold. Hence, we may apply Lemma 2.16 and obtain the desired results by the same arguments as for the first part of (5.107) by using again Assumption (H2, 5.83).

The proof of the concentration inequality for $\omega_{n}(s)$ follows from similar arguments. ∎

5.6 Details for Example 2.5.3

Let $(D_{1},D_{2})$ and $(\widetilde{D}_{1},\widetilde{D}_{2})$ be two pairs of random variables with

[TABLE]

We suppose that $D_{1},D_{2},\widetilde{D}_{1},\widetilde{D}_{2}\in\mathbb{R}^{p}$ all have the same dimension $p$ . The matrix $\sigma\in\mathbb{R}^{p\times p}$ contains the covariances of $D_{1}$ and $D_{2}$ . Let $|.|$ denote the determinant of a matrix and $I_{p}$ is the $p\times p$ identity matrix. We can compute that (use formulas for the Kullback-Leibler divergence of two multivariate normals and for the determinant of block-matrices)

[TABLE]

Suppose that the entries of $\sigma$ are small and that $\Sigma_{1}$ and $\Sigma_{2}$ are positive definite. In that case $\Sigma_{2}^{-1}\sigma^{\prime}\Sigma_{1}^{-1}\sigma$ has small eigenvalues and $I_{p}-\Sigma_{2}^{-1}\sigma^{\prime}\Sigma_{1}^{-1}\sigma$ is positive definite. With this we may continue the estimation by applying the bound $\log|A|\leq\textrm{tr}(A-I)$ and using the Neumann Series representation:

[TABLE]

Let every entry of $\sigma$ be bounded in absolute value by $\varepsilon$ . Then, there is $c>0$ such that each entry of $\Sigma_{2}^{-1}\sigma^{\prime}\Sigma_{1}^{-1}\sigma$ is in absolute value bounded by $cp^{3}\varepsilon^{2}\leq cp^{4}\varepsilon^{2}$ . And by induction each entry of $\left(\Sigma_{2}^{-1}\sigma^{\prime}\Sigma_{1}^{-1}\sigma\right)^{k}$ is bounded by $c^{k}\varepsilon^{2k}p^{4k}$ . Then, we continue

[TABLE]

We have shown in (2.14) that $\varepsilon=c^{*}\sqrt{6\alpha_{0}}^{\Delta}$ . Moreover, $p=|I_{1}|=|I_{2}|=2(\Delta-1)^{2}$ . Recalling in addition that $6\alpha_{0}<1$ , we conclude overall that $c\varepsilon^{2}p^{4}\to 0$ exponentially fast for $\Delta\to\infty$ .

5.7 Useful Results

For the convenience of the reader, we collect some result which are needed in the proofs.

We will consider the Grouping Lemma in the following form (Rio [43], Lemma 5.1 therein).

Lemma 5.24.

Let $\mathcal{A}$ be a $\sigma$ -field in $(\Omega,\mathcal{F},\mathbb{P})$ and let $X$ be a random variable with values in a Polish space $\mathcal{X}$ . Let $\delta$ be a random variable with uniform distribution over $[0,1]$ which is independent of the $\sigma$ -field generated by $\mathcal{A}$ and $X$ . Then, there exists a random variable $X^{*}$ which has the same law as $X$ and which is independent of $\mathcal{A}$ , such that $\mathbb{P}(X\neq X^{*})=\beta(\mathcal{A},\sigma(X))$ . Furthermore, $X^{*}$ is measurable with respect to the $\sigma$ -field generated by $\mathcal{A}$ and $(X,\delta)$ .

The Bernstein Inequality will be used in this form (cf. Giné and Nickl [17]).

Proposition 5.25.

Let $X_{i}$ , $i=1,...,n$ be a sequence of independent, centred random variables such that there are numbers $c$ and $\sigma_{i}$ such that for all $k$ $\mathbb{E}(|X_{i}|^{k}|)\leq\frac{k!}{2}\sigma_{i}^{2}c^{k-2}$ . Set $\sigma^{2}:=\sum_{i=1}^{n}\sigma_{i}^{2}$ , $S_{n}:=\sum_{i=1}^{n}X_{i}$ . Then, for all $t\geq 0$ $\mathbb{P}(S_{n}\geq t)\leq\exp\left(-\frac{t^{2}}{2(\sigma^{2}+ct)}\right)$ .

Lenglart’s Inequality shows how a martingale may be controlled by using the quadratic variation. We state in the following a slight adaptation of the original version as it is provided in Lenglart [29].

Lemma 5.26.

Let $X$ be a non-negative, right-continuous local sub-martingale and denote by $A$ its compensator. Then it holds for all finite stopping times $S>0$ and all $c,d>0$ that

[TABLE]

In this paper, we will apply Lenglart’s Inequality mostly in the following form which is close to Andersen et al. [3]. The following is an easy corollary to the previous lemma.

Corollary 5.27.

Let $M$ be a locally square integrable, right-continuous martingale and denote by $\langle M\rangle$ it’s compensator.

For all $T,c,d>0$ we have

[TABLE] 2. 2.

For all $T>0$ it is true that

[TABLE]

The main tool for finding the asymptotic distributions in this paper is Rebolledo’s Martingale Central Limit Theorem. It is known that a Brownian Motion is the only continuous Gaussian process with a certain covariance structure. This is used to formulate a martingale central limit theorem in the following. We state here the version of the theorem as Theorem II.5.1 in Andersen et al. [3], the original work is Rebolledo [42].

Let $M^{n}=(M_{1}^{n},...,M_{k}^{n})$ be a vector of sequences of locally square integrable martingales on an interval $\mathcal{T}$ . For $\varepsilon>0$ we denote by $M_{\varepsilon}^{n}$ a vector of locally square integrable martingales that contain all jumps of components of $M^{n}$ which are larger in absolute value than $\varepsilon$ , i.e., $M_{i}^{n}-M_{\varepsilon,i}^{n}$ is a local square integrable martingale for all $i=1,...,k$ and $|\Delta M_{i}^{n}-\Delta M_{\varepsilon,i}^{n}|\leq\varepsilon$ . Furthermore, we denote by $\langle M^{n}\rangle:=\left(\langle M_{i}^{n},M_{j}^{n}\rangle\right)_{i,j=1,...,k}$ the $k\times k$ matrix of quadratic covariations. Moreover, we denote by $M$ a multivariate, continuous Gaussian martingale with $\langle M\rangle_{t}=V_{t}$ , where $V:\mathcal{T}\to\mathbb{R}^{k\times k}$ is a continuous deterministic $k\times k$ positive semi-definite matrix valued function on $\mathcal{T}$ such that its increments $V_{t}-V_{s}$ are also positive semi-definite for $s\leq t$ , then $M_{t}-M_{s}\sim\mathcal{N}(0,V_{t}-V_{s})$ is independent of $(M_{r}:\,r\leq s)$ . Given such a function $V$ , such a Gaussian process $M$ always exists. We can now formulate the central limit theorem for martingales.

Theorem 5.28.

Let $\mathcal{T}_{0}\subseteq\mathcal{T}$ . Assume that for all $t\in\mathcal{T}_{0}$ as $n\to\infty$

[TABLE]

Then

[TABLE]

as $n\to\infty$ for all $t\in\mathcal{T}_{0}$ .

We remark that the predictable quadratic variation may be replaced by the optional quadratic variation. But we do not use that in this paper.

Finally, the theorem by Kantorovich gives a relation between the solution of an equation system and its derivative at the solution (see e.g. Deimling [12]):

Theorem 5.29.

(Newton-Kantorovich Theorem) Let $R(x)=0$ be a system of equations where $R:D_{0}\subseteq\mathbb{R}^{p}\to\mathbb{R}$ is a function defined on $D_{0}$ . Let $R$ be differentiable and denote by $R^{\prime}$ its first derivative. Assume that there is an $x_{0}$ such that all expressions in the following statements exist and such that the following statements are true

$||R^{\prime}(x_{0})^{-1}||\leq B$ , 2. 2.

$||R^{\prime}(x_{0})^{-1}R(x_{0})||\leq\eta$ , 3. 3.

$||R^{\prime}(x)-R^{\prime}(y)||\leq K||x-y||$ * for all $x,y\in D_{0}$ ,* 4. 4.

$r:=BK\eta\leq\frac{1}{2}$ * and $\Omega_{*}:=\{x:||x-x_{0}||<2\eta\}\subseteq D_{0}$ .*

Then there is $x^{*}\in\Omega_{*}$ with $R(x^{*})=0$ and

[TABLE]

We also use Itô’s Formula for semi-martingales with jumps (Theorem 14.2.4 in Cohen and Elliott [7]). Here $X$ is to be understood as the cadlag modification of $X$ (cf. Corollary 5.1.9 in Cohen and Elliott [7]).

Theorem 5.30.

Let $X$ be a $n$ -dimensional vector of semi-martingales $X=(X^{1},...,X^{n})$ and let $f:\mathbb{R}^{n}\to\mathbb{R}$ be a twice continuously differentiable function. Then,

[TABLE]

The above equality means that the processes to the left and to the right are indistinguishable and $[X^{i},X^{j}]$ denotes the optional covariation of $X^{i}$ and $X^{j}$ (see below).

The optional quadratic variation of a cadlag square integrable local martingale $M$ is given by $[M]_{t}:=M_{t}^{2}-\int_{0}^{t}M_{s-}dM_{s}$ . The optional covariation for two such martingales $M$ and $N$ is given by $[M,N]:=\frac{1}{2}([M+n]-[M]-[N])$ .

Acknowledgement This work is part of my PhD Thesis which I have written under the supervision of Prof. Dr. Enno Mammen (Heidelberg University) and Prof. Dr. Wolfgang Polonik (UC Davis). I am thankful for many discussions and helpful remarks. My work was supported by Deutsche Forschungsgemeinschaft through the Research Training Group RTG 1953.

I am also very thankful for the comments of the associate editor and two referees, their suggestions significantly contributed to a great improvement the paper.

Bibliography49

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Andersen and Gill [1982 a] P. K. Andersen and R. D. Gill. Cox’s regression model for counting processes: A large sample study. Ann. Statist. , 10(4):1100–1120, 12 1982 a.
2Andersen and Gill [1982 b] P. K. Andersen and R. D. Gill. Cox’s regression model for counting processes: A large sample study. The Annals of Statistics , 10(4):1100–1120, 1982 b.
3Andersen et al. [1993] P. K. Andersen, O. Borgan, R. D. Gill, and N. Keiding. Statistical Models Based on Counting Processes . Springer, 1993.
4Brownlees et al. [2018] C. Brownlees, E. Nualart, and Y. Sun. Realized networks. Journal of Applied Econometrics , 33(7):986–1006, 2018.
5Butts [2008] C. T. Butts. A relational event framework for social action. Sociological Methodology , 38(1):155–200, 2008.
6Chen and Shao [2004] L. H. Y. Chen and Q.-M. Shao. Normal approximation under local dependence. Ann. Probab. , 32(3):1985–2028, 07 2004.
7Cohen and Elliott [2015] S. N. Cohen and R. J. Elliott. Stochastic Calculus and Applications . Birkhäuser, 2015.
8Cox [1972] D. R. Cox. Regression models and life-tables. Journal of the Royal Statistical Society. Series B (Methodological) , 34(2):187–220, 1972.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Correlation bounds, mixing and mmm-dependence under random time-varying network distances with an application to Cox-Processes

1 Introduction

2 Describing Dependence on Dynamic Networks

2.1 Preliminaries and Notation

Remark 2.1**.**

Definition 2.2**.**

Remark 2.3**.**

2.2 Asymptotic Uncorrelation

Corollary 2.4**.**

2.3 Momentarily m-Dependent Networks

Definition 2.5**.**

Definition 2.6**.**

Lemma 2.7**.**

Remark 2.8**.**

Proposition 2.9**.**

Definition 2.10**.**

Theorem 2.11**.**

2.4 Mixing Networks

Definition 2.12**.**

Definition 2.13**.**

Remark 2.14**.**

Lemma 2.15**.**

Lemma 2.16**.**

2.5 Examples

2.5.1 On Δ\DeltaΔ-Partitions

Example 2.17**.**

Lemma 2.18**.**

Proof.

Example 2.19**.**

2.5.2 Example: Momentary-mmm-Dependence

2.5.3 Example: Mixing

Remark 2.20**.**

2.6 Processes Indexed by Vertices

3 Application

Theorem 3.1**.**

Definition 3.2**.**

4 Bike Data Illustration

5 Appendix

5.1 Proofs of Section 2

Proof of Theorem 2.11.

Proof of Lemma 2.15.

Proposition 5.1**.**

Proof.

Proof of Lemma 2.16.

5.2 Proof of Theorem 3.1

Lemma 5.2**.**

Lemma 5.3**.**

Lemma 5.4**.**

Lemma 5.5**.**

Lemma 5.6**.**

Lemma 5.7**.**

Lemma 5.8**.**

Proof.

Lemma 5.9**.**

Proof.

Lemma 5.10**.**

Proof.

Lemma 5.11**.**

Proof.

Lemma 5.12**.**

Proof.

Lemma 5.13**.**

Proof.

Lemma 5.14**.**

Proof.

Proposition 5.15**.**

Proof.

Lemma 5.16**.**

Proof.

Corollary 5.17**.**

Proof.

Proposition 5.18**.**

Proof.

Correlation bounds, mixing and $m$ -dependence under random time-varying network distances with an application to Cox-Processes

Remark 2.1.

Definition 2.2.

Remark 2.3.

Corollary 2.4.

Definition 2.5.

Definition 2.6.

Lemma 2.7.

Remark 2.8.

Proposition 2.9.

Definition 2.10.

Theorem 2.11.

Definition 2.12.

Definition 2.13.

Remark 2.14.

Lemma 2.15.

Lemma 2.16.

2.5.1 On $\Delta$ -Partitions

Example 2.17.

Lemma 2.18.

Example 2.19.

2.5.2 Example: Momentary- $m$ -Dependence

Remark 2.20.

Theorem 3.1.

Definition 3.2.

Proposition 5.1.

Lemma 5.2.

Lemma 5.3.

Lemma 5.4.

Lemma 5.5.

Lemma 5.6.

Lemma 5.7.

Lemma 5.8.

Lemma 5.9.

Lemma 5.10.

Lemma 5.11.

Lemma 5.12.

Lemma 5.13.

Lemma 5.14.

Proposition 5.15.

Lemma 5.16.

Corollary 5.17.

Proposition 5.18.

Proposition 5.19.

Lemma 5.20.

Lemma 5.21.

Lemma 5.22.

Lemma 5.23.

Lemma 5.24.

Proposition 5.25.

Lemma 5.26.

Corollary 5.27.

Theorem 5.28.

Theorem 5.29.

Theorem 5.30.