Nonparametric estimation of jump rates for a specific class of Piecewise   Deterministic Markov Processes

Nathalie Krell (IRMAR); Emeline Schmisser (LPP)

arXiv:1901.10166·math.ST·December 9, 2020

Nonparametric estimation of jump rates for a specific class of Piecewise Deterministic Markov Processes

Nathalie Krell (IRMAR), Emeline Schmisser (LPP)

PDF

Open Access

TL;DR

This paper introduces a nonparametric method to estimate the jump rate of a specific class of PDMPs, using an adaptive stationary density estimator and a quotient estimator, with theoretical risk bounds and simulation validation.

Contribution

It develops a novel adaptive estimation procedure for the jump rate of PDMPs, achieving nearly minimax optimality with theoretical guarantees.

Findings

01

Estimator of jump rate is nearly minimax optimal.

02

Uniform risk bounds are established for the estimators.

03

Simulations demonstrate the estimator's effectiveness.

Abstract

In this paper, we consider a piecewise deterministic Markov process (PDMP), with known flow and deterministic transition measure, and unknown jump rate $λ$ . To estimate nonparametrically the jump rate, we first construct an adaptive estimator of the stationary density, then we derive a quotient estimator $\hat{λ}_{n}$ of $λ$ . We provide uniform bounds for the risk of these estimators, and prove that the estimator of the jump rate is nearly minimax (up to a $ln^{2} (n)$ factor). Simulations illustrate the behavior of our estimator.

Equations605

λ (x) = \frac{ν ( x )}{D ( x )}

λ (x) = \frac{ν ( x )}{D ( x )}

\forall x \in R^{+}, \exists ε^{'} > 0 such that \int_{0}^{ε^{'}} λ (ϕ (x, s)) d s < \infty

\forall x \in R^{+}, \exists ε^{'} > 0 such that \int_{0}^{ε^{'}} λ (ϕ (x, s)) d s < \infty

P (T_{1} > t ∣ X_{0} = x_{0}) = e^{- Λ (x_{0}, t)}, where Λ (x, t) = \int_{0}^{t} λ (ϕ (x, s)) d s .

P (T_{1} > t ∣ X_{0} = x_{0}) = e^{- Λ (x_{0}, t)}, where Λ (x, t) = \int_{0}^{t} λ (ϕ (x, s)) d s .

X (t) = {ϕ (x_{0}, t) Z_{1} for t < T_{1}, for t = T_{1} .

X (t) = {ϕ (x_{0}, t) Z_{1} for t < T_{1}, for t = T_{1} .

P (Y_{1} > y ∣ Z_{0} = z_{0})

P (Y_{1} > y ∣ Z_{0} = z_{0})

= exp (- \int_{0}^{(ϕ_{z_{0}})^{- 1} (y)} λ (ϕ_{z_{0}} (s)) d s) 1 l_{{y \geq z_{0}}}

P (Y_{1} > y ∣ Z_{0} = z_{0}) = exp (- \int_{z_{0}}^{y} λ (u) (ϕ_{z_{0}}^{- 1})^{'} (u) d u) 1 l_{{y \geq z_{0}}} .

P (Y_{1} > y ∣ Z_{0} = z_{0}) = exp (- \int_{z_{0}}^{y} λ (u) (ϕ_{z_{0}}^{- 1})^{'} (u) d u) 1 l_{{y \geq z_{0}}} .

P (z_{0}, y) := λ (y) (ϕ_{z_{0}}^{- 1})^{'} (y) e^{- \int_{z_{0}}^{y} λ (u) (ϕ_{z_{0}}^{- 1})^{'} (u) d u} 1 l_{{y \geq z_{0}}} .

P (z_{0}, y) := λ (y) (ϕ_{z_{0}}^{- 1})^{'} (y) e^{- \int_{z_{0}}^{y} λ (u) (ϕ_{z_{0}}^{- 1})^{'} (u) d u} 1 l_{{y \geq z_{0}}} .

μ (d z) = \int_{R^{+}} ν (d y) Q (y, d z) = \int_{R^{+}} ρ (d y, d z), ρ (d y, d z) = ν (d y) Q (y, d z),

μ (d z) = \int_{R^{+}} ν (d y) Q (y, d z) = \int_{R^{+}} ρ (d y, d z), ρ (d y, d z) = ν (d y) Q (y, d z),

ν (d y) = \int_{R^{+}} ξ (d x, d y) = \int_{R^{+}} P (z, d y) μ (d z), ξ (d z, d y) = μ (d z) P (z, d y) .

∣ E (ψ (Y_{k}, Z_{k}) ∣ Z_{0} = z_{0}) - E_{ρ} (ψ (Y_{1}, Z_{1})) ∣ \leq R V_{λ} (z_{0}) γ^{k} .

∣ E (ψ (Y_{k}, Z_{k}) ∣ Z_{0} = z_{0}) - E_{ρ} (ψ (Y_{1}, Z_{1})) ∣ \leq R V_{λ} (z_{0}) γ^{k} .

y \in [i_{1}, i_{2}] sup ν (y) = y \in [i_{1}, i_{2}] sup \int_{0}^{i_{2}} P (z, y) μ (d z) < \infty.

y \in [i_{1}, i_{2}] sup ν (y) = y \in [i_{1}, i_{2}] sup \int_{0}^{i_{2}} P (z, y) μ (d z) < \infty.

E_{z_{0}} (\frac{1}{n} k = 1 \sum n s (Y_{k}, Z_{k})) - \int s (y, z) ρ (d y, d z) \leq ∥ s ∥_{\infty} \frac{R V _{λ} ( z _{0} )}{n ( 1 - γ )}

E_{z_{0}} (\frac{1}{n} k = 1 \sum n s (Y_{k}, Z_{k})) - \int s (y, z) ρ (d y, d z) \leq ∥ s ∥_{\infty} \frac{R V _{λ} ( z _{0} )}{n ( 1 - γ )}

Var_{z_{0}} (\frac{1}{n} k = 1 \sum n s (Y_{k}, Z_{k})) \leq \frac{1}{n} \int s^{2} (y, z) ρ (d y, d z) + \frac{∥ s ∥ _{\infty}}{n} \int ∣ s (y, z) ∣ G_{λ} (z) ρ (d y, d z) + \frac{c _{λ} ∥ s ∥ _{\infty}^{2}}{n ^{2}}

Var_{z_{0}} (\frac{1}{n} k = 1 \sum n s (Y_{k}, Z_{k})) \leq \frac{1}{n} \int s^{2} (y, z) ρ (d y, d z) + \frac{∥ s ∥ _{\infty}}{n} \int ∣ s (y, z) ∣ G_{λ} (z) ρ (d y, d z) + \frac{c _{λ} ∥ s ∥ _{\infty}^{2}}{n ^{2}}

O_{a}^{b} = σ ({X_{j_{1}} \in I_{1}, \dots, X_{j_{n}} \in I_{n}}, a \leq j_{1} \leq \dots \leq j_{n} \leq b, n \in N, I_{k} \in B (R^{+})) .

O_{a}^{b} = σ ({X_{j_{1}} \in I_{1}, \dots, X_{j_{n}} \in I_{n}}, a \leq j_{1} \leq \dots \leq j_{n} \leq b, n \in N, I_{k} \in B (R^{+})) .

β_{X} (t) = k sup E \in O_{0}^{k} \times O_{t + k}^{\infty} sup ∣ P_{O_{0}^{k}, O_{t + k}^{\infty}} (E) - P_{O_{0}^{k}} \otimes P_{O_{t + k}^{\infty}} (E) ∣

β_{X} (t) = k sup E \in O_{0}^{k} \times O_{t + k}^{\infty} sup ∣ P_{O_{0}^{k}, O_{t + k}^{\infty}} (E) - P_{O_{0}^{k}} \otimes P_{O_{t + k}^{\infty}} (E) ∣

β_{Y, Z} (k) \leq c γ^{k} where c = R \int V_{λ} (z) μ (d z) + R (1 + R) V_{λ} (x_{0}) .

β_{Y, Z} (k) \leq c γ^{k} where c = R \int V_{λ} (z) μ (d z) + R (1 + R) V_{λ} (x_{0}) .

λ (y) (ϕ_{z_{0}}^{- 1})^{'} (y) 1 l_{{z_{0} \leq y}} P (Y_{1} > y ∣ Z_{0} = z_{0})

λ (y) (ϕ_{z_{0}}^{- 1})^{'} (y) 1 l_{{z_{0} \leq y}} P (Y_{1} > y ∣ Z_{0} = z_{0})

λ (y) E (1 l_{{Z_{0} \leq y < Y_{1}}} (ϕ_{Z_{0}}^{- 1})^{'} (y) Z_{0} = z_{0})

λ (y) E_{ξ} ((ϕ_{Z_{0}}^{- 1})^{'} (y) 1 l_{{Z_{0} \leq y < Y_{1}}}) = \int P (z, y) μ (d z) = ν (y)

λ (y) E_{ξ} ((ϕ_{Z_{0}}^{- 1})^{'} (y) 1 l_{{Z_{0} \leq y < Y_{1}}}) = \int P (z, y) μ (d z) = ν (y)

D (y) := E_{ξ} ((ϕ_{Z_{0}}^{- 1})^{'} (y) 1 l_{{Z_{0} \leq y < Y_{1}}}) .

D (y) := E_{ξ} ((ϕ_{Z_{0}}^{- 1})^{'} (y) 1 l_{{Z_{0} \leq y < Y_{1}}}) .

λ (y) = \frac{ν ( y )}{D ( y )} .

λ (y) = \frac{ν ( y )}{D ( y )} .

y \in I in f D (y) \geq D_{0} > 0.

y \in I in f D (y) \geq D_{0} > 0.

D (y) \geq Φ_{0} P_{ξ} (Z_{0} \leq y < Y_{1}) .

D (y) \geq Φ_{0} P_{ξ} (Z_{0} \leq y < Y_{1}) .

\hat{D}_{n} (y) = \frac{1}{n} k = 1 \sum n (ϕ_{Z_{k}^{- 1}})^{'} (y) 1_{Z_{k - 1} \leq y \leq Y_{k}} > 0.

\hat{D}_{n} (y) = \frac{1}{n} k = 1 \sum n (ϕ_{Z_{k}^{- 1}})^{'} (y) 1_{Z_{k - 1} \leq y \leq Y_{k}} > 0.

0 < m (y) \leq (ϕ_{x}^{- 1})^{'} (y) \leq M (y) .

0 < m (y) \leq (ϕ_{x}^{- 1})^{'} (y) \leq M (y) .

\forall y \geq i_{1}, λ (y) m (y) \geq a \frac{y ^{b}}{b + 1} .

\forall y \geq i_{1}, λ (y) m (y) \geq a \frac{y ^{b}}{b + 1} .

i_{2}^{'} = max (i_{2}, (i_{2} - i_{1}) + (\frac{1}{a ( 1 - κ ^{b + 1} )} ln (\frac{2 κ ^{b + 1}}{1 - κ ^{b + 1}}))^{1/ (b + 1)} 1 l_{{κ^{b + 1} \geq 1/3}}) .

i_{2}^{'} = max (i_{2}, (i_{2} - i_{1}) + (\frac{1}{a ( 1 - κ ^{b + 1} )} ln (\frac{2 κ ^{b + 1}}{1 - κ ^{b + 1}}))^{1/ (b + 1)} 1 l_{{κ^{b + 1} \geq 1/3}}) .

Q (x, d y) = Q_{1} (x, y) d y + p_{0} (x) δ_{0} (d y) + i = 1 \sum j_{Q} p_{i} (x) δ_{f_{i} (x)} (d y)

Q (x, d y) = Q_{1} (x, y) d y + p_{0} (x) δ_{0} (d y) + i = 1 \sum j_{Q} p_{i} (x) δ_{f_{i} (x)} (d y)

E (s, b, α) = {λ \in H^{α} (J), \forall y \geq i_{1}, λ (y) m (y) \geq \frac{a y ^{b}}{b + 1}, \int_{0}^{i_{1}} λ (u) M (u) \leq l, ∥ λ ∥_{H^{α} (J)} \leq L}

E (s, b, α) = {λ \in H^{α} (J), \forall y \geq i_{1}, λ (y) m (y) \geq \frac{a y ^{b}}{b + 1}, \int_{0}^{i_{1}} λ (u) M (u) \leq l, ∥ λ ∥_{H^{α} (J)} \leq L}

J = J_{⌊ α ⌋} \cup [i_{1}, i_{2}^{'}] := [j_{1}, j_{2}]

J = J_{⌊ α ⌋} \cup [i_{1}, i_{2}^{'}] := [j_{1}, j_{2}]

J_{0} = I and J_{k + 1} = Conv I \cup i = 1 ⋃ j_{Q} f_{i}^{- 1} (J_{k}) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMarkov Chains and Monte Carlo Methods · Simulation Techniques and Applications · Advanced Queuing Theory Analysis

Full text

Nonparametric estimation of jump rates for a specific class of piecewise deterministic Markov processes

N. Krell

E. Schmisser Université de Rennes 1, Institut de Recherche mathématique de Rennes, CNRS-UMR 6625, Campus de Beaulieu. Bâtiment 22, 35042 Rennes Cedex, France. email: [email protected] Paul Painlevé Université des Sciences et Technologies de Lille, Bureau 314, Bâtiment M3, Cité Scientifique, 59 655 Villeneuve d’Ascq Cedex email: [email protected]

Abstract

In this paper, we consider a unidimensional piecewise deterministic Markov process (PDMP), with homogeneous jump rate $\lambda(x)$ . This process is observed continuously, so the flow $\phi$ is known. To estimate nonparametrically the jump rate, we first construct an adaptive estimator of the stationary density, then we derive a quotient estimator $\hat{\lambda}_{n}$ of $\lambda$ . Under some ergodicity conditions, we bound the risk of these estimators (and give a uniform bound on a small class of functions), and prove that the estimator of the jump rate is nearly minimax (up to a $\ln^{2}(n)$ factor). The simulations illustrate our theoretical results.

Keywords: Piecewise deterministic Markov processes, model selection, nonparametric estimation

Mathematical Subject Classification: 62G05, 62G07, 62M05, 60J25

1 Introduction

Piecewise deterministic Markov processes are a large class of continuous-time stochastic models first introduced by Davis [13]. They are used to model deterministic phenomenons in which randomness appears as point events. They are not diffusions, which adds complexity to their study. This family of stochastic processes is well adapted to model various problems in biology (see for instance Cloez et al. [10], Rudnicki and Tyran-Kamińska [29]), neuroscience (Höpfner et al [22], Renault et al [28]), physics (Blanchard and Jadczyk [9]), reliability (De Saporta et al. [14]), optimal consumption and exploration (Farid and David [18]), risk insurance, seismology,…. See also the references in the survey Azaïs et al. [4].

In this article, we consider a filtered piecewise deterministic Markov process (PDMP) $(X_{t})_{t\geq 0}$ taking values in $\mathbb{R}^{+}$ , with flow $\phi$ , transition measure $Q(x,dy)$ and homogeneous jump rate $\lambda(x)$ . Starting from initial value $x_{0}$ , the process follows the flow $\phi$ until the first jump time $T_{1}$ which occurs spontaneously in a Poisson-like fashion with rate $\lambda(\phi(x,t))$ . The post-jump location of the process at time $T_{1}$ is governed by the transition distribution $Q(\phi(x_{0},T_{1}),dy)$ and the motion restarts from this new point as before.

To fix the ideas, let us consider two major examples of unidimensional PDMP.

The TCP (transmission control protocol) (see Dumas et al. [17], Guillemin et al. [20] for instance) is one of the main data transmission protocol in Internet. The maximum number of packets that can be sent at time $t_{k}$ in a round is a random variable $X_{t_{k}}$ . If the transmission is successful, then the maximum number of packets is increased by one: $X_{t_{k+1}}=X_{t_{k}}+1$ . If the transmission fails, then we set $X_{t_{k+1}}=\kappa X_{t_{k}}$ with $\kappa\in(0,1)$ . A correct scaling of this process leads to a piecewise deterministic Markov process $(X_{t})$ with flow $\phi(x,t)=x+ct$ and deterministic transition measure $Q(x,{y})={\mathchoice{\rm 1\mskip-4.0mul}{\rm 1\mskip-4.0mul}{\rm 1\mskip-4.5mul}{\rm 1\mskip-5.0mul}}_{\{y=\kappa x\}}$ . This process grows linearly (by construction) and the constant $\kappa$ can be configured in the server implementation (so it is also known), but the moment when the transmission fails is of course unknown. In the literature it is usually supposed that the jump rate satisfies $\lambda(x)=x$ , but with this work we can check whether it is a realistic assumption or not.

Another example of PDMP is the size of a marked bacteria (see Doumic et al. [16], Robert et al. [26], Laurençot and Perthame [25]). We randomly choose a bacteria, and follow its growth, until it divides in two. Then we randomly choose one of its daughters, and so on. Between the jumps, the bacteria grows exponentially: $\phi(x,t)=xe^{ct}$ . The size of the bacteria after the division is random, as the bacteria does not divide itself in two equal parts.

The process $(X_{t})$ is observed continuously without errors (so the flow $\phi$ is known); it is assumed to be ergodic, with fast convergence toward the stationary measure, and exponentially $\beta$ -mixing. We denote by $(T_{1},\ldots,T_{n})$ the jump times and consider the Markov chain $(Z_{0}=x_{0},(Y_{k}=X_{T_{k}^{-}},Z_{k}=X_{T_{k}})_{k\in\mathbb{N}})$ . Our aim is to construct a non-parametric adaptive estimator of the jump rate $\lambda$ on a compact interval.

There exist few results concerning PDMP’s estimation. Azaïs et al. [5] and Azaïs and Muller-Gueudin [3] consider a more general model, for a multidimensional PDMP. They construct a quotient of kernel estimators, which estimate the compound function $\lambda(\phi(x,t))$ . Their estimator is consistent ([5]), asymptotically normal, and its pointwise rate of convergence depends on the bandwidth of the kernel (see [3]). They explain how to construct an adaptive estimator, but do not bound its risk.

Doumic et al. [16] and Hodara et al. [21] also consider multi-dimensional PDMPs but for very specific biological models.

Fujii [19] and Krell [24] both consider unidimensional PDMP, and provide estimators of $\lambda(x)$ . [19] constructs an estimator of $\lambda(x)$ thanks to a Rice formula, by estimating local times. He proves the consistency of his estimators. [24] considers a deterministic transition measure (so $Y_{k}$ is a function of $Z_{k}$ ). Her estimator of $\lambda$ is a quotient of a kernel estimator of the stationary density of $Z_{k}$ and an empirical estimator $\hat{\mathbf{D}}_{n}$ of another function $\mathbf{D}$ with the parametric rate of convergence $n^{1/2}$ . This nonparametric estimator is asymptotically normal, and bounds for the pointwise risk are provided. In a very recent article, Azaïs and Genadot [2] construct a nonparametric estimator of $\lambda(x)$ for a multidimensional PDMP and prove its consistency.

This article is an extension of the work of [24]. We consider a wider class of models (in particular, the transition measure $Q$ does not need to be deterministic any more). We bound the $L^{2}$ risk of the adaptive estimator, whereas [24] only considers the pointwise risk of the nonparametric estimator with fixed bandwidth $h$ . We also prove that our estimator is minimax (up to a $\ln^{2}(n)$ factor).

For this purpose, in analogy with [24], we use the equality

[TABLE]

where $\nu$ is the stationary density of pre-jump locations $Y_{k}$ (see Assumption A2 for the existence of this stationary density) and $\mathbf{D}$ a function defined in equation (5). We get an estimator $\hat{\mathbf{D}}_{n}(x)$ , which converges with rate $n^{1/2}$ . To estimate the density function $\nu$ , we use a projection method. We obtain a series of estimators $(\hat{\nu}_{0},\hat{\nu}_{1},\ldots,\hat{\nu}_{m},\ldots)$ of $\nu$ . Then we choose the ”best” estimator by a penalization method, in the same way as Barron et al. [6], and give an oracle inequality for the adaptive estimator $\hat{\nu}_{\hat{m}}$ . The constant in the penalty term is intractable, but can be estimated thanks to a slope heuristic. Finally, we construct a quotient estimator of $\lambda$ , $\hat{\lambda}=\hat{\nu}_{\hat{m}}/\hat{\mathbf{D}}_{n}$ , and bound its $L^{2}$ -risk. In Section 2, we specify the model and its assumptions. The main results are stated in Section 3. Proofs are gathered in Section 4 and in Appendix A for the technical results. In Appendix B, some simulations for the TCP protocol and the bacterial growth are provided, with various functions $\lambda$ . The outcomes are consistent with the theoretical results.

2 PDMP

A piecewise deterministic Markov process (PDMP) is defined by its local characteristics, namely, the jump rate $\lambda$ , the flow $\phi$ and the transition measure $Q$ according to which the location of the process is chosen after the jump. In this article, we consider a unidimensional PDMP $\{X(t)\}_{t\geq 0}$ . More precisely,

Assumption A 1.

**

a.

The flow $\phi:\mathbb{R}^{+}\times\mathbb{R}^{+}\mapsto\mathbb{R}^{+}$ is a one-parameter group of homeomorphisms: $\phi$ is $\mathcal{C}^{1}$ , for each $t\in\mathbb{R}^{+}$ , $\phi(.,t)$ is an homeomorphism satisfying the semigroup property: $\phi(.,t+s)=\phi(\phi(.,s),t)$ and for each $x\in\mathbb{R}^{+}$ , $\phi_{x}(.):=\phi(x,.)$ is an increasing $\mathcal{C}^{1}$ -diffeormorphism. This implies that $\phi(x,0)=x$ . 2. b.

The jump rate $\lambda:\mathbb{R}^{+}\rightarrow\mathbb{R}^{+}$ is a measurable function satisfying

[TABLE]

that is, the jump rate does not explode. 3. c.

$\forall x\in\mathbb{R}^{+}$ , $Q(x,\mathbb{R}^{+}\setminus\{x\})=1$ .

For instance, we can take $\phi(x,t)=x+ct$ (linear flow) or $\phi(x,t)=xe^{ct}$ (exponential flow). The transition measure may be continuous with respect to the Lebesgue measure or deterministic ( $Q(x,\{y\})={\mathchoice{\rm 1\mskip-4.0mul}{\rm 1\mskip-4.0mul}{\rm 1\mskip-4.5mul}{\rm 1\mskip-5.0mul}}_{\{y=f(x)\}}$ ).

Given these three characteristics, it can be shown (Davis [13, p62-66]), that there exists a filtered probability space $(\Omega,\mathcal{F},\{\mathcal{F}_{t}\},\{\mathbb{P}_{x}\})$ such that the motion of the process $\{X(t)\}_{t\geq 0}$ starting from a point $x_{0}\in\mathbb{R}^{+}$ may be constructed as follows. Consider a random variable $T_{1}$ with survival function

[TABLE]

If $T_{1}$ is equal to infinity, then the process $\{X(t)\}_{t\geq 0}$ follows the flow, i.e. for $t\in\mathbb{R}^{+}$ , $X(t)=\phi(x_{0},t)$ . Otherwise let $Y_{1}=\phi(x_{0},T_{1}^{-})$ the pre-jump location and $Z_{1}$ the post-jump location. $Z_{1}$ is defined through the transition kernel $Q$ : $\mathbb{P}\left(\left.Z_{1}\in A\right|Y_{1}=y\right)=\int_{A}Q(y,dz)$ . The trajectory of $\{X(t)\}$ starting at $x_{0}$ , for $t\in[0,T_{1}]$ , is given by

[TABLE]

Inductively starting from $X(T_{n})=Z_{n}$ , we now select the next inter-jump time $T_{n+1}-T_{n}$ and post-jump location $X(T_{n+1})=Z_{n+1}$ in a similar way. This construction properly defines a strong Markov process $\{X(t)\}_{t\geq 0}$ with jump times $\{T_{k}\}_{k\in\mathbb{N}}$ (where $T_{0}=0$ ). A very natural Markov chain is linked to $\{X(t)\}_{t\geq 0}$ , namely the jump chain $\{Y_{n},Z_{n}\}_{n\in\mathbb{N}}$ (or, equivalently, $\{T_{n},Z_{n}\}_{n\in\mathbb{N}}$ ).

To simplify the notations, let us set $\phi_{x}(t)=\phi(x,t)$ and $z_{0}=x_{0}$ . By (1),

[TABLE]

and by the change of variable $u=\phi_{z_{0}}(s)$ (we recall that for any $z\in\mathbb{R}^{+}$ , $\phi_{z}$ is a monotonic function), we get

[TABLE]

If the function $\lambda(y)(\phi_{z_{0}}^{-1})^{\prime}(y)$ is finite, we obtain the conditional density:

[TABLE]

By analogy, we set $\mathcal{P}(z_{0},dy)=\mathbb{P}\left(\left.Y_{1}\in dy\right|Z_{0}=z_{0}\right)$ .

Our aim is to estimate the jump rate $\lambda$ on the compact interval $\mathcal{I}:=[i_{1},i_{2}]\subset(0,\infty)$ .

The ergodicity is often a keystone in statistical inference for Markov processes. We also assume fast convergence toward the stationary density.

Assumption A 2.

**

a.

The jump rate does not explode before $i_{2}$ : for all $x\leq i_{1}$ , $\int_{0}^{i_{1}}\lambda(y)(\phi_{x}^{-1})^{\prime}(y)dy<\infty$ and $\sup_{y\in[i_{1},i_{2}]}\lambda(y)<\infty$ . 2. b.

The process $(Y_{k},Z_{k})$ is recurrent positive and strongly ergodic. We denote by $\boldsymbol{\nu}$ the stationary measure of $Y_{k}$ , by $\mu$ that of $Z_{k}$ , by $\rho$ the stationary measure of the couple $(Y_{k},Z_{k})$ and by $\xi$ that of $(Z_{k},Y_{k+1})$ . We have that:

[TABLE] 3. c.

There exist a function $\mathbf{V}_{\lambda}$ greater than 1, two constants $\gamma\in]0,1[$ , $R\in\mathbb{R}^{+*}$ such that, for any function $\psi:(\mathbb{R}^{+})^{2}\mapsto\mathbb{R}^{+}$ , $|\psi|\leq\mathbf{V}_{\lambda}$ , for any integer $k$ :

[TABLE]

The inequality $|\psi|\leq\mathbf{V}_{\lambda}$ means that, for any $(y,z)\in(\mathbb{R}^{+})^{2}$ , $|\psi(y,z)|\leq\mathbf{V}_{\lambda}(z)$ . This inequality is true in particular for any function $\psi$ bounded by 1 and for $\psi(y,z)=\mathbf{V}_{\lambda}(z)$ .

Under Assumption A2 a, the conditional measure $\mathcal{P}$ is continuous with respect to the Lebesgue measure on $[0,i_{2}]\times[i_{1},i_{2}]$ and $\sup_{x,y\in[0,i_{2}]\times[i_{1},i_{2}]}\mathcal{P}(x,y)<\infty$ . So is $y\to\nu(y)$ : $\boldsymbol{\nu}(dy)=\nu(y)dy$ . Moreover,

[TABLE]

We can also remark that, for any $x>0$ , $|\mathbb{E}_{\mu}\left(\mathbf{V}_{\lambda}(Z_{1})\right)|\leq\mathbf{V}_{\lambda}(x)+R\mathbf{V}_{\lambda}(x)<\infty$ .

Let us set $\mathbb{E}_{z_{0}}\left(U\right)=\mathbb{E}\left(\left.U\right|Z_{0}=z_{0}\right)$ . Under Assumption A2, the empirical mean is close to its expectation under the stationary density, as shown by the following lemma (proved in the Appendix).

Lemma 1.

Under Assumptions A1-A2, for any bounded function $s$ :

[TABLE]

and

[TABLE]

where $G_{\lambda}(z)=\frac{R}{1-\gamma}\left(\mathbf{V}_{\lambda}(z)+\int\mathbf{V}_{\lambda}(u)\mu(du)\right)$ and $c_{\lambda}$ depends explicitly on $(\gamma,R,\mathbf{V}_{\lambda})$ . We can remark that $C_{\lambda}:=\int G_{\lambda}(z)\mu(dz)=\frac{2R}{1-\gamma}\int\mathbf{V}_{\lambda}(z)\mu(dz)$ .

In the bound of the variance, the first term is the same as for i.i.d variables. The second term is due to covariance terms (we found a similar term for stationary $\beta$ -mixing processes), the third comes from the non-stationarity of the random vectors $(Y_{k},Z_{k})$ .

To study an adaptive estimator of $\nu$ , we need to prove that the Markov chain $(Y_{k},Z_{k})$ is weakly dependent. It is the case if the process is $\beta$ -mixing.

Definition 2.

Let $(X_{k})_{k\geq 0}$ be a Markov process. Let us define the $\sigma$ -algebra

[TABLE]

The $\beta$ -mixing coefficient of the Markov chain $(X_{k})$ is

[TABLE]

where $P_{\mathscr{O},\mathscr{S}}$ is the joint law of an event on $\mathscr{O}\times\mathscr{S}$ . The $\beta$ -mixing coefficient characterizes the dependence between what happens before $T_{k}$ and what happens after $T_{t+k}$ . The process $(X_{k})_{k\geq 0}$ is $\beta$ -mixing if $\lim_{k\rightarrow\infty}\beta_{X}(k)=0$ . It is exponentially (or geometrically) $\beta$ -mixing if there exists two positive constants $c$ , $\beta$ such that $\beta_{X}(k)\leq ce^{-\beta k}$ .

The following lemma is a consequence of Assumption A2. It is proved in the Appendix.

Lemma 3.

Under Assumptions A1-A2, the Markov chain $(Y_{k},Z_{k})$ is geometrically $\beta$ -mixing. Moreover, its $\beta$ -mixing coefficient satisfies: $\forall k\in\mathbb{N}$ :

[TABLE]

Estimating directly $\lambda$ is difficult, but we can construct a quotient estimator. By (2) and (3), we get that, for any $y\in\mathcal{I}$ ,

[TABLE]

and we integrate with respect to the stationary distribution $\mu$ of $Z_{0}$

[TABLE]

recalling that $\xi$ is the stationary measure of the couple $(Z_{0},Y_{1})$ . Let us set

[TABLE]

Then, if $\mathbf{D}(y)>0$ , we get:

[TABLE]

It remains to ensure that $\mathbf{D}(y)>0$ on $\mathcal{I}=[i_{1},i_{2}]$ .

Assumption A 3.

There exists $D_{0}>0$ such that

[TABLE]

*Remark**.*

Assumption A3 is very natural; indeed, let us set $\Phi_{0}:=\inf_{x\leq i_{2},y\in\mathcal{I}}(\phi_{x}^{-1})^{\prime}(y)$ . As $\phi_{x}$ is invertible, and $\phi_{\cdot}^{\prime}(\cdot)$ is continuous, $\Phi_{0}>0$ . Then

[TABLE]

If the probability $\mathbb{P}_{\xi}\left(Z_{0}\leq y<Y_{1}\right)$ is null, then under the stationary distribution, the probability that $(X_{t})$ passes through $y$ is null and the jump rate at that point can not be measured.

We can remark that if $\mathbf{D}>0$ for some point $y$ , then so is $\mathbf{P}(1_{Z_{0}\leq y\leq Y_{1}})>0$ and its estimator

[TABLE]

Then if we take an interval $[\hat{i}_{1},\hat{i}_{2}]$ such that for some $n$ , and some observation $(X_{t})_{t\geq 0}$ , $\hat{D}_{n}$ is positive on this interval, then Assumption A3 is satisfied on $[\hat{i}_{1},\hat{i}_{2}]$ . However, the true value of $D_{0}$ is unknown in that case. It should be noted that the interval $[\hat{i}_{1},\hat{i}_{2}]$ should not be changed for each simulation, otherwise the convergence of the estimator on the whole interval can not be guaranteed (the interval of estimation would become larger and larger, and as $D$ is smaller on the edges on the new interval, and the convergence of the estimator is therefore slower).

Assumptions A2 and A3 are not explicit in $(\lambda,Q,\phi)$ , so it is not easy to check that a particular model satisfies those assumptions. We give some explicit sufficient conditions on the coefficients $(\lambda,Q,\phi)$ . For the next assumption, we use the Hölder spaces $H^{\alpha}$ , as defined in Appendix A.4.

Assumption (S).

**

a.

The transition kernel is a contraction mapping: there exists $\kappa<1$ , such that $\mathbb{P}\left(Z_{1}\leq\kappa Y_{1}\right)=1$ . 2. b.

The flow is bounded: there exist two functions $\mathbf{m}$ and $\mathbf{M}$ such that, $\forall x,y\in(\mathbb{R}^{+})^{2}$ :

[TABLE] 3. c.

The jump rate is positive on $[i_{1},\infty[$ and there exists $\mathbf{a}>0$ , $b>-1$ such that

[TABLE]

Then $\forall y\geq z$ , $\mathbb{P}_{z}\left(Y_{1}\geq y\right)\leq\exp(-\mathbf{a}(y^{b+1}-z^{b+1}))$ and $\lim_{y\rightarrow\infty}\mathbb{P}_{z}\left(Y_{1}\geq y\right)=0$ . 4. d.

The jump rate does not explode too soon: there exist two positive constants $\mathbf{L},\mathbf{l}$ , such that $\left\|\lambda\right\|_{L^{\infty}([i_{1},i^{\prime}_{2}])}\leq\mathbf{L}$ and $\int_{0}^{i_{1}}\lambda(u)\mathbf{M}(u)du\leq\mathbf{l}$ where

[TABLE]

These conditions ensure that Assumptions A2 and A3 are satisfied. The following two assumptions allow us to control the regularity of $\nu$ (the rate of convergence of the estimator $\hat{\lambda}_{n}$ depends on the regularity of $\nu$ , not on the regularity of $\lambda$ ).

e.

For any $y\in\mathbb{R}^{+}$ , $\lambda(y)<\infty$ . This ensures that $\nu$ and $\mathcal{P}$ are continuous with respect to the Lebesgue measure on $\mathbb{R}^{+}$ . 2. f.

There exists $\alpha>0$ such that:

•

$\forall K\subset\mathbb{R}^{+*}$ * compact, $\forall z\in\mathbb{R}^{+*}$ , the function $(\phi_{.}^{-1})^{\prime}(.)$ belongs to $H^{\alpha}([0,z]\times K)$ .*

•

$\forall K\subset\mathbb{R}^{+*}$ * compact, $\lambda\in H^{\alpha}(K)$ .*

•

The transition measure $Q$ can be written

[TABLE]

with, for any compact $K$ , $Q_{1}$ and $(p_{i})_{0\leq i\leq j_{Q}}$ in $H^{\alpha-1}(K)$ , and $(f_{i})_{1\leq i\leq j_{Q}}$ invertible functions such that $(f_{i}^{-1})_{1\leq i\leq j_{Q}}\in H^{\alpha}(K)$ .

If Assumption (S) is satisfied, for fixed flow $\phi$ and transition measure $Q$ , we can introduce the class of functions

[TABLE]

with $\mathfrak{s}=(\mathbf{a},\mathbf{l},\mathbf{L})\in(\mathbb{R}^{+})^{3}$ and the convex set

[TABLE]

is defined by the recurrence:

[TABLE]

The following lemmas are proved in the Appendix.

Lemma 4.

Under Assumptions A1 and (S)

a.

Assumption A2 is satisfied for $\mathbb{V}_{b}(x):=\exp\left(\mathbf{a}x^{b+1}\right)$ : there exists $R$ , $\gamma$ , for any function $|\psi|\leq\mathbb{V}_{b}$ ,

[TABLE]

recalling that the inequality $|\psi|\leq\mathbb{V}_{b}$ means that, for any $(y,z)\in(\mathbb{R}^{+})^{2}$ , $|\psi(y,z)|\leq\mathbb{V}_{b}(z)$ . 2. b.

Assumption A3 is satisfied. Moreover, there exists $\eta>0$ , $D_{0}>0$ such that

[TABLE]

Lemma 5.

If Assumptions A1 and (S) are satisfied, we can control the regularity of $\nu$ :

[TABLE]

*Remark**.*

In [24], the author introduces the set of functions $\mathcal{F}(\mathfrak{c},b)$ with very similar conditions. As she considers a transition measure $Q$ deterministic, the sets $\mathcal{F}(\mathfrak{c},b)$ and $\mathcal{E}(\mathfrak{s},b,\alpha)\cap H^{\alpha}$ may not be equal. In particular, if $\lambda\in\mathcal{E}(\mathfrak{s},b,\alpha)$ , then there exists $\mathfrak{c}$ such that $\lambda\in\mathcal{F}(\mathfrak{c},b)\cap H^{\alpha}$ . On the contrary, if $\lambda$ belongs to $\mathcal{F}(\mathfrak{c},b)\cap H^{\alpha}$ and the deterministic transition $f$ is $f(x)=\kappa x$ , then for $i_{1}$ large enough, there exists $\mathfrak{s}$ such that $\lambda\in\mathcal{E}(\mathfrak{s},b,\alpha)$ . This is no longer the case if, for instance, $f(x)\propto x^{\beta}$ . As the transition measure $Q$ is unknown, it is not possible to exploit its characteristics.

Another difference between the two sets is that $\lambda$ is estimated on the fixed interval $[i_{1},i_{2}]$ and the assumptions depends on $(i_{1},i_{2})$ , whereas in [24], the interval of estimation depends on the set $\mathcal{F}(\mathfrak{c},b)$ .

3 Estimation of the jump rate

3.1 The observation scheme

As in [3] and [24], the statistical inference is based on the observation scheme $(X(t),t\leq T_{n})$ and asymptotics are considered when the number of jumps of the process, $n$ , goes to infinity. Actually the simpler observation scheme: $(X(0),(X(T_{i^{-}}),X(T_{i})),1\leq i\leq n)=(Z_{0},(Y_{i},Z_{i}),1\leq i\leq n)$ is sufficient, as $\phi$ is known and one can remark that for all $n\geq 1$ , $T_{n}=\phi_{Z_{n-1}}^{-1}(Y_{n})$ .

3.2 Methodology

[24] and [3] construct a pointwise kernel estimator of $\nu$ before deriving an estimator of $\lambda$ . Indeed, densities are often approximated by kernels methods (see Tsybakov [30] for instance). If the kernel is positive, the estimator is also a density. However, we want to control the $L^{2}$ risk of our estimator (not the pointwise risk), and also to construct an adaptive estimator. Estimators by projection are well adapted for $L^{2}$ estimation: if they are longer to compute at a single point than pointwise estimators, it is sufficient to know the estimated coefficients to construct the whole function. Furthermore, to find an adaptive estimator, we minimize a function of the norm of our estimator, that is the sum of the square of the coefficients, and the dimension. That is the reason why we choose an estimation by projection.

We first aim at estimating $\nu$ on the compact set $\mathcal{I}$ . We construct a sequence of $L^{2}$ estimators by projection on an orthonormal basis. As usual in nonparametric estimation, their risks can be decomposed in a variance term and a bias term which depends of the regularity of the density function $\nu$ . We choose to use the Besov spaces (see Section A.4) to characterize the regularity, which are well adapted to $L^{2}$ estimation (particularly for the wavelet decomposition). The ”best” estimator is then selected by penalization. To construct the sequence of estimators, we introduce a sequence of vectorial subspaces $S_{m}$ . We construct an estimator $\hat{\nu}_{m}$ of $\nu$ on each subspace and then select the best estimator $\hat{\nu}_{\hat{m}}$ .

Assumption A 4.

**

a.

The subspaces $S_{m}$ are increasing and have finite dimension $D_{m}$ . 2. b.

The $L^{2}$ -norm and the $L^{\infty}$ -norm are connected:

[TABLE]

This implies that, for any orthonormal basis $(\varphi_{l})$ of $S_{m}$ ,

[TABLE] 3. c.

There exists a constant $\psi_{2}>0$ such that, for any $m\in\mathbb{N}$ , there exists an orthonormal basis $\varphi_{l}$ such that:

[TABLE] 4. d.

There exists $\mathbf{r}\in\mathbb{N}$ , called the regularity of the decomposition, such that:

[TABLE]

where $s_{m}$ is the orthogonal projection of $s$ on $S_{m}$ and $\mathbf{\mathit{B}}_{2,\infty}^{\alpha}$ is a Besov space (see Appendix A.4).

Conditions a, b and d are usual (see Comte et al. [12, section 2.3] for instance). They are satisfied for subspaces generated by wavelets, piecewise polynomials or trigonometric polynomials (see DeVore and Lorentz [15] for trigonometric polynomials and piecewise polynomials and Meyer [27] for wavelets). Condition c is necessary because we are not in the stationary case: it helps us to control some covariance terms. It is obviously satisfied for bounded bases (trigonometric polynomials), and localized bases (piecewise polynomials). Let us prove it for a wavelet basis. Let $\varphi$ be a father wavelet function, then $D_{m}=2^{m}$ and $\varphi_{l}(x)=2^{m/2}\varphi(2^{m}x-l)$ . We get that $\left\|\sum_{l=1}^{D_{m}}\left\|\varphi_{l}\right\|_{\infty}|\varphi_{l}(x)|\right\|_{\infty}\leq 2^{m}\left\|\varphi\right\|_{\infty}\left\|\sum_{l\in\mathbb{Z}}|\varphi(x-l)|\right\|_{\infty}$ . As $\varphi$ is at least 0-regular, for $m=2$ , there exists a constant $C$ such that $|\varphi(x)|\leq C(1+|x|^{-2})$ . Then $\sup_{x}\sum_{l\in\mathbb{Z}}|\varphi(x-l)|\leq C\sup_{x}\sum_{l\in\mathbb{Z}}(1+|x-l|^{-2})<\infty$ and condition c is satisfied.

3.3 Estimation of the stationary density

Let us now construct an estimator $\hat{\nu}_{m}$ of $\nu$ on the vectorial subspace $S_{m}$ . We consider an orthonormal basis $(\varphi_{l})$ of $S_{m}$ satisfying Assumption A4. Let us set

[TABLE]

The function $\nu_{m}$ is the orthogonal projection of $\nu$ on $L^{2}(\mathcal{I})$ . We consider the estimator

[TABLE]

Proposition 6.

If $D_{m}^{2}\leq n$ , under Assumptions A1-A2 and A4,

[TABLE]

where $C_{\lambda}=\frac{2R}{1-\gamma}\int\mathbf{V}_{\lambda}(z)\mu(dz)$ and $c$ depends explicitly on $\mathbf{V}_{\lambda}$ , $\gamma$ , $R$ .

When $m$ increases, the bias term decreases whereas the variance term increases. It is important to find a good bias-variance compromise. If $\nu$ belongs to the Besov space $\mathbf{\mathit{B}}_{2,\infty}^{\alpha}(\mathcal{I})$ , then $\left\|\nu_{m}-\nu\right\|_{L^{2}(\mathcal{I})}^{2}\leq C\left\|\nu\right\|_{B_{2,\infty}^{\alpha}(\mathcal{I})}D_{m}^{-2\alpha}$ (see Assumption A4 d). If $\alpha\geq 1/2$ , the risk is then minimum for $D_{m_{opt}}\propto n^{1/(2\alpha+1)}$ and we have, for some continuous function $\psi$ :

[TABLE]

This is the usual nonparametric convergence rate (see Tsybakov [30]). If $\alpha<1/2$ , then the risk is minimum for $D_{m}=n^{1/2}$ and the bias term is greater than the variance term. We can remark that a piecewise continuous function belongs to $B_{2,\infty}^{1/2}$ .

Let us now construct the adaptive estimator. We compute $(\hat{\nu}_{0},\ldots,\hat{\nu}_{m},\ldots)$ for $m\in\mathscr{M}_{n}=\{m,D_{m}^{2}\leq n\}$ . Our aim is to select automatically $m$ , without knowing the regularity of the stationary density $\nu$ . Let us introduce the contrast function $\gamma_{n}(s)=\left\|s\right\|_{L^{2}}^{2}-\frac{2}{n}\sum_{k=1}^{n}s(Y_{k})$ . If $s\in S_{m}$ , then we can write $s=\sum_{l}b_{l}\varphi_{l}$ and

[TABLE]

The minimum is obtained for $b_{l}=\hat{a}_{l}=\frac{1}{n}\sum_{k=1}^{n}\varphi_{l}(Y_{k})$ . Therefore

[TABLE]

As the subspaces $S_{m}$ are increasing, the function $\gamma_{n}(\hat{\nu}_{m})$ decreases when $m$ increases. To find an adaptive estimator, we need to add a penalty term $pen(m)$ . Let us set $pen(m)=\frac{48(\psi_{1}+C_{\lambda}\psi_{2})D_{m}}{n}+\frac{48c_{\lambda}\psi_{1}}{n}$ (or more generally $pen(m)=\frac{\sigma D_{m}}{n}+\frac{\sigma^{\prime}}{n}$ , with $\sigma\geq 48(\psi_{1}+C_{\lambda}\psi_{2})$ , $\sigma^{\prime}\geq 48c_{\lambda}\psi_{1}$ ) and choose

[TABLE]

We obtain an adaptive estimator $\hat{\nu}_{\hat{m}}$ .

Theorem 7 (Risk of the adaptive estimator).

Under Assumptions A1-A2 and A4, $\forall\sigma\geq 48(\psi_{1}+C_{\lambda}\psi_{2})$ , $\sigma^{\prime}\geq 48c_{\lambda}\psi_{1}$ , $pen(m)=\frac{\sigma D_{m}}{n}+\frac{\sigma^{\prime}}{n}$ ,

[TABLE]

where $c^{\prime}$ is a function of $(\mathbf{V}_{\lambda},R,\gamma,\left\|\nu\right\|_{L^{2}(\mathcal{I})})$ . We recall that $\mathscr{M}_{n}=\{m,D_{m}^{2}\leq n\}$ .

The estimator is adaptive: it realizes the best bias-variance compromise, up to a multiplicative constant. We have an explicit rate of convergence if $\nu$ belongs to some (unknown) Besov space $\mathbf{\mathit{B}}_{2,\infty}^{\alpha}$ : in that case,

[TABLE]

and if $\alpha\geq 1/2$ ,

[TABLE]

for some continuous function $\psi$ .

3.4 Estimation of the jump rate

By (6), we have

[TABLE]

where $\xi$ is the stationary measure of $(Z_{k},Y_{k+1})$ .

*Remark**.*

We notice that this formula is different as the one used in [24]

[TABLE]

where

[TABLE]

As in [24], the author works under the assumption that $Q(x,\{y\})={\mathchoice{\rm 1\mskip-4.0mul}{\rm 1\mskip-4.0mul}{\rm 1\mskip-4.5mul}{\rm 1\mskip-5.0mul}}_{\{y=f(x)\}}$ , the study was easier, here we need to consider the Markov chain $(Y_{k},Z_{k})_{k\in\mathbb{N}}$ .

To estimate the jump rate, we construct a quotient estimator. Let us consider the estimator

[TABLE]

where

[TABLE]

*Remark**.*

As the process $\{X(t)\}$ is observed continuously without errors, $\phi^{-1}$ (and therefore $(\phi^{-1})^{\prime}$ ) is known on $\cup_{k}[Z_{k-1},Y_{k}]$ so $\hat{\mathbf{D}}_{n}(y)$ is computable.

The estimator $\hat{\lambda}_{n}$ converges with nearly the same rate of convergence as $\hat{\nu}$ :

Theorem 8.

Under A1-A4, as soon as $\ln(n)^{-1}\leq D_{0}/2$ ,

[TABLE]

where

[TABLE]

The bias term depends of the regularity of the stationary density $\nu$ , not of the regularity of $\lambda$ . If we consider $\lambda$ and $\nu$ as functions of a Besov space, their regularities are not related: the Besov spaces are not stable by product (as they are subspaces of $L^{2}(\mathcal{I})$ ). We would like to link the rate of convergence of $\hat{\lambda}_{n}$ to the regularity of $\lambda$ rather than $\nu$ , at least when $\lambda\in\mathcal{E}(\mathfrak{s},b,\alpha)$ . In that case, $\lambda$ belong to some Hölder space, which is stable by product, composition and integration. See Appendix A.4 for the definition and properties of Besov and Hölder spaces. We obtain the following corollary:

Corollary 9.

Under A1, (S) and A4, as soon as $\ln(n)^{-1}\leq D_{0}/2$ , for any $\alpha\geq 1/2$ ,

[TABLE]

*Remark**.*

[24] obtain the same rate of convergence for a kernel estimator (with the regularity of $\lambda$ known).

3.5 Minimax bound for the estimator of the jump rate

We have proved that, under assumptions A1, (S) and A4,

[TABLE]

We would like to verify that our estimator converges with the minimax rate of convergence, i.e:

[TABLE]

The $\ln^{2}(n)$ factor comes from the quotient estimator, we can not expect it will stay in the minimax bound. Indeed, it is clear that one could replace $\ln^{-1}(n)$ in (11) by any function $w(n)$ greater than $D_{0}/2$ . The best estimator will be obtained of course by taking $w(n)=D_{0}/2$ and the risk of this estimator (unreachable as $D_{0}$ is unknown) will be proportional to $n^{-2\alpha/(2\alpha+1)}$ .

Theorem 10 (Minimax bound).

If A1, (S) and A4 are satisfied, then

[TABLE]

where the infimum is taken among all estimators.

4 Proofs

Lemmas 1, 3, 4 and 5 are proved in the Appendix.

4.1 Proof of Proposition 6

We have the following bias-variance decomposition:

[TABLE]

The estimator $\hat{\nu}_{m}$ (and therefore its expectation $\mathbb{E}_{z_{0}}\left(\hat{\nu}_{m}\right)$ ) belongs to the subspace $S_{m}$ . Then, by orthogonality

[TABLE]

The first terms are two terms of bias, the third is a variance term. Let us first bound the second term of bias. As the functions $(\varphi_{l})_{1\leq l\leq D_{m}}$ form an orthonormal basis of $S_{m}$ , we have

[TABLE]

By Lemma 1,

[TABLE]

As the $L^{2}$ and the $L^{\infty}$ -norms are connected (see Assumption A4 b), $\left\|\varphi_{l}\right\|_{\infty}^{2}\leq\psi_{1}D_{m}$ and, since $D_{m}^{2}\leq n$ , we get:

[TABLE]

Let us now consider the variance term. As the functions $(\varphi_{l})$ form an orthonormal basis of $S_{m}$ , the integrated variance of $\hat{\nu}_{m}$ is the sum of the variances of the coefficients $\hat{a}_{\lambda}$ :

[TABLE]

By Lemma 1, as $\int_{\mathbb{R}^{+}}\rho(x,dz)=\nu(x)$ , we get:

[TABLE]

By Assumptions A4 b and c, $\forall x$ , $\sum_{l=1}^{D_{m}}\varphi_{l}^{2}(x)\leq\psi_{1}D_{m}$ , $\sum_{l=1}^{D_{m}}\left\|\varphi_{l}\right\|_{\infty}|\varphi_{l}(x)|\leq\psi_{2}D_{m}$ and $\sum_{l=1}^{D_{m}}\left\|\varphi_{l}\right\|^{2}_{\infty}\leq\psi_{1}D_{m}^{2}\leq\psi_{1}n$ . Therefore:

[TABLE]

where $C_{\lambda}=\int G_{\lambda}(z)\mu(dz)=\frac{2R}{1-\gamma}\int_{\mathcal{I}}\mathbf{V}_{\lambda}(z)\mu(dz)$ and $c_{\lambda}$ depends only on $\mathbf{V}_{\lambda}$ , $R$ and $\gamma$ .

4.2 Proof of Theorem 7

The number of coefficients in the adaptive estimator is random. If we are still able to control easily the bias term, we can not simply control the variance of our estimator by adding the variances of its coefficients. For any $m\in\mathscr{M}_{n}$ , by definition of $\hat{m}$ (see (8) and (9)), we have the following inequality:

[TABLE]

with $\gamma_{n}(s)=\left\|s\right\|_{L^{2}(\mathcal{I})}^{2}-2n^{-1}\sum_{k=1}^{n}s(Y_{k})$ . Then

[TABLE]

We have that, for any function $s\in L^{2}(\mathcal{I})$ , $\left\|s\right\|_{L^{2}(\mathcal{I})}^{2}=\left\|s-\nu\right\|_{L^{2}(\mathcal{I})}^{2}-\left\|\nu\right\|_{L^{2}(\mathcal{I})}^{2}+2\int_{\mathcal{I}}s(x)\nu(x)dx$ . We apply this equality to $\hat{\nu}_{\hat{m}}$ and $\nu_{m}$ . Equation (13) becomes:

[TABLE]

The function $\hat{\nu}_{\hat{m}}-\nu_{m}$ belongs to the vectorial subspace $S_{\hat{m}}+S_{m}$ . Therefore:

[TABLE]

where $\mathscr{B}_{m,m^{\prime}}=\{s\in S_{m}+S_{m^{\prime}},\left\|s\right\|_{L^{2}(\mathcal{I})}=1\}$ . As the sequence $(S_{m})$ is increasing, $S_{m}+S_{m^{\prime}}$ is simply the largest of the two subspaces. By the inequality of arithmetic and geometric means,

[TABLE]

By the triangular inequality, $\left\|\hat{\nu}_{\hat{m}}-\nu_{m}\right\|_{L^{2}(\mathcal{I})}^{2}\leq 2\left\|\hat{\nu}_{\hat{m}}-\nu\right\|_{L^{2}(\mathcal{I})}^{2}+2\left\|\nu_{m}-\nu\right\|_{L^{2}(\mathcal{I})}^{2}$ , and:

[TABLE]

We can decompose the last term in a bias term and a variance term. Let us set:

[TABLE]

and $p(m,m^{\prime}):=(pen(m)+pen(m^{\prime}))/8$ . Then:

[TABLE]

By Assumption A4 b, $s\in\mathscr{B}_{m,\hat{m}}$ implies that $\left\|s\right\|_{\infty}^{2}\leq\psi_{1}(D_{m}+D_{\hat{m}})\leq 2\psi_{1}n^{1/2}$ (we recall that $D_{m}$ and $D_{\hat{m}}$ are smaller than $n^{1/2}$ ). Then by Lemma 1,

[TABLE]

It remains to bound $\mathbb{E}_{z_{0}}\left(\sup_{s\in\mathscr{B}_{m,\hat{m}}}I_{n}^{2}(s)-p(m,\hat{m})\right)_{+}$ . The unit ball $\mathscr{B}_{m,\hat{m}}$ is random. We can not bound $I_{n}^{2}(s)$ on it, we have to control the risk on the fixed balls $\mathscr{B}_{m,m^{\prime}}$ . We can write:

[TABLE]

The Markov chain $(Y_{1},\ldots,Y_{n})$ is exponentially $\beta$ -mixing with $\beta$ -mixing coefficient $\beta_{Y}(k)\leq c\gamma^{k}=ce^{-\ln(1/\gamma)k}$ . The following lemma is deduced from the Berbee’s coupling lemma and a Talagrand inequality. It is proved in the appendix.

Lemma 11 (Talagrand’s inequality for $\beta$ -mixing variables).

Let $Y_{1},\dots,Y_{n}$ be a Markov chain exponentially $\beta$ -mixing, with $\beta$ -mixing coefficient $\beta_{Y}(k)\leq ce^{-b_{0}k}$ . We choose $q_{n}:=c_{q}\ln(n)$ with $c_{q}\geq 2/b_{0}$ , $p_{n}=n/(2q_{n})$ . We have that $\beta_{Y}(q_{n})\leq c\gamma^{2\ln(n)}\lesssim n^{-2}$ . Let us consider

[TABLE]

If we can find a triplet ( $M_{2}$ , $V$ and $H$ ) such that:

[TABLE]

then we have:

[TABLE]

where $K_{1}$ , $K_{2}$ , $k_{1}$ and $k_{2}$ are universal constants.

For the sake of simplicity, let us set $D=D_{m}+D_{m^{\prime}}$ and $\mathscr{B}=\mathscr{B}_{m,m^{\prime}}$ . By Assumption A4 b,

[TABLE]

By Lemma 1,

[TABLE]

By Cauchy-Schwarz,

[TABLE]

and

[TABLE]

Then

[TABLE]

By Assumption A4 b, $\left\|s\right\|_{\infty}\leq\psi_{1}^{1/2}D^{1/2}$ , moreover, $\sup_{s\in\mathcal{B}}\left\|s\right\|_{L^{2}(\mathcal{I})}=1$ and then

[TABLE]

It remains to find $H$ such that $\mathbb{E}_{z_{0}}\left(\sup_{s\in\mathscr{B}}|I_{n}(s)|\right)\leq H/\sqrt{n}$ . Let us introduce $(\varphi_{l})_{1\leq l\leq D}$ an orthonormal basis of $S_{m}+S_{m^{\prime}}=S_{\max(m,m^{\prime})}$ satisfying Assumption A4. Then we can write $s=\sum_{l}b_{l}\varphi_{l}.$ As the function $s\rightarrow I_{n}(s)$ is linear:

[TABLE]

We can remark that $I_{n}(\varphi_{l})=\hat{a}_{l}-\mathbb{E}_{z_{0}}\left(\hat{a}_{l}\right)$ (see equation (14)) and by consequence, $\mathbb{E}_{z_{0}}\left(I_{n}^{2}(\varphi_{l})\right)=\operatorname{Var}_{z_{0}}\left(\hat{a}_{l}\right)$ . By (12):

[TABLE]

We can now apply Lemma 11 with

[TABLE]

For $p(m,m^{\prime})\geq 6(\psi_{1}+C_{\lambda}\psi_{2})D/n+6c_{\lambda}\psi_{1}/n$ , we get

[TABLE]

As $2/(x+y)\geq\min(1/x,1/y)$ ,

[TABLE]

and therefore

[TABLE]

where $(K_{i}^{\lambda})_{1\leq i\leq 3}$ and $(k_{i}^{\lambda})_{1\leq i\leq 2}$ depend on $(\mathbf{V}_{\lambda},R,\gamma,(\psi_{1},\psi_{2}),\left\|\nu\right\|_{L^{2}(\mathcal{I})})$ . The second term can be made smaller than $n^{-2}$ for $c_{q}$ large enough. The third is also smaller to $n^{-2}$ thanks to the exponential term. Then

[TABLE]

All the dimensions $D_{m,m^{\prime}}$ are different, so $\sum_{m^{\prime}\in\mathscr{M_{n}}}D_{m,m^{\prime}}e^{-cD_{m,m^{\prime}}^{1/2}}\leq\sum_{l=1}^{\infty}le^{-cl^{1/2}}<\infty$ . Moreover, as $\sup_{m^{\prime}\in\mathscr{M}_{n}}D_{m,m^{\prime}}\leq\sqrt{n}$ , $\sum_{m^{\prime}\in\mathscr{M}_{n}}D_{m,m^{\prime}}\leq\max_{m^{\prime}\in\mathscr{M}_{n}}D_{m,m^{\prime}}^{2}\leq n$ . Then by (17),

[TABLE]

Collecting (15), (16) and (18), for any $m\in\mathscr{M}_{n}$ :

[TABLE]

All the constants involved in the bound of $J_{n}^{2}$ and $I_{n}^{2}$ ( $M_{2}$ , $H$ , $V$ ) depends on $\mathbf{V}_{\lambda}$ , $\gamma$ , $R$ and $\left\|\nu\right\|_{L^{2}(\mathcal{I})}\leq\left\|\nu\right\|_{\mathbf{\mathit{B}}_{2,\infty}^{\alpha}(\mathcal{I})}$ . Then there exists an continuous function $\psi$ such that $x\to\psi(x,v,c,r)$ is increasing and

[TABLE]

4.3 Proof of Theorem 8

Let us first control $\mathbb{E}_{z_{0}}\left((\hat{\mathbf{D}}_{n}(y)-\mathbf{D}(y))^{2}\right)$ . As $\phi$ is a diffeomorphism, the function $\left(\phi_{x}^{-1}\right)^{\prime}(y)$ is bounded on $[0,i_{2}]\times\mathcal{I}$ . The function $s_{x,z}(y)=\left(\phi_{x}^{-1}\right)^{\prime}(y){\mathchoice{\rm 1\mskip-4.0mul}{\rm 1\mskip-4.0mul}{\rm 1\mskip-4.5mul}{\rm 1\mskip-5.0mul}}_{\{x\leq y\leq z\}}$ is bounded by a constant on $\mathcal{I}$ :

[TABLE]

We have that

[TABLE]

with $\xi$ the stationary density of $(Z_{k-1},Y_{k})$ introduced in Assumption A2. By Lemma 1, we have

[TABLE]

and therefore

[TABLE]

For $n$ large enough, $1/\ln(n)$ is smaller than $D_{0}/2$ ( $D_{0}$ is defined in Assumption A3) and then by Markov inequality,

[TABLE]

As $\nu$ is a positive function, $|\hat{\lambda}_{n}(y)-\lambda(y)|{\mathchoice{\rm 1\mskip-4.0mul}{\rm 1\mskip-4.0mul}{\rm 1\mskip-4.5mul}{\rm 1\mskip-5.0mul}}_{\{\hat{\nu}_{\hat{m}}(y)\geq 0\}}\leq|\hat{\lambda}_{n}(y)-\lambda(y)|$ and therefore, according to the definition of the estimator $\hat{\lambda}_{n}$ (see (11)),

[TABLE]

We can write:

[TABLE]

As $\mathbf{D}\geq D_{0}$ by Assumption A3:

[TABLE]

By (20) and (21),

[TABLE]

with $c_{\lambda}^{\prime}=\frac{\Phi_{1}^{2}}{D_{0}^{2}}(4+2C_{\lambda})(3\left\|\nu\right\|_{L^{2}(\mathcal{I})}^{2}+12\left\|\lambda\right\|_{L^{2}(\mathcal{I})}^{2})$ .

4.4 Proof of Theorem 10

We use the reduction scheme described in Tsybakov [30, chapter 2]. By Markov inequality,

[TABLE]

Our aim is to show that

[TABLE]

Instead of searching an infimum on the whole class $\mathcal{E}(\mathfrak{s},b,\alpha)$ , we can limit ourselves to the finite set $\{\lambda_{0},\ldots,\lambda_{P_{n}}\}\in\mathcal{E}(\mathfrak{s},b,\alpha)$ , such that

[TABLE]

Then

[TABLE]

We note $\psi^{*}$ the predictor

[TABLE]

By the triangular inequality, $\left\|\hat{\lambda}_{n}-\lambda_{j}\right\|_{L^{2}(\mathcal{I})}\geq\left\|\lambda_{\psi^{*}}-\lambda_{j}\right\|_{L^{2}(\mathcal{I})}-\left\|\lambda_{\psi^{*}}-\hat{\lambda}_{n}\right\|_{L^{2}(\mathcal{I})}$ .

Consequently, as $\left\|\hat{\lambda}_{n}-\lambda_{j}\right\|_{L^{2}(\mathcal{I})}\geq\left\|\hat{\lambda}_{n}-\lambda_{\psi^{*}}\right\|_{L^{2}(\mathcal{I})}$ ,

[TABLE]

By (22), $\left\|\lambda_{\psi^{*}}-\lambda_{j}\right\|_{L^{2}(\mathcal{I})}\geq 2C^{\prime}n^{-\alpha/(2\alpha+1)}{\mathchoice{\rm 1\mskip-4.0mul}{\rm 1\mskip-4.0mul}{\rm 1\mskip-4.5mul}{\rm 1\mskip-5.0mul}}_{\{\psi^{*}\neq j\}}$ . Then setting $A_{n}=C^{\prime}n^{-\alpha/(2\alpha+1)}$ ,

$\left\{\left\|\hat{\lambda}_{n}-\lambda_{j}\right\|_{L^{2}(\mathcal{I})}\geq C^{\prime}n^{-\alpha/(2\alpha+1)}\right\}\supseteq\left\{\psi^{*}\neq j\right\}$ and therefore:

[TABLE]

We denote by $\mathbf{P}^{\lambda_{j}}$ the law of $(Z_{0},Y_{1},Z_{1},\ldots,Y_{n},Z_{n})$ under $\lambda_{j}$ . The following lemma is exactly Theorem 2.5 of Tsybakov [30].

Lemma 12.

Let us consider a series of functions $\lambda_{0},\ldots,\lambda_{P_{n}}$ such that:

a.

The functions $\lambda_{i}$ are sufficiently apart: $\forall i\neq j$

[TABLE] 2. b.

For all $i$ , the function $\lambda_{i}$ belongs to the subspace $\mathcal{E}(\mathfrak{s},b,\alpha)$ . 3. c.

Absolute continuity: $\forall 1\leq j\leq P_{n}$ , $\mathbf{P}^{\lambda_{j}}<<\mathbf{P}^{\lambda_{0}}$ . 4. d.

The distance between the measures of probabilities is not too large:

[TABLE]

with $0<c<1/8$ , and $\chi^{2}(.,.)$ the $\chi$ -square divergence.

Then

[TABLE]

Step 1: Construction of $(\lambda_{0},\ldots\lambda_{P_{n}})$ .

Let us set

[TABLE]

with $\mathcal{J}=[j_{1},j_{2}]$ defined in (7). As $\lambda_{0}$ is constant on $\mathcal{J}$ , this function belongs to the Hölder space $H^{\alpha}(\mathcal{J})$ and $\left\|\lambda_{0}\right\|_{H^{\alpha}(\mathcal{J})}=\varepsilon$ (see Appendix A.4 for the definition of the Hölder space). It remains to ensure that it belongs to $\mathcal{E}(\mathfrak{s},b,\alpha)$ . If $\varepsilon>\mathbf{L}$ , then $\mathcal{E}(\mathfrak{s},b,\alpha)=\emptyset$ . If $\mathbf{L}=\varepsilon$ , then any function $\lambda\in\mathcal{E}(\mathfrak{s},b,\alpha)$ satisfies: $\forall x\in[i_{1},j_{2}],\lambda(x)=\lambda_{0}(x)$ . Let us assume that $\varepsilon<\mathbf{L}$ : in that case, there exists $\delta>0$ such that $\left\|\lambda_{0}\right\|_{H^{\alpha}(\mathcal{J})}\leq\mathbf{L}-\delta$ .

We consider a non-negative function $K\in H^{\alpha}(\mathbb{R})$ , bounded, with support in $[0,j_{2}-i_{1}[$ and such that $\left\|K\right\|_{L^{1}}\leq 1$ . We set $h_{n}=n^{-1/(2\alpha+1)}$ , $p_{n}=\lceil 1/h_{n}\rceil$ and, for $0\leq k\leq p_{n}-1$ , $x_{k}=i_{1}+h_{n}k(j_{2}-i_{1})$ . We consider the functions $\varphi_{k}(x):=ah_{n}^{\alpha}K\left((x-x_{k})/h_{n}\right)$ with $a<1$ . The functions $\varphi_{k}$ have support in $[x_{k},x_{k+1})\subset\mathcal{J}$ . Moreover, by a change of variable $y=(x-x_{k})/h_{n}$ , $\left\|\varphi_{k}\right\|_{L^{1}}=ah_{n}^{\alpha+1}\left\|K\right\|_{L^{1}}\leq ah_{n}^{\alpha+1}$ and $\left\|\varphi_{k}\right\|_{L^{2}}^{2}=a^{2}h_{n}^{2\alpha+1}\left\|K\right\|_{L^{2}}^{2}$ . We consider the set of functions

[TABLE]

The cardinal of $\mathscr{G}_{n}$ is $2^{p_{n}}$ . For two vectors $(\epsilon,\eta)$ with values in $\{0,1\}^{p_{n}}$ , the distance between two functions $\lambda_{\epsilon}$ and $\lambda_{\eta}$ is:

[TABLE]

As the series $\epsilon_{k}$ and $\eta_{k}$ have values in $\{0,1\}$ , the quantity

[TABLE]

is the Hamming distance between $\eta$ and $\epsilon$ . To apply Lemma 12, we need that, $\forall\eta\neq\epsilon$ ,

[TABLE]

This is not the case if we take the whole $\mathscr{G}_{n}$ (the minimal Hamming distance between two vectors $\epsilon$ and $\eta$ is 1). We need to extract a sub-series of functions. According to Tsybakov [30, Lemma 2.7] (bound of Varshamov-Gilbert), it is possible to extract a family $(\epsilon_{(0)},\ldots,\epsilon_{(P_{n})})$ of the set $\Omega=\{0,1\}^{p_{n}}$ such that $\epsilon_{(0)}=(0,\ldots,0)$ and

[TABLE]

As $p_{n}\geq n^{1/(2\alpha+1)}$ ,

[TABLE]

We define

[TABLE]

Then, for any $\lambda_{j},\lambda_{k}\in\mathscr{H}_{n}$ , if $j\neq k$ , as $p_{n}=\lceil 1/h_{n}\rceil$ , by (23),

[TABLE]

This is exactly the expected lower bound if we take $C^{\prime}=a\left\|K\right\|_{L^{2}}/(4\sqrt{2})$ .

Step 2: Functions $\lambda_{j}$ belong to $\mathcal{E}(\mathfrak{s},b,\alpha)$ .

We already know that $\lambda_{0}$ belongs to $\mathcal{E}(\mathfrak{s},b,\alpha)$ . Let us first compute the norm of $\lambda_{j}$ on $H^{\alpha}(\mathcal{J})$ . Let us set $\mathbf{r}=\lfloor\alpha\rfloor$ . We have that $(K(./h_{n}))^{(\mathbf{r})}=h_{n}^{-\mathbf{r}}K^{(\mathbf{r})}(./h_{n})$ . We compute the modulus of smoothness:

[TABLE]

and

[TABLE]

by the change of variable $z=t/h_{n}$ . The functions $\varphi_{k}$ have disjoint supports. For any $(x,y)\in\mathcal{J}$ , there exists $(i,j)$ such that $x\in[x_{i},x_{i+1}($ and $y\in[x_{j},x_{j+1}($ . Then

[TABLE]

Therefore

[TABLE]

and $|\lambda_{k}|_{H^{\alpha}(\mathcal{J})}\leq 2a|K|_{H^{\alpha}}$ . Moreover,

[TABLE]

and consequently $\left\|\lambda_{k}\right\|_{H^{\alpha}(\mathcal{J})}\leq\left\|\lambda_{0}\right\|_{H^{\alpha}(\mathcal{J})}+2a\left\|K\right\|_{H^{\alpha}}$ . Then $\lambda_{k}\in H^{\alpha}(\mathcal{J},\mathbf{L})$ for $a$ sufficiently small. It remains to check that $\lambda_{k}\in\mathcal{E}(\mathfrak{s},b,\alpha)$ . For any $0\leq k\leq P_{n}$ :

a.

As $K$ is non-negative, $\forall x\geq i_{1}$ , $\lambda_{k}(x)\geq\mathbf{a}\frac{x^{b}}{(b+1)\mathbf{m}(x)}$ . 2. b.

$\left\|\lambda_{k}\right\|_{H^{\alpha}(\mathcal{J})}\leq\mathbf{L}$ for $a$ small enough. 3. c.

$\int_{0}^{i_{1}}\lambda_{k}(u)\mathbf{M}(u)du=0\leq\mathbf{l}$ .

Therefore $\lambda_{k}\in\mathcal{E}(\mathfrak{s},b,\alpha)$ for $a$ small enough.

Step 3: Absolute continuity.

We denote by $\mathcal{P}_{j}$ the transition densities $\mathcal{P}_{\lambda_{j}}$ . As $(Z_{0},Y_{1},Z_{1},\ldots,Y_{n},Z_{n})$ is a Markov process,

[TABLE]

By (3), we can rewrite: $\mathcal{P}_{0}(x,y)=A_{x,y}\exp(-\tilde{A}_{x,y})$ where

[TABLE]

and $\mathcal{P}_{j}(x,y)=(A_{x,y}+B_{x,y})\exp(-\tilde{A}_{x,y}-\tilde{B}_{x,y})$ where $B_{x,y}=\sum_{k=1}^{m}\epsilon_{k}B^{k}_{x,y}$ , $\tilde{B}_{x,y}=\sum_{k=1}^{p_{n}}\epsilon_{k}\tilde{B}^{k}_{x,y}$ and

[TABLE]

The probability density $\mathbf{P}^{\lambda_{0}}$ is null if one of the $Q(y_{i},dz_{i})$ is null, if one of the indicator function ${\mathchoice{\rm 1\mskip-4.0mul}{\rm 1\mskip-4.0mul}{\rm 1\mskip-4.5mul}{\rm 1\mskip-5.0mul}}_{\{y_{i+1}\geq z_{i}\}}=0$ , or if one $y_{i}$ is smaller than $i_{1}$ ; then $\mathbf{P}^{\lambda_{j}}$ is absolutely continuous with respect to $\mathbf{P}^{\lambda_{0}}$ .

Step 4: The $\chi^{2}$ divergence.

As $\mathbf{P}^{\lambda_{0}},\mathbf{P}^{\lambda_{j}}$ are equivalent measures, we have:

[TABLE]

Let us set $E_{3}:=\chi^{2}(\mathbf{P}^{\lambda_{j}},\mathbf{P}^{\lambda_{0}})+1$ . We can write:

[TABLE]

As $Q$ is the transition density, for any $y_{n}$ , $\int_{\mathbb{R}^{+}}Q(y_{n},dz_{n})=1$ . Moreover, as $\int_{\mathbb{R}^{+}}\mathcal{P}_{0}(x,y)dy=\int_{\mathbb{R}^{+}}\mathcal{P}_{j}(x,y)dy=1$ ,

[TABLE]

This expression of the $\chi^{2}$ divergence enables us to approximate it more closely. Let us set

[TABLE]

As the support of $\varphi_{k}$ is included in $\mathcal{J}$ , we can remark that $B_{x,y}$ is null on $\mathcal{J}^{c}$ and

[TABLE]

We bound the $\chi^{2}$ -divergence differently on $\mathcal{J}$ and $\mathcal{J}^{c}$ : $DP=R_{1}+R_{2}$ where

[TABLE]

We have that $B_{x,y}^{k}\leq\mathbf{M}(y)\left\|\varphi_{k}\right\|_{\infty}{\mathchoice{\rm 1\mskip-4.0mul}{\rm 1\mskip-4.0mul}{\rm 1\mskip-4.5mul}{\rm 1\mskip-5.0mul}}_{\{y\geq x\}}{\mathchoice{\rm 1\mskip-4.0mul}{\rm 1\mskip-4.0mul}{\rm 1\mskip-4.5mul}{\rm 1\mskip-5.0mul}}_{\{y\in[x_{k},x_{k+1}(\}}$ and therefore, as $\left\|\varphi_{k}\right\|_{\infty}=ah_{n}^{\alpha}\left\|K\right\|_{\infty}$ ,

[TABLE]

By (26), we obtain, as the functions $\varphi_{k}$ are supported in $\mathcal{J}$ :

[TABLE]

and, as $p_{n}=\lceil 1/h_{n}\rceil$ ,

[TABLE]

Then by (30) and as $\int_{\mathbb{R}^{+}}A_{x,y}\exp(-\tilde{A}_{x,y})dy=1$

[TABLE]

As $\lambda_{0}=\varepsilon$ on $\mathcal{J}$ , we get by (25) that

[TABLE]

Moreover, on $\mathbb{R}^{+}$ , $\exp(-\tilde{A}_{x,y})\leq 1$ . Then by (29) and (30), we get that

[TABLE]

Therefore $DP=O(a^{2}h_{n}^{2\alpha})$ and, by (27) and (28), we get by recurrence

[TABLE]

As $h_{n}=n^{-\frac{1}{2\alpha+1}}$ ,

[TABLE]

By (24), $\ln(P_{n})\geq\ln(2)n^{1/(2\alpha+1)}/8$ and therefore,

[TABLE]

for $a$ small enough, which concludes the proof.

Acknowledgements

N. Krell was partly supported by the Agence Nationale de la Recherche PIECE 12-JS01-0006-01. The research of E. Schmisser was supported in part by the Labex CEMPI (ANR-11-LABX-0007-01)

Appendix A Technical proofs and results

A.1 Proof of Lemma 1

We consider a function $s$ such that $\left\|s\right\|_{\infty}=1$ ; we obtain the expected result by dividing $s$ by its $L^{\infty}$ -norm. According to Assumption A2,

[TABLE]

which proves the first inequality. Let us set $\tilde{s}(Y,Z)=s(Y,Z)-\mathbb{E}_{z_{0}}\left(s(Y,Z)\right)$ . We have:

[TABLE]

We notice that:

[TABLE]

by Assumption A2. Therefore

[TABLE]

Let us bound the last term of (31). We can remark that $(Z_{0},Y_{1},Z_{1},\ldots,Y_{k},Z_{k},\ldots)$ is an inhomogeneous Markov chain. Therefore, for any $(k<k^{\prime})$ , $\mathbb{E}\left(\left.s(Y_{k^{\prime}},Z_{k^{\prime}})\right|Y_{k},Z_{k}\right)=\mathbb{E}\left(\left.s(Y_{k^{\prime}},Z_{k^{\prime}})\right|Z_{k}\right)$ and by Assumption A2,

[TABLE]

Then

[TABLE]

As $|\tilde{s}(Y_{k},Z_{k})|\leq|s(Y_{k},Z_{k})|+\mathbb{E}_{z_{0}}\left(|s(Y_{k},Z_{k})|\right)$ ,

[TABLE]

By Assumption A2, for any function $|\psi|\leq\mathbf{V}_{\lambda}$ ,

[TABLE]

Then

[TABLE]

By (31) and (32), we get:

[TABLE]

where $C$ depends only on $R$ , $\mathbf{V}_{\lambda}$ and $\gamma$ and we recall that $G_{\lambda}(z)=\frac{R}{1-\gamma}\left(\mathbf{V}_{\lambda}(z)+\int\mathbf{V}_{\lambda}(u)\mu(du)\right)$ .

A.2 Proof of Lemma 3

Let $G$ be an event of $\mathscr{O}_{0}^{k}\times\mathscr{O}_{t+k}^{\infty}$ . Then $G$ is a disjoint reunion of events $E^{i}\cap F^{i,j}$ where

[TABLE]

with $J_{j}^{i}$ and $I_{l}^{i,j}$ subsets of $(\mathbb{R}^{+})^{2}$ and $1\leq\mathbf{n}<\infty$ . Then

[TABLE]

As $(Y_{k},Z_{k})_{k\in\mathbb{N}}$ is a Markov chain,

[TABLE]

To simplify the notations, let us set

[TABLE]

and $\mathcal{R}^{t}(x,dy,dz)=\mathbb{P}\left(\left.Y_{t}\in dy,Z_{t}\in dz\right|Z_{0}=x\right)$ . Then

[TABLE]

We regroup the $F^{i,j}$ :

[TABLE]

where $\psi(y,z):=\sum_{j}{\mathchoice{\rm 1\mskip-4.0mul}{\rm 1\mskip-4.0mul}{\rm 1\mskip-4.5mul}{\rm 1\mskip-5.0mul}}_{(y,z)\in I_{0}^{i,j}}\int_{I_{1}^{i,j}\times\ldots\times I_{\mathbf{n}}^{i,j}}\mathcal{R}_{\mathbf{n}}(z,dy^{\prime}_{1},dz^{\prime}_{1},\ldots,dy^{\prime}_{\mathbf{n}},dz^{\prime}_{\mathbf{n}})$ . We can remark that $\psi(y,z)=\sum_{j}{\mathchoice{\rm 1\mskip-4.0mul}{\rm 1\mskip-4.0mul}{\rm 1\mskip-4.5mul}{\rm 1\mskip-5.0mul}}_{y,z\in I_{0}^{i,j}}\mathbb{P}_{z}\left((Y_{1},Z_{1})\in I_{1}^{i,j},\ldots,(Y_{\mathbf{n}},Z_{\mathbf{n}})\in I_{\mathbf{n}}^{i,j}\right)$ and by the law of total probability, $\psi(y,z)\leq 1$ . We can apply Assumption A2 to the function $\psi$ :

[TABLE]

Then

[TABLE]

By Lemma 1,

[TABLE]

Therefore

[TABLE]

As $\gamma<1$ ,

[TABLE]

with $\beta=-\ln(\gamma)$ , $c=R\left(\int\mathbf{V}_{\lambda}(z)\mu(dz)+(1+R)\mathbf{V}_{\lambda}(z_{0})\right)$ .

A.3 Proof of Lemma 4

A.3.1 Assumption A2 is satisfied

Assumption (S)d implies Assumption A2 a. To prove Assumption A2 b and c, in analogy with [24], we apply the following result, which is Theorem 1.1 of Baxendale [7] written for a Markov chain on $\mathbb{R}^{2}$ and a finite measure instead of a probability.

Result (Sufficient conditions for ergodicity).

Let us consider $(Y_{k},Z_{k})_{k\geq 1}$ an homogeneous Markov chain on $(\mathbb{R}^{2},\mathscr{B}(\mathbb{R}^{2}))$ with transition probability $\tilde{R}$ . Under the following three conditions,

Minorization condition

There exist a set $\mathscr{C}\subset\mathbb{R}^{2}$ and a finite measure $\mathbf{s}$ such that $\forall(y_{1},z_{1})\in\mathscr{C},\forall A\in\mathscr{B}(\mathbb{R}^{2})$ ,

[TABLE]

Strong aperiodicity condition

$\mathbf{s}(\mathscr{C})>0$ .

Drift condition

There exists a function $\mathbf{V}:\mathbb{R}^{2}\to[1,\infty[$ and two constants $c<1$ , $K>0$ such that

[TABLE]

Then the process $\{(Y_{k},Z_{k}):k\geq 0\}$ is recurrent positive and strongly ergodic, and has a unique stationary probability measure $\rho$ .

Moreover, there exists $\gamma$ and $R$ depending only on $\mathbf{s}$ , $c$ and $K$ such that, for any function $\psi\leq\mathbf{V}$ ,

[TABLE]

Then Assumptions A2 b and c are satisfied.

Let us check that its three conditions (minorization, strong aperiodicity and drift) are satisfied. We need to control the transition density. As $(Z_{0},Y_{1},Z_{1},\ldots,)$ is an (inhomogeneous) Markov chain, let us note

[TABLE]

Let us set

[TABLE]

Minorization condition

Let us set $\mathscr{C}=\mathbb{R}^{+}\times[0,i^{\prime}_{1}]$ . For any $(y_{1},z_{1})\in\mathscr{C}$ , any $A\subseteq(\mathbb{R}^{+})^{2}$ , by Assumption (S)b, we have that

[TABLE]

By Assumption (S)d and c, for any $\lambda\in\mathcal{E}(\mathfrak{s},b,\alpha)$ ,

[TABLE]

and $\mathbf{s}_{\mathfrak{s}}(A)$ is a finite measure.

Strong aperiodicity condition

[TABLE]

For any $y\leq\kappa^{-1}i^{\prime}_{1}$ , by Assumption (S) a, $\int_{0}^{i^{\prime}_{1}}Q(y,dz)=\mathbb{P}\left(\left.Z_{1}\leq i^{\prime}_{1}\right|Y_{1}=y\right)=1$ . Therefore

[TABLE]

Then $\mathbf{s}_{\mathfrak{s}}(\mathscr{C})>0$ .

Drift condition

For any $(y_{1},z_{1})$ , as $(Y_{1},Z_{1},Y_{2},Z_{2})$ is an (inhomogeneous) Markov chain,

[TABLE]

where $\mathbb{V}_{b}(z)=\exp\left(\mathbf{a}z^{b+1}\right)$ . By Assumption (S)a, as $\mathbb{V}_{b}$ is an increasing function, $\mathbb{V}_{b}(Z_{2})>z\iff Z_{2}\geq\mathbb{V}_{b}^{-1}(z)\implies Y_{2}\geq\kappa^{-1}\mathbb{V}_{b}^{-1}(z)$ . Then by (2) and Assumption (S)b,

[TABLE]

Let us make the change of variable $y=\kappa^{-1}\mathbb{V}_{b}^{-1}(z)$ , then $dz=\kappa\mathbb{V}_{b}^{\prime}(\kappa y)dy$ and

[TABLE]

Let us first bound this quantity for $z_{1}\geq i_{1}$ . By Assumption (S)c, for any $z_{1}\geq i_{1}$ ,

[TABLE]

As $\mathbb{V}_{b}(y)=\exp\left(\mathbf{a}y^{b+1}\right)$ , $\mathbb{V}_{b}^{\prime}(\kappa y)=\mathbf{a}(b+1)\kappa^{b}y^{b}\exp\left(\mathbf{a}\kappa^{b+1}y^{b+1}\right)$ and, for any $z_{1}\geq i_{1}$ ,

[TABLE]

We have that

[TABLE]

Therefore, for any $z_{1}\geq i^{\prime}_{1}$ , as $\mathbb{V}_{b}$ is an increasing function,

[TABLE]

Then

[TABLE]

Moreover, by (34),

[TABLE]

and by (35),

[TABLE]

Therefore the three conditions (minorization, strong aperiodicity and drift) are satisfied, which gives Assumption A2.

A.3.2 Assumption A3 is satisfied

It remains to prove that Assumption A3 is satisfied. We recall that

[TABLE]

By equation (5), for any $y\in\mathcal{I}$ ,

[TABLE]

By equation (2), Assumption (S)b and d, for any $y\in\mathcal{I}$ ,

[TABLE]

Then

[TABLE]

It remains to bound $\mu([0,i_{1}])$ away from 0.

As $\mu$ is the stationary density of $(Z_{k})$ , $\mu(]z,\infty])=\mathbb{P}_{\mu}\left(Z_{1}>z\right)$ . Therefore, by Markov inequality, as $\mathbb{V}_{b}$ is an increasing function,

[TABLE]

By Lemma 4 a,

[TABLE]

As $\sup_{\lambda\in\mathcal{E}(\mathfrak{s},b,\alpha)}\mathbb{E}_{\mu}\left(\mathbb{V}_{b}(Z_{1})\right)<\infty$ , and $\mathbb{V}_{b}$ is an increasing function, there exists $y_{0}>0$ , $\sup_{\lambda\in\mathcal{E}(\mathfrak{s},b,\alpha)}\mu(]y_{0},\infty[)<1$ and consequently, $\inf_{\lambda\in\mathcal{E}(\mathfrak{s},b,\alpha)}\mu([0,y_{0}])>0$ . Let us consider the sequence

[TABLE]

where $z_{k_{n}-1}<y_{0}\leq z_{k_{n}}$ . We can remark that

[TABLE]

As $\mu$ is the stationary density, for any $z>0$ ,

[TABLE]

As $\mathbb{P}\left(Z_{1}\leq\kappa Y_{1}\right)=1$ , $\mathbb{P}\left(\left.Z_{1}\leq z\right|Z_{0}=x\right)\geq\mathbb{P}\left(\left.Y_{1}\leq\kappa^{-1}z\right|Z_{0}=x\right)$ and by (2),

[TABLE]

as $\kappa^{-1}z_{j}\geq z_{j+1}$ . By Assumption (S)b and c, $\lambda$ and $(\phi_{x}^{-1})^{\prime}$ are bounded by below and there exists a constant $\eta$ such that

[TABLE]

Therefore, as $\kappa^{-1}z_{j}=\kappa^{-1/2}z_{j+1}$ ,

[TABLE]

Let us set $c_{j}=\left(1-\exp(-\eta(\kappa^{-1}z_{j}(1-\sqrt{\kappa}))\right)$ . We can note that

[TABLE]

and in particular, $\mu([0,i_{1}])(1+c_{0})\geq c_{0}\mu([0,z_{1}]].$ By recurrence, we obtain:

[TABLE]

Then by (37)

[TABLE]

which concludes the proof.

A.4 Besov and Hölder spaces

Definition 13 (Modulus of continuity).

The modulus of continuity is defined by

[TABLE]

If $f$ is Lipschitz, the modulus of continuity is proportional to $t$ . If $\omega(f,t)=o(t)$ , then $f$ is constant: the modulus of continuity can not measure higher smoothness.

Definition 14 (Modulus of smoothness).

If $f$ is a function on $\mathcal{A}$ , we define its modulus of smoothness by

[TABLE]

We can remark that if $f$ is $C^{\mathbf{r}}$ , then

[TABLE]

In particular, if $f\in C^{r}(\mathcal{A})$ with $\mathcal{A}$ compact and if $f^{({\mathbf{r}}+1)}$ is Lipschitz, then $\omega_{\mathbf{r}+1}(f,t)_{p}=O(t^{\mathbf{r}+1})$ . If $f^{(\mathbf{r})}$ is $(\alpha-\mathbf{r})$ -Hölder-continuous, that is if $\forall x,y\in\mathcal{A}$ , $|f^{(\mathbf{r})}(x)-f^{(\mathbf{r})}(y)|\leq C|x-y|^{\alpha-\mathbf{r}}$ , then

[TABLE]

If $f^{(\mathbf{r})}$ is piecewise-continuous and $(\alpha-\mathbf{r})$ -Hölder on the points of continuity, then

[TABLE]

The modulus of continuity and the modulus of smoothness are sub-linears:

[TABLE]

Definition 15 (Besov space).

The Besov space $\mathbf{\mathit{B}}_{2,\infty}^{\alpha}(\mathcal{A})$ is the set of functions:

[TABLE]

where $\mathbf{r}=\lfloor\alpha\rfloor$ . The norm is defined by: $\left\|f\right\|_{B_{2,\infty}^{\alpha}}:=\sup_{t>0}t^{-\alpha}\omega_{\mathbf{r}+1}(f,t)_{2}+\left\|f\right\|_{L^{2}(\mathcal{A}}$ . We denote $\mathscr{B}_{2,\infty}^{\alpha}(\mathcal{A},M_{1})=\{f\in\mathscr{B}_{2,\infty}^{\alpha}(\mathcal{A}),\left\|f\right\|_{B_{2,\infty}^{\alpha}(\mathcal{A})}\leq M_{1}\}$ .

See DeVore and Lorentz [15] and Meyer [27] for more details. We use the Besov space to control the risk of the estimator of the stationary density $\nu$ .

Definition 16 (Hölder space).

The Hölder space is the set of functions:

[TABLE]

where $\mathbf{r}=\lfloor\alpha\rfloor$ . We note $|f|_{H^{\alpha}(\mathcal{A})}:=\sup_{t>0}t^{\mathbf{r}-\alpha}\omega(f^{(\mathbf{r})},t)_{\infty}$ and define the norm of the Hölder space $\left\|f\right\|_{H^{\alpha}(\mathcal{A})}=|f|_{H^{\alpha}(\mathcal{A})}+\left\|f\right\|_{L^{\infty}(\mathcal{A})}$ and $H^{\alpha}(\mathcal{A},M_{1})=\{f\in H^{\alpha}(\mathcal{A}),\left\|f\right\|_{H^{\alpha}(\mathcal{A})}\leq M_{1}\}$ .

As noted before, $t^{\mathbf{r}-\alpha}\omega(f^{(\mathbf{r})},t)_{\infty}=t^{-\alpha}\omega_{\mathbf{r}}(f,t)_{\infty}$ : the Hölder space $H^{\alpha}(\mathcal{A})$ is included in $\mathbf{\mathit{B}}_{\infty,\infty}^{\alpha}(\mathcal{A})$ which itself is included in $\mathbf{\mathit{B}}_{2,\infty}^{\alpha}(\mathcal{A})$ . We can remark that if a function is $C^{\mathbf{r}}$ and piecewise $C^{\mathbf{r}+1}$ , it belongs to $\mathbf{\mathit{B}}_{2,\infty}^{\mathbf{r}+1/2}$ but only to $H^{\mathbf{r}}$ .

A.5 Proof of Lemma 5

As $\boldsymbol{\nu}$ is the stationary distribution of $(Y_{k})$ , by (4) and (3), we have, for all $y\in\mathcal{J}$ :

[TABLE]

with

[TABLE]

As the Hölder spaces are stables by multiplication, composition and integration, $\Lambda$ has the same regularity than $\lambda$ and $(\phi_{x}^{-1})^{\prime}$ . We have that

[TABLE]

Let us set $\mathcal{Q}_{g}(y)=\int_{\mathbb{R}^{+}}\int_{0}^{y}g(x)Q(x,dz)dx.$ If $\mathcal{Q}_{\nu}$ is differentiable, we get:

[TABLE]

and if $\mathcal{Q}_{\nu}$ belongs to $C^{\mathbf{r}}$ , there exist $(c_{k_{1},k_{2}})_{k_{1}+k_{2}\leq\mathbf{r}-1}\in\mathbb{R}$ such that :

[TABLE]

It remains to study the regularity of the function $\mathcal{Q}_{\nu}$ .

We consider some particular transition measures $Q$ in order to understand how the regularity of $\mathcal{Q}_{\nu}$ (and $\nu$ ) depends on the form and the regularity of $Q$ .

Continuous transition measure

There exists a function $Q_{1}$ such that $Q(x,dy)=Q_{1}(x,y)dy$ , and we can write

[TABLE]

Moreover, as $Q_{1}(x,y)=0$ if $x<y$ , with $\mathcal{I}=[i_{1},i_{2}]$ , we get

[TABLE]

Furthermore, by definition of the Hölder semi-norm, for $\mathbf{r}=\lfloor\alpha\rfloor$

[TABLE]

Then $\left\|\mathcal{Q}_{\nu}\right\|_{H^{\alpha}(\mathcal{I})}\leq i_{2}\left\|Q_{1}\right\|_{H^{\alpha-1}([i_{1},\infty[\times\mathcal{I})}$ .

Deterministic transition measure

Let us assume that $Q$ can be written $Q(x,dy)=\delta_{f(x)}(dy)$ with $f$ a bijection. As $\mathbb{P}\left(Z\leq\kappa Y\right)=1$ , $f(0)=0$ . Then we have that

[TABLE]

If $f^{-1}$ is differentiable:

[TABLE]

So we get:

[TABLE]

The regularity of $\nu^{\prime}$ on $\mathcal{I}$ depends on the regularity of $\nu$ on $f^{-1}(\mathcal{I})$ and of $\Lambda$ and $f^{-1}$ on $\mathcal{I}$ . By recurrence, there exists a function $\psi_{2}$ such that

[TABLE]

where

[TABLE]

If $f$ is not a bijection (and $f(x)\neq 0$ ), then $\nu$ can be less regular than $\lambda$ . Let us consider $f(x)=\lfloor x/2\rfloor$ . Then

[TABLE]

Then $\nu$ is a piecewise constant function and is not differentiable. We can remark that

[TABLE]

is not differentiable.

If $Q(x,dy)=\delta_{0}(y)$ (which implies that the vectors $(Z_{k},Y_{k})$ are independent), then $\nu(y)=\int_{\mathbb{R}^{+}}\nu(x)\Lambda(0,y)dx=\Lambda(0,y)$ has the same regularity as $\Lambda$ . We can remark that $\mathcal{Q}_{\nu}(y)=\int_{\mathbb{R}^{+}}\nu(x)=1$ is $C^{\infty}$ .

General case

Under Assumption (S),

[TABLE]

with $(f_{i})$ invertible, therefore

[TABLE]

and

[TABLE]

Therefore, there exists a function $\psi_{2}$ such that

[TABLE]

As $\lambda\in H^{\alpha}(\mathcal{J})$ and $\forall x,(\phi_{x}^{-1})^{\prime}\in H^{\alpha}(\mathcal{J})$ , then $\forall x$ , $\Lambda(z,.)\in H^{\alpha}(\mathcal{J})$ and there exists a continuous function $\psi_{1}$ such that

[TABLE]

which ends the proof.

A.6 Proof of Talagrand’s inequality for beta-mixing variables

The following lemma is very useful to replace weak dependent variables by variables which are independent by blocks. It is proved by Viennet [31, proof of Proposition 5.1].

Lemma 17 (Berbee’s coupling lemma).

The random variables $\{Y_{k}\}_{k\in\mathbb{N}}$ are exponentially $\beta$ -mixing. Let us set $q_{n}=\lfloor(r+1)\ln(n)/\beta\rfloor$ where $\beta$ characterizes the $\beta$ -mixing coefficient (see Definition 2). We have that $\beta(q_{n})\leq 1/n^{r+1}$ . We set $p_{n}=n/(2q_{n})$ . There exist random vectors $(Y_{1}^{*},\ldots,Y_{n}^{*})$ such that:

•

$Y_{i}$ * and $Y_{i}^{*}$ have same law.*

•

The random vectors $(Y_{2kq_{n}+1}^{*},\ldots,Y_{(2k+1)q_{n}}^{*})_{0\leq k<p_{n}}$ are independent, as the random vectors

$(Y_{(2k+1)q_{n}+1}^{*},\ldots,Y_{(2k+2)q_{n}}^{*})_{0\leq k<p_{n}}$ .

•

For any integer $k$ , $0\leq k\leq 2p_{n}-1$ , $\mathbb{P}\left(Y_{kq_{n}+1},\ldots,Y_{(k+1)q_{n}})\neq(Y^{*}_{kq_{n}+1},\ldots,Y^{*}_{(k+1)q_{n}})\right)\leq\beta_{Y}(q_{n})\leq n^{-(r+1)}$ .

Let us set $\Omega^{*}=\{\omega,\forall k,Y_{k}=Y_{k}^{*}\}$ . Then

[TABLE]

This following inequality comes from Talagrand’s inequalities (see Birgé and Massart [8, Corollary 2 p354]).

Lemma 18 (Talagrand’s inequality).

Let $X_{1},\ldots,X_{n}$ be independent random variables and $S$ a vectorial subspace of finite dimension $D$ satisfying Assumption 4. We denote by $\mathscr{F}$ a countable family of $S$ . Let us set

[TABLE]

with $u\in L^{2}$ . If

[TABLE]

then

[TABLE]

where $C$ is a universal constant and $k_{2}=(\sqrt{2}-1)/(21\sqrt{2})$ .

Proof of lemma 18.

We apply Theorem 1.1 of Klein and Rio [23] to the functions $s^{i}(u)=\frac{u(Y_{i})-\mathbb{E}_{z_{0}}\left(u(Y_{i})\right)}{2M_{2}}$ (notation used in Theorem 1.1 of Klein and Rio [23]). We obtain that

[TABLE]

We modify this inequality following Corollary 2 of Birgé and Massart [8]. It gives:

[TABLE]

The end of the proof is done in Comte and Merlevède [11, p222-223]. ∎

Proof of lemma 11.

To deduce lemma 11, we simply apply the Berbee’s coupling lemma to exponential $\beta$ -mixing variables, and then the Talagrand’s inequality. Indeed, by Berbee’s coupling lemma, as $Y_{k}^{*}$ and $Y_{k}$ have same law:

[TABLE]

We first bound the second part of the sum $I_{2}(s):=\frac{1}{n}\sum_{k=1}^{n}s(Y_{k})-s(Y_{k}^{*})$ . We have:

[TABLE]

By Cauchy-Schwartz, $I_{2}^{2}(s)\leq\frac{4M_{2}^{2}}{n}\sum_{k=1}^{n}{\mathchoice{\rm 1\mskip-4.0mul}{\rm 1\mskip-4.0mul}{\rm 1\mskip-4.5mul}{\rm 1\mskip-5.0mul}}_{\{Y_{k}\neq Y_{k}^{*}\}}$ and by Berbee’s coupling lemma,

$\mathbb{E}_{z_{0}}\left(\sup_{s\in\mathscr{B}}I_{2}(s)\right)\leq\frac{4M_{2}^{2}}{n^{2}}.$

Let us now bound the first term $I_{1}(s):=\frac{1}{n}\sum_{k=1}^{n}s(Y_{k}^{*})-\mathbb{E}_{z_{0}}\left(s(Y_{k}^{*})\right)$ . We have

[TABLE]

where $X_{j,i}:=\left(Y^{*}_{2(j+i)q_{n}+1},\ldots,Y^{*}_{(2(j+i)+1)q_{n}}\right)$ and $u_{s}(x_{1},\ldots,x_{q_{n}}):=\frac{1}{q_{n}}\sum_{k=1}^{q_{n}}s(x_{k})$ . The random variables $X_{j,0}$ are independent, the same can be said for $X_{j,1}$ . Moreover, $|X_{j,i}|\leq M_{2}$ and $\operatorname{Var}_{z_{0}}\left(X_{j,i}\right)\leq V$ . Let us set

[TABLE]

We have: $I_{1}(s):=(I_{n,0}^{*}(s)+I_{n,1}^{*}(s))/2$ . Then,

[TABLE]

As the dimension of $S$ is finite, we can find a countable family $\mathscr{F}$ dense in $\mathscr{B}$ and we can then apply the Talagrand’s inequality to $I_{n,0}^{*}$ and $I_{n,1}^{*}$ which concludes the proof.

∎

Appendix B: Simulations

For the simulations, two very classical PDMP processes are considered: the TCP and the size of a marked bacteria.

TCP protocol.

The transmission control protocol (TCP) is one of the main data transmission protocol in the Internet. The maximum number of packets that can be sent at time $t_{k}$ in a round is a random variable $X_{t_{k}}$ . If the transmission is successful, then the maximum number of packets is increased by one: $X_{t_{k+1}}=X_{t_{k}}+1$ . If the transmission fails, then $X_{t_{k+1}}=\kappa X_{t_{k}}$ with $\kappa\in(0,1)$ . A correct scaling of this process leads to a piecewise deterministic Markov process $(X_{t})$ with the characteristics:

[TABLE]

Then the function $(\phi_{x}^{-1})^{\prime}$ is constant: $(\phi_{x}^{-1})^{\prime}=1/c$ . Let us denote by $\Lambda$ a primitive of $\lambda$ . By (2), we have:

[TABLE]

As $\lambda$ is positive, its primitive is invertible and by a change of variable:

[TABLE]

Then $\Lambda(Y_{j})|\Lambda(Z_{j-1})$ follows an exponential law translated by $\Lambda(Z_{j-1})$ and of parameter $1/c$ . Therefore, if we can find the inverse of the function $\Lambda$ , we can construct the sequence $(Y_{j},Z_{j})$ by recurrence:

[TABLE]

where $E_{j}$ are i.i.d. of law $\mathscr{E}(1)$ .

If $\lambda(x)=\lambda x^{\delta}$ with $\delta>-1$ , then $Y_{j}^{\delta+1}=Z_{j-1}^{\delta+1}+c(\delta+1)/\lambda E_{j}$ and we obtain

[TABLE]

This model satisfies Assumption (S). In order to have a model with a non-increasing function $\lambda$ , we also consider the function $\lambda(x)=(x-a)^{2}+b$ with $a>0$ , $b\geq 0$ . In that case, by (38),

[TABLE]

and, by Cardan’s formula, this equation has a unique real solution, which is

[TABLE]

where $Q=3cE_{j}+(Z_{j-1}-a)^{3}+3b(Z_{j-1}-a)$ . This model also satisfies Assumption (S).

Bacterial growth.

We choose randomly a bacteria, and follow its growth, until it divides in two parts more or less equal. Then we choose randomly one of its daughter, and so on. Between the jumps, the bacteria grows exponentially. During a jump, the size of the bacteria is more or less divided by two. We model this by setting $Z_{k}=Y_{k}\times U_{k}$ , where $U_{k}$ is a random variable independent of $Y_{k}$ , in $(0,1)$ , and centered in $1/2$ . The Beta distribution $\beta(\alpha,\alpha)$ satisfies these conditions. For $\alpha=1$ , it is the uniform distribution, and when $\alpha$ increases, the distribution is more concentrated around $1/2$ . We choose $\alpha=20$ . Then

[TABLE]

Then $(\phi_{x}^{-1})^{\prime}(y)=\frac{1}{y}$ and by (2),

[TABLE]

We need to find a primitive of $\lambda(x)/x$ . If $\lambda(x)=\lambda x^{\delta}$ , $\delta>0$ , then:

[TABLE]

Therefore

[TABLE]

and the law of the random variable $Y_{k}^{\delta}$ is an exponential translated by $Z_{k-1}^{\delta}$ and of parameter $\lambda/\delta c$ . Then

[TABLE]

with $E_{k}\sim\mathscr{E}(1)$ i.i.d. and $U_{k}\sim\beta(20,20)$ i.i.d. All the conditions of Assumption (S) are satisfied, except point a. Indeed, $\mathbb{P}\left(Z<Y\right)=1$ , but there do not exists any $\kappa<1$ such that $\mathbb{P}\left(Z\leq\kappa Y\right)=1$ . However, in the simulations, it seems that the process is ergodic and that A3 is satisfied.

Computations

For the two models, $\nu$ has a density with respect to the Lebesgue measure on $\mathbb{R}$ , so it can be estimated on any compact interval $\mathcal{A}$ , here $\mathcal{A}=[-1,5]$ to avoid edge effects. The estimator is computed thanks to a projection on a trigonometric basis. The constant involved in $pen(m)$ , cpen, should be greater than $\frac{3}{2}\left(\psi_{1}+\psi_{2}C_{\lambda}\right)$ , with $\psi_{1}=\psi_{2}=\frac{1}{3}$ . The problem is that $C_{\lambda}$ , a correlation term, is not easily tractable. We set cpen= $\psi_{1}+\psi_{2}=2/3$ for all models. This choice seems confirmed by the simulations results: the oracle $or$ remains close to 1.

The constant cpen could be determined via the slope heuristic. Indeed, if the constant in the penalty is too small, the algorithm selects the maximal dimension. If the penalty is large enough, it selects models of reasonable size. We then let the constant $c$ in the penalty vary and note the dimension selected. For $c$ smaller than a value $c_{min}$ , the largest models are selected, and for $c$ greater than $c_{min}$ , smaller models are chosen. The ”best” constant is $c=2c_{min}$ . See Arlot and Massart [1] for instance.

Figure 2 shows the selected dimension with respect to cpen, the constant in the penalty. When the constant in the penalty increases, the chosen dimension first decreases very rapidly, until cpen= $0.24$ , then it decreases very slowly towards 1. Then $2c_{min}=0.48$ . Our chosen penalty constant, $2/3$ , is a little greater than $2c_{min}$ , and selects the same dimension (here 17).

However, the slope heuristic involves quite a lot of computations, so it can not be used for every simulation, only to check that the penalty constant is coherent.

In Figures 3 and 4, for each graph, five simulations of the PDMP with $n=10^{5}$ are realized. For each simulation, the estimator $\hat{\lambda}$ , the density $\hat{\nu}_{\hat{m}}$ and $\hat{\mathbf{D}}_{n}$ are drawn.

In the tables, 200 simulations for each 4-tuple $(n,c,\kappa,\lambda)$ are computed. The estimation interval $\mathcal{I}=[0.5,2]$ is such that $\mathbf{D}$ is greater than the threshold $(\ln(n))^{-1}$ on $\mathcal{I}$ for $n=10^{-5}$ for all our models. For each set of parameters, the mean of the selected dimension $\hat{D}_{m}$ , the mean and the standard variation of the $L^{2}$ error on $\mathcal{I}$ , denoted by ”risk” and ”sd” are calculated. We also want to prove that our estimator is truly adaptive. As $\nu$ is unknown, we can not check that $\hat{m}$ is the better choice for estimating $\nu$ . Instead, let us consider the estimator

[TABLE]

Then $\hat{\lambda}_{n}=\hat{\lambda}_{\hat{m}}.$ The optimal dimension is

[TABLE]

and the minimal risk $\left\|\hat{\lambda}_{m_{opt}}-\lambda\right\|_{L^{2}(\mathcal{I})}^{2}$ . In the tables, we give the empirical means of $D_{\hat{m}}$ , $D_{m_{opt}}$ , the empirical mean and standard deviation of the risk and the empirical mean of the oracle

[TABLE]

In Figure 5, four simulations are realized, each for a different value of $n$ ( $n=10^{2}$ , $10^{3}$ , $10^{4}$ and $10^{5}$ ) in order to show the convergence of our estimator.

Results

In Figures 3-4, the estimator $\hat{\lambda}$ is very close to $\lambda$ , at least when $x$ is neither too small nor too large, that is when there are enough values to compute the estimator. The estimators $\hat{\nu}_{\hat{m}}$ and $\hat{\mathbf{D}}_{n}$ are quite smooth, whereas $\hat{\lambda}$ tends to oscillate. This is due to the division of two estimators. In Tables 3-4, the risk decreases when $n$ increases and seems to tend toward 0. The oracle remains close to 1, our estimator is really adaptive. When the number of observations is small, the risk may seem quite important (for instance, for figure 4 when $\lambda(x)=x^{2}$ ). This is simply because $\mathbf{D}$ is smaller than the threshold ( $1/\ln(10^{2})=0.2$ ), and the estimator $\hat{\lambda}$ is set to 0 on some part of $\mathcal{I}$ , or even on the whole interval. The estimation near 0 can be good for some models, for instance when $\kappa=1/5$ and $\lambda(x)=x$ , because the random variables $Z_{k}$ take smaller values (at a jump, we divide the process by 5 instead of by 2). The function $\mathbf{D}$ then take higher values near 0, and the estimator $\hat{\lambda}$ is positive even for small values of $x$ . This problem is illustrated in Figure 5: when $n$ increases, the estimator is better both because the support interval of $\hat{\lambda}$ increases and because on the support interval, the estimator is closer to the true function.

Bibliography31

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Arlot and Massart [2009] S. Arlot and P. Massart. Data-driven calibration of penalties for least-squares regression. Journal of Machine Learning Research , 10:245–279, 2009.
2Azaïs and Genadot [2018] R. Azaïs and A. Genadot, A new characterization of the jump rate for piecewise-deterministic Markov processes with discrete transitions, Comm. Statist. Theory Methods , 47, (8):1812–1829, 2018
3Azaïs and Muller-Gueudin [2016] R. Azaïs and A. Muller-Gueudin. Optimal choice among a class of nonparametric estimators of the jump rate for piecewise-deterministic Markov processes. Electron. J. Stat. , 10(2):3648–3692, 2016. ISSN 1935-7524.
4Azaïs et al. [2014] R. Azaïs, J.-B. Bardet, A. Génadot, N. Krell, and P.-A. Zitt. Piecewise deterministic Markov process—recent results. In Journées MAS 2012 , volume 44 of ESAIM Proc. , pages 276–290. EDP Sci., Les Ulis, 2014.
5Azaïs et al. [2014] R. Azaïs, F. Dufour and A. Gégout-Petit, Non-parametric estimation of the conditional distribution of the interjumping times for piecewise-deterministic Markov processes, Scandinavian Journal of Statistics. Theory and Applications , 41, (4):950–969, 2014.
6Barron et al. [1999] A. Barron, L. Birgé, and P. Massart. Risk bounds for model selection via penalization. Probab. Theory Related Fields , 113(3):301–413, 1999. ISSN 0178-8051.
7Baxendale [2005] P. Baxendale Renewal theory and computable convergence rates for geometrically ergodic Markov chains, The Annals of Applied Probability , 15, (1B):700–738, 2005.
8Birgé and Massart [1998] L. Birgé and P. Massart. Minimum contrast estimators on sieves: exponential bounds and rates of convergence. Bernoulli , 4(3):329–375, 1998. ISSN 1350-7265.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Nonparametric estimation of jump rates for a specific class of piecewise deterministic Markov processes

Abstract

1 Introduction

2 PDMP

Assumption A​​ 1**.**

Assumption A​​ 2**.**

Lemma 1**.**

Definition 2**.**

Lemma 3**.**

Assumption A​​ 3**.**

Remark*.*

Assumption** (S).**

Lemma 4**.**

Lemma 5**.**

Remark*.*

3 Estimation of the jump rate

3.1 The observation scheme

3.2 Methodology

Assumption A​​ 4**.**

3.3 Estimation of the stationary density

Proposition 6**.**

Theorem 7** (Risk of the adaptive estimator).**

3.4 Estimation of the jump rate

Remark*.*

Remark*.*

Theorem 8**.**

Corollary 9**.**

Remark*.*

3.5 Minimax bound for the estimator of the jump rate

Theorem 10** (Minimax bound).**

4 Proofs

4.1 Proof of Proposition 6

4.2 Proof of Theorem 7

Lemma 11** (Talagrand’s inequality for β\betaβ-mixing variables).**

4.3 Proof of Theorem 8

4.4 Proof of Theorem 10

Lemma 12**.**

Step 1: Construction of (λ0,…λPn)(\lambda_{0},\ldots\lambda_{P_{n}})(λ0​,…λPn​​).

Step 2: Functions λj\lambda_{j}λj​ belong to E(s,b,α)\mathcal{E}(\mathfrak{s},b,\alpha)E(s,b,α).

Step 3: Absolute continuity.

Step 4: The χ2\chi^{2}χ2 divergence.

Acknowledgements

Appendix A Technical proofs and results

A.1 Proof of Lemma 1

A.2 Proof of Lemma 3

A.3 Proof of Lemma 4

A.3.1 Assumption A2 is satisfied

Result** (Sufficient conditions for ergodicity).**

A.3.2 Assumption A3 is satisfied

A.4 Besov and Hölder spaces

Definition 13** (Modulus of continuity).**

Definition 14** (Modulus of smoothness).**

Definition 15** (Besov space).**

Definition 16** (Hölder space).**

A.5 Proof of Lemma 5

Continuous transition measure

Deterministic transition measure

General case

A.6 Proof of Talagrand’s inequality for beta-mixing variables

Lemma 17** (Berbee’s coupling lemma).**

Lemma 18** (Talagrand’s inequality).**

Proof of lemma 18.

Proof of lemma 11.

Appendix B: Simulations

TCP protocol.

Bacterial growth.

Computations

Results

Assumption A 1.

Assumption A 2.

Lemma 1.

Definition 2.

Lemma 3.

Assumption A 3.

*Remark**.*

Assumption (S).

Lemma 4.

Lemma 5.

*Remark**.*

Assumption A 4.

Proposition 6.

Theorem 7 (Risk of the adaptive estimator).

*Remark**.*

*Remark**.*

Theorem 8.

Corollary 9.

*Remark**.*

Theorem 10 (Minimax bound).

Lemma 11 (Talagrand’s inequality for $\beta$ -mixing variables).

Lemma 12.

Step 1: Construction of $(\lambda_{0},\ldots\lambda_{P_{n}})$ .

Step 2: Functions $\lambda_{j}$ belong to $\mathcal{E}(\mathfrak{s},b,\alpha)$ .

Step 4: The $\chi^{2}$ divergence.

Result (Sufficient conditions for ergodicity).

Definition 13 (Modulus of continuity).

Definition 14 (Modulus of smoothness).

Definition 15 (Besov space).

Definition 16 (Hölder space).

Lemma 17 (Berbee’s coupling lemma).

Lemma 18 (Talagrand’s inequality).