On Bernstein Type Inequalities for Stochastic Integrals of Multivariate   Point Processes

Hanchao Wang; Zhengyan Lin; Zhonggen Su

arXiv:1703.07966·math.PR·March 24, 2017

On Bernstein Type Inequalities for Stochastic Integrals of Multivariate Point Processes

Hanchao Wang, Zhengyan Lin, Zhonggen Su

PDF

Open Access

TL;DR

This paper establishes Bernstein-type concentration inequalities for stochastic integrals of multivariate point processes, providing new bounds and convergence rates for related martingales and estimators.

Contribution

It introduces novel Bernstein inequalities for multivariate point process integrals and applies them to improve convergence rate results for nonparametric MLEs.

Findings

01

Derived a Bernstein-type concentration inequality using Doléans-Dade exponential formula.

02

Established a uniform exponential inequality via generic chaining.

03

Provided an improved convergence rate for nonparametric maximum likelihood estimators.

Abstract

We consider the stochastic integrals of multivariate point processes and study their concentration phenomena. In particular, we obtain a Bernstein type of concentration inequality through Dol\'eans-Dade exponential formula and a uniform exponential inequality using a generic chaining argument. As applications, we obtain a upper bound for a sequence of discrete time martingales indexed by a class of functionals, and so derive the rate of convergence for nonparametric maximum likelihood estimators, which is an improvement of earlier work of van de Geer.

Equations194

E [∣ ξ_{k} ∣^{p}] \leq \frac{p ! a ^{p - 2}}{2} E [ξ_{k}^{2}], p \geq 2

E [∣ ξ_{k} ∣^{p}] \leq \frac{p ! a ^{p - 2}}{2} E [ξ_{k}^{2}], p \geq 2

P (∣ S_{n} ∣ \geq n x) \leq 2 exp (- \frac{n x ^{2}}{2 ( K + a x )})

P (∣ S_{n} ∣ \geq n x) \leq 2 exp (- \frac{n x ^{2}}{2 ( K + a x )})

∣Δ M_{t} ∣ \leq K, t > 0

∣Δ M_{t} ∣ \leq K, t > 0

\textsf{P}(M_{t}\geq x~{}\text{and}~{}V_{t}\leq y^{2}~{}\text{for some}~{}t)\leq\exp\Big{(}-\frac{x^{2}}{2(xK+y^{2})}\Big{)}.

\textsf{P}(M_{t}\geq x~{}\text{and}~{}V_{t}\leq y^{2}~{}\text{for some}~{}t)\leq\exp\Big{(}-\frac{x^{2}}{2(xK+y^{2})}\Big{)}.

V_{m, t} \leq \frac{m !}{2} K^{m - 2} R_{t}, t \geq 0,

V_{m, t} \leq \frac{m !}{2} K^{m - 2} R_{t}, t \geq 0,

\textsf{P}(M_{t}\geq x~{}\text{and}~{}V_{t}\leq y^{2}~{}\text{for some}~{}t)\leq\exp\Big{(}-\frac{x^{2}}{2(xK+y^{2})}\Big{)}.

\textsf{P}(M_{t}\geq x~{}\text{and}~{}V_{t}\leq y^{2}~{}\text{for some}~{}t)\leq\exp\Big{(}-\frac{x^{2}}{2(xK+y^{2})}\Big{)}.

μ (d t, d x) = k \geq 1 \sum 1_{{T_{k} < \infty}} ε_{(T_{k}, X_{k})} (d t \times d x) .

μ (d t, d x) = k \geq 1 \sum 1_{{T_{k} < \infty}} ε_{(T_{k}, X_{k})} (d t \times d x) .

ν (ω, d t, d x) = d A_{t} (ω) K_{ω, t} (d x) .

ν (ω, d t, d x) = d A_{t} (ω) K_{ω, t} (d x) .

\hat{W}_{t} = \int_{R} W (t, x) ν ({t} \times d x) .

\hat{W}_{t} = \int_{R} W (t, x) ν ({t} \times d x) .

Ξ (W)_{t} = max {0, (W - \hat{W})} * ν_{t} + (1 - a_{s}) max {0, - \hat{W}_{t}} .

Ξ (W)_{t} = max {0, (W - \hat{W})} * ν_{t} + (1 - a_{s}) max {0, - \hat{W}_{t}} .

C (W)_{t} = ⟨ W * (μ - ν), W * (μ - ν) ⟩_{t} .

C (W)_{t} = ⟨ W * (μ - ν), W * (μ - ν) ⟩_{t} .

C (W)_{t} = (W - \hat{W})^{2} * ν_{t} + s \leq t \sum (1 - a_{s}) (\hat{W}_{s})^{2} .

C (W)_{t} = (W - \hat{W})^{2} * ν_{t} + s \leq t \sum (1 - a_{s}) (\hat{W}_{s})^{2} .

Q (W, m)_{t} = max {0, (W - \hat{W})}^{m} * ν_{t} + s \leq t \sum (1 - a_{s}) max {0, - \hat{W}}^{m}, m \geq 3.

Q (W, m)_{t} = max {0, (W - \hat{W})}^{m} * ν_{t} + s \leq t \sum (1 - a_{s}) max {0, - \hat{W}}^{m}, m \geq 3.

Ξ (W)_{t} \leq K, Q (W, m)_{t} \leq \frac{m !}{2} K^{m - 2} C (W)_{t}, m \geq 3.

Ξ (W)_{t} \leq K, Q (W, m)_{t} \leq \frac{m !}{2} K^{m - 2} C (W)_{t}, m \geq 3.

P (∣ W * (μ - ν)_{t} ∣ \geq x and C (W)_{t} \leq y^{2} for some t) \leq exp (- \frac{x ^{2}}{2 ( x K + y ^{2} )}) .

P (∣ W * (μ - ν)_{t} ∣ \geq x and C (W)_{t} \leq y^{2} for some t) \leq exp (- \frac{x ^{2}}{2 ( x K + y ^{2} )}) .

d_{1} (ψ_{1}, ψ_{2}) = ∣∣Ξ (W^{ψ_{1}} - W^{ψ_{2}})_{T} ∣ ∣_{\infty},

d_{1} (ψ_{1}, ψ_{2}) = ∣∣Ξ (W^{ψ_{1}} - W^{ψ_{2}})_{T} ∣ ∣_{\infty},

d_{2} (ψ_{1}, ψ_{2}) = ∣∣ C (W^{ψ_{1}} - W^{ψ_{2}})_{T} ∣ ∣_{\infty}

d_{2} (ψ_{1}, ψ_{2}) = ∣∣ C (W^{ψ_{1}} - W^{ψ_{2}})_{T} ∣ ∣_{\infty}

\textsf{P}(|X^{\psi_{1}}-X^{\psi_{2}}|>x)\leq\exp\Big{(}-\frac{x^{2}}{2(d_{2}^{2}(\psi_{1},\psi_{2})+d_{1}(\psi_{1},\psi_{2})x)}\Big{)}.

\textsf{P}(|X^{\psi_{1}}-X^{\psi_{2}}|>x)\leq\exp\Big{(}-\frac{x^{2}}{2(d_{2}^{2}(\psi_{1},\psi_{2})+d_{1}(\psi_{1},\psi_{2})x)}\Big{)}.

γ_{α} (Ψ, d) = in f ψ \in Ψ sup n \geq 0 \sum 2^{n / α} Υ_{d} (A_{n} (ψ)),

γ_{α} (Ψ, d) = in f ψ \in Ψ sup n \geq 0 \sum 2^{n / α} Υ_{d} (A_{n} (ψ)),

Ξ (W^{ψ})_{T} \leq K, C (W^{ψ})_{T}^{m} \leq \frac{m !}{2} K^{m - 2} C (W^{ψ})_{T}, m \geq 3.

Ξ (W^{ψ})_{T} \leq K, C (W^{ψ})_{T}^{m} \leq \frac{m !}{2} K^{m - 2} C (W^{ψ})_{T}, m \geq 3.

\textsf{P}\big{(}\sup_{\psi\in\Psi}|X^{\psi}|\geq Cu(\gamma_{2}(\Psi,d_{2})+\gamma_{1}(\Psi,d_{1}))\big{)}\leq C\exp(-\frac{u}{2}).

\textsf{P}\big{(}\sup_{\psi\in\Psi}|X^{\psi}|\geq Cu(\gamma_{2}(\Psi,d_{2})+\gamma_{1}(\Psi,d_{1}))\big{)}\leq C\exp(-\frac{u}{2}).

\textsf{E}\sup_{\psi\in\Psi}X^{\psi}\leq C\big{(}\gamma_{2}(\Psi,d_{2})+\gamma_{1}(\Psi,d_{1})\big{)}.

\textsf{E}\sup_{\psi\in\Psi}X^{\psi}\leq C\big{(}\gamma_{2}(\Psi,d_{2})+\gamma_{1}(\Psi,d_{1})\big{)}.

P (∣ W * (μ - ν)_{t} ∣ \geq x) \leq P (∣ W ∣ * (μ - ν)_{t} \geq x) .

P (∣ W * (μ - ν)_{t} ∣ \geq x) \leq P (∣ W ∣ * (μ - ν)_{t} \geq x) .

X_{t} = W * (μ - ν)_{t},

X_{t} = W * (μ - ν)_{t},

S (λ)_{t} = \int_{0}^{t} \int_{R} (e^{λ (W - \hat{W})} - 1 - λ (W - \hat{W})) ν (d s, d x) + s \leq t \sum (1 - a_{s}) (e^{- λ \hat{W}_{s}} - 1 + λ \hat{W}_{s}),

S (λ)_{t} = \int_{0}^{t} \int_{R} (e^{λ (W - \hat{W})} - 1 - λ (W - \hat{W})) ν (d s, d x) + s \leq t \sum (1 - a_{s}) (e^{- λ \hat{W}_{s}} - 1 + λ \hat{W}_{s}),

E (Y)_{t} = e^{Y_{t} - Y_{0} - \frac{1}{2} < Y^{c}, Y^{c} >_{t}} s \leq t \prod (1 + Δ Y_{s}) e^{- Δ Y_{s}} .

E (Y)_{t} = e^{Y_{t} - Y_{0} - \frac{1}{2} < Y^{c}, Y^{c} >_{t}} s \leq t \prod (1 + Δ Y_{s}) e^{- Δ Y_{s}} .

Δ S (λ)_{t} = \frac{\int e ^{λW} ν ({ t } , d x ) - a _{t} e ^{λ \hat{W}_{t}} + ( 1 - a _{t} ) ( 1 - e ^{λ \hat{W}_{t}} )}{e ^{λ \hat{W}_{t}}}

Δ S (λ)_{t} = \frac{\int e ^{λW} ν ({ t } , d x ) - a _{t} e ^{λ \hat{W}_{t}} + ( 1 - a _{t} ) ( 1 - e ^{λ \hat{W}_{t}} )}{e ^{λ \hat{W}_{t}}}

\int e^{λW} ν ({t}, d x) + 1 - a_{t} > 0,

\int e^{λW} ν ({t}, d x) + 1 - a_{t} > 0,

Δ S (λ)_{t} > - 1

Δ S (λ)_{t} > - 1

Δ X = (W - \hat{W}) 1_{D} - \hat{W} 1_{D^{c}},

Δ X = (W - \hat{W}) 1_{D} - \hat{W} 1_{D^{c}},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPoint processes and geometric inequalities

Full text

On Bernstein Type Inequalities for Stochastic Integrals of Multivariate Point Processes

Hanchao Wang111Corresponding author, email: [email protected].

Zhongtai Security Institute for Financial Studies, Shandong University, Jinan, 250100, PRC

Zhengyan Lin, Zhonggen Su

School of Mathematical Sciences, Zhejiang University, Hangzhou, 310027, PRC

Abstract

We consider the stochastic integrals of multivariate point processes and study their concentration phenomena. In particular, we obtain a Bernstein type of concentration inequality through Doléans-Dade exponential formula and a uniform exponential inequality using a generic chaining argument. As applications, we obtain a upper bound for a sequence of discrete time martingales indexed by a class of functionals, and so derive the rate of convergence for nonparametric maximum likelihood estimators, which is an improvement of earlier work of van de Geer.

keywords:

Bernstein inequality, Doléans-Dade exponential formula, Generic chaining method, Multivariate point process.

1 Introduction

There have been a lot of research activities around phenomena of measure concentration in the past decades. The reader is referred to excellent books like Ledoux and Talagrand [12], Ledoux [11] and nice paper like Talagrand [14] for remarkable results and powerful methods. A primary purpose of the present paper is to establish a Bernstein type exponential concentration inequality for stochastic integrals of multivariate point processes.

For sake of statement, we will begin with a classical Bernstein inequality for sums of independent random variables. Assume that $(\Omega,\cal{F},\textsf{P})$ is a probability space so large that we can construct all random objects of interest in it. Let $\xi_{1}$ , $\xi_{2}$ , $\cdots$ be a sequence of centered independent random variables with finite variance, and denote $S_{n}=\xi_{1}+\xi_{2}+\cdots+\xi_{n}$ . If there exists a certain constant $a>0$ such that

[TABLE]

then

[TABLE]

for all $x>0$ and for all $K$ satisfying $Var(S_{n})=\sum_{i=1}^{n}\textsf{E}\xi_{n}^{2}\leq K$ .

(1.1) was due to Bernstein [6], and so (1.1) is now referred as Bernstein condition. Since then various extensions and improvement have appeared in literature, among which are Bennett [1, 2], Hoeffding [9], Freedman [8], Bentkus [4, 5], Fan et al. [7]. A very recent nice book is Bercu et al [3] which gives a very clear exposition on concentration inequalities for sums of independent random variables and martingales.

An important extension of Bernstein inequality is to both discrete time martingales and continuous time martingales. In particular, Freedman [8] first obtained the Bernstein inequality of discrete time martingales with bounded jumps, and then Shorack and Wellner [13] extended Freedman's result to continuous time martingales. More precisely, let $(\Omega,\mathcal{F},(\mathcal{F}_{t})_{t\in[0,T]},\textsf{ P})$ be a stochastic basis, $\{M_{t}\}_{t\geq 0}$ be a locally square integrable martingale with respect to the filtration $\{\mathcal{F}_{t}\}_{t\in[0,T]}$ with $M_{0}=0$ . Denote the jump by $\Delta M_{t}=M_{t}-M_{t-}$ and the predictable variation by $V_{t}=<M,M>_{t}$ , $t>0$ . Assume that

[TABLE]

for a positive constant $K$ . Then for each $x,y>0$ ,

[TABLE]

The bounded jump assumption (1.3) can be relaxed. In fact, van de Geer [17] improved the above result under Bernstein condition. For each $m\geq 2$ , consider the process $\{\sum_{s\leq t}|\Delta M_{s}|^{m}\}_{t\geq 0}$ and its predictable compensator $\{V_{m,t}\}_{t\geq 0}$ . If there exist a constant $K>0$ and a predictable process $\{R_{t}\}_{t\geq 0}$ such that

[TABLE]

then for each $x,y>0$ ,

[TABLE]

We remark that any locally square integrable martingale can be represented as the sum of continuous local martingale and pure jump local martingale. The nonzero continuous local martingale part indeed played a crucial role in the proof of both (1.4) and (1.6). Now it is natural to ask what happens for a pure jump local martingale. It is an interesting and challenging mathematical problem to establish a concentration inequality for general pure jump local martingales. We shall restrict ourselves to stochastic integrals of multivariate point processes.

Let $(E,{\cal E})$ be a Blackwell space. Assume that $\{T_{k}\}_{k\geq 1}$ be a sequence of strictly increasing positive random variables, $\{X_{k}\}_{k\geq 1}$ a sequence of $E$ -valued random variables and $X_{k}$ is measurable with respect to ${\cal F}_{T_{k}}$ for each $k\geq 1$ . A multivariate point process is an integer-valued random variables defined by

[TABLE]

We note that Poisson point process and compound Poisson point process are classic and well-studied examples of multivariate point processes. We shall be interested in stochastic integrals of a predictable process with respect to the measure $\mu$ . Let $\nu$ be the predictable compensator of $\mu$ and assume that $\nu$ admits the disintegration

[TABLE]

where $K$ is a transition probability from $(\Omega\times[0,T],\mathcal{P})$ in to $(E,\mathcal{E})$ , $A$ is an increasing càdlág predictable process. Denote $a_{t}=\nu(\{t\}\times\mathbb{R})$ . It is easy to see that the process $a\equiv 0$ if $\mu$ is a Lévy point process. However, what we are more interested in the case $a\neq 0$ , namely $\Delta A_{t}\neq 0$ .

Given a predictable function $W$ on $\tilde{\Omega}$ , $\tilde{\Omega}=\Omega\times\mathbb{R}_{+}\times\mathbb{R}$ , define the stochastic integral

[TABLE]

In addition, put

[TABLE]

and

[TABLE]

An easy computation, see Chapter 2 of Jacod and Shiryaev [10], implies

[TABLE]

Motivated by (1.12), we introduce the following quantities

[TABLE]

The Bernstein inequality for $W*(\mu-\nu)$ reads as follows

Theorem 1.1

Suppose that for all $t>0$ and some $0<K<\infty$

[TABLE]

Then for each $x>0,$ $y>0$ ,

[TABLE]

The proof of Theorem 1.1 will be given in Section 2. A key ingredient is Doléans-Dade exponential formula for semimartingales with given predictable characteristics.

Next let us turn to consider the uniform bound for a family of stochastic integrals of predictable processes with respect to multivariate point process. Let $(\Psi,d)$ be a metric space, $\mathcal{W}=\{W^{\psi}:\psi\in\Psi\}$ a family of predictable functions on $\Omega\times\mathbb{R}_{+}\times\mathbb{R}$ .

Fix a $T>0$ . We denote $X^{\psi}=W^{\psi}*(\mu-\nu)_{T}$ and define two metrics as follows

[TABLE]

where $||\cdot||_{\infty}$ stands for norm of $L^{\infty}$ .

By Theorem 1.1, one easily can obtain

[TABLE]

As known to us, (1.18) is a certain increment condition. We can further derive a uniform inequality for $\sup_{\psi\in\Psi}{X^{\psi}}$ using a generic chaining method as in Talagrand [15]. To this end, we need to introduce more notations. For a given metric space $(\Psi,d)$ , an increasing sequence $(\mathcal{A}_{n})_{n\geq 1}$ of partitions of $\Psi$ is called as admissible sequence if $\sharp\mathcal{A}_{n}\leq 2^{2^{n}}$ . Denote by $A_{n}(\psi)$ the unique of element of $(\mathcal{A}_{n})$ containing $\psi$ , and denote by $\Upsilon_{d}(A_{n}(\psi))$ the diameter of $A_{n}(\psi)$ under $d$ . In addition, let

[TABLE]

where the infimum is taken over all admissible sequences. We can now state a uniform inequality for $\sup_{\psi\in\Psi}{X^{\psi}}$ in terms of $\gamma_{1}$ and $\gamma_{2}$ .

Theorem 1.2

Suppose that for all $\psi\in\Psi$ and some $0<K<\infty$

[TABLE]

Then we have

[TABLE]

Moreover, it follows

[TABLE]

The proof of Theorem 1.2 will also be given in Section 2. As applications, we will obtain a Bernstein type exponential inequality for a class of functional index empirical processes and so derive a convergence rate for nonparametric maximum likelihood estimators. This is the content of Section 3.

2 Proofs of Theorems 1 and 2

**Proof of Theorem 1.1 ** Clearly, it follows

[TABLE]

So without loss of generality, we can and do assume $W>0$ . For simplicity of notation, put

[TABLE]

and

[TABLE]

where $0<\lambda<\frac{1}{K}$ . Note for any semimartingale $Y$ , the Doléans-Dade exponential is

[TABLE]

Since

[TABLE]

and

[TABLE]

we can obtain

[TABLE]

for all $t>0$ .

We shall first show the the process $\Big{(}e^{\lambda X}/\mathcal{E}(S(\lambda))\Big{)}_{t\geq 0}$ is a local martingale. For $X$ , the jump part of $X$ is

[TABLE]

where $D$ is the thin set, which is exhausted by $\{T_{n}\}_{n\geq 1}$ .

We denote by $\mu^{X}$ the jump measure of $X$ . Let $\nu^{X}$ be the predictable compensator of $\mu^{X}$ , and

[TABLE]

The Itô formula yields

[TABLE]

Furthermore,

[TABLE]

We obtain that

[TABLE]

is a local martingale. Set $H=e^{\lambda X}$ , $G=\mathcal{E}(S(\lambda))$ , $A=S(\lambda)$ and $f(h,g)=\frac{h}{g}.$ The Itô formula yields

[TABLE]

Let $N_{1}=H-H_{-}\cdot A$ , and note $N_{1}$ is also a local martingale. We have

[TABLE]

By the definition of $\mathcal{E}(A)$ , we have $G=1+G_{-}\cdot A$ , thus

[TABLE]

Then

[TABLE]

Noting that $\Delta G=G_{-}\Delta A$ , $\Delta N_{1}=\Delta H-H_{-}\Delta A$ , we have

[TABLE]

where $A$ is a predictable process, and $N$ is a local martingale. By the property of the Stieltjes integral, we have

[TABLE]

Thus $\Big{(}e^{\lambda X}/\mathcal{E}(S(\lambda))\Big{)}_{t}$ is a local martingale.

Since $e^{x}\geq x+1$ and $e^{S(\lambda)_{t}}\geq\mathcal{E}(S(\lambda)_{t})$ ,

[TABLE]

Thus,

[TABLE]

Set

[TABLE]

we have

[TABLE]

On $A$ ,

[TABLE]

then

[TABLE]

Take $\lambda=\frac{x}{y+Kx}$ , we obtain

[TABLE]

Proof of Theorem 1.2

By Theorem 1.1, we can obtain

[TABLE]

and

[TABLE]

for $u>0$ . We set $X^{\psi_{0}}=0$ .

Consider an admissible sequence $(\mathcal{B}_{n})$ such that

[TABLE]

where $\Upsilon_{1}(B_{n}(\psi))$ is the diameter of the set $B_{n}(\psi)$ for $d_{1}$ , and an admissible sequence $(\mathcal{C}_{n})$ such that

[TABLE]

where $\Upsilon_{2}(C_{n}(\psi))$ is the diameter of the set $C_{n}(\psi)$ for $d_{2}$ .

We may define partition $\mathcal{A}_{n}$ for $\Psi$ as follows: $\mathcal{A}_{0}=\{\Psi\}$ ,

[TABLE]

Consider a set $\Phi_{n}$ that contains exactly one point in $\mathcal{A}_{n}$ . For $\psi\in\Psi$ , $\pi_{n}(\psi)$ is the element of $\Phi_{n}$ that belong to $A_{n}(\psi)$ . We can easily obtain

[TABLE]

Let $\Lambda_{n}$ be the event defined by

[TABLE]

For $u>1$ , it easily follows

[TABLE]

Letting $\Omega_{u}=\cap_{n\geq 1}\Lambda_{n}$ , we obtain

[TABLE]

On $\Omega_{u}$ ,

[TABLE]

Hence,

[TABLE]

where

[TABLE]

Thus

[TABLE]

Obviously,

[TABLE]

When $n\geq 2$ , we have $\pi_{n}(\psi)$ , $\pi_{n-1}(\psi)\in A_{n-1}(\psi)\subset B_{n-2}(\psi)$ , so that

[TABLE]

Thus

[TABLE]

Proceeding similarly for $d_{2}$ , we obtain

[TABLE]

in $\Omega_{u}$ . Thus

[TABLE]

we complete the proof of (1.21).

We can obtain (1.22) through

[TABLE]

Remark 2.1

Let $(\Psi,d)$ be a metric space, and let $\big{(}X^{\psi},\psi\in\Psi\big{)}$ be a family of stochastic processes defined on a probability space $(\Omega,\mathcal{F},\textsf{ P})$ . A primary problem is to study the bounds for $\textsf{E}\sup_{\psi\in\Psi}X^{\psi}$ , where

[TABLE]

However, this is not easy at all for general processes. The generic chaining method was first invented by Talagrand in a series of articles to deal with $\textsf{E}\sup_{\psi\in\Psi}X^{\psi}$ . In particular, under the increment condition

[TABLE]

Talagrand [15] proved

[TABLE]

In addition, if the condition (2.31) is replaced by

[TABLE]

then it follows

[TABLE]

Theorem 1.2 implies that if the following increment condition is satisfied

[TABLE]

then (1.21) still holds true.

3 Applications

In this section we shall first apply the previous results to functional index empirical processes. Consider a sequence of adapted stationary time series $(Y_{n})_{n\geq 0}$ on the discrete time stochastic basis $(\Omega,\mathcal{F},(\mathcal{F}_{n})_{n\geq 0},\textsf{ P})$ . Let $\Psi$ be the space of all bounded measurable functions in $\mathbb{R}$ . For a $\psi\in\Psi$ , define

[TABLE]

Obviously, for each $\psi$ , ${X_{n}^{\psi}}_{n\geq 0}$ is a discrete time martingale. Note also $X_{n}^{\psi}$ can be realized through a stochastic integral of $\psi$ with respect to a multivariate point process. In fact, let $T_{k}=k$ , $X_{k}=Y_{k}$ , $\psi(k,x)=\psi_{(}x_{k})$ , then

[TABLE]

A simple computation shows

[TABLE]

and

[TABLE]

As a direct consequence of Theorem 1.1, we have

Theorem 3.1

Suppose that, for all $k>0$ and some $0<K<\infty$

[TABLE]

Then for each $x>0,$ $y>0$ ,

[TABLE]

Remark 3.2

If we denote

[TABLE]

and

[TABLE]

The conditions (3.5) and (3.6) imply the conditions (1.14) in Theorem 1.1 for $\Xi(\psi)_{n}$ and $Q(\psi,m)_{n}$ .

Furthermore, we define the metric for fix $n>0$

[TABLE]

where $||\cdot||_{\infty}$ stands for norm of $L^{\infty}$ .

Theorem 3.3

Suppose that, for all $\psi\in\Psi$ and some $0<K<\infty$ , (3.5) and (3.6) hold. Then

[TABLE]

Proof. It follows directly from the proof of Theorem 1.2

[TABLE]

Note

[TABLE]

Then (3.12) easily holds. \qed

As a special example of functional index empirical processes, we consider the nonparametric maximum likelihood estimators below.

Let $\mathcal{P}=\{P_{\theta},\theta\in\Theta\}$ be a family of probability measures, we assume that $\mathcal{P}$ is dominated by a Lebesgue measure. Denote the density of $P_{\theta}$ by $f_{\theta}=\frac{dP_{\theta}}{dx}$ , $\theta\in\Theta$ . Fix a $\theta_{0}\in\Theta$ such that $f_{\theta_{0}}>0$ , and let $X_{1},X_{2},\cdots$ be a sequence of i.i.d. observations from $P_{0}=P_{\theta_{0}}$ . Define the empirical distribution

[TABLE]

on the basis of the first $n$ observations. The maximum likelihood estimator $\hat{\theta}_{n}$ of $\theta_{0}$ is defined by

[TABLE]

We assume throughout that a $\hat{\theta}_{n}$ exists.

It is very important to study the rate of convergence of $f_{\hat{\theta}_{n}}$ to $f_{\theta}$ in the theory of nonparametric statistical inference. Recall the Hellinger distance is usually used to describe the distance between two probability measures. In particular, for a pair $(P,\bar{P})$ of probability measures the Hellinger distance $H(P,\bar{P})$ is defined by

[TABLE]

where $Q$ is a measure dominating $P$ and $\bar{P}$ .

In our setting $Q$ is a Lebesgue measure, $f=\frac{dP}{dx}$ , $\bar{f}=\frac{d\bar{P}}{dx}$ , and we simply write $h(f,\bar{f})=H(P,\bar{P})$ . It is natural to ask what the rate of convergence for $f_{\hat{\theta}_{n}}$ to $f_{\theta_{0}}$ in terms of $h^{2}(f_{\hat{\theta}_{n}},f_{\theta_{0}})$ . We have the following result in this aspect. Denote $\mathcal{G}=\{g_{\theta}:=\sqrt{\frac{f_{\theta}}{f_{\theta_{0}}}}-1,~{}~{}\theta\in\Theta\}$ , and set

[TABLE]

Theorem 3.4

Suppose that there is a positive constant $0<K<\infty$ such that for all $\theta\in\Theta$

[TABLE]

Then it follows

[TABLE]

Proof.

Since $\log(1+x)\leq x$ for any $x>-1$ ,

[TABLE]

Also, according to (3.16),

[TABLE]

Thus we have

[TABLE]

We now can complete the proof by Theorem 3.3. \qed

Remark 3.5

van de Geer [16, 17] discussed the similar problem on maximum likelihood estimators. To our knowledge, Theorem 3.4 is new in this area. We also remark that Theorem 3.4 can be extended by Theorem 1.2 and 3.3 to the stationary sample case with suitable maximum likelihood estimators. It is left to future work .

Acknowledgments

This research work is support by the National Natural Science Foundation of China (No. 11371317, 11171303) and the Fundamental Research Fund of Shandong University (No. 2016GN019).

Reference

Bibliography17

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Bennett, G. Probability inequalities for the sum of independent random variables. J. Amer. Statist. Assoc. , 57 , 33-45, (1962).
2[2] Bennett, G. On the probability of large deviations from the expectation for sums of bounded independent random variables. Biometrika , 50 , 528-535, (1963).
3[3] Bercu, B., Delyon, B. and Rio, E. Concentration inequalities for sums and martingales. Springer Briefs in Mathematics , Springer, (2015)
4[4] Bentkus, V. An inequality for tail probabilities of martingales with differences bounded from one side. J. Theoret. Probab. , 16 , 161-173, (2003).
5[5] Bentkus, V. On Hoeffding's inequalities. Ann. Probab. , 32 , 1650-1673, (2004).
6[6] Bernstein, S.N. Theory of Probability , Moscow. (1927) .
7[7] Fan, X., Grama, I. and Liu, Q. Exponential inequalities for martingales with applications. Electron. J. Probab. , 20 , (2015).
8[8] Freedman, D. On tail probabilites for martingales. Ann. Probability , 3 , 100-118, (1975).