Second Order Expansions for Sample Median with Random Sample Size

Gerd Christoph; Vladimir V. Ulyanov; Vladimir E. Bening

arXiv:1905.07765·math.ST·June 25, 2020

Second Order Expansions for Sample Median with Random Sample Size

Gerd Christoph, Vladimir V. Ulyanov, Vladimir E. Bening

PDF

Open Access

TL;DR

This paper develops second order asymptotic expansions for the sample median when the sample size is random, extending classical results to more realistic scenarios where sample size varies unpredictably.

Contribution

It introduces novel second order Chebyshev–Edgeworth and Cornish–Fisher expansions for the median with a specific type of random sample size, advancing asymptotic theory.

Findings

01

Derived second order expansions for median with random sample size

02

Applied expansions to Student's t- and Laplace distributions

03

Enhanced understanding of median's asymptotic behavior under randomness

Abstract

In practice, we often encounter situations where a sample size is not defined in advance and can be a random value. The randomness of the sample size crucially changes the asymptotic properties of the underlying statistic. In the present paper second order Chebyshev--Edgeworth and Cornish--Fisher expansions based of Student's $t$ - and Laplace distributions and their quantiles are derived for sample median with random sample size of a special kind.

Equations259

\overline{T}_{N_{n}} (ω) := \overline{T}_{N_{n} (ω)} (X_{1} (ω), \dots, X_{N_{n} (ω)}), ω \in Ω,

\overline{T}_{N_{n}} (ω) := \overline{T}_{N_{n} (ω)} (X_{1} (ω), \dots, X_{N_{n} (ω)}), ω \in Ω,

S_{N_{n}} = \sum_{k = 1}^{N_{n}} X_{k} \mbox an d T_{N_{n}} = \frac{1}{N _{n}} \sum_{k = 1}^{N_{n}} X_{k} = \frac{1}{N _{n}} S_{N_{n}},

S_{N_{n}} = \sum_{k = 1}^{N_{n}} X_{k} \mbox an d T_{N_{n}} = \frac{1}{N _{n}} \sum_{k = 1}^{N_{n}} X_{k} = \frac{1}{N _{n}} S_{N_{n}},

∙

∙

∙

∙

M_{m}=\left\{\begin{array}[]{ll}X_{(j)},&\quad m=2j-1,\\[4.30554pt] (X_{(j)}+X_{(j+1)})/2,&\quad m=2j,\end{array}\right.\qquad j,m\in\mathbb{N}\,.

M_{m}=\left\{\begin{array}[]{ll}X_{(j)},&\quad m=2j-1,\\[4.30554pt] (X_{(j)}+X_{(j+1)})/2,&\quad m=2j,\end{array}\right.\qquad j,m\in\mathbb{N}\,.

sup_{x \in R} P_{θ} (2 p_{X} (0) m (M_{m} - θ) \leq x) - Φ (x) \to 0 \mbox a s m \to \infty,

sup_{x \in R} P_{θ} (2 p_{X} (0) m (M_{m} - θ) \leq x) - Φ (x) \to 0 \mbox a s m \to \infty,

Φ (x) = \int_{- \infty}^{x} φ (y) d y \mbox w i t h φ (y) = \frac{1}{2 π} e^{- y^{2} /2} .

Φ (x) = \int_{- \infty}^{x} φ (y) d y \mbox w i t h φ (y) = \frac{1}{2 π} e^{- y^{2} /2} .

s_{\nu}(x)=\frac{\Gamma((\nu+1)/2)}{\sqrt{\nu\pi}\,\Gamma(\nu/2)}\,\,\Big{(}1+\frac{x^{2}}{\nu}\Big{)}^{-(\nu+1)/2},\quad\nu>0,\quad x\in\mathbb{R},

s_{\nu}(x)=\frac{\Gamma((\nu+1)/2)}{\sqrt{\nu\pi}\,\Gamma(\nu/2)}\,\,\Big{(}1+\frac{x^{2}}{\nu}\Big{)}^{-(\nu+1)/2},\quad\nu>0,\quad x\in\mathbb{R},

t_{a} (x) = \frac{a - ∣ x ∣}{a ^{2}} 1_{(- a, a)} (x), a > 0 \mbox w i t h 1_{A} (x) := {1, 0, x \in A x \in / A, A \subset R,

t_{a} (x) = \frac{a - ∣ x ∣}{a ^{2}} 1_{(- a, a)} (x), a > 0 \mbox w i t h 1_{A} (x) := {1, 0, x \in A x \in / A, A \subset R,

u_{a} (x) = \frac{1}{2 a} 1_{(- a, a)} (x), a > 0

u_{a} (x) = \frac{1}{2 a} 1_{(- a, a)} (x), a > 0

l_{μ} (x) = \frac{1}{2 μ} e^{- 2 ∣ x ∣/ μ}, x \in R, μ > 0, x \in R .

l_{μ} (x) = \frac{1}{2 μ} e^{- 2 ∣ x ∣/ μ}, x \in R, μ > 0, x \in R .

\left.\begin{array}[]{llll}\bullet\,\varphi(x):&\!\!\!p_{0}=1/\sqrt{2\,\pi},&\!\!\!p_{1}=0,&\!\!\!p_{2}=-1/\sqrt{2\,\pi},\\ \bullet\,s_{\nu}(x):&\!\!\!p_{0}=\frac{\displaystyle\Gamma((\nu+1)/2)}{\displaystyle\sqrt{v\pi}\,\Gamma(v/2)},&\!\!\!p_{1}=0,&\!\!\!p_{2}=-\,\frac{\displaystyle\Gamma((\nu+3)/2)}{\displaystyle\sqrt{v\pi}\,\Gamma((v+2)/2)},\\ \bullet\,t_{a}(x):&\!\!\!p_{0}=a^{-1},&\!\!\!p_{1}=-a^{-2},&\!\!\!p_{2}=0,\\ \bullet\,u_{a}(x):&\!\!\!p_{0}=(2\,a)^{-1},&\!\!\!p_{1}=0,&\!\!\!p_{2}=0,\\ \bullet\,l_{\mu}(x):&\!\!\!p_{0}=1/(\sqrt{2}\,\mu),&\!\!\!p_{1}=-\mu^{-2},&\!\!\!p_{2}=\sqrt{2}\mu^{-3}.\end{array}\right\}

\left.\begin{array}[]{llll}\bullet\,\varphi(x):&\!\!\!p_{0}=1/\sqrt{2\,\pi},&\!\!\!p_{1}=0,&\!\!\!p_{2}=-1/\sqrt{2\,\pi},\\ \bullet\,s_{\nu}(x):&\!\!\!p_{0}=\frac{\displaystyle\Gamma((\nu+1)/2)}{\displaystyle\sqrt{v\pi}\,\Gamma(v/2)},&\!\!\!p_{1}=0,&\!\!\!p_{2}=-\,\frac{\displaystyle\Gamma((\nu+3)/2)}{\displaystyle\sqrt{v\pi}\,\Gamma((v+2)/2)},\\ \bullet\,t_{a}(x):&\!\!\!p_{0}=a^{-1},&\!\!\!p_{1}=-a^{-2},&\!\!\!p_{2}=0,\\ \bullet\,u_{a}(x):&\!\!\!p_{0}=(2\,a)^{-1},&\!\!\!p_{1}=0,&\!\!\!p_{2}=0,\\ \bullet\,l_{\mu}(x):&\!\!\!p_{0}=1/(\sqrt{2}\,\mu),&\!\!\!p_{1}=-\mu^{-2},&\!\!\!p_{2}=\sqrt{2}\mu^{-3}.\end{array}\right\}

m^{*}=2\,[m/2]=\left\{\begin{array}[]{ll}m&\mbox{for even m,}\\ m-1&\mbox{for odd m.}\end{array}\right.

m^{*}=2\,[m/2]=\left\{\begin{array}[]{ll}m&\mbox{for even m,}\\ m-1&\mbox{for odd m.}\end{array}\right.

sup_{x \in R} P_{θ} (2 p_{0} m^{*} (M_{m} - θ) \leq x) - Φ (x) - \frac{f _{1} ( x )}{m ^{*}} - \frac{f _{2} ( x )}{m ^{*}} \leq \frac{C _{1}}{m ^{3/2}},

sup_{x \in R} P_{θ} (2 p_{0} m^{*} (M_{m} - θ) \leq x) - Φ (x) - \frac{f _{1} ( x )}{m ^{*}} - \frac{f _{2} ( x )}{m ^{*}} \leq \frac{C _{1}}{m ^{3/2}},

f_{1}(x)=\frac{p_{1}x|x|}{4p_{0}^{2}}\varphi(x)\quad\mbox{and}\quad f_{2}(x)=\frac{x}{4}\Big{(}3+x^{2}+\frac{p_{2}x^{2}}{6p_{0}^{3}}-\frac{p_{1}^{2}x^{4}}{8p_{0}^{4}}\Big{)}\varphi(x).

f_{1}(x)=\frac{p_{1}x|x|}{4p_{0}^{2}}\varphi(x)\quad\mbox{and}\quad f_{2}(x)=\frac{x}{4}\Big{(}3+x^{2}+\frac{p_{2}x^{2}}{6p_{0}^{3}}-\frac{p_{1}^{2}x^{4}}{8p_{0}^{4}}\Big{)}\varphi(x).

\sup\nolimits_{x\in\mathbb{R}}\left|\mathbb{P}_{\theta}\Big{(}2p_{0}\sqrt{m^{*}}(M_{m}-\theta)\leq x\Big{)}-\Phi(x)-\frac{f_{1}(x)}{\sqrt{m}}-\frac{f_{2}(x)}{m}\right|\leq\frac{C_{2}}{m^{3/2}},

\sup\nolimits_{x\in\mathbb{R}}\left|\mathbb{P}_{\theta}\Big{(}2p_{0}\sqrt{m^{*}}(M_{m}-\theta)\leq x\Big{)}-\Phi(x)-\frac{f_{1}(x)}{\sqrt{m}}-\frac{f_{2}(x)}{m}\right|\leq\frac{C_{2}}{m^{3/2}},

P_{θ} (2 p_{0} m^{*} (M_{m^{*}} - θ) \leq x) - P_{θ} (2 p_{0} m^{*} (M_{m^{*} + 1} - θ) \leq x) \leq C m^{- 3/2} .

P_{θ} (2 p_{0} m^{*} (M_{m^{*}} - θ) \leq x) - P_{θ} (2 p_{0} m^{*} (M_{m^{*} + 1} - θ) \leq x) \leq C m^{- 3/2} .

\left.\begin{array}[]{lcl}\Gamma(z)&=&\sqrt{2\,\pi}\,z^{z-1/2}\,e^{-z}\,(1+\frac{\displaystyle 1}{\displaystyle 12z}+\frac{\displaystyle 1}{\displaystyle 288z^{2}}+R_{3}(z)),\\[4.30554pt] \frac{\displaystyle 1}{\displaystyle\Gamma(z)}&=&\frac{\displaystyle 1}{\displaystyle\sqrt{2\,\pi}}\,z^{-z+1/2}\,e^{z}\,(1-\frac{\displaystyle 1}{\displaystyle 12z}+\frac{\displaystyle 1}{\displaystyle 288z^{2}}+\tilde{R}_{3}(z)),\end{array}\right\}\quad z>0,

\left.\begin{array}[]{lcl}\Gamma(z)&=&\sqrt{2\,\pi}\,z^{z-1/2}\,e^{-z}\,(1+\frac{\displaystyle 1}{\displaystyle 12z}+\frac{\displaystyle 1}{\displaystyle 288z^{2}}+R_{3}(z)),\\[4.30554pt] \frac{\displaystyle 1}{\displaystyle\Gamma(z)}&=&\frac{\displaystyle 1}{\displaystyle\sqrt{2\,\pi}}\,z^{-z+1/2}\,e^{z}\,(1-\frac{\displaystyle 1}{\displaystyle 12z}+\frac{\displaystyle 1}{\displaystyle 288z^{2}}+\tilde{R}_{3}(z)),\end{array}\right\}\quad z>0,

\left.\begin{array}[]{ll}\sup\nolimits_{y\geq 0}\left|\mathbb{P}\!\left(\!g_{n}^{-1}N_{n}\leq y\right)-H(y)\right|\leq C_{3}n^{-b},&0<b\leq 1\\[8.61108pt] \sup\nolimits_{y\geq 0}\left|\mathbb{P}\!\left(\!g_{n}^{-1}N_{n}\leq y\right)-H(y)-n^{-1}h_{2}(y)\right|\leq C_{3}n^{-b},&b>1\end{array}\right\}

\left.\begin{array}[]{ll}\sup\nolimits_{y\geq 0}\left|\mathbb{P}\!\left(\!g_{n}^{-1}N_{n}\leq y\right)-H(y)\right|\leq C_{3}n^{-b},&0<b\leq 1\\[8.61108pt] \sup\nolimits_{y\geq 0}\left|\mathbb{P}\!\left(\!g_{n}^{-1}N_{n}\leq y\right)-H(y)-n^{-1}h_{2}(y)\right|\leq C_{3}n^{-b},&b>1\end{array}\right\}

\displaystyle\sup\nolimits_{x\in\mathbb{R}}\Big{|}\mathbb{P}_{\theta}\Big{(}2p_{0}\sqrt{g_{n}\,N_{n}^{*}/N_{n}\,}\,\,(M_{N_{n}}-\theta)\leq x\Big{)}-G_{n}(x,1/g_{n})\Big{|}

\displaystyle\sup\nolimits_{x\in\mathbb{R}}\Big{|}\mathbb{P}_{\theta}\Big{(}2p_{0}\sqrt{g_{n}\,N_{n}^{*}/N_{n}\,}\,\,(M_{N_{n}}-\theta)\leq x\Big{)}-G_{n}(x,1/g_{n})\Big{|}

\leq C_{2} E (N_{n}^{- 3/2}) + (C_{3} D_{n} + C_{4}) n^{- b},

G_{n}(x,1/g_{n})=\int\nolimits^{\infty}_{1/g_{n}}\Big{(}\Phi(x\sqrt{y})+\frac{f_{1}(x\sqrt{y})}{\sqrt{g_{n}y}}+\frac{f_{2}(x\sqrt{y})}{g_{n}y}\Big{)}d\Big{(}H(y)+\frac{h_{2}(y)}{n}\Big{)},

G_{n}(x,1/g_{n})=\int\nolimits^{\infty}_{1/g_{n}}\Big{(}\Phi(x\sqrt{y})+\frac{f_{1}(x\sqrt{y})}{\sqrt{g_{n}y}}+\frac{f_{2}(x\sqrt{y})}{g_{n}y}\Big{)}d\Big{(}H(y)+\frac{h_{2}(y)}{n}\Big{)},

D_{n} = sup_{x} D_{n} (x) \leq D < \infty

D_{n} = sup_{x} D_{n} (x) \leq D < \infty

D_{n} (x) = \int_{1/ g_{n}}^{\infty} \frac{\partial}{\partial y} (Φ (x y) + \frac{f _{1} ( x y )}{y g _{n}} + \frac{f _{2} ( x y )}{y g _{n}}) d y,

D_{n} (x) = \int_{1/ g_{n}}^{\infty} \frac{\partial}{\partial y} (Φ (x y) + \frac{f _{1} ( x y )}{y g _{n}} + \frac{f _{2} ( x y )}{y g _{n}}) d y,

N_{n}^{*}=2\,[N_{n}/2]=\left\{\begin{array}[]{ll}N_{n}&\mbox{for even realizations of $N_{n}$,}\\ N_{n}-1&\mbox{for odd realizations of $N_{n}$.}\end{array}\right.

N_{n}^{*}=2\,[N_{n}/2]=\left\{\begin{array}[]{ll}N_{n}&\mbox{for even realizations of $N_{n}$,}\\ N_{n}-1&\mbox{for odd realizations of $N_{n}$.}\end{array}\right.

\frac{1}{2 n} \sum_{u = 1}^{\infty} \int_{(2 u - 1) / g_{n}}^{2 u / g_{n}} x y φ (x y) d H (y) \leq \frac{1}{2 2 π e n} .

\frac{1}{2 n} \sum_{u = 1}^{\infty} \int_{(2 u - 1) / g_{n}}^{2 u / g_{n}} x y φ (x y) d H (y) \leq \frac{1}{2 2 π e n} .

g_{n}^{- 1} \int_{1/ g_{n}}^{\infty} \frac{f _{2} ( x y )}{y} d H (y) \leq c g_{n}^{- b} \mbox i f b < 1.

g_{n}^{- 1} \int_{1/ g_{n}}^{\infty} \frac{f _{2} ( x y )}{y} d H (y) \leq c g_{n}^{- b} \mbox i f b < 1.

\displaystyle\mathbb{P}_{\theta}\Big{(}2p_{0}\sqrt{g_{n}N_{n}^{*}/N_{n}}\big{(}M_{N_{n}}\!-\theta)\leq x\Big{)}=\mathbb{P}_{\theta}\Big{(}2p_{0}\sqrt{N_{n}^{*}}(M_{N_{n}}\!-\theta)\leq x\sqrt{N_{n}/g_{n}}\Big{)}

\displaystyle\mathbb{P}_{\theta}\Big{(}2p_{0}\sqrt{g_{n}N_{n}^{*}/N_{n}}\big{(}M_{N_{n}}\!-\theta)\leq x\Big{)}=\mathbb{P}_{\theta}\Big{(}2p_{0}\sqrt{N_{n}^{*}}(M_{N_{n}}\!-\theta)\leq x\sqrt{N_{n}/g_{n}}\Big{)}

\begin{array}[]{l}{\displaystyle\sup\nolimits_{x}\sum\nolimits_{m=1}^{\infty}\left|\mathbb{P}_{\theta}\Big{(}2p_{0}\sqrt{m^{*}}(M_{m}-\theta)\leq x\sqrt{m/g_{n}}\Big{)}\,-\Phi_{m}(x\sqrt{m/g_{n}})\right|\,\mathbb{P}(N_{n}=m)}\\ {\displaystyle\qquad\qquad\qquad\stackrel{{\scriptstyle(\ref{eq21c})}}{{\leq}}C_{2}\,\sum\nolimits_{m=1}^{\infty}m^{-3/2}\,\mathbb{P}(N_{n}=m)=C_{2}\,\mathbb{E}(N_{n}^{-3/2}).}\end{array}

\begin{array}[]{l}{\displaystyle\sup\nolimits_{x}\sum\nolimits_{m=1}^{\infty}\left|\mathbb{P}_{\theta}\Big{(}2p_{0}\sqrt{m^{*}}(M_{m}-\theta)\leq x\sqrt{m/g_{n}}\Big{)}\,-\Phi_{m}(x\sqrt{m/g_{n}})\right|\,\mathbb{P}(N_{n}=m)}\\ {\displaystyle\qquad\qquad\qquad\stackrel{{\scriptstyle(\ref{eq21c})}}{{\leq}}C_{2}\,\sum\nolimits_{m=1}^{\infty}m^{-3/2}\,\mathbb{P}(N_{n}=m)=C_{2}\,\mathbb{E}(N_{n}^{-3/2}).}\end{array}

\sum_{m = 1}^{\infty} Φ_{m} (x m / g_{n}) P (N_{n} = m) = E_{θ} (Φ_{N_{n}} (x N_{n} / g_{n}))

\sum_{m = 1}^{\infty} Φ_{m} (x m / g_{n}) P (N_{n} = m) = E_{θ} (Φ_{N_{n}} (x N_{n} / g_{n}))

= \int_{1/ g_{n}}^{\infty} Δ_{n} (x, y) d P (N_{n} / g_{n} \leq y) = G_{n} (x, 1/ g_{n}) + I,

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Statistical Methods and Models · Bayesian Methods and Mixture Models · Statistical Distribution Estimation and Applications

Full text

Second Order Expansions for Sample Median with Random Sample Size

Gerd Christoph

,

Vladimir V. Ulyanov

and

Vladimir E. Bening

Otto-von-Guericke University Magdeburg, Department of Mathematics,

Postfach 4120,

39016 Magdeburg, Germany.

[email protected]

Lomonosov Moscow State University,

Faculty of Computational Mathematics and Cybernetics

119991, Leninskie Gory, 1/52, Moscow, Russia.

National Research University Higher School of Economics,

101000, Myasnitskaya ulitsa, 20, Moscow, Russia

[email protected]

Lomonosov Moscow State University,

Faculty of Computational Mathematics and Cybernetics

119991, Leninskie Gory, 1/52, Moscow, Russia

[email protected]

Abstract.

In practice, we often encounter situations, where a sample size is not defined in advance and can be a random value. The randomness of the sample size crucially changes the asymptotic properties of the underlying statistic. In the present paper the second order Chebyshev–Edgeworth and Cornish–Fisher expansions based of Student’s $t$ - and Laplace distributions and their quantiles are derived for sample median with random sample size of a special kind.

Key words and phrases:

Sample median; samples with random sizes; second order expansions; Laplace distribution; Student’s $t$ -distribution; negative binomial distribution; discrete Pareto distribution.

2000 Mathematics Subject Classification:

60F05, 60G50, 62E17, 62H10.

1. Introduction

Usually in classical statistical inference the number of observations is known. But often we do not know in advance the sample sizes or there are missing observations. Therefore the sample size may be a realization of a random variable.

There are many practical situations, where it is almost impossible to have a fixed sample size. They often occur when observations are collected in a fixed time span. For example, in reliability testing this is the number of failed devices, in medicine – the number of patients with a specific disease, in finance – the number of market transactions, in queueing theory – the number of customers entering a store, in insurance – the number of claims. All these numbers are random variables.

The use of samples with random sample sizes has been steadily growing over the years. For an overview of statistical inferences with a random number of observations and some applications see, e.g. Esquível (2016) and the references therein.

Let $X_{1},X_{2},\ldots\in\mathbb{R}=(-\infty\,,\,\infty)$ and $N_{1},N_{2},\ldots\in\mathbb{N}=\{1,2,...\}$ be the random variables on the same probability space $\left(\Omega,\mathbb{A},\mathbb{P}\right)$ . In statistics the random variables $X_{1},X_{2},\ldots$ are observations. Let $N_{n}$ be a random size of the underlying sample, which depends on parameter $n\in\mathbb{N}$ . We suppose for each $n\in\mathbb{N}$ that $N_{n}\in\mathbb{N}$ is independent of $X_{1},X_{2},\ldots$ and $N_{n}\to\infty$ in probability as $n\to\infty$ .

Let $T_{m}:=T_{m}\left(X_{1},\ldots,X_{m}\right)$ be some statistic of a sample with non-random sample size $m\in\mathbb{N}$ . Define the random variable $\overline{T}_{N_{n}}$ for every $n\in\mathbb{N}$ :

[TABLE]

i.e. $\overline{T}_{N_{n}}$ is some statistic obtained from a random sample $X_{1},X_{2},\ldots,X_{N_{n}}$ .

Gnedenko (1989) considered the asymptotic properties of the distributions of sample quantiles for samples of random size. In Nunes et al. (2019a) unknown sample sizes are assumed in medical research for analysis of one-way fixed effects ANOVA models to avoid false rejections. Application of orthogonal mixed models to situations with sample of random sizes are investigated in Nunes et al. (2019b). Esquível (2016) considered inference for the mean with known and unknown variance and inference for the variance in the normal model. Prediction intervals for the future observations for generalized order statistics and confidence intervals for quantiles based on samples of random sizes are studied in Barakat et al. (2018) and Al-Mutairi and Raqab (2020), respectively. They illustrated their results with real biometric data set, the duration of remission of leukemia patients treated by one drug. General asymptotic expansions for statistics with random sample sizes $\overline{T}_{N_{n}}$ are given in Bening et al. (2013) applying corresponding asymptotic expansions for the normalized statistic $T_{m}$ and the suitable scaled random sample size $N_{n}$ .

Many models lead to random sums and random means

[TABLE]

respectively. Wald’s identity for random sums $\mathbb{E}(S_{N_{n}})=\mathbb{E}(N_{n})\mathbb{E}(X_{1})$ if $N_{n}$ and $X_{1}$ have finite expectations is a powerful tool in statistical inference, particularly in sequential analysis, see e.g. Wald (1945) and Kolmogorov and Prokhorov (1949). Robbins (1948) proved that asymptotic normality of the index $N_{n}$ automatically implies asymptotic normality of the corresponding random sum $S_{N_{n}}$ .

The randomness of the sample size may crucially change asymptotic properties of random sums, see e.g. Gnedenko (1989) or Gnedenko and Korolev (1996). If the statistic $T_{m}$ is asymptotically normal, then the limit laws of normalized statistic $T_{N_{n}}$ are scale mixtures of normal distributions with zero mean, depending on the random sample size $N_{n}$ .

A fundamental introduction to asymptotic distributions of random sums is given in Döbler (2015). Using Stein’s method, quantitative Berry-Esseen bounds of random sums were proved in Chen et al. (2011, Theorem 10.6), Döbler (2015, Theorems 2.5 and 2.7) and Pike and Ren (2014, Theorem 1.3) in case of approximation by normal and Laplace distributions. Moderate and large derivations are investigated in Eichelsbacher and Löwe (2019), and Klüppelberg and Mikosch (1997). Many applications of geometric random sums when $N_{n}$ is geometrically distributed are given in Kalashnikov (1997). Bounds on the total variation distance between geometric random sum of independent, non-negative, integer-valued random variables and the geometric distribution are studied in Peköz et al. (2014, Section 3)

It is worth to mention that a suitable scaled factor by random sums $S_{N_{n}}$ or random means $T_{N_{n}}$ affects the type of limit distribution. In fact, consider random mean $T_{N_{n}}$ given in (1.2). For the sake of convenience let $X_{1},X_{2},...$ be independent standard normal random variables and $N_{n}\in\mathbb{N}$ be geometrically distributed with $\mathbb{E}(N_{n})=n$ and independent of $X_{1},X_{2},...$ . Then one has

[TABLE]

We have three different limit distributions. The suitable scaled random mean $T_{N_{n}}$ is standard normal distributed or tends to the Student distribution with 2 degrees of freedom as the limit distributions depending on whether we take the random scaling factor $\sqrt{N_{n}}$ or the non-random scaling factor $\sqrt{\mathbb{E}N_{n}}$ , respectively. Moreover, we get the Laplace distribution with variance 1 if we use scaling with the mixed factor $N_{n}/\sqrt{\mathbb{E}(N_{n})}$ .

Assertion (1.3) we obtain by conditioning and the stability of the normal law. Student distribution as a limit for statistics from samples with a random sample size are proved e.g. in Bening and Korolev (2005) and Schluter and Trede (2016), hence relationship (1.4) holds. Since $N_{n}\,T_{N_{n}}=S_{N_{n}}$ statement (1.5) follows e.g. from Bening and Korolev (2008) or Schluter and Trede (2016).

In Bening et al. (2013) first order expansions of the random mean $T_{N_{n}}=(X_{1}+...+X_{N_{n}})/N_{n}$ are proved if the sample size $N_{n}$ is negative binomial distributed with success probability $1/n$ or it is the maximum of $n$ independent identically distributed discrete Pareto random variables with tail index 1, using first order Chebyshev-Edgeworth expansions for mean $T_{m}=(X_{1}+...+X_{m})/m$ and the rate of convergence for the distribution of suitably normalized random sample size $N_{n}$ to the corresponding limit law. Second order asymptotic expansions of suitably normalized random sample size $N_{n}$ are proved in Christoph et al. (2020) which were used to derive second order Chebyshev-Edgeworth expansions for the random mean $T_{N_{n}}=(X_{1}+...+X_{N_{n}})/N_{n}$ .

In the present paper we investigate the median of a sample $\{X_{1},....,X_{N_{n}}\}$ with the random sizes $N_{n}$ mentioned above.

Let $F_{X}(x\,-\,\theta)$ and $p_{X}(x\,-\,\theta)$ be the known common distribution function and the probability density function of independent components of the sample $\{X_{1},X_{2},...,X_{m}\}$ , where $\theta$ is the unknown location parameter to be estimated from the given sample. By $X_{(1)}\leq X_{(2)}\leq...\leq X_{(m)}$ we denote the order statistics constructed from the original observations $X_{1},X_{2},...,X_{m}$ .

As statistic $T_{m}$ we consider the sample median $M_{m}$ , that is,

[TABLE]

Huang (1999) discussed the even-odd phenomenon for the median in statistical literature and gave a counterexample which contradicts the statistical folklore: “It never pays to base the median on an odd number of observations”.

Looking for change points in the location parameter in time series, tests for a change in mean may be susceptible to outliers in the data, whereas tests for a change in median could may show a change of the center of the marginal distribution, see Shao and Zang (2010), Vogel and Wendler (2017) and the references therein.

To perform statistical analysis of large data sets Minsker (2019) presents new results for the median-of-means estimator using new algorithms for distributed statistical estimation that exploit divide-and-conquer approach.

To estimate the location parameter $\theta$ one could use the random mean $T_{N_{n}}$ as well, but for its second order expansion more than the fourth moment of $X_{1}$ is required. For heavy tailed distributions $F$ of $X_{1}$ with tail index $\leq 4$ such second order Edgeworth expansions of the random mean $T_{m}$ cannot be obtained. If the tail index $\leq 1$ , then the mean does not exist: $\mathbb{E}|X_{1}|=\infty$ . The mean need not always exist, whereas the median always exists.

In Peña and Kim (2019) confidence region for median of $X_{1}$ in the nonparametric measurement error model are constructed and several applications are given when a confidence interval about the center of a distribution is desired.

Therefore, it is reasonable to use the sample median $M_{m}$ .

The asymptotic normality of the normalized sample median $M_{m}$ is well known, see e.g. Cramér (1946, Chapter 28.5): If $F_{X}(0)=1/2$ , $p_{X}(0)>0$ and the density $p_{X}(x)$ is continuous and has a continuous derivative $p^{\prime}_{X}(x)$ in some neighborhood of $x=0$ , then

[TABLE]

where $\Phi(x)$ is the standard Gaussian distribution function having density $\varphi(x)$ :

[TABLE]

Instead of moment conditions now regularity assumptions on the density $p_{X}(x)$ are required:

Assumption A: *The density $p_{X}(x)$ is symmetric around zero, i.e., $p_{X}(-x)=p_{X}(x),x\in\mathbb{R}$ and $p_{X}(0)>0$ . Moreover, the density $p_{X}(x)$ has three continuous bounded derivatives in some interval $(0,x_{0}),\,\,x_{0}>0$ .

Define $p_{0}=p_{X}(0)>0,\quad p_{1}=p^{\prime}_{X}(0+)\quad\mbox{and}\quad p_{2}=p^{\prime\prime}_{X}(0+).$

The regularity conditions in Assumption A are fulfilled, for example, for

$\bullet$ normal density (1.8),

$\bullet$ heavy tailed Student’s $t$ -distribution $S_{\nu}(x)$ with density function

[TABLE]

including Cauchy distribution in case $\nu=1$ , where the degree of freedom parameter $\nu>0$ determines the heaviness of the distribution tail,

$\bullet$ the triangular distribution with density

[TABLE]

$\bullet$ the continuous uniform distribution or rectangular distribution with density

[TABLE]

$\bullet$ and symmetric Laplace distribution $L_{\mu}(x)$ having density

[TABLE]

The corresponding coefficients $p_{0},p_{1}$ , and $p_{2}$ in these examples are:

[TABLE]

Under Assumption A Burnashev (1997, Theorem 1) proved in relation (1.7) an asymptotic expansion in terms of orders $m^{-1/2}$ and $m^{-1}$ with remainder $\mathcal{O}(m^{-3/2})$ as $m\to\infty$ . In the present paper we prove a similar second order expansion for the sample median $M_{N_{n}}$ constructed from a sample with random sample size $N_{n}$ . Therefore in Section 2 we clarify the result of Burnashev (1997) in the sense that we get non-asymptotic relations for any integer $m\geq 1$ estimating the closeness of the sample median $M_{m}$ and the corresponding second order expansion by inequalities. In Section 3 we give a transition proposition from non-random to random sample size and in Sections 4 and 5 the cases of Student $t$ - and Laplace distributions as limit laws for the random median $M_{N_{n}}$ are considered. In Section 6 the Cornish-Fisher expansions for the quantiles of sample medians $M_{N_{n}}$ and $M_{m}$ are derived from the corresponding Edgeworth-type expansions.

2. Non-Asymptotic Expansions for Sample Median

Let $[y]$ denote the integer part of value $y$ . Define

[TABLE]

Proposition 2.1.

Let Assumption A be satisfied, then for all $m\geq 2$ :

[TABLE]

where $C_{1}$ does not depend on $m$ ,

[TABLE]

Since $0<(m-1)^{-\alpha}-m^{-\alpha}\leq 2\,m^{-3/2}$ for $m\geq 2$ and $\alpha=1/2$ or $\alpha=1$ an immediate consequence of inequality (2.2) is

[TABLE]

where (2.4) for $m=1$ is trivial and $C_{2}$ does not dependent on $m$ .

Remarks: 1. If the parent distributions of the sample $\{X_{1},...,X_{m}\}$ have the normal density (1.8), Student’s $t$ -density (1.9) or continuous uniform density (1.11), then with respect to (1.13) the first term $f_{1}(x)/\sqrt{m^{*}}$ vanishes since in these cases $p_{1}=0$ . Therefore the convergence rate of the distribution of sample median $M_{m}$ to normality has order $m^{-1}$ . The triangular density (1.10) and the Laplacian density (1.12) have discontinuous derivatives at $x=0$ , nevertheless $p_{1}>0$ and the convergence rate to normality has the order $m^{-1/2}$ .

In Cramér (1946, Chapter 28.5) for asymptotic normality (1.7) it is required, that density $p_{X}(t)$ has a continuous derivative $p^{\prime}_{X}(x)$ in some neighborhood of $x=0$ .

2. As in Burnashev (1997) the natural normalizing factor in (2.2) is $m^{*}$ , i.e., $\sqrt{m-1}$ for odd $m\geq 3$ and $\sqrt{m}$ for even $m$ . He proved also for all $m\geq 2$

[TABLE]

Hence, for the sample median $M_{m}$ each odd observation adds an amount of information of order $m^{-3/2}$ and not $m^{-1}$ as usual if the normalizing factor by $(M_{m}-\theta)$ is $\sqrt{m}$ .

Proof of Proposition 2.1: Following the detailed proof of Burnashev (1997, Theorem 1) one has to change Stirling’s formula of the Gamma functions $\Gamma(z)$ and $1/\Gamma(z)$ as $z\to\infty$ by inequalities, proved in Nemes (2015, Theorem 1.3):

[TABLE]

with $\{|R_{3}(z)|,|\tilde{R}_{3}(z)|\leq cz^{-3}$ and $c=\frac{\displaystyle(1+\zeta(3))\Gamma(3)(2\sqrt{3}+1)}{\displaystyle 2\,(2\pi)^{4}}\leq 0.006$ .

Here $\zeta(z)$ is the Riemann zeta function with $\zeta(3)\approx 1.202...$

Finally, when ever Taylor’s formula is used with remainder in big $\mathcal{O}$ notation, then the remainder has to be estimated in Lagrange form by an inequality. The constants $C_{1},C_{2}>0$ in (2.2) and (2.4) depend only on $p_{0},p_{1},p_{2}$ and the upper bound of $p_{X}^{\prime\prime\prime}(x)$ in some interval $(0,x_{0}),\,\,x_{0}>0$ . $\Box$

3. Transfer Proposition from Non-Random to Random Sample Sizes

Suppose that distribution functions of the random sample size $N_{n}$ satisfy the following condition.

Assumption B: There exist a distribution function $H(y)$ with $H(0+)=0$ , a function of bounded variation $h_{2}(y)$ with $h_{2}(0)=h_{2}(\infty)=0$ , a sequence $0<g_{n}\uparrow\infty$ and real numbers $b>0$ and $C_{3}>0$ such that for all $n\in\mathbb{N}$

[TABLE]

Theorem 3.1.

Let both Assumptions A and B be satisfied. Then the following inequality holds for all $n\in\mathbb{N}$ :

[TABLE]

where $f_{1}(z),f_{2}(z),h_{2}(y)$ are given in (2.3) and (3.1) and

[TABLE]

The positive constants $C_{2},C_{3},C_{4},D$ do not depend on $n$ .

Remarks: 1. The scaling factor $\sqrt{g_{n}\,N_{n}^{*}/N_{n}}$ seems to be the natural one in case of the median of a sample with a random sample size $N_{n}$ since the distribution of $N_{n}/g_{n}$ has a known limit distribution and $N_{n}^{*}$ the same structure as $m^{*}$ in Burnashev (1997).

2. Without the quotient $N_{n}^{*}/N_{n}$ in the scaling factor $\sqrt{g_{n}\,N_{n}^{*}/N_{n}}$ an additional term in the expansion occurs:

[TABLE]

3. The lower bound of the integral in (3.3) depends on $g_{n}$ which can affect the coefficients at $1/\sqrt{g_{n}}$ and $1/g_{n}$ in the approximation. For example the proof of Theorem 4.2 in Section 4 shows that among other integrals

[TABLE]

Proof of Theorem 3.1: The proof follows along the similar arguments of the more general transfer theorem in Bening et al. (2013, Theorem 3.1) under conditions of our Theorem 3.1. Then conditioning on $N_{n}$ , we have

[TABLE]

Using now (2.4) with $\Phi_{m}(z):=\Phi(z)+m^{-1/2}f_{1}(z)+m^{-1}f_{2}(z)$ :

[TABLE]

Taking in account $\mathbb{P}\Big{(}N_{n}/g_{n}<1/g_{n}\Big{)}=\mathbb{P}\Big{(}N_{n}<1\Big{)}=0$ we obtain

[TABLE]

where $\Delta_{n}(x,y):=\Phi(x\sqrt{y})+f_{1}(x\sqrt{y})/\sqrt{g_{n}y}+f_{2}(x\sqrt{y})/(g_{n}y)$ , $G_{n}(x,1/g_{n})$ is defined in (3.3) and

[TABLE]

Estimating integral $I$ we use integration by parts for Lebesgue-Stieltjes integrals.

[TABLE]

First we calculate $(\partial/\partial\,y)\Delta_{n}(x,y)$ . Obviously $\frac{\displaystyle\partial}{\displaystyle\partial y}\Phi(x\,\sqrt{y})=\,\frac{\displaystyle x}{\displaystyle 2}\,y^{-1/2}\varphi(x\,\sqrt{y})$ and

[TABLE]

where $a_{0}=p_{1}/(4p_{0})$ , $a_{1}=1+p_{2}/(6p_{0}^{3})$ and $a_{2}=p_{1}^{2}/(8p_{0}^{4})$ , see (2.3).

The functions $f_{k}(z)$ and $q_{k}(z)$ , $k=1,2$ , are bounded, we suppose

[TABLE]

To estimate $D_{n}$ defined in (3.4) we consider $D_{n}(x)$ for $x\not=0$ since $D_{n}(0)=~{}0$ . Because $0\leq\int_{1/g_{n}}^{\infty}(\partial/\partial\,y)\Phi(x\sqrt{y})dy=1-\Phi(x/\sqrt{g_{n}})\leq 1/2$ for $x>0$ and $\int_{1/g_{n}}^{\infty}|(\partial/\partial\,y)\Phi(x\sqrt{y})|dy=\Phi(x/\sqrt{g_{n}})\leq 1/2$ for $x<0$ we find with (3.8) $D_{n}(x)\leq 1/2+c_{1}^{**}+c_{2}^{**}/8$ for $x\not=0$ . Therefore inequality (3.4) holds with $D=1/2+c_{1}^{**}+c_{2}^{**}/8$ . It follows now from (3.1) and (3.8) that

[TABLE]

and $C_{4}=(1+c_{1}^{*}+c_{2}^{*})\,C_{3}$ . Theorem 3.1 is proved. $\Box$

Theorem 3.2.

Under the conditions of Theorem 3.1 and the additional conditions to functions $H(.)$ and $h_{2}(.)$ , depending on the convergence rate $b>0$ in (3.1):

[TABLE]

we obtain for the function $G_{n}(x,1/g_{n})$ defined in (3.3):

[TABLE]

with

[TABLE]

and

[TABLE]

Remarks: If $b>1/2$ then (3.9ii) implies (3.9i). If $b>1$ then (3.9iii) implies (3.9ii) and (3.9i). Conditions (3.9) and (3.10) lead to the range of the integrals in (3.12) which ensures (3.11). The length of the asymptotic expansion is defined by (3.12).

Proof of Theorem 3.2: Using condition (3.9i) we find

[TABLE]

It follows from (3.8), (3.9ii) and (3.9iii) that for $k=1,2$

[TABLE]

Integration by parts, $|z|\varphi(z)/2\leq c^{*}=(8\,\pi\,e)^{-1/2}$ , (3.10i) and (3.10ii) lead to

[TABLE]

Taking into account (3.3), (3.12), (3.13) and (3.14) we obtain (3.11). $\Box$

In the next two sections we use both Theorems 3.1 and 3.2 when the scale mixture $G(x)=\int_{0}^{\infty}\Phi(x\,\sqrt{y})dH(y)$ as limiting distribution of $M_{N_{n}}$ can be expressed in terms of the well-known distributions. We obtain non-asymptotic results like in Proposition 2.1 for the sample median $M_{N_{n}}$ , using second order approximations for both the statistic $M_{m}$ and for the random sample size $N_{n}$ . In both cases the jumps of the distribution function of the random sample size $N_{n}$ only affect the function $h_{2}(y)$ in formula (3.1).

4. Student’s Distribution as Limit for Random Sample Median $M_{N_{n}}$

Let the sample size $N_{n}(r)$ be the negative binomial distributed (shifted by 1) with parameters $1/n$ and $r>0$ , having probability mass function

[TABLE]

with $g_{n}=\mathbb{E}(N_{n}(r))=r\,(n-1)+1$ . Schluter and Trede (2016, Section 2.1) pointed out that the negative binomial distribution is one of the two leading cases for count models, it accommodates the over-dispersion typically observed in count data (which the Poisson model cannot) and they showed in a general unifying framework

[TABLE]

where $G_{r,r}(x)$ is the Gamma distribution function with the shape parameter which coincides with the scale parameter and equals $r>0$ , having density

[TABLE]

The statement (4.2) was proved earlier in Bening and Korolev (2005, Lemma 2.2).

The convergence rate in (4.2) for $r>0$ is given in Bening et al. (2013, Formula (21)) or Gavrilenko et al. (2017, Formula (17)):

[TABLE]

In Schluter and Trede (2016) and Gavrilenko et al. (2017) the negative binomial random variable $\tilde{N}_{n}(r)$ is not shifted: $\tilde{N}_{n}(r)=N_{n}(r)-1\in\{0,1,2,...\}$ with $\mathbb{E}\tilde{N}_{n}(r)=r(n-1)$ . Then we have $\mathbb{P}(\tilde{N}_{n}(r)\leq 0)-G_{r,r}(0)=n^{-r}\to 0$ as $n\to\infty$ instead of $\mathbb{P}(N_{n}(r)\leq 0)-G_{r,r}(0)=0$ . Moreover

[TABLE]

The statements (4.2) and (4.4) still hold when $\tilde{N}_{n}(r)$ is shifted by a fixed integer. From Taylor expansion with Lagrange remainder term it follows that for $r>1$

[TABLE]

Hence, for $r>1$ shifting $\tilde{N}_{n}(r)$ has influence of a term by $g_{n}^{-1}$ . Second order asymptotic expansions for $N_{n}(r)$ where proved in Christoph et al. (2020, Theorem 1):

Proposition 4.1.

Let $r>0$ , discrete random variable $N_{n}(r)$ have probability mass function (4.1) and $g_{n}:=\mathbb{E}N_{n}(r)=r(n-1)+1$ . For $x>0$ and all $n\in\mathbb{N}$ there exists a real number $C_{3}(r)>0$ such that

[TABLE]

where

[TABLE]

Remark: The jumps of the sample size $N_{n}(r)$ have an effect only on the function $Q_{1}(.)$ in the term $h_{2;r}(x)$ . The function $Q_{1}(y)$ is periodic with period 1, it is right-continuous with jump height 1 at each integer point $y$ . The Fourier series expansion of $Q_{1}(y)$ at all non-integer points $y$ is

[TABLE]

see formula 5.4.2.9 in Prudnikov et al. (1992, p. 726) with $a=0$ .

In Theorem 3.1 an estimate for the negative moment $\mathbb{E}(N_{n})^{-3/2}$ of the random sample size $N_{n}$ is required. Proposition 4.1 is used in Bening (2020, Corollary 2) to obtain an asymptotic expansion of negative moments $\mathbb{E}(N_{n}(r))^{-p}$ for $r>1$ and $0<p\leq r-1$ . Such expansions are applied in the mentioned paper to to analyze asymptotic deficiencies and risk functions of estimates based on random-size samples. An improved result is given here, i ncluding the correct bounds for $0<p\leq r-1$ :

Corollary 4.2.

Let $r>0$ and $p>0$ . Then for all $n\geq 2$ the following expansions hold for negative moments:

[TABLE]

*where $|R_{k;n}^{*}|\leq c_{k}^{*}(p,r)\,g_{n}^{-\min\{r,2\}}$ for some constants $c_{k}^{*}(p,r)<\infty$ , $k=1,2,...,5$ .

Remark: The leading terms in (4.9) and the bound (4.5) lead to the estimate

[TABLE]

Proof of Corollary 4.2: Integrating by parts and substituting $y/g_{n}=x$ , we obtain

[TABLE]

where with (4.5) of Proposition 4.1

[TABLE]

Next we calculate the first part $I_{1}$ of the integral in (4):

[TABLE]

with $R_{2}(n)=G_{r,r}(1/g_{n})\leq r^{r}g_{n}^{-r}/\Gamma(r+1)$

[TABLE]

where for $p<r$

[TABLE]

In case $p\geq r$ we split the integral in $I_{1}(n,p)$ into three parts, the first one leads to the leading term in (4.12) for $p=r$ and $p>r$ , respectively:

[TABLE]

Then we obtain

[TABLE]

Now we calculate the second part $I_{2}$ of the integral in (4) in case of $r>1$ :

[TABLE]

First we show that the integral $I_{2,p}(n)$ has the order of the remainder:

[TABLE]

Let $0<p<r-1$ where $r>1$ . Then

[TABLE]

where since $g_{n}\leq r\,n$ for $r>1$

[TABLE]

and considering (4.8) and interchange integral and sum

[TABLE]

Applying formula 2.5.31.4 in Prudnikov et al. (1992, p. 446) with $\alpha=r-p-1,\,p=r$ and $b=2\pi kg_{n}$ then

[TABLE]

Hence

[TABLE]

with Riemann zeta function $\zeta(.)$ and $\frac{\displaystyle 1}{\displaystyle r-p-1}\leq\zeta(r-p)\leq\frac{\displaystyle r-p}{\displaystyle r-p-1}<\infty$ and

[TABLE]

In case $0<p=r-1$ the Fourier series expansion (4.8) of $Q_{1}(y)$ and integration by parts lead to

[TABLE]

and

[TABLE]

If $p>r-1$ using $|Q_{1}(y)\leq 1/2$ we find

[TABLE]

and (4.15) is proved.

It remains to calculate the first term on the right-hand side of (4.14), say $I_{3,p}(n)=I_{2}-I_{2,p}(n)$ . Since the integrals in $I_{1}$ and $I_{3,p}(n)$ have the same structure, one get with the above method

[TABLE]

where $|R_{k}(n)|\leq c_{k}(r,p)g_{n}^{-r}$ , with some constants $c_{k}(r,p)$ , $k=7,8,9$ .

Estimates (4), (4.12) and (4.16) lead to (4.9) and Corollary 4.2 is proved. $\Box$

If the statistic $T_{m}$ is asymptotically normal the limit distribution of the standardized statistic $T_{N_{n}(r)}$ with random size $N_{n}(r)$ is Student’s $t$ -distrib̃ution $S_{2r}(x)$ having density (1.9) with $\nu=2r$ , see Bening and Korolev (2005) or Schluter and Trede (2016).

Theorem 4.3.

Let $r>0$ . Consider the sample median $M_{N_{n}}$ with random sample size $N_{n}=N_{n}(r)$ having probability mass function (4.1) and $g_{n}=\mathbb{E}N_{n}(r)=r(n-1)+1$ . If inequalities (2.4) and (4.5) hold for the mean $M_{m}(X_{1},...,X_{m})$ and the random sample size $N_{n}(r)$ , respectively, then there exists a constant $C_{r}$ such that

[TABLE]

for all $n\in\mathbb{N}$ uniformly in $x\in\mathbb{R}$ , where $N_{n}^{*}$ is defined in (3.5),

[TABLE]

Remark: Under the condition (4.4) with $r>1/2$ a first order expansions of $\mathbb{P}_{\theta}\big{(}2p_{0}\sqrt{g_{n}}(M_{N_{n}}-\theta)\leq x\big{)}$ was announced in the conference paper Bening et al. (2016). Note that the convergence rate in Theorems 3.1 and 3.2 as well as in Corollaries 3.1 and 3.2 in case $1/2<r<1$ has to be $\mathcal{O}(n^{-r})$ instead of $\mathcal{O}(n^{-1})$ as announced. Moreover, in case $r=1$ the convergence order $\mathcal{O}(n^{-1})$ in (4.17) improves the rate $\mathcal{O}(\ln n\,n^{-1})$ given in Bening et al. (2016).

Proof of Theorem 4.3: We use Theorems 3.1 and 3.2 with $H(y)=G_{r,r}(y)$ , $h_{2}(y)=h_{2;r}(y)$ $g_{n}=r(n-1)+1$ and $b=\min\{r\,,\,2\}$ defined in Proposition 4.1 .

It follows from (4.10) with $p=3/2$ that

[TABLE]

The conditions (3.9) and (3.10) follow from (4.13) with $p=k/2<r$ , k = 0, 1, 2, respectively $p=-1,0$ , $r>1$ .

Now we estimate the integrals (3.13) and (3.14) to obtain a bound in inequality (3.11). Using (3.8) for $f_{1}(z)$ and $f_{2}(z)$ defined in (2.3) we find

[TABLE]

for $0<r<1/2$ . If $r=1/2$ then with $x^{2}/(1+x^{2})\leq 1$

[TABLE]

Consider the second term in (3.13). Let $r<1$ . Using now $c_{2}^{*}=\sup_{z}|f_{2}(z)|$ , then

[TABLE]

If $r=1$ we define the polynomial $P_{4}(z)$ by $f_{2}(z)=z\,P_{4}(z)\,\varphi(z)$ with $z=x\,\sqrt{y}$ and put $c_{4}^{*}=\sup_{z}\{|P_{4}(z)|\varphi(z/\sqrt{2})\}<\infty$ . Then $|f_{2}(z)|\leq c_{4}^{*}|z|\,\varphi(z/\sqrt{2})$ and using $|x|\,(1+x^{2}/4)^{-1/2}\leq 2$ we obtain

[TABLE]

and for $0<r\leq 1$ uniform in $x$

[TABLE]

It remains to estimate $I_{2}(x,n)$ in (3.14) for $r>1$ . Integration by parts for Lebesgue-Stieltjes integrals and (3.10i) lead to

[TABLE]

with

[TABLE]

where for $k=1,2$ functions $f_{k}(z)$ and $q_{k}(z)$ are bounded, see (3.8).

Moreover $g_{n}y^{2}\geq\sqrt{g_{n}}y^{3\,/2}$ for $y\geq 1/g_{n}$ and $g_{n}\leq n\,r$ for $r>1$ .

If $1<r<3/2$ with above defined $c_{3}^{*}$ we find

[TABLE]

If $r>3/2$ with $c_{4}^{*}=\frac{\displaystyle r^{r-1}}{\displaystyle 2\,\Gamma(r)}\,\sup_{y}\{(e^{-r\,y/2}\,(|y-1|\,|2-r|+1)\}<\infty$ we obtain

[TABLE]

For $r=3/2$ the above estimates of $|I_{2}^{*}(x,n)|$ lead to an exponential integral:

[TABLE]

In the latter case $|I_{2}^{*}(x,n)|\leq C\,g_{n}^{-3/2}$ may be obtained with an analogous procedure as for estimating the above integral $|I_{1}(x,n)|$ for $r=1$ in (4.20). This proof is omitted because the rate of convergence in Theorem 4.3, see (4.17), is determined by the negative moment (4.19), where the term $\ln n$ cannot be omitted.

To obtain (4.18) we calculate integrals in (3.12), which are similar to that in the proof of Theorem 2 in Christoph et al. (2020). Using formula 2.3.3.1 in Prudnikov et al. (1992, p. 322) with $\beta>-r$ and $p=r+x^{2}/2$ :

[TABLE]

we compute the first integral in (3.12) with $\beta=1/2$ in (4.25):

[TABLE]

Hence

[TABLE]

For $r>1/2$ we find with $f_{1}(x)$ defined in (2.3) and $\beta=1/2$ in (4.25)

[TABLE]

For $r>1$ we obtain with $f_{2}(x)$ from (2.3) and $\beta=-1/2,1/2,3/2$ in (4.25)

[TABLE]

The integral $\int_{0}^{\infty}\Phi(x\sqrt{y})dh_{2;r}(y)$ in (3.12) is the same as the integral $J_{4}(x)$ in the proof of Theorem 2 in Christoph et al. (2020) where is shown:

[TABLE]

With (4.26) and $|(rn)^{-1}-g_{n}^{-1}|\leq(1/r)n^{-2}$ the term by $1/g_{n}$ in (4.18) follows. $\Box$

5. Laplace Distribution as Limit for Random Sample Median $M_{N_{n}}$

Let $Y(s)\in\mathbb{N}$ be discrete Pareto II distributed with parameter $s>0$ , having probability mass and distribution functions

[TABLE]

which is a particular class of a general model of discrete Pareto distributions, obtained by discretization continuous Pareto II (Lomax) distributions on integers, see Buddana and Kozubowski (2014).

Now, let $Y_{1}(s),Y_{2}(s),...$ , be independent random variables with the same distribution (5.1). Define for $n\in\mathbb{N}$ and $s>0$ the random variable

[TABLE]

The distribution of $N_{n}(s)$ is extremely spread out on the positive integers.

In Christoph et al. (2020) the following Edgeworth expansion was proved:

Proposition 5.1.

Let the discrete random variable $N_{n}(s)$ have distribution function (5.2). For $y>0$ , fixed $s>0$ and all $n\in\mathbb{N}$ then there exists a real number $C_{3}(s)>0$ such that

[TABLE]

where $Q_{1}(y)$ is defined in (4.7).

Remarks: 1. Lyamin (2010) proved a first order bound in (5.3) for integer $s\geq 1$

[TABLE]

In case $n=1$ and $s=1$ we have $\mathbb{P}\left(N_{1}(1)\leq x\right)=0$ for $0<x<1$ and

[TABLE]

2. The continuous function $H_{s}(y)=e^{-s/y}{\rm\bf{1}}_{(0\,,\,\infty)}(y)$ with parameter $s>0$ is the distribution function of the inverse exponential random variable $W(s)=1/V(s)$ , where $V(s)$ is exponentially distributed with rate parameter $s>0$ . Both $H_{s}(y)$ and $\mathbb{P}(N_{n}(s)\leq y)$ are heavy tailed with shape parameter 1.

Therefore $\mathbb{E}\big{(}N_{n}(s)\big{)}=\infty$ for all $n\in\mathbb{N}$ and $\mathbb{E}\big{(}W(s)\big{)}=\infty$ . Moreover:

$\bullet$ First absolute pseudo moment $\nu_{1}=\int_{0}^{\infty}x\big{|}d\big{(}\mathbb{P}\big{(}N_{n}(s)\leq n\,x\big{)}-e^{-s/x}\big{)}\big{|}=\infty$ ,

$\bullet$ Absolute difference moment $\chi_{u}=\int_{0}^{\infty}x^{u-1}\big{|}\mathbb{P}\big{(}N_{n}(s)\leq n\,x\big{)}-e^{-s/x}\big{|}dx<\infty$

for $1\leq u<2$ . These statements are proved in Christoph et al. (2020, Lemma 2). On pseudo moments and some of their generalizations see e.g. Christoph and Wolf (1992, Chapter 2).

Next we estimate the negative moment $\mathbb{E}(N_{n}(s))^{-p}$ , $p>0$ , for the random sample size $N_{n}(s)$ :

Corollary 5.2.

Let $r>0$ and $p>0$ . Then for all $n\geq 2$ the following expansions hold for negative moments:

[TABLE]

*where $|R_{k;n}^{*}|\leq c_{k}^{*}(p)\,n^{-2}$ for some constants $c_{k}^{*}(p)<\infty$ , $k=1,2,3$ .

Remarks: 1. The leading terms in (5.6) and the bound (5.3) lead to the estimate

[TABLE]

where for $0<p\leq 2$ the order of the bound is optimal.

2. In Bening (2020, Corollary 3) the expansion (5.6) for $0<p<1$ is given with with an additional term at $n^{-p-1}$ .

Proof of Corollary 5.2: As in the beginning of the proof of Corollary 4.2 we obtain

[TABLE]

where

[TABLE]

considering (4.8)

[TABLE]

and with (5.3) of Proposition 5.1

[TABLE]

Since for $\alpha>0$ $0<\beta\leq 2$ and $s>0$

[TABLE]

we find with $c_{2}=p\,C(p,2-p,s)$ and $c_{3}=sp|s-1|\,C(p,2-p,s)/2$

[TABLE]

Both remainders decrease exponentially with order $n\,e^{-s\,n}$ respectively $n^{3}\,e^{-s\,n}$ .

It remains to estimate $I_{3}$ . Partial integration leads to

[TABLE]

Considering (5.8) and $\sum_{k=1}^{\infty}k^{-2}=\pi^{2}/6$ we obtain $|I_{3}|\leq c(s,p)n^{-2}$ . $\Box$

For an asymptotically normally distributed statistic $T_{m}$ the limit distribution of the standardized $T_{N_{n}(s)}$ is Laplace distribution $L_{1/\sqrt{s}}(x)$ having density (1.12) with $\mu=1/\sqrt{s}$ , therefore $l_{1/\sqrt{s}}(x)=\sqrt{s/2}\,e^{-\sqrt{2\,s}\,|x|}$ . See Bening and Korolev (2008) or Schluter and Trede (2016).

Theorem 5.3.

Let $s>0$ . Consider the statistic $M_{N_{n}}$ with random sample size $N_{n}=N_{n}(s)$ having distribution function (5.2). If for the statistic $M_{m}(X_{1},...X_{m})$ inequality (2.4) holds and $g_{n}=n$ , then there exists a constant $C_{s}$ such that $\forall n\in\mathbb{N}$

[TABLE]

where $N_{n}^{*}$ is defined in (3.5) and

[TABLE]

Remark: Under the condition (5.5) a first order expansions was announced in the conference paper Bening et al. (2016, Theorem 4.1).

Proof of Theorem 5.3: We use Theorems 3.1 and 3.2 with $H(y)=H_{s}(y)$ and $h_{2}(y)=h_{2;s}(y)$ defined in (5.4), $b=2$ and $g_{n}=n$ .

Considering (5.8) the functions $H_{s}(1/n)$ , $h_{2;s}(1/n)$ and the corresponding integrals decrease even exponentially with order $n\,e^{-s\,n}$ or $n^{2}\,e^{-s\,n}$ , $s>0$ . Moreover, $h_{2;s}(0)=\lim_{y\downarrow 0}h_{2;s}(y)=0$ . Hence conditions (3.9) and (3.10) are fulfilled.

It remains to estimate $I_{2}(x,n)$ given in (3.14). Changing only $h_{2;r}(y)$ by $h_{2;s}(y)$ in the estimations (4.23) and (4.24) of the corresponding $I_{2}(x,n)$ in the proof of Theorem 4.3, using partial integration, the relations (5.4), (3.8) and $ny^{2}\geq\sqrt{n}y^{3/2}$ for $y\geq 1/n$ , then we obtain

[TABLE]

To obtain (5.10) we calculate integrals in (3.12) for $b=3/2$ as in the proof of Theorem 5 in Christoph et al. (2020). Here we use formula 2.3.16.3 in Prudnikov et al. (1992, p. 344) with $p=x^{2}/2>0$ , $s>0$ , $m=0,1,2$ :

[TABLE]

where

[TABLE]

In the mentioned proof we obtained with (5.12) for $m=1$

[TABLE]

and with (5.12) for $m=2$

[TABLE]

Moreover, using (5.12) for $m=1$ , we find

[TABLE]

and, finally, with (5.12) for $m=0,1,2$ , we calculate

[TABLE]

Together with $\mathbb{E}\left(N_{n}(s)\right)^{-3/2}\leq C(s)n^{-3/2}$ for all $s\geq s_{0}>0$ , proved in Christoph et al. (2020, Lemma 3) we proved (5.9). $\Box$

6. Cornish-Fisher Expansions for Quantiles of $M_{m}$ and $M_{N_{n}}$

In statistical inference it is of fundamental importance to obtain the quantiles of the distribution of statistics under consideration. Statistical applications and modeling with quantile functions are discussed extensively by Gilchrist (2000). There are very few quantile functions which can be expressed in closed form. The Cornish-Fisher expansions provide tools to approximate the quantiles of probability distributions.

Let $F_{n}(x)$ be a distribution function admitting a Chebyshev-Edgeworth expansion in powers of $g_{n}^{-1/2}$ with $0<g_{n}\uparrow\infty$ as $n\to\infty$ :

[TABLE]

where $g(x)$ is the density of a three times differentiable limit distribution $G(x)$ .

Proposition 6.1.

Let $F_{n}(x)$ be given by (6.1) and let $x(u)$ and $u$ be quantiles of distributions $F_{n}$ and $G$ with the same order $\alpha$ , i.e. $F_{n}(x(u))=G(u)=\alpha$ . Then the following relation holds for $n\to\infty$ :

[TABLE]

with

[TABLE]

Proposition 6.1 is a direct consequence of more general statements, see e.g. Ulyanov (2011, p. 311-315), Fujikoshi et al. (2010, Chapter 5.6.1) or Ulyanov et al. (2016) and the references therein.

First we consider random median $M_{N_{n}}$ if sample size $N_{n}=N_{n}(r)$ is negative binomial distributed with probability mass function (4.1) and Student’s $t$ -distribution $S_{2r}(x)$ is the limit law. The second order expansion (4.17) in Theorem 4.3 admits a relation like (6.1) with $g_{n}=r(n-1)+1$ and $a_{k}(x)=A_{k;r}(x)$ , $k=1,2$ . The transfer Proposition 6.1 implies the following statement:

Corollary 6.2.

Suppose $r>0$ . Let $x=x_{\alpha}$ and $u=u_{\alpha}$ be $\alpha$ -quantiles of standardized statistic $\mathbb{P}\Big{(}2p_{0}\sqrt{g_{n}}(M_{N_{n}(r)}-\theta)\leq x\Big{)}$ and of the limit Student’s $t$ -distribution $S_{2r}(u)$ , respectively. Then with previous definitions the following Cornish-Fischer expansion holds as $n\to\infty$ :

[TABLE]

where $B_{2}(u)=\,\frac{\displaystyle p_{1}^{2}\,u^{3}}{\displaystyle 8\,p_{0}^{4}}-\frac{\displaystyle(5-r)\,u^{3}\,+\,(5r+2)u)}{\displaystyle 4(2\,r\,-\,1)}-\frac{\displaystyle u^{3}}{\displaystyle 4}\Big{(}1+\frac{\displaystyle p_{2}}{\displaystyle 6\,p_{0}^{3}}\Big{)}.$

Next we study the approximation of quantiles for the random mean $M_{N_{n}}$ if sample size $N_{n}=N_{n}(s)$ is based on discrete Pareto distributions with probability mass function (5.2) and Laplace distribution $L_{1/\sqrt{s}}(x)$ is the limit law. Relation (5.9) in Theorem 5.3 admits a expansion like (6.1) with $g_{n}=n$ and $a_{k}(x)=A_{k;s}(x)$ , $k=1,2$ . The transfer Proposition 6.1 leads now to:

Corollary 6.3.

Suppose $s>0$ . Let $x=x_{\alpha}$ and $u=u_{\alpha}$ be $\alpha$ -quantiles of standardized statistic $\mathbb{P}\Big{(}2p_{0}\sqrt{n}(M_{N_{n}(s)}-\theta)\leq x\Big{)}$ and of the limit Laplace distribution $L_{1/\sqrt{s}}(u)$ , respectively. Then with previous definitions the following Cornish-Fisher expansion holds

[TABLE]

where $B_{2}(u)=\frac{\displaystyle p_{1}^{2}\,u^{3}}{\displaystyle 8\,p_{0}^{4}}\,+\,\frac{\displaystyle(4-s)\,u\,(1+\sqrt{2s}|u|)}{\displaystyle 8\,s}-\frac{\displaystyle u^{3}}{\displaystyle 4}\Big{(}1+\frac{\displaystyle p_{2}}{\displaystyle 6p_{0}^{3}}\Big{)}.$

For the sake of completeness let us consider the Cornish-Fischer expansion for the median $M_{m}$ , too. Using (2.4) with $a_{k}(x)=f_{k}(x)$ , $k=1,2$ , defined in (2.3).

Corollary 6.4.

Let $x=x_{\alpha}$ and $u=u_{\alpha}$ be $\alpha$ -quantiles of the standardized statistic $\mathbb{P}_{\theta}\Big{(}2p_{0}\sqrt{2[m/2]}(M_{m}-\theta)\leq x\Big{)}$ and of the limit normal distribution $\Phi(u)$ , respectively. Then with previous definitions the classical Cornish-Fischer expansion holds as $m\to\infty$ :

[TABLE]

7. Acknowledgement

Proposition 2.1, Theorems 3.1 and 4.3 and Corollary 4.2 have been obtained under support of the RSF Grant No. 18-11-00132. The paper was prepared within the framework of the Moscow Center for Fundamental and Applied Mathematics, Moscow State University and HSE University Basic Research Programs.

Bibliography41

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Al-Mutairi and Raqab (2020) Al-Mutairi, J.S. and Raqab, M.Z. Confidence intervals for quantiles based on samples of random sizes. Statist. Papers . 61 (1), 261-277 (2020). MR 4056802
2Barakat et al. (2018) Barakat, H.M., Nigm, E.M., El-Adll, M.E. and Yusuf, M. Prediction of future generalized order statistics based on exponential distribution with random sample size. Statist. Papers . 59 (2), 605-631 (2018). MR 3800816.
3Bening (2020) Bening, V.E. On risks of estimates based on random-size samples. Moscow University Computational Mathematics and Cybernetics . 44 (1), 16-26 (2020)
4Bening and Korolev (2005) Bening, V.E. and Korolev, V.Yu. On the use of Student’s distribution in problems of probability theory and mathematical statistics. Theory Probab. Appl. 49 (3), 377-391 (2005). MR 2144862.
5Bening and Korolev (2008) Bening, V.E. and Korolev, V.Yu. Some statistical problems related to the Laplace distribution (Russian). Informatics and its Applications , IPI RAN. 2 (2), 19-34 (2008).
6Bening et al. (2013) Bening, V.E., Galieva N.K. and Korolev V.Yu. Asymptotic expansions for the distribution functions of statistics constructed from samples with random sizes (Russian). Informatics and its Applications . IPI RAN. 7 (2), 75-83 (2013).
7Bening et al. (2016) Bening, V.E., Korolev, V.Yu. and Zeifman, A.I. Asymptotic expansions for the distribution function of the sample median constructed from a sample with random size. In Proceedings 30th ECMS 2016 Regensburg , edited by Claus, T. et al. 669-675(2016). doi:10.7148/2016-0669.
8Buddana and Kozubowski (2014) Buddana, A. and Kozubowski, T.J. Discrete Pareto distributions. Econ. Qual. Control . 29 (2), 143-156 (2014).

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Second Order Expansions for Sample Median with Random Sample Size

Abstract.

Key words and phrases:

2000 Mathematics Subject Classification:

1. Introduction

2. Non-Asymptotic Expansions for Sample Median

Proposition 2.1**.**

3. Transfer Proposition from Non-Random to Random Sample Sizes

Theorem 3.1**.**

Theorem 3.2**.**

4. Student’s Distribution as Limit for Random Sample Median MNnM_{N_{n}}MNn​​

Proposition 4.1**.**

Corollary 4.2**.**

Theorem 4.3**.**

5. Laplace Distribution as Limit for Random Sample Median MNnM_{N_{n}}MNn​​

Proposition 5.1**.**

Corollary 5.2**.**

Theorem 5.3**.**

6. Cornish-Fisher Expansions for Quantiles of MmM_{m}Mm​ and MNnM_{N_{n}}MNn​​

Proposition 6.1**.**

Corollary 6.2**.**

Corollary 6.3**.**

Corollary 6.4**.**

7. Acknowledgement

Proposition 2.1.

Theorem 3.1.

Theorem 3.2.

4. Student’s Distribution as Limit for Random Sample Median $M_{N_{n}}$

Proposition 4.1.

Corollary 4.2.

Theorem 4.3.

5. Laplace Distribution as Limit for Random Sample Median $M_{N_{n}}$

Proposition 5.1.

Corollary 5.2.

Theorem 5.3.

6. Cornish-Fisher Expansions for Quantiles of $M_{m}$ and $M_{N_{n}}$

Proposition 6.1.

Corollary 6.2.

Corollary 6.3.

Corollary 6.4.