Strong Asymptotic Properties of Kernel Smoothing Estimation for NA Random Variables with Right Censoring
Jianhua Shi, Jiansen Xu, Jinfeng Xu

TL;DR
This paper extends kernel smoothing estimation methods to negatively associated random variables with right censoring, establishing their strong asymptotic properties to support practical applications in incomplete-data scenarios.
Contribution
It introduces the first strong asymptotic analysis of kernel estimators for NA variables under right censoring, relaxing previous ideal assumptions.
Findings
Established strong asymptotic properties of kernel density estimators
Validated the use of Kaplan-Meier based estimators in NA contexts
Provided theoretical justification for practical kernel smoothing in censored data
Abstract
Most studies for negatively associated (NA) random variables consider the complete-data situation, which is actually a relatively ideal condition in practice. The paper relaxes this condition to the incomplete-data setting and considers kernel smoothing density and hazard function estimation in the presence of right censoring based on the Kaplan-Meier estimator. We establish the strong asymptotic properties for these two estimators to assess their asymptotic behavior and justify their practical use.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Statistical Distribution Estimation and Applications
Strong Asymptotic Properties of Kernel Smoothing Estimation for NA Random Variables with Right Censoring
Jian-hua Shia,b,c,d, Jian-sen Xua, Jin-feng Xua∗
aSchool of Mathematics and Statistics, Minnan Normal University, Zhangzhou, 363000, China
b Fujian Key Laboratory of Granular Computing and Applications, Zhangzhou, 363000, China
c The Institute of Meteorological Big Data-Digital Fujian, Zhangzhou, 363000, China
d Fujian Key Laboratory of Data Science and Statistics, Zhangzhou, 363000, China
Abstract
Most studies for negatively associated (NA) random variables consider the complete-data situation, which is actually a relatively ideal condition in practice. The paper relaxes this condition to the incomplete-data setting and considers kernel smoothing density and hazard function estimation in the presence of right censoring based on the Kaplan-Meier estimator. We establish the strong asymptotic properties for these two estimators to assess their asymptotic behavior and justify their practical use.
Keywords: Kaplan-Meier estimator; kernel smoothing estimator; NA random variable; right-censoring.
2000 Mathematics Subject Classification. 62G05
@normalsize††footnotetext: Corresponding author: Jin-feng Xu, Minnan Normal University, Zhangzhou, Fujian, China 363000. E-mail: [email protected].
1 Introduction
Negatively associated (NA) random sequence is a sequence of dependent random variables, which was first introduced by Alam and Saxena in 1981 and then delicately studied by Joag-Dev and Proschan in 1983. The definition is given as follows.
Definition (Joag-Dev and Proschan, 1983) Random sequences {} are said to be negatively associated (NA) if for every pair of disjoint subsets and from ,
[TABLE]
where there exists the covariance for and with increasing for every variable (or decreasing for every variable). A sequence of random variables is said to be NA if every finite subfamily is NA.
Clearly, independent random variable sequences are NA, and many other non-independent random sequences, for example the random sampling without replacement in a finite population, can also be included in NA category. Many researchers have studied the properties of NA random variables and obtained many important results. For example, Su et al. (1997) established a probability inequality and some moment inequalities for the partial sum of a NA sequence, which can be used to prove some properties for strictly stationary NA sequences such as weak invariance principle. Shao (2000) proved that most of the well-known inequalities, such as the Kolmogorov exponential inequality and the Rosenthal maximal inequality, still hold for NA random variables. Wu and Chen (2013) presented two strong representation results of the Kaplan-Meier estimator for NA data with censoring, which are most relevant to the main results in our paper. Zhou and Lin (2015) considered a nonparametric regression model with repeated NA error structures where the wavelet method is used to estimate the regression function. Thuan and Quang (2016) studied some properties for the NA random variables and obtained some inequalities including maximal inequality and Hjek- Rnyi’s type inequality. Tang et al. (2018) studied the asymptotic normality of the wavelet estimator in nonparametric regression, where the random errors are asymptotically NA random variables. Meng (2018) established two general strong laws of large numbers which also involve NA random variables.
Most studies for NA random variable are under a complete-data setting, however, which is actually a relatively ideal condition in practice. In survival analysis, right censoring is often encountered. For detailed discussion of the censoring and its practical relevance, please see Gijbels and Wang (1993), Zhou and Yip(1999), Chen et al. (2015), Qiu et al. (2015), Shi et al. (2018), Ma et al. (2019), Zhang and Zhou (2018) among many others for reference.
Let denote a sequence of random vectors where is the true survival time of interest, which is right censored by the random variable . It is assumed that is independent of , but the i.i.d. assumption is not made for ’s and ’s, which are both NA in our paper. The observations consist of , where
and
and is the indicator function of the random event For simplicity, assume that have a common continuous marginal distribution function and let its survival distribution . The random censoring times , being independent of the random variables ’s , are assumed to have a common distribution function with its survival distribution . Meanwhile, let be the distribution of the observed variable ’s, and we write its survival distribution as For any distribution function , we define the left and right endpoints of its support as and by throughout the paper.
The distribution function can be consistently estimated by the empirical distribution function , which is defined as follows.
[TABLE]
where denotes the number of uncensored and censored observations larger than time , and .
For drawing nonparametric inference about unknown based on the censored observations we introduce a stochastic process on as follows
[TABLE]
which counts the number of uncensored observations no larger than time . One nonparametric maximum likelihood estimation of is the well-known Kaplan-Meier (K-M) estimator (Kaplan and Meier, 1958), which is commonly used to estimate for the incomplete data , i.e.
[TABLE]
where the jump
Define the sub-distribution function Since , using integration by parts, we have
[TABLE]
and then
[TABLE]
It is further assumed that has a density function . The estimation for the hazard function is also of substantial interest in survival analysis, which is defined as
[TABLE]
Its correspond cumulative hazard function is defined as
[TABLE]
The representation (1.1) of in terms of and suggests the empirical estimator for by
[TABLE]
where and
[TABLE]
denotes the empirical estimator of .
Note that where are the order statistics of , and is the concomitant of . It can be verified that the estimators and can be respectively represented as
[TABLE]
and
[TABLE]
In the case of right censoring, the K-M estimator and the estimator have been generally accepted as a substitute for the usual empirical estimators of distribution function and the cumulative hazard function , respectively, which help to study other estimators such as the kernel density estimator and the kernel hazard estimator in the following.
A kernel smoothed estimator for based on can be constructed as
[TABLE]
where is a smooth probability kernel function and is a sequence of bandwidth tending to zero at appropriate rates.
Similarly, we can also construct a kernel smoothed estimator for the hazard function under the NA sampling random variables, which is defined by
[TABLE]
The estimators and have attracted the attention of many investigators. For example, Mielniczuk (1986) investigated kernel estimator of a density function using the K-M estimator for censored data. When the data was sampled from -mixing and censoring, Cai (1998) explored the uniform consistency (with rates) and the asymptotic normality of the kernel estimators for density and hazard function. Zhou (1999) successfully established several asymptotic uniformly strong and weak representations for kernel estimators of the density function and the hazard function under left truncation. Antoniadis et al. (1999) proposed a wavelet method for estimating density and hazard rate functions from randomly right-censored data. Some other results, one may refer to Diehl and Stute (1988), Gijbels and Wang (1993), Arcones and Gin (1995), Zhou and Yip (1999), Lemdani and Ould-Saïd (2007), Shen and He (2008) among others.
To present our main results, define
[TABLE]
The main purpose of this paper is to study the asymptotic properties of kernel smoothing density estimator and hazard estimator based on censoring NA random variables. Under certain regularity conditions, we establish the strong asymptotic properties for the two estimators with the convergent rates , where will be defined in Section 2. Throughout the paper, the sequences of variables and are all non-negative unless otherwise specified.
2 Main results and their proofs
We first present two lemmas (Wu and Chen, 2013) that will help to prove our theorems.
Lemma 1
Let and be two sequences of NA random variables. Suppose that the sequences and are independent. Then, for any
[TABLE]
and
[TABLE]
For positive real numbers and , write
[TABLE]
Lemma 2
Let and be two sequences of NA random variables. Suppose that the sequences and are independent. Then, for any
[TABLE]
and
[TABLE]
where .
Remark 1
Note that by the definition of ,
[TABLE]
Therefore, we can obtain by Lemma 2 that
[TABLE]
One can establish the following lemma by noting that and are both sequences of NA random variables according to Joag-Dev and Proschan (1983).
Lemma 3
Under the conditions of Lemma 1, for any there are
[TABLE]
and
[TABLE]
Theorem 1
Under the conditions of Lemma 1, assume that the kernel density has bounded variation on some finite interval with for , where . Suppose that density distribution and are bounded on the closed interval for some Then there is
[TABLE]
where the sequence satisfies .
Theorem 2
Under the conditions of Lemma 1, assume that the kernel density has bounded variation on some finite interval with for , where Suppose that density distribution and are bounded on the closed interval for some Then there is
[TABLE]
where the sequence satisfies .
Remark 2
Theorem 1 and Theorem 2 are key results in studying censored NA sequences, which can be useful in deriving some asymptotic properties for the kernel density estimator and the hazard function estimator , respectively. For example, if one can establish the results similar to those in Hall (1981) for NA sequences, then by using Theorem 1, the following proposition will hold.
Proposition 1 Suppose that the sequence satisfies for , and
(a)
(b) in such a way that
[TABLE]
then there will be
[TABLE]
where is some functional for and .
For simplicity and without loss of generality, it can be assumed that in the following proof.
Proof of Theorem 1 According to Remark 1, can be expressed as
[TABLE]
Considering , we have
[TABLE]
Thus, we have the following formula
[TABLE]
Using the partial integration for , we get
[TABLE]
Note that the kernel function is zero outside the interval and the fact that is monotone. Then, when is large enough, by (2.5) and the definition of , applying the change of variable formula, we have
[TABLE]
and
[TABLE]
Again, note that
[TABLE]
Integrating by parts for , we have for ,
[TABLE]
Since density function and are bounded in the closed interval , which means that is also bounded in the interval , and hence
[TABLE]
where is some positive constant number.
Thus, combining equations (2.10) - (2.13), we have
[TABLE]
On the other hand, similar to the discussion of ,
[TABLE]
where
[TABLE]
and
[TABLE]
It can be obtained by (2.5) that
[TABLE]
As for term , note that is of bounded variation, it follows from Lemma 2 that
[TABLE]
This completes the proof by combining (2.9) and (2.14)- (2.16).
Proof of Theorem 2 Note by the strong asymptotic expression from (2.4),
[TABLE]
and similarly for the term , we have
[TABLE]
Then following the proofs of Theorem 1, we can also obtain Theorem 2. This completes our proof.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Antoniadis, A., Gr e ´ ´ 𝑒 \acute{e} goire, G., Nason, G. P. (1999) Density and hazard rate estimation for right-censored data by using wavelet methods. Journal of the Royal Statistical Society, Series B , 61(1), 63-84.
- 2[2] Alam K., K. M. Lai Saxena (1981) Positive dependence in multivariate distributions, Communications in Statistics-Theory and Methods , 10:12, 1183-1196.
- 3[3] Arcones, M. A., Gin e ´ ´ 𝑒 \acute{e} , E. (1995) On the law of the iterated logarithm for canonical u-statistics and processes. Stochastic Processes and Their Applications , 58(2), 217-245.
- 4[4] Cai, Z. (1998) Asymptotic properties of Kaplan-Meier estimator for censored dependent data. Statistics and Probability Letters , 37(4), 381-389.
- 5[5] Chen, X., Shi, J. and Zhou, Y.(2015). Monotone rank estimation of transformation models with length-biased and right-censored data. SCIENCE CHINA Mathematics(English series) , 58(10), 2055-2068.
- 6[6] Diehl, S., Stute, W. (1988) Kernel density and hazard function estimation in the presence of censoring. Journal of Multivariate Analysis , 25(2), 299-310.
- 7[7] Gijbels, I., Wang, J. (1993) Strong representations of the survival function estimator for truncated and censored-data with applications. Journal of Multivariate Analysis , 47(2), 210-229.
- 8[8] Joag-Dev, K., Proschan, F. (1983) Negative association of random variables with applications. The Annals of Statistics , 11(1), 286-295.
