Maximum Approximate Bernstein Likelihood Estimation in Proportional   Hazard Model for Interval-Censored Data

Zhong Guan

arXiv:1906.08882·stat.ME·December 25, 2020

Maximum Approximate Bernstein Likelihood Estimation in Proportional Hazard Model for Interval-Censored Data

Zhong Guan

PDF

Open Access

TL;DR

This paper introduces a new Bernstein likelihood estimation method for proportional hazard models with interval-censored data, providing smoother survival estimates and improved regression coefficient accuracy.

Contribution

It proposes a maximum approximate Bernstein likelihood approach that enhances estimation of baseline density and regression coefficients in interval-censored survival analysis.

Findings

01

Faster convergence rate for survival function estimates

02

Better finite sample performance than existing methods

03

Effective real data application demonstration

Abstract

Maximum approximate Bernstein likelihood estimates of the baseline density function and the regression coefficients in the proportional hazard regression models based on interval-censored event time data are proposed. This results in not only a smooth estimate of the survival function which enjoys faster convergence rate but also improved estimates of the regression coefficients. Simulation shows that the finite sample performance of the proposed method is better than the existing ones. The proposed method is illustrated by real data applications.

Figures4

Click any figure to enlarge with its caption.

Tables1

Table 1. Table 1: Mean squared errors of estimates of the regression coefficients using semiparametric method (SP), the proposed method using m = m ~ 𝑚 ~ 𝑚 m=\tilde{m} (B1), the proposed method using m = m ^ 𝑚 ^ 𝑚 m=\hat{m} (B2), and the parametric method (P).

	$γ_{1}$			$γ_{2}$
Method	$n = 30$	$n = 50$	$n = 100$	$n = 30$	$n = 50$	$n = 100$
SP	0.2799	0.1202	0.0467	0.1038	0.0478	0.0184
B1	0.2392	0.1095	0.0469	0.0883	0.0443	0.0175
B2	0.2380	0.1090	0.0461	0.0868	0.0439	0.0174
P	0.2184	0.0973	0.0437	0.0756	0.0389	0.0163

Equations301

Λ (t ∣ x) = - lo g S (t ∣ x), O (y ∣ x) = \frac{S ( y ∣ x )}{1 - S ( y ∣ x )}, λ (t ∣ x) = \frac{d}{d t} Λ (t ∣ x) = \frac{f ( t ∣ x )}{S ( t ∣ x )} .

Λ (t ∣ x) = - lo g S (t ∣ x), O (y ∣ x) = \frac{S ( y ∣ x )}{1 - S ( y ∣ x )}, λ (t ∣ x) = \frac{d}{d t} Λ (t ∣ x) = \frac{f ( t ∣ x )}{S ( t ∣ x )} .

S (t ∣ x) = S (t ∣ x, γ, f_{0}) = S (t ∣ x_{0})^{e x p (γ^{⊤} \tilde{x})},

S (t ∣ x) = S (t ∣ x, γ, f_{0}) = S (t ∣ x_{0})^{e x p (γ^{⊤} \tilde{x})},

f (t ∣ x) = f (t ∣ x; γ, f_{0}) = exp (γ^{⊤} \tilde{x}) S (t ∣ x_{0})^{e x p (γ^{⊤} \tilde{x}) - 1} f (t ∣ x_{0}) .

f (t ∣ x) = f (t ∣ x; γ, f_{0}) = exp (γ^{⊤} \tilde{x}) S (t ∣ x_{0})^{e x p (γ^{⊤} \tilde{x}) - 1} f (t ∣ x_{0}) .

ℓ (γ, f_{0}; z) =

ℓ (γ, f_{0}; z) =

+ δ lo g [S (y_{1} ∣ x_{0})^{e^{γ^{⊤} \tilde{x}}} - S (y_{2} ∣ x_{0})^{e^{γ^{⊤} \tilde{x}}}] .

f_{m}(t|\bm{x}_{0};\bm{p})=\left\{\begin{array}[]{ll}\frac{1}{\tau_{n}}\sum_{i=0}^{m}p_{i}\beta_{mi}(t/\tau_{n}),&\hbox{$t\in[0,\tau_{n}]$;}\\ p_{m+1}\alpha(t-\tau_{n}),&\hbox{$t\in(\tau_{n},\infty)$,}\end{array}\right.

f_{m}(t|\bm{x}_{0};\bm{p})=\left\{\begin{array}[]{ll}\frac{1}{\tau_{n}}\sum_{i=0}^{m}p_{i}\beta_{mi}(t/\tau_{n}),&\hbox{$t\in[0,\tau_{n}]$;}\\ p_{m+1}\alpha(t-\tau_{n}),&\hbox{$t\in(\tau_{n},\infty)$,}\end{array}\right.

S_{m}(t|\bm{x}_{0};\bm{p})=\left\{\begin{array}[]{ll}\sum_{i=0}^{m+1}p_{i}\bar{\mathcal{B}}_{mi}(t/\tau_{n}),&\hbox{$t\in[0,\tau_{n}]$;}\\ p_{m+1}\bar{\mathcal{A}}(t-\tau_{n}),&\hbox{$t\in(\tau_{n},\infty)$.}\end{array}\right.

S_{m}(t|\bm{x}_{0};\bm{p})=\left\{\begin{array}[]{ll}\sum_{i=0}^{m+1}p_{i}\bar{\mathcal{B}}_{mi}(t/\tau_{n}),&\hbox{$t\in[0,\tau_{n}]$;}\\ p_{m+1}\bar{\mathcal{A}}(t-\tau_{n}),&\hbox{$t\in(\tau_{n},\infty)$.}\end{array}\right.

S_{m} (t ∣ x; γ, p)

S_{m} (t ∣ x; γ, p)

f_{m} (t ∣ x; γ, p)

\bm{p}=\bm{p}(\bm{x}_{0})=(p_{0},\ldots,p_{m^{*}})\in\mathbb{S}_{m^{*}},\quad\mbox{$0\leq p_{m+1}<1$}.

\bm{p}=\bm{p}(\bm{x}_{0})=(p_{0},\ldots,p_{m^{*}})\in\mathbb{S}_{m^{*}},\quad\mbox{$0\leq p_{m+1}<1$}.

ℓ_{m} (γ, p; z)

ℓ_{m} (γ, p; z)

+ δ lo g [S_{m} (y_{1} ∣ x_{0}; p)^{e^{γ^{⊤} \tilde{x}}} - S_{m} (y_{2} ∣ x_{0}; p)^{e^{γ^{⊤} \tilde{x}}}],

ℓ_{m} (γ, p) = i = 1 \sum n ℓ_{m} (γ, p; z_{i}) .

ℓ_{m} (γ, p) = i = 1 \sum n ℓ_{m} (γ, p; z_{i}) .

\hat{f}_{B} (t ∣ x) = f_{m} (t ∣ x; \hat{γ}, \hat{p}), \hat{S}_{B} (t ∣ x) = S_{m} (t ∣ x; \hat{γ}, \hat{p}) .

\hat{f}_{B} (t ∣ x) = f_{m} (t ∣ x; \hat{γ}, \hat{p}), \hat{S}_{B} (t ∣ x) = S_{m} (t ∣ x; \hat{γ}, \hat{p}) .

\frac{\partial ℓ _{m} ( γ , p ; z )}{\partial p} =

\frac{\partial ℓ _{m} ( γ , p ; z )}{\partial p} =

=

Ψ_{j} (γ, p; z) =

Ψ_{j} (γ, p; z) =

\cdot \frac{S _{m} ( y _{1} ∣ x _{0} ; p ) ^{e^{γ^{⊤} \tilde{x}} - 1} B ˉ _{mj} ( y _{1} ) - S _{m} ( y _{2} ∣ x _{0} ; p ) ^{e^{γ^{⊤} \tilde{x}} - 1} B ˉ _{mj} ( y _{2} )}{S _{m} ( y _{1} ∣ x _{0} ; p ) ^{e^{γ^{⊤} \tilde{x}}} - S _{m} ( y _{2} ∣ x _{0} ; p ) ^{e^{γ^{⊤} \tilde{x}}}} .

λ_{n} (γ) := i = 1 \sum n e^{γ^{⊤} \tilde{x}_{i}} \geq

λ_{n} (γ) := i = 1 \sum n e^{γ^{⊤} \tilde{x}_{i}} \geq

\overset{ˉ}{Ψ}_{j} (γ, p) = \frac{1}{λ _{n} ( γ )} \frac{\partial ℓ _{m}}{\partial p _{j}} (γ, p) =

\overset{ˉ}{Ψ}_{j} (γ, p) = \frac{1}{λ _{n} ( γ )} \frac{\partial ℓ _{m}}{\partial p _{j}} (γ, p) =

p_{j}^{[s + 1]}

p_{j}^{[s + 1]}

\tilde{f}_{B} (t ∣ x)

\tilde{f}_{B} (t ∣ x)

= exp (\tilde{γ}^{⊤} \tilde{x}) [S_{m} (t ∣ x_{0}; \tilde{p})]^{e x p (\tilde{γ}^{⊤} \tilde{x}) - 1} f_{m} (t; \tilde{p}),

\tilde{S}_{B} (t ∣ x)

p_{j}^{(s + 1)} = \frac{p _{j}^{(s)}}{n} i = 1 \sum n Ψ_{j} (p^{(s)}; z_{i}), j \in I_{0}^{m^{*}},

p_{j}^{(s + 1)} = \frac{p _{j}^{(s)}}{n} i = 1 \sum n Ψ_{j} (p^{(s)}; z_{i}), j \in I_{0}^{m^{*}},

Ψ_{j} (p; z) = \frac{( 1 - δ ) β _{mj} ( y )}{f _{m} ( y ; p )} + δ \frac{B ˉ _{mj} ( y _{1} ) - B ˉ _{mj} ( y _{2} )}{S _{m} ( y _{1} ; p ) - S _{m} ( y _{2} ; p )}, j \in I_{0}^{m^{*}},

Ψ_{j} (p; z) = \frac{( 1 - δ ) β _{mj} ( y )}{f _{m} ( y ; p )} + δ \frac{B ˉ _{mj} ( y _{1} ) - B ˉ _{mj} ( y _{2} )}{S _{m} ( y _{1} ; p ) - S _{m} ( y _{2} ; p )}, j \in I_{0}^{m^{*}},

R (m_{i}) = k lo g (\frac{ℓ _{k} - ℓ _{0}}{k}) - i lo g (\frac{ℓ _{i} - ℓ _{0}}{i}) - (k - i) lo g (\frac{ℓ _{k} - ℓ _{i}}{k - i}), i \in I_{1}^{k},

R (m_{i}) = k lo g (\frac{ℓ _{k} - ℓ _{0}}{k}) - i lo g (\frac{ℓ _{i} - ℓ _{0}}{i}) - (k - i) lo g (\frac{ℓ _{k} - ℓ _{i}}{k - i}), i \in I_{1}^{k},

\frac{f _{m} ( t ∣ x _{0} ; p _{0} ) - f ( t ∣ x _{0} )}{f ( t ∣ x _{0} )} = O (m^{- ρ /2}),

\frac{f _{m} ( t ∣ x _{0} ; p _{0} ) - f ( t ∣ x _{0} )}{f ( t ∣ x _{0} )} = O (m^{- ρ /2}),

\frac{f _{m} ( t ; p _{0} ) - φ ( t )}{φ ( t )} = O (m^{- ρ /2}) .

\frac{f _{m} ( t ; p _{0} ) - φ ( t )}{φ ( t )} = O (m^{- ρ /2}) .

E [O (Y ∣ X)] = \int_{X} \int_{0}^{\infty} O (y ∣ x) d G_{1} (y ∣ x) d H (x) < \infty.

E [O (Y ∣ X)] = \int_{X} \int_{0}^{\infty} O (y ∣ x) d G_{1} (y ∣ x) d H (x) < \infty.

E [O (Y_{1} ∣ X) S (Y_{1} ∣ X)] = \int_{X} \int_{0}^{\infty} O (y_{1} ∣ x) S (y_{1} ∣ x) d G_{21} (y_{1} ∣ x) d H (x) < \infty,

E [O (Y_{1} ∣ X) S (Y_{1} ∣ X)] = \int_{X} \int_{0}^{\infty} O (y_{1} ∣ x) S (y_{1} ∣ x) d G_{21} (y_{1} ∣ x) d H (x) < \infty,

χ_{0}^{2} (p; x_{0})

χ_{0}^{2} (p; x_{0})

D_{0 j}^{2} (γ, p; x_{0})

D_{0}^{2} (p; x_{0})

D_{1}^{2} (p; x_{0})

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Control Systems and Identification · Fuzzy Systems and Optimization

Full text

Maximum Approximate Bernstein Likelihood Estimation in Proportional Hazard Model for Interval-Censored Data

Zhong Guan

Department of Mathematical Sciences

Indiana University South Bend, USA

[email protected]

Abstract

Maximum approximate Bernstein likelihood estimates of the baseline density function and the regression coefficients in the proportional hazard regression models based on interval-censored event time data are proposed. This results in not only a smooth estimate of the survival function which enjoys faster convergence rate but also improved estimates of the regression coefficients. Simulation shows that the finite sample performance of the proposed method is better than the existing ones. The proposed method is illustrated by real data applications.

Key Words and Phrases: Approximate Likelihood, Bernstein Polynomial Model, Cox’s Proportional Hazard Regression Model, Density Estimation, Interval Censoring, Survival Curve.

1 Introduction

Traditionally in semi- and nonparametric statistics we approximate an unknown smooth distribution function by a step function and parameterize this infinite-dimensional parameter by the jump sizes of the step function at the observed values. Therefore, the working model is actually of finite but varying dimension. The resulting estimate is a step function and does not deserve a density. This approach works fine when the infinite-dimensional parameter is nuisance. However, in the situation when such parameters such as survival, hazard, and density functions are our concerns the traditional approach which results in a jagged step-function estimation is not satisfactory especially when sample size is small which is usually the case for survival analysis of rare diseases. Besides the roughness of the estimation when data are incompletely observed it is difficult to parameterize the unknown survival function and not easy to find the nonparametric maximum likelihood estimate due to the complication of assigning probabilities and the large number of parameters (usually the same as the sample size) to be estimated. Moreover, the roughness of the estimate of nonparametric component could reduce the accuracy of the estimates of parameters in semiparametric models. Turnbull, (1976) presented an EM algorithm (Dempster et al.,, 1977) to compute the discrete nonparametric maximum likelihood estimate (NPMLE) of the distribution function from grouped, censored, and truncated data without covariates (see also Groeneboom and Wellner,, 1992). The method is generalized to obtain semiparametric maximum likelihood estimate (SPMLE) of the survival function to models including Cox’s proportional hazards (PH) model by Finkelstein, (1986), Huang, (1996), Huang and Wellner, (1997), and Pan, (1999). Finkelstein and Wolfe, (1985) proposed some semiparametric models for interval censored data. Asymptotic results about some semiparametric models can be found in Huang and Wellner, (1997), and Schick and Yu, (2000), etc. With interval censored data the assignment of the probabilities within the Turnbull interval cannot be uniquely determined (Anderson-Bergman, 2017b, ). Groeneboom and Wellner, (1992) suggested an iterative convex minorant (ICM) algorithm, which was improved or generalized by Wellner and Zhan, (1997), Pan, (1999), and Anderson-Bergman, 2017a . Grouped failure time data have been studied by, among others, Prentice and Gloeckler, (1978) and Pierce et al., (1979). Unfortunately, the NPMLE or SPMLE of the survival function is a step-function and may be not unique. Parametric models and Kernel smoothing methods (Parzen,, 1962; Rosenblatt,, 1956) have been applied to obtain smooth estimator of survival function (Lindsey,, 1998; Lindsey and Ryan,, 1998; Betensky et al.,, 1999). Another continuous estimation was due to Becker and Melbye, (1991) who assumed piecewise constant intensity model. Carstensen, (1996) generalized this method to regression models by assuming piecewise constant baseline rate.

Goetghebeur and Ryan, (2000) indicated that many of the EM-like methods have the relatively ad hoc nature of the procedure used to impute missing data and proposed a method using approximate likelihood to avoid such problem that retains some of the appealing features of the nonparametric smoothing methods such as the regression spline smoothing of Kooperberg and Clarkson, (1998) and the local likelihood kernel smoothing of Betensky et al., (1999).

Nonparametric density estimation is rather difficult due the lack of information contained in sample about it (Bickel et al.,, 1998; Ibragimov and Khasminskii,, 1983). Kernel method is usually unsatisfactory when sample size is small even for complete data. Some authors have studied the estimation of density function based on censored data (see for example Braun et al.,, 2005; Harlass,, 2016, and the refereces therein) without covariate.

A useful working statistical model must be finite-dimensional and approximates (see page 1 of Bickel et al.,, 1998) the true underlying distribution. Instead of approximating the underlying continuous distribution function by a step-function which is a multinomial probability model, Guan, (2016) suggested a Bernstein polynomial approximation (Bernstein,, 1912; Lorentz,, 1963) which is actually a mixture of some specific beta distributions. This Bernstein polynomial model performs much better than the classical kernel method for estimating density even from grouped data (Guan,, 2017). The maximum approximate Bernstein likelihood estimate can be viewed as a continuous version of the NPMLE or SPMLE. In this paper such estimates of the conditional survival and density functions given covariate are proposed by fitting interval censored data with Cox’s proportional hazards model.

2 Methodology

2.1 Proportional Hazards Model

Let $T$ be an event time and $\bm{X}$ be an associated $d$ -dimensional covariate with distribution $H(\bm{x})$ on $\cal{X}$ . We denote the marginal and the conditional survival functions of $T$ , respectively, by $S(t)=\bar{F}(t)=1-F(t)=P(T>t)$ and $S(t|\bm{x})=\bar{F}(t|\bm{x})=1-F(t|\bm{x})=P(T>t|\bm{X}=\bm{x}).$ Let $f(t|\bm{x})$ denote the conditional density of a continuous $T$ given $\bm{X}=\bm{x}$ . The conditional cumulative hazard function, odds ratio, and hazard rate are, respectively,

[TABLE]

Consider the Cox’s proportional hazard (PH) regression model (Cox,, 1972)

[TABLE]

where $\bm{\gamma}\in\Gamma\subset\mathbb{R}^{d}$ , $\tilde{\bm{x}}=\bm{x}-\bm{x}_{0}$ , $\bm{x}_{0}$ is any fixed covariate value, $f_{0}(\cdot)=f(\cdot|\bm{x}_{0})$ is the unknown baseline density and $S(\cdot|\bm{x}_{0})=\int_{\cdot}^{\infty}f(t|\bm{x}_{0})dt$ is the corresponding survival function. This is equivalent to

[TABLE]

It is clear that (1) and (2) are also true if we change the “baseline” covariate $\bm{x}_{0}$ to any $\bm{x}_{0}^{*}\in\cal{X}$ with the same $\bm{\gamma}$ but $\tilde{\bm{x}}$ being replaced by $\tilde{\bm{x}}^{*}=\bm{x}-\bm{x}_{0}^{*}$ . For a given $\bm{\gamma}\in\Gamma$ , define a $\bm{\gamma}$ -related “baseline” as an $\bm{x}_{\bm{\gamma}}\in\arg\min_{\bm{x}\in\mathcal{X}}\bm{\gamma}^{\!\mathrm{\scriptscriptstyle\top}\!}\bm{x}$ and denote $\tilde{\bm{x}}_{\bm{\gamma}}=\bm{x}-\bm{x}_{\bm{\gamma}}$ . Define $\tau=\inf\{t:F(t|\bm{x}_{0})=1\}$ . It is true that $\tau$ is independent of $\bm{x}_{0}$ , $0<\tau\leq\infty$ , and $f(t|\bm{x})$ have the same support $[0,\tau]$ for all $\bm{x}\in\cal{X}$ . It is obvious that for any strictly increasing continuous function $\psi$ , $P(\psi(T)>t|\bm{x})=P(\psi(T)>t|\bm{x}_{0})^{\exp(\bm{\gamma}^{\!\mathrm{\scriptscriptstyle\top}\!}\tilde{\bm{x}})}$ . Thus the transformed event time $\psi(T)$ also satisfies the Cox model (1).

We will consider the general situation where the event time is subject to interval censoring. The observed data are $\bm{Z}=(\bm{Y},\bm{X},\Delta)$ , where $\bm{Y}=(Y_{1},Y_{2}]$ and $\Delta$ is the censoring indicator, i.e., $T=Y=Y_{1}=Y_{2}$ is uncensored if $\Delta=0$ and $T\in\bm{Y}=(Y_{1},Y_{2}]$ , $0\leq Y_{1}<Y_{2}\leq\infty$ , is interval censored if $\Delta=1$ . The reader is referred to Huang and Wellner, (1997) for a review and more references about interval censoring. The right-censoring $Y_{2}=\infty$ and left-censoring $Y_{1}=0$ are included as special cases. For any individual observation $\bm{z}=(\bm{y},\bm{x},\delta)$ , where if $\delta=0$ then $\bm{y}=y=t$ else if $\delta=1$ then $\bm{y}=(y_{1},y_{2}]\ni t$ , $0\leq y_{1}<y_{2}\leq\infty$ , the full loglikelihood, up to an additive term independent of $(\bm{\gamma},f_{0})$ , is

[TABLE]

Let $(\bm{y}_{i},\bm{x}_{i},\delta_{i})$ , $i\in\mathbb{I}_{1}^{n}$ be independent observations of $(\bm{Y},\bm{X},\Delta)$ , here and in what follows $\mathbb{I}_{m}^{n}=\{m,\ldots,n\}$ for any integers $m\leq n\leq\infty$ . If $\tau$ is either unknown or $\tau=\infty$ and $\tau_{n}$ is at least the last finite observed time, i.e., $\tau_{n}\geq y_{(n)}=\max\{y_{i1},y_{j2}:y_{j2}<\infty;\,i,j\in\mathbb{I}_{1}^{n}\}$ then $[\tau_{n},\infty)$ is contained in the last Turnbull interval (Turnbull,, 1976). It is well known that if the last event time is right censored then the distribution of $T$ is not “nonparametrically estimable” on $[\tau_{n},\infty)$ . Thus all finite observed times are in $[0,\tau_{n}]$ and we can only estimate the truncated version of $f(t|\bm{x})$ on $[0,\tau_{n}]$ , $\bar{f}(t|\bm{x})=f(t|T\in[0,\tau_{n}],\bm{x})={f(t|\bm{x})}/{F(\tau_{n}|\bm{x})}$ , $t\in[0,\tau_{n}]$ . In many applications with right censored last observation $\bar{f}(t|\bm{x})$ does not approximate $f(t|\bm{x})$ because $F(\tau_{n}|\bm{x})$ may be not close to one.

2.2 Approximate Bernstein Polynomial Model

The full likelihood (3) cannot be maximized without specifying $S(t|\bm{x}_{0})$ using a finite dimensional model. Traditional method approximates $S(t|\bm{x}_{0})$ by step-function and treats the jumps at observations as unknown parameters. For censored or other types of incompletely observed data this parametrization is difficult and complicated. However the Bernstein polynomial approximation makes the parametrization simple and much easy (Guan,, 2016, 2017). Given any $\bm{x}_{0}$ , we approximate the truncated density $\bar{f}(t|\bm{x}_{0})=f(t|\bm{x}_{0})/F(\tau_{n}|\bm{x}_{0})$ by $\bar{f}_{m}(t|\bm{x}_{0};\bar{\bm{p}})=\tau_{n}^{-1}\sum_{i=0}^{m}\bar{p}_{i}\beta_{mi}(t/\tau_{n})$ , a mixture of beta densities $\beta_{mi}$ with shape parameters $(i+1,m-i+1)$ , $i\in\mathbb{I}_{0}^{m}$ , and unknown mixing proportions $\bar{\bm{p}}=\bar{\bm{p}}(\bm{x}_{0})=(\bar{p}_{0},\ldots,\bar{p}_{m})$ . Here the dependence of $\bar{\bm{p}}=\bar{\bm{p}}(\bm{x}_{0})$ on $\bm{x}_{0}$ will be suppressed. The mixing proportions $\bar{\bm{p}}$ are subject to constraints $\bar{\bm{p}}\in\mathbb{S}_{m}\equiv\{(u_{0},\ldots,u_{m})^{\!\mathrm{\scriptscriptstyle\top}\!}\in\mathbb{R}^{m+1}:u_{i}\geq 0,\sum_{i=0}^{m}u_{i}=1.\}.$ Denote $\pi=\pi(\bm{x}_{0})=F(\tau_{n}|\bm{x}_{0})$ . Reparametrizing with $p_{i}=\pi\bar{p}_{i}$ , $i\in\mathbb{I}_{0}^{m}$ , we can approximate $f(t|\bm{x}_{0})$ on $[0,\tau_{n}]$ by $f_{m}(t|\bm{x}_{0};\bm{p})=\pi(\bm{x}_{0})\bar{f}_{m}(t|\bm{x}_{0};\bm{p})=\frac{1}{\tau_{n}}\sum_{i=0}^{m}p_{i}\beta_{mi}(t/\tau_{n})$ . If $\pi<1$ , although we do not need and cannot estimate the values of $f(t|\bm{x}_{0})$ on $(\tau_{n},\infty)$ , we can put an arbitrary guess on them such as $f_{m}(t|\bm{x}_{0};\bm{p})=p_{m+1}\alpha(t-\tau_{n})$ , $t\in(\tau_{n},\infty)$ , where $p_{m+1}=1-\pi$ and $\alpha(\cdot)$ is a density on $[0,\infty)$ such that $(1-\pi)\alpha(0)=(m+1)p_{m}/\tau_{n}$ so that $f_{m}(t|\bm{x}_{0};\bm{p})$ is continuous at $t=\tau_{n}$ , e.g., $\alpha(t)=\alpha(0)\exp[-\alpha(0)t]$ . Thus $f(t|\bm{x}_{0})$ and $S(t|\bm{x}_{0})$ on $[0,\infty)$ , can be “approximated”, respectively, by

[TABLE]

and

[TABLE]

where $\bar{\mathcal{B}}_{mi}(t)=1-{\mathcal{B}}_{mi}(t)=1-\int_{0}^{t}\beta_{mi}(s)ds$ , $i\in\mathbb{I}_{0}^{m}$ , $\bar{\mathcal{B}}_{m,m+1}(t)\equiv 1$ , and $\bar{\mathcal{A}}(t)=\int_{t}^{\infty}\alpha(u)du$ . Thus we can approximate $S(t|\bm{x})$ and $f(t|\bm{x})$ on $[0,\tau_{n}]$ , respectively, by

[TABLE]

If $\tau$ is finite and known we choose $\tau_{n}=\tau$ and specify $p_{m+1}=0$ . Otherwise, we choose $\tau_{n}=y_{(n)}$ . In this case, from (3) we see that for data without right-censoring and covariate we have to specify $p_{m+1}=0$ due to its unidentifiability. If $\tau_{n}\neq 1$ we divide all the observed times by $\tau_{n}$ . Thus we assume $\tau_{n}=1$ in the following. We define $m^{*}=m$ or $=m+1$ according to whether we specify $p_{m+1}=0$ or not. Thus $\bm{p}=(p_{0},\ldots,p_{m^{*}})$ and satisfies constraints

[TABLE]

The loglikelihood $\ell(\bm{\gamma},f_{0};\bm{z})$ can be approximated by the Bernstein loglikelihood $\ell_{m}(\bm{\gamma},\bm{p};\bm{z})=\ell(\bm{\gamma},f_{m}(\cdot|\bm{x}_{0};\bm{p});\bm{z})$ , that is,

[TABLE]

where $S_{m}(\infty|\bm{x}_{0};\bm{p})=0$ . The loglikelihood $\ell(\bm{\gamma},f_{0})=\sum_{i=1}^{n}\ell(\bm{\gamma},f_{0};\bm{z}_{i})$ can be approximated by

[TABLE]

For a given degree $m$ , if $(\hat{\bm{\gamma}},\hat{\bm{p}})$ maximizes $\ell_{m}(\bm{\gamma},\bm{p})$ subject to constraints in (8) for some ${\bm{x}}_{0}$ then $(\hat{\bm{\gamma}},\hat{\bm{p}})$ is called the maximum approximate Bernstein (or beta) likelihood estimator (MABLE) of $(\bm{\gamma},\bm{p})$ . This is a full likelihood method. The MABLE’s of $f(t|\bm{x})$ and $S(t|\bm{x})$ are, respectively,

[TABLE]

The derivative of $\ell_{m}(\bm{\gamma},\bm{p};\bm{z})$ with respect to $\bm{p}$ is

[TABLE]

where, for $j\in\mathbb{I}_{0}^{m^{*}}$ ,

[TABLE]

Lemma 1.

The Hessian matrix $\bm{H}(\bm{\gamma},\bm{p})=\frac{\partial^{2}\ell_{m}(\bm{\gamma},\bm{p})}{\partial\bm{p}\partial\bm{p}^{\!\mathrm{\scriptscriptstyle\top}\!}}$ is nonpositive, i.e., all entries are nonpositive. For any fixed $\bm{\gamma}$ if ${\bm{\gamma}}^{\!\mathrm{\scriptscriptstyle\top}\!}\bm{x}_{0}\leq\min_{1\leq i\leq n}\{{{\bm{\gamma}}^{\!\mathrm{\scriptscriptstyle\top}\!}\bm{x}_{i}}\}$ then $\bm{H}(\bm{\gamma},\bm{p})$ is negative semi-definite for each $\bm{p}\in\mathbb{S}_{m^{*}}$ . If, in addition, the vectors $[\Psi_{j}(\bm{\gamma},\bm{p};\bm{z}_{1})$ , $\ldots,\Psi_{j}(\bm{\gamma},\bm{p};\bm{z}_{n})]$ , $j\in\mathbb{I}_{0}^{m^{*}}$ , are linearly independent, then $\bm{H}(\bm{\gamma},\bm{p})$ is negative definite.

Let $\tilde{\bm{p}}=\tilde{\bm{p}}(\bm{\gamma})=(\tilde{p}_{0},\ldots,\tilde{p}_{m^{*}})^{\!\mathrm{\scriptscriptstyle\top}\!}$ denote the maximizer of $\ell_{m}(\bm{\gamma},\bm{p})$ with respect to $\bm{p}=(p_{0},\ldots,p_{m^{*}})^{\!\mathrm{\scriptscriptstyle\top}\!}$ subject to constraints in (8).

Similar to Peters, Jr. and Walker, (1978) we have the following result about a necessary and sufficient condition for $\tilde{\bm{p}}$ .

Theorem 1.

For any fixed $\bm{\gamma}$ if ${\bm{\gamma}}^{\!\mathrm{\scriptscriptstyle\top}\!}\bm{x}_{0}\leq\min_{1\leq i\leq n}\{{{\bm{\gamma}}^{\!\mathrm{\scriptscriptstyle\top}\!}\bm{x}_{i}}\}$ then $\tilde{\bm{p}}=\tilde{\bm{p}}(\bm{\gamma})$ is a maximizer of $\ell_{m}(\bm{\gamma},\bm{p})$ if and only if

[TABLE]

for all $j\in\mathbb{I}_{0}^{m^{*}}$ with equality if $\tilde{p}_{j}>0$ . If, in addition, the vectors $[\Psi_{j}(\bm{\gamma},\bm{p};\bm{z}_{1})$ , $\ldots,\Psi_{j}(\bm{\gamma},\bm{p};\bm{z}_{n})]$ , $j\in\mathbb{I}_{0}^{m^{*}}$ , are linearly independent for all $\bm{p}$ in the interior of $\mathbb{S}_{m^{*}}$ , then $\tilde{\bm{p}}$ is unique.

So it is necessary that $\tilde{p}_{j}=\tilde{p}_{j}\bar{\Psi}_{j}({\bm{\gamma}},\tilde{\bm{p}})$ , $j\in\mathbb{I}_{0}^{m^{*}}$ , where

[TABLE]

We have fixed-point iteration

[TABLE]

If ${\bm{\gamma}}^{\!\mathrm{\scriptscriptstyle\top}\!}\bm{x}_{0}\leq\min_{1\leq i\leq n}\{{{\bm{\gamma}}^{\!\mathrm{\scriptscriptstyle\top}\!}\bm{x}_{i}}\}$ then $\bar{\Psi}_{j}({\bm{\gamma}},{\bm{p}})\geq 0$ for all $j\in\mathbb{I}_{0}^{m^{*}}$ and $\bm{p}\in\mathbb{S}_{m^{*}}$ .

Similar to the proof of Theorem 4 of Peters, Jr. and Walker, (1978) we can prove the convergence of ${\bm{p}}^{[s]}$ .

Theorem 2.

For any fixed $\bm{\gamma}$ suppose ${\bm{\gamma}}^{\!\mathrm{\scriptscriptstyle\top}\!}\bm{x}_{0}\leq\min_{1\leq i\leq n}\{{{\bm{\gamma}}^{\!\mathrm{\scriptscriptstyle\top}\!}\bm{x}_{i}}\}$ . If $\bm{p}^{[0]}$ is in the interior of $\mathbb{S}_{m^{*}}$ , the sequence $\{\bm{p}^{[s]}\}$ of (13) converges to $\tilde{\bm{p}}$ .

Define an empirical $\bm{\gamma}$ -related “baseline” $\hat{\bm{x}}_{0}=\hat{\bm{x}}_{0}(\bm{\gamma})$ such that ${\bm{\gamma}}^{\!\mathrm{\scriptscriptstyle\top}\!}\hat{\bm{x}}_{0}=\min_{1\leq i\leq n}\{{{\bm{\gamma}}^{\!\mathrm{\scriptscriptstyle\top}\!}\bm{x}_{i}}\}$ .

Lemma 2.

The matrix $\frac{\partial^{2}\ell_{m}(\bm{\gamma},\bm{p})}{\partial\bm{\gamma}\partial\bm{\gamma}^{\!\mathrm{\scriptscriptstyle\top}\!}}$ is negative definite.

Let $\tilde{\bm{\gamma}}$ be an efficient estimator of $\bm{\gamma}$ such as the NPMLE and SPMLE. We choose ${\bm{x}}_{0}=\hat{\bm{x}}_{0}(\tilde{\bm{\gamma}})$ . Then we maximize $\ell_{m}(\tilde{\bm{\gamma}},\bm{p})$ to obtain $\tilde{\bm{p}}=\tilde{\bm{p}}(\tilde{\bm{\gamma}})$ . Therefore we can estimate $f(t|\bm{x})$ and $S(t|\bm{x})$ on $[0,1]$ , respectively, by

[TABLE]

For the data without covariate, we have $\hat{\bm{\gamma}}=\bm{0}$ . Then we have $\hat{f}_{\mathrm{B}}(t)=f_{m}(t|\bm{x};\bm{0},\hat{\bm{p}})$ and $\hat{S}_{\mathrm{B}}(t)=S_{m}(t|\bm{x};\bm{0},\hat{\bm{p}})$ .

For the NPMLE or SPMLE $\tilde{\bm{\gamma}}$ of $\bm{\gamma}$ , the profile estimates $(\tilde{\bm{\gamma}},\tilde{\bm{p}})$ are close to $(\hat{\bm{\gamma}},\hat{\bm{p}})$ especially for large sample size. Thus $(\tilde{\bm{\gamma}},\tilde{\bm{p}})$ can be used as initial values to find $(\hat{\bm{\gamma}},\hat{\bm{p}})$ by the following algorithm. Such procedure was also suggested by Huang, (1996).

- Step 0:

Start with an initial guess $\bm{\gamma}^{(0)}$ of $\bm{\gamma}$ . Choose $\bm{x}_{0}^{(0)}=\hat{\bm{x}}_{0}(\bm{\gamma}^{(0)})$ . Use (13) with $\tilde{\bm{\gamma}}=\bm{\gamma}_{0}$ , $\bm{x}_{0}=\bm{x}_{0}^{(0)}$ , and starting point $\bm{p}^{[0]}=\bm{u}_{m}\equiv(1,\ldots,1)/(m^{*}+1)$ to get $\bm{p}^{(0)}=\tilde{\bm{p}}$ . Set $s=0$

Step 1:

Find the maximizer $\bm{\gamma}^{(s+1)}$ of $\ell_{m}(\bm{\gamma},\bm{p}^{(s)})$ using the Newton-Raphson method.

Step 2:

Choose $\bm{x}_{0}^{(s+1)}=\hat{\bm{x}}_{0}(\bm{\gamma}^{(s+1)})$ and $\tilde{\bm{\gamma}}=\bm{\gamma}^{(s+1)}$ . If $\tilde{\bm{\gamma}}^{\!\mathrm{\scriptscriptstyle\top}\!}\Delta\bm{x}_{0}\equiv\tilde{\bm{\gamma}}^{\!\mathrm{\scriptscriptstyle\top}\!}(\bm{x}_{0}^{({s+1})}-\bm{x}_{0}^{({s})})=0$ then $\bm{p}^{[0]}=\bm{p}^{(s)}$ otherwise $p^{[0]}_{i}=C_{m}f_{m}(i/m|\bm{x}_{0}^{(s+1)};\tilde{\bm{\gamma}},\bm{p}^{(s)})$ , $i\in\mathbb{I}_{0}^{m}$ , $p^{[0]}_{m+1}=(p^{(s)}_{m+1})^{e^{\tilde{\bm{\gamma}}^{\!\mathrm{\scriptscriptstyle\top}\!}\Delta\bm{x}_{0}}}$ if $m^{*}=m+1$ , where $C_{m}$ is chosen so that $\sum_{i=0}^{m}p^{[0]}_{i}=1-p^{[0]}_{m+1}$ . Then use (13) with $\bm{x}_{0}=\bm{x}_{0}^{({s+1})}$ to get $\bm{p}^{(s+1)}=\tilde{\bm{p}}$ . If the so obtained $\bm{p}^{[0]}$ is not in the interior of $\mathbb{S}_{m^{*}}$ we set $\bm{p}^{[0]}=(\bm{p}^{[0]}+\epsilon\bm{u}_{m})/(1+\epsilon)$ using a small $\epsilon>0$ . Set $s=s+1$ .

Step 3:

Repeat Steps 1 and 2 until convergence. The final $\bm{\gamma}^{(s)}$ and $\bm{p}^{(s)}$ are taken as the MABLE $(\hat{\bm{\gamma}},\hat{\bm{p}})$ of $(\bm{\gamma},\bm{p})$ with baseline $\hat{\bm{x}}_{0}=\bm{x}_{0}^{(s)}$ .

The concavities of $\ell_{m}(\bm{\gamma},\bm{p})$ with respect to $\bm{\gamma}$ and $\bm{p}$ ensure that the above iterative algorithm is a point-to-point map and the solution set contains single point. Convergence of $(\bm{\gamma}^{(s)},\bm{p}^{(s)})$ to $(\hat{\bm{\gamma}},\hat{\bm{p}})$ is guaranteed by the Global Convergence Theorem (Zangwill,, 1969).

2.2.1 Some Special Cases

Data Without Covariate: For interval-censored data without covariate, $\bm{z}_{i}=(\bm{y}_{i},\delta_{i})$ , $i\in\mathbb{I}_{1}^{n}$ . The iteration (13) reduces to

[TABLE]

where

[TABLE]

$f_{m}(t;\bm{p})=\sum_{j=0}^{m}p_{i}\beta_{mj}(t)$ , and $S_{m}(t;\bm{p})=\sum_{j=0}^{m^{*}}p_{j}\bar{\mathcal{B}}_{mj}(t)$ .

Two-Sample Data: When $\bm{x}=x$ is binary, $x=1$ for cases and $x=0$ for controls, we have a two-sample PH model which specifies $S(t|1)=[S(t|0)]^{\exp(\gamma)}$ . In this case, usually $\gamma\geq 0$ so that $\Psi_{j}({\bm{\gamma}},{\bm{p}};\bm{z})$ is always positive for each $j$ . In case $\gamma<0$ we switch case and control data.

2.3 Model Selection

The change-point method for model degree selection (Guan,, 2016) applies for finding an optimal degree $m$ for a given regression model. Let $M=\{m_{0},\ldots,m_{k}\}$ , $m_{i}=m_{0}+i$ , $i\in\mathbb{I}_{0}^{k}$ . For each $i\in\mathbb{I}_{0}^{k}$ , fit the data to obtain $(\hat{\bm{\gamma}},\hat{\bm{p}})$ and $\ell_{i}=\ell_{m_{i}}(\hat{\bm{\gamma}},\hat{\bm{p}})$ . The optimal degree $m$ is the maximizer $\hat{m}$ of

[TABLE]

where $R(m_{k})=0$ . Alternatively, we can replace $\ell_{i}$ by $\ell_{m_{i}}(\tilde{\bm{\gamma}},\tilde{\bm{p}})$ where $\tilde{\bm{p}}=\tilde{\bm{p}}(\tilde{\bm{\gamma}})$ for a fixed efficient estimate $\tilde{\bm{\gamma}}$ for all $i$ . The resulting optimal degree is denoted by $\tilde{m}$ . Then using $m=\hat{m}$ or $m=\tilde{m}$ we obtain $(\hat{\bm{\gamma}},\hat{\bm{p}})$ .

3 Asymptotic Results

3.1 Some Assumptions and Conditions

The following assumptions are needed to develop asymptotic theory.

(A1).

The support $\mathcal{X}$ of covariate $\bm{X}$ is compact and for each $\bm{x}_{0}\in\mathcal{X}$ , $\mathrm{E}(\tilde{\bm{X}}\tilde{\bm{X}}^{\!\mathrm{\scriptscriptstyle\top}\!})$ is positive definite, where $\tilde{\bm{X}}=\bm{X}-\bm{x}_{0}$ .

(A2).

For each $\bm{x}_{0}\in\mathcal{X}$ and $\tau_{n}>0$ , there exist $f_{m}(t|\bm{x}_{0};{\bm{p}}_{0})$ and $\rho>0$ such that, uniformly in $t\in[0,\tau_{n}]$ ,

[TABLE]

where ${\bm{p}}_{0}=(p_{01},\ldots,p_{0m},p_{0,m+1})^{\!\mathrm{\scriptscriptstyle\top}\!}$ , $p_{0i}=\pi(\bm{x}_{0})\bar{p}_{0i}$ , $i\in\mathbb{I}_{0}^{m}$ , $p_{0,m+1}=1-\pi(\bm{x}_{0})=S(\tau_{n}|\bm{x}_{0})$ .

For any $\bm{\gamma}$ , the compactness of $\cal{X}$ ensures the existence of $\bm{x}_{\gamma}\in\arg\min\{\bm{\gamma}^{\!\mathrm{\scriptscriptstyle\top}\!}\bm{x}:\bm{x}\in\cal{X}\}$ . Boundedness of $\bm{X}$ is assumed in the literature, e.g. (A3)(b) of Huang and Wellner, (1997). The positive finiteness of $\mathrm{E}(\tilde{\bm{X}}\tilde{\bm{X}}^{\!\mathrm{\scriptscriptstyle\top}\!})$ assures the identifiability of $\bm{\gamma}$ .

Let $\mathcal{C}^{(r)}[0,1]$ be the class of functions which have $r$ th continuous derivative $f^{(r)}$ on $[0,1]$ . A function $f$ is said to be $\alpha$ –Hölder continuous with $\alpha\in(0,1]$ if $|f(x)-f(y)|\leq C|x-y|^{\alpha}$ for some constant $C>0$ . We have the following result.

Lemma 3.

Suppose that $\varphi(t)=t^{a}(1-t)^{b}\varphi_{0}(t)$ is a density on $[0,1]$ , $a$ and $b$ are nonnegative integers, $\varphi_{0}\in\mathcal{C}^{(r)}[0,1]$ , $r\geq 0$ , $\varphi_{0}(t)\geq b_{0}>0$ , and $\varphi_{0}^{(r)}$ is $\alpha$ -Hölder continuous with $\alpha\in(0,1]$ . Then there exists $\bm{p}_{0}\in\mathbb{S}_{m}$ such that uniformly in $t\in[0,1]$ , with $\rho=r+\alpha$ ,

[TABLE]

This lemma was proved in Wang and Guan, (2019). This is a generalization of the result of Lorentz, (1963) which requires a positive lower bound for $\varphi$ , *i.e. *, $a=b=0$ .

If $\varphi(t)=\tau_{n}\bar{f}(\tau_{n}t|\bm{x}_{0})=\tau_{n}f(\tau_{n}t|\bm{x}_{0})/\pi(\bm{x}_{0})$ as a density on $[0,1]$ fulfills the condition of Lemma 3, then assumption (A2) is fulfilled. The condition of Lemma 3 seems only sufficient for (A2).

In the following, all expectations $\mathrm{E}(\cdot)$ are taken with respect to the (joint) distribution of random variable(s) in upper case. The following are the conditions for cases considered in the asymptotic results.

(C0).

The event time $T$ is uncensored and $\tau_{n}=\tau<\infty$ .

(C1).

The event time $T$ is subject to Case 1 interval censoring. Given $\bm{X}=\bm{x}$ the inspection time $Y$ has cdf $G_{1}(\cdot|\bm{x})$ on $[\tau_{l},\tau_{u}]$ , $0<\tau_{l}<\tau_{u}=\tau_{n}<\tau\leq\infty$ , and

[TABLE]

(C2).

The event time $T$ is subject to Case $k$ ( $k\geq 2$ ) interval censoring Given $\bm{X}=\bm{x}$ the observed inspection times $\bm{Y}=(Y_{1},Y_{2})$ have joint cdf $G_{2}(\cdot,\cdot|\bm{x})$ on $\{\bm{y}=(y_{1},y_{2}):0<\tau_{l}\leq y_{1}\leq y_{2}\leq\tau_{u}\}$ , $\tau_{n}=\tau_{u}<\tau$ , and

[TABLE]

where $G_{21}$ is the marginal cdf of $Y_{1}$ .

The condition about the support of the inspection times are similar to those of Huang and Wellner, (1997). The next theorem is about the identifiability of the approximate model.

Theorem 3.

Suppose that $\bm{X}$ is almost surely linearly independent on $\mathcal{X}$ . Then for uncensored data both $\bm{\gamma}$ and $\bm{p}$ are identifiable. For censored data, if, in addition, the inspection time is continuous then both $\bm{\gamma}$ and $\bm{p}$ are identifiable.

3.2 Some Statistical Distances

Under condition (C0), define statistical distances

[TABLE]

where $\bm{\gamma}_{0}$ is the true value of $\bm{\gamma}$ .

Under condition (C1), we define a weighted version of the Anderson and Darling, (1954) distance as

[TABLE]

Under condition (C2), we define

[TABLE]

In the following the same symbols $C$ and $C^{\prime}$ may represent different constants in different places.

Theorem 4.

Let $(\hat{\bm{\gamma}},\hat{\bm{p}})$ be the MABLE of $(\bm{\gamma},\bm{p})$ with degree $m\geq Cn^{1/\rho}$ for some constant $C>0$ . Suppose that assumptions (A1) and (A2) are satisfied. For each $i=0,1,2$ , and any $\epsilon\in(0,1/2)$ , under condition (C $i$ ), we have $\|\hat{\bm{\gamma}}-\bm{\gamma}_{0}\|^{2}\leq Cn^{-1+\epsilon}$ , a.s. and $D^{2}_{i}(\hat{\bm{p}};\hat{\bm{x}}_{0})\leq Cn^{-1+\epsilon}$ , a.s..

Theorem 5.

Suppose that assumptions (A1) and (A2) are satisfied. Let $\tilde{\bm{\gamma}}=\tilde{\bm{\gamma}}({\bm{p}}_{0})$ be the maximizer of $\ell_{m}(\bm{\gamma},\bm{p}_{0})$ for some $\bm{p}_{0}$ that satisfies (A2). For each $i=0,1,2$ , under condition (C $i$ ), $\sqrt{n}(\tilde{\bm{\gamma}}-\bm{\gamma}_{0})$ converges in distribution to $N(\bm{0},\mathcal{I}^{-1})$ as $n\to\infty$ , where $\bm{x}_{0}\in\arg\min_{\bm{x}\in\mathcal{X}}\bm{\gamma}_{0}^{\!\mathrm{\scriptscriptstyle\top}\!}\bm{x}$ , $\mathcal{I}=\mathrm{E}(\tilde{\bm{X}}\tilde{\bm{X}}^{\!\mathrm{\scriptscriptstyle\top}\!})$ under condition (C0); $\mathcal{I}=\mathrm{E}\{[O(Y|\bm{X})\Lambda^{2}(Y|\bm{X})]\tilde{\bm{X}}\tilde{\bm{X}}^{\!\mathrm{\scriptscriptstyle\top}\!}\}$ under condition (C1); and

[TABLE]

under condition (C2), where

[TABLE]

Remark 1.

For Cox’s maximum partial likelihood estimator $\hat{\bm{\gamma}}_{cox}$ from uncensored data, the information is

[TABLE]

By the law of total covariance

[TABLE]

with equality iff $\mathrm{E}(\bm{X}|T=t)$ is constant. So under this surreal situation, the information $\mathcal{I}=\mathrm{E}(\tilde{\bm{X}}^{\otimes 2})\geq\mathcal{I}_{cox}$ for all $\bm{x}_{0}\in\mathcal{X}$ . More theoretical work need be done to access the information loss due to the unknown $\bm{p}_{0}$ .

Because $\ell_{m}(\bm{\gamma},\bm{p})$ depends on $\bm{p}$ through $f_{m}(\cdot|\bm{x}_{0};\bm{p})$ and $f_{m}(\cdot|\bm{x}_{0};\bm{p}_{0})\approx f_{m}(\cdot|\bm{x}_{0};\hat{\bm{p}})$ , although $\bm{p}_{0}$ is unknown, we have $\hat{\bm{\gamma}}\approx\tilde{\bm{\gamma}}$ . We can estimate the information $\mathcal{I}$ by, with $\bm{x}_{0}=\hat{\bm{x}}_{0}$ ,

[TABLE]

4 Simulation

Assume that given $\bm{X}=\bm{x}$ , $T$ is Weibull $W(\theta,\sigma e^{-\bm{\gamma}^{\!\mathrm{\scriptscriptstyle\top}\!}\bm{x}/\theta})$ so that the baseline $\bm{x}=\bm{0}$ distribution is $W(\theta,\sigma)$ with shape and scale $\theta=\sigma=2$ . The function simIC_weib() of R package icenReg (Anderson-Bergman, 2017b, ) was used to generate interval censored data of sizes $n=30,50,100$ with censoring probability is 70% from Weibull distributions. For data with covariate, $\bm{X}=(X_{1},X_{2})$ , where $X_{1}$ and $X_{2}$ are independent, $X_{1}$ is uniform [-1,1] and $X_{2}=\pm 1$ is uniform, with coefficients $\gamma_{1}=0.5$ , $\gamma_{2}=-0.5$ . For data without covariate, Braun et al., (2005)’s kernel density estimation implemented in R ICE package was used. In each case, 1000 samples were generated and used to estimate $\bm{\gamma}$ , $f(\cdot|\bm{0})$ and $S(\cdot|\bm{0})$ on $[0,7]$ . If $\tau_{n}=y_{(n)}<7$ we use exponential $\alpha(\cdot)$ on $(\tau_{n},7)$ as in (4) and (5).

The simulation results on the estimation of the regression coefficients are summarized in Table 1. The pointwise mean squared errors of the estimated survival functions are plotted in Figure 1. Since the proposed $\hat{S}_{\mathrm{B}}$ has smaller variance than the discrete SPMLE especially when sample size is not large, the new estimator $\hat{\bm{\gamma}}$ may have smaller standard deviation than the traditional one. This is convinced by the simulation. From these results we see that the proposed estimates are better than the semiparametric estimates of $\gamma$ ’s and are close to the parametric maximum likelihood estimates(PMLEs) especially for small sample data. The two proposed estimates using $m=\tilde{m}$ and $m=\hat{m}$ are very close. The proposed method is compared with the kernel smoothing method of Braun et al., (2005) (see the right panels of Figure 1). The overall performance of the proposed method is close, and getting closer as sample size increases, to the PMLE and much better than the NPMLE and the kernel estimates.

5 Examples

5.1 Gentleman and Geyer, (1994)’s Example

Gentleman and Geyer, (1994) gave an artificial data set to show that Turnbull’s nonparametric maximum likelihood estimator $\hat{F}(t)$ exists, but there are two fixed points of Turnbull’s selfconsistency algorithm. The data consist of six intervals (0, 1), (0, 2), (0, 2), (1, 3), (1, 3), (2, 3). Since there is no right-censored event time, $p_{m+1}=0$ . Choosing $\tau_{n}=3$ we have the transformed intervals are $(y_{i1},y_{i2}):$ $(0,1/3),(0,2/3),(0,2/3),(1/3,1),(1/3,1),(2/3,1).$ Let $q_{1}(\bm{p})=\sum_{j=0}^{m}p_{j}\mathcal{B}_{mj}(1/3)$ and $q_{2}(\bm{p})=\sum_{j=0}^{m}p_{j}\mathcal{B}_{mj}(2/3)$ , where $\bm{p}=(p_{0},p_{1},\ldots,p_{m})$ . The likelihood is $\ell_{m}(\bm{p})=\ell(q_{1},q_{2})=\log q_{1}+2\log q_{2}+2\log(1-q_{1})+\log(1-q_{2}).$ It attains maximum $-3.819085$ at $(q_{1},q_{2})=(1/3,2/3)$ . So $\ell_{m}(\bm{p})$ is maximized whenever $q_{1}=\sum_{j=0}^{m}p_{j}\mathcal{B}_{mj}(1/3)=1/3$ and $q_{2}=\sum_{j=0}^{m}p_{j}\mathcal{B}_{mj}(2/3)=2/3$ .

For this artificial dataset, the MABLE of $\bm{p}$ is unique and uniform if $m=1,2$ but not unique if $m\geq 3$ . Figure 2 shows the NPMLE of $S(t)$ and the MABLEs of $S(t)$ and $f(t)$ when $m=6$ with different starting points $\bm{p}_{1}^{[0]}=(1,2,\ldots,7)/28$ , $\bm{p}_{2}^{[0]}=(1,1,\ldots,1)/7$ , and $\bm{p}_{3}^{[0]}=(1,2,3,4,3,2,1)/16$ . Although the MABLE $\hat{\bm{p}}$ is not unique, as shown in Figure 2, the resulting estimated survival functions are almost identical. A kernel density estimate for this dataset was discussed in Braun et al., (2005).

5.2 Stanford Heart Transplant Data

To illustrate the use of the proposed method for right-censored data with binary covariate, we used the Stanford Heart Transplant data which is available in R survival package. More information about this dataset can be found in Crowley and Hu, (1977). We choose $X$ , the indicator of prior bypass surgery, as covariate and $\tau_{n}=y_{(n)}=1799$ . The Cox’s partial likelihood estimate of $\gamma$ is $\tilde{\gamma}=-0.74072$ (s.e. 0.3591). With fixed $\gamma=\tilde{\gamma}$ , the estimated degree is $\tilde{m}=14$ . The MABLE of $\bm{p}$ is $\tilde{\bm{p}}=(\tilde{p}_{0},\ldots,\tilde{p}_{15})^{\!\mathrm{\scriptscriptstyle\top}\!}$ , where $\tilde{p}_{0}=0.470490$ , $\tilde{p}_{6}=1.3256\times 10^{-6}$ , $\tilde{p}_{7}=0.151148$ , $\tilde{p}_{8}=2.7997\times 10^{-5}$ , $\tilde{p}_{10}=1.1001\times 10^{-7}$ , $\tilde{p}_{11}=0.038977$ , $\tilde{p}_{15}=1-\tilde{\pi}=0.339359$ , and all the other $\tilde{p}_{i}$ ’s are smaller than $10^{-9}$ . Then we obtain

[TABLE]

With the chosen $\tilde{m}=14$ , the maximizer $(\hat{\bm{\gamma}},\hat{\bm{p}})$ of $\ell_{\tilde{m}}(\bm{\gamma},\bm{p})$ was found to be $\hat{\gamma}=-0.95151$ (s.e. 0.12309) and $\hat{\bm{p}}=(\hat{p}_{0},\ldots,\hat{p}_{15})^{\!\mathrm{\scriptscriptstyle\top}\!}$ , where $\hat{p}_{0}=0.40848$ , $\hat{p}_{2}=4.49876\times 10^{-6}$ , $\hat{p}_{3}=3.35856\times 10^{-6}$ , $\hat{p}_{6}=1.12521\times 10^{-6}$ , $\hat{p}_{7}=0.14646$ , $\hat{p}_{8}=2.28252\times 10^{-6}$ , $\hat{p}_{10}=1.30873\times 10^{-6}$ , $\hat{p}_{11}=0.03827$ , $\hat{p}_{12}=1.21518\times 10^{-6}$ , $\hat{p}_{15}=1-\tilde{\pi}=0.40677$ , and all the other $\hat{p}_{i}$ ’s are smaller than $10^{-6}$ . The resulting estimated survival function is denoted by $\hat{S}_{\mathrm{B}}(t|\bm{x}=1)$ with $m=14$ .

The optimal degree is $\hat{m}=12$ based on full likelihood $\ell_{m}(\hat{\bm{\gamma}},\hat{\bm{p}})$ . The MABLE of $(\bm{\gamma},\bm{p})$ was found to be $\hat{\gamma}=-1.05959$ (s.e. 0.12309) and $\hat{\bm{p}}=(\hat{p}_{0},\ldots,\hat{p}_{13})^{\!\mathrm{\scriptscriptstyle\top}\!}$ , where $\hat{p}_{0}=0.38968$ , $\hat{p}_{6}=0.11718$ , $\hat{p}_{7}=0.02320$ , $\hat{p}_{8}=4.19865\times 10^{-6}$ , $\hat{p}_{9}=0.03226$ , $\hat{p}_{10}=5.74877\times 10^{-6}$ , $\hat{p}_{13}=1-\hat{\pi}=0.43767$ , and all the other $\hat{p}_{i}$ ’s are smaller than $10^{-6}$ . The resulting estimated survival function is denoted by $\hat{S}_{\mathrm{B}}(t|\bm{x}=1)$ with $m=12$ . The results are shown in Figure 3. The proposed estimates of survival probabilities for those who had (no) by-pass surgery are much larger (a little smaller) than the SPMLEs.

5.3 Ovarian Cancer Data

As an example of right-censored data with continuous covariate the ovarian cancer dataset contained in the R package Survival (Therneau,, 2015) was originally reported by Edmonson et al., (1979), and was used as real data example by several authors (e.g. Collett,, 2003; Huang and Ghosh,, 2014). In this study $n=26$ patients with advanced ovarian carcinoma (stages IIIB and IV) were treated using either cyclophosphamide alone (1 g/m2) or cyclophosphamide (500 mg/m2) plus adriamycin (40 mg/m2) by i.v. injection every 3 weeks in order to compare the treatment effect in prolonging the time of survival. Twelve observations are uncensored and the rest is right-censored. We choose $X$ =Age. The Cox’s partial likelihood estimate of $\gamma$ is $\tilde{\gamma}=0.16162$ (s.e. 0.04974). Using the proposed method we obtained optimal degree $m=23$ based on either $\ell_{m}(\tilde{\gamma},\hat{\bm{p}})$ or $\ell_{m}(\hat{\gamma},\hat{\bm{p}})$ (see upper panels of Figure 4). With $m=23$ , we have $\hat{\gamma}=0.17665$ ( s.e. 0.01218), and $\hat{x}_{0}=38.89$ . The components of $\hat{\bm{p}}$ are $\hat{p}_{2}=0.00226$ , $\hat{p}_{9}=0.02789$ , $\hat{p}_{10}=0.00277$ , $\hat{p}_{24}=0.96707$ , and all the other $\hat{p}_{i}<10^{-6}$ . The estimated survival curves given ages 60 and 65 are shown in Figure 4.

6 Concluding Remarks

We have seen that with a continuous approximate model it is much easy to write the full likelihood. The parameter $\bm{p}$ is identifiable under some conditions. This overcomes the unidentifiability and roughness problem of the discrete NPMLE or SPMLE of survival function. Furthermore the proposed method gives better estimates of the regression coefficients. However, the discrete NPMLE or SPMLE is useful to obtain initial starting points for the proposed MABLEs of survival function and the regression coefficients.

7 Appendix

7.1 Proof of Lemma 1

Let $\bm{p}$ be any point in the interior of $\mathbb{S}_{m}$ . For any nonzero vector $\bm{v}=(v_{0},\ldots,v_{{m^{*}}})^{\!\mathrm{\scriptscriptstyle\top}\!}\in\mathbb{R}^{m^{*}+1}$ , define

[TABLE]

By (11), the $(j,k)$ -entry of $\bm{H}(\bm{\gamma},\bm{p})$ is $H_{jk}=\sum_{i=1}^{n}H_{jk}(\bm{z}_{i})$ , where

[TABLE]

Denote temporally $\eta=e^{{\bm{\gamma}}^{\!\mathrm{\scriptscriptstyle\top}\!}\tilde{\bm{x}}}$ , $B_{ij}=\bar{\mathcal{B}}_{mi}(y_{j};\bm{v})$ , and $V_{j}=S_{m}(y_{j}|\bm{x}_{0};\bm{p})$ , $i\in\mathbb{I}_{0}^{m^{*}}$ , $j=1,2$ . We know $V_{1}\geq V_{2}$ and $B_{i1}\geq B_{i2}$ . In order to show that $H_{jk}(\bm{z})\leq 0$ for all $j,k\in\mathbb{I}_{0}^{m^{*}}$ , it suffices to show $A\leq B$ , where $A=(V_{1}^{\eta-2}B_{j1}B_{k1}-V_{2}^{\eta-2}B_{j2}B_{k2})(V_{1}^{\eta}-V_{2}^{\eta})$ and $B=(V_{1}^{\eta-1}B_{j1}-V_{2}^{\eta-1}B_{j2})(V_{1}^{\eta-1}B_{k1}-V_{2}^{\eta-1}B_{k2})$ . Now

[TABLE]

For any $\bm{v}\in\mathbb{R}^{m^{*}+1}$ , denoting $W_{i}=W(y_{i};\bm{v})$ , $i=1,2$ , we have $\bm{v}^{\!\mathrm{\scriptscriptstyle\top}\!}\bm{H}(\bm{\gamma},\bm{p})\bm{v}=\sum_{i=1}^{n}\bm{v}^{\!\mathrm{\scriptscriptstyle\top}\!}\bm{H}(\bm{\gamma},\bm{p};\bm{z}_{i})\bm{v}$ , where, shown by simple algebra,

[TABLE]

Since $\eta\geq 1$ we have

[TABLE]

where

[TABLE]

Now $\bm{v}^{\!\mathrm{\scriptscriptstyle\top}\!}\sum_{i=1}^{n}\bm{U}_{0}(\bm{\gamma},\bm{p};\bm{z}_{i})\bm{v}=0$ implies, for all $i\in\mathbb{I}_{1}^{n}$ , $\sum_{j=0}^{m^{*}}v_{j}\Psi_{j}(\bm{\gamma},\bm{p};\bm{z}_{i})=0$ . The proof of Lemma 1 is complete.

7.2 Proof of Lemma 2

The derivatives of $\ell_{m}(\bm{\gamma},\bm{p};\bm{z})$ with respect to $\bm{\gamma}$ are

[TABLE]

where

[TABLE]

The lemma follows easily from (22) through(24).

7.3 Proof of Theorem 1

If ${\bm{\gamma}}^{\!\mathrm{\scriptscriptstyle\top}\!}\bm{x}_{0}=\min_{1\leq i\leq n}\{{{\bm{\gamma}}^{\!\mathrm{\scriptscriptstyle\top}\!}\bm{x}_{i}}\},$ we have ${\bm{\gamma}}^{\!\mathrm{\scriptscriptstyle\top}\!}\tilde{\bm{x}}_{i}\geq 0$ . By Lemma 1, $\ell_{m}(\bm{\gamma},\bm{p})$ is strictly concave on the compact and convex set $\mathbb{S}_{{m^{*}}}$ for the fixed $\bm{\gamma}$ . By the optimality condition for convex optimization (Boyd and Vandenberghe,, 2004) we have that $\tilde{\bm{p}}$ is the unique maximizer of $\ell_{m}(\bm{\gamma},\bm{p})$ if and only if

[TABLE]

where $\nabla_{\bm{p}}\ell_{m}(\bm{\gamma},{\bm{p}})=\partial\ell_{m}(\bm{\gamma},{\bm{p}})/\partial\bm{p}$ . Therefore $\tilde{\bm{p}}$ is a maximizer of $\ell_{m}(\bm{\gamma},{\bm{p}})$ for the fixed $\bm{\gamma}$ if and only if

[TABLE]

for all $j\in\mathbb{I}_{0}^{{m^{*}}}$ with equality if $\tilde{p}_{j}>0$ . The proof is complete.

7.4 Proof of Theorem 2

Following the proof of Theorems 1 and 2 and the Corollary of Peters, Jr. and Walker, (1978) we define $\bm{\Pi}=\mbox{diag}\{\bm{p}\}$ and ${\bm{A}}(\bm{p},\bm{\gamma})=\bm{\Pi}\nabla_{\bm{p}}\bar{\bm{\Psi}}(\bm{p},\bm{\gamma}),$ where $\bar{\bm{\Psi}}(\bm{p},\bm{\gamma})=[\bar{\Psi}_{0}(\bm{p},\bm{\gamma}),\ldots,\bar{\Psi}_{{m^{*}}}(\bm{p},\bm{\gamma})]^{\!\mathrm{\scriptscriptstyle\top}\!}$ . Then

[TABLE]

Its gradient is

[TABLE]

For any norm on $\mathbb{R}^{m^{*}+1}$ we have

[TABLE]

Consider $\nabla{\bm{A}}(\tilde{\bm{p}},\bm{\gamma})$ as an operator on subspace

[TABLE]

If all components of $\tilde{\bm{p}}$ are positive then $\nabla_{\bm{p}}\ell_{m}(\bm{\gamma},\tilde{\bm{p}})=\lambda_{n}(\bm{\gamma})\bm{1}$ , and $\nabla{\bm{A}}(\tilde{\bm{p}},\bm{\gamma})=I_{m^{*}+1}-\bm{Q},$ where

[TABLE]

From Lemma 1 and (20) it follows that $\bm{Q}$ is a left stochastic matrix and $\tilde{\bm{p}}^{\!\mathrm{\scriptscriptstyle\top}\!}\frac{\partial^{2}\ell_{m}(\bm{\gamma},\tilde{\bm{p}})}{\partial\bm{p}\partial\bm{p}^{\!\mathrm{\scriptscriptstyle\top}\!}}=-\frac{\partial\ell_{m}(\bm{\gamma},\tilde{\bm{p}})}{\partial\bm{p}^{\!\mathrm{\scriptscriptstyle\top}\!}}=-\lambda_{n}(\bm{\gamma})\bm{1}^{\!\mathrm{\scriptscriptstyle\top}\!}$ . So $\mathbb{Z}_{m}$ is invariant under $\bm{Q}$ .

Define an inner product $\langle\cdot,\cdot\rangle$ by $\langle\bm{u},\bm{v}\rangle=\bm{u}^{\!\mathrm{\scriptscriptstyle\top}\!}\tilde{\bm{\Pi}}^{-1}\bm{v}$ for $\bm{u}$ , $\bm{v}$ in $\mathbb{Z}_{m}$ . It can be easily shown that, with respect to this inner product, $\bm{Q}$ is symmetric and positive semidefinite on $\mathbb{Z}_{m}$ :

[TABLE]

Let $\mu_{0}$ and $\mu_{m}$ be the smallest and largest eigenvalues of $\bm{Q}$ associated with eigenvectors in $\mathbb{Z}_{m}$ . Then the operator norm of $\nabla{\bm{A}}(\tilde{\bm{p}},\bm{\gamma})$ on $\mathbb{Z}_{m}$ w.r.t. this inner product equals $\max\{|1-\mu_{0}|,|1-\mu_{m}|\}$ . It is clear that $0\leq\mu_{0}\leq\mu_{m}\leq 1$ because $\bm{Q}$ is a left stochastic matrix. By Lemma 1 we have $\mu_{0}>0$ . Similar to the proof of Theorem 2 of Peters, Jr. and Walker, (1978) the assertion of theorem follows. If $\tilde{\bm{p}}$ contains zero component(s), say $\tilde{p}_{j}=0$ , $j\in J_{0}$ , deleting the $j$ -th row and $j$ -th column of the vectors and matrices in the above proof for all $j\in J_{0}$ we can show that the iterates $p_{j}^{[s]}$ , $s\in\mathbb{I}_{0}^{\infty}$ , converge to $\tilde{p}_{j}$ as $s\to\infty$ for all $j\notin J_{0}$ . Because $\sum_{j=0}^{m^{*}}p_{j}^{[s]}=1$ and $p_{j}^{[s]}\geq 0$ , $j\in\mathbb{I}_{0}^{m^{*}}$ , for those $j\in J_{0}$ , $p_{j}^{[s]}$ converges to zero as $s\to\infty$ . The proof of Theorem 2 is complete.

7.5 Proof of Theorem 3

If $\ell_{m}(\bm{\gamma}^{(1)},\bm{p}^{(1)};\bm{z})\equiv\ell_{m}(\bm{\gamma}^{(2)},\bm{p}^{(2)};\bm{z})$ , where $\bm{\gamma}^{(i)}\in\Gamma$ and $\bm{p}^{(i)}\in\mathbb{S}_{m^{*}}$ , $i=1,2$ , then (i) for uncensored data we have $f_{m}(y|\bm{x}_{0};\bm{p}^{(1)})\equiv f_{m}(y|\bm{x}_{0};\bm{p}^{(2)})$ and $\tilde{\bm{x}}^{\!\mathrm{\scriptscriptstyle\top}\!}\bm{\gamma}^{(1)}\equiv\tilde{\bm{x}}^{\!\mathrm{\scriptscriptstyle\top}\!}\bm{\gamma}^{(2)}$ ; and (ii) for censored data we have

[TABLE]

For case (i) we have $\bm{p}^{(1)}=\bm{p}^{(2)}$ as shown by Guan, (2016) and $\bm{\gamma}^{(1)}=\bm{\gamma}^{(2)}$ if $\tilde{\bm{x}}$ is linearly independent. For case (ii) we have $S_{m}(y_{j}|\bm{x}_{0};\bm{p}^{(1)})\equiv S_{m}(y_{j}|\bm{x}_{0};\bm{p}^{(2)})$ which implies $\bm{p}^{(1)}=\bm{p}^{(2)}$ and $\bm{\gamma}^{(1)}=\bm{\gamma}^{(2)}$ if $\tilde{\bm{x}}$ is linearly independent.

7.6 Proof of Theorem 4

We need the following lemma for the proof.

Lemma 4.

Suppose that assumptions (A1) and (A2) with $m\geq C_{0}n^{1/k}$ for some constant $C_{0}$ , and condition (C $i$ ) are satisfied for an $i\in\mathbb{I}_{0}^{2}$ and an $\epsilon\in(0,1/2)$ . If $\|\bm{\gamma}-\bm{\gamma}_{0}\|^{2}\leq Cn^{-1+\epsilon}$ then for any $\epsilon^{\prime}\in(\epsilon,1/2)$ and $n$ large enough the maximizer $\tilde{\bm{p}}=\tilde{\bm{p}}(\bm{\gamma})$ of $\ell_{m}(\bm{\gamma},\bm{p})$ almost surely satisfies $D_{i}^{2}(\tilde{\bm{p}};\bm{x}_{0})\leq C^{\prime}n^{-1+\epsilon^{\prime}}$ , for some constant $C^{\prime}>0$ , where $\bm{x}_{0}=\bm{x}_{\bm{\gamma}}$ , $\tilde{\bm{p}}\in A_{m}(\epsilon_{n})$ . Conversely, if ${D_{i}^{2}(\tilde{\bm{p}};\bm{x}_{0})\leq Cn^{-1+\epsilon}}$ , for some $\bm{x}_{0}$ , then for any $\epsilon^{\prime}\in(\epsilon,1/2)$ and $n$ large enough the maximizer $\tilde{\bm{\gamma}}=\tilde{\bm{\gamma}}(\bm{p})$ of $\ell(\bm{\gamma},\bm{p})$ for the fixed $\bm{p}$ almost surely satisfies $\|\tilde{\bm{\gamma}}-\bm{\gamma}_{0}\|^{2}\leq C^{\prime}n^{-1+\epsilon^{\prime}}$ , for some constant $C^{\prime}>0$ .

Proof of Lemma 4

Define $\ell(\bm{\gamma},f_{0})=\sum_{i=1}^{n}\ell(\gamma,f_{0};\bm{z}_{i})$ and ${\mathcal{R}}({\bm{\gamma}},\bm{p})=\ell({\bm{\gamma}}_{0},f_{0})-\ell_{m}({\bm{\gamma}},\bm{p})$ . By Taylor expansion we have, for all $\bm{p}\in\mathcal{A}(\epsilon_{n})$ ,

[TABLE]

where $\tilde{\mathcal{R}}_{00}({\bm{\gamma}},\bm{p})=\sum_{i=1}^{n}(1-\delta_{i})U_{0i}(\bm{\gamma})$ , $\tilde{\mathcal{R}}_{01}({\bm{\gamma}},\bm{p})=-\sum_{i=1}^{n}(1-\delta_{i})[U_{1i}(\bm{p})+(e^{{\bm{\gamma}}^{\!\mathrm{\scriptscriptstyle\top}\!}\tilde{\bm{x}}_{i}}-1)U_{2i}(\bm{p})]$ , $\tilde{\mathcal{R}}_{02}({\bm{\gamma}},\bm{p})=\frac{1}{2}\sum_{i=1}^{n}(1-\delta_{i})[U_{1i}^{2}(\bm{p})+(e^{{\bm{\gamma}}^{\!\mathrm{\scriptscriptstyle\top}\!}\tilde{\bm{x}}_{i}}-1)U_{2i}^{2}(\bm{p})]$ , $\tilde{\mathcal{R}}_{11}({\bm{\gamma}},\bm{p})=-\sum_{i=1}^{n}\delta_{i}U_{3i}({\bm{\gamma}},\bm{p})$ , $\tilde{\mathcal{R}}_{12}({\bm{\gamma}},\bm{p})=\frac{1}{2}\sum_{i=1}^{n}\delta_{i}U_{3i}^{2}({\bm{\gamma}},\bm{p})$ ,

[TABLE]

It is clear, for all real $x$ ,

[TABLE]

Proof of Lemma 4 under condition (C0):

For uncensored data, all $\delta_{i}=0$ . By integration by parts we have

[TABLE]

where $j\in\mathbb{I}_{3}^{\infty}$ . Since $\bm{X}$ is bounded we have, for all $\bm{\gamma}\in\mathbb{B}_{d}(n^{-1+\epsilon})$ ,

[TABLE]

where $\lambda_{0}>0$ and $\lambda_{d}>0$ are, respectively, the minimum and maximum eigenvalues of $\mathrm{E}(\tilde{\bm{X}}\tilde{\bm{X}}^{\!\mathrm{\scriptscriptstyle\top}\!})$ . Similarly, repeated integration by parts implies

[TABLE]

By (28) we have $|e^{x}-1-x|\leq\frac{1}{2}|x|^{2}e^{|x|}$ , and

[TABLE]

Consequently

[TABLE]

Therefore by LIL we have, for all $\bm{\gamma}\in\mathbb{B}_{d}(n^{-1+\epsilon})$ ,

[TABLE]

For $j=1,2$ , denote

[TABLE]

Integration by parts implies

[TABLE]

We also have

[TABLE]

Therefore by (35) we have

[TABLE]

and

[TABLE]

Thus

[TABLE]

If $T$ is independent of covariate $\bm{X}$ then $\bm{\gamma}_{0}=\bm{0}$ and $\mathrm{E}[V^{2}_{1i}(\bm{p})]=\chi_{0}^{2}(\bm{p};\bm{x}_{0}).$ If $\bm{\gamma}_{0}\neq\bm{0}$ we have $\bm{\gamma}\neq\bm{0}$ for large $n$ and

[TABLE]

Since $\bm{\gamma}^{\!\mathrm{\scriptscriptstyle\top}\!}\bm{x}_{0}=\min\{\bm{\gamma}^{\!\mathrm{\scriptscriptstyle\top}\!}\bm{x}:\bm{x}\in\mathcal{X}\}$ , for any $\delta_{0}>0$ such that $\delta_{0}e^{\delta_{0}}<1$ , we have

[TABLE]

Hence we have

[TABLE]

Since for $x\geq 0$ , $e^{x}-1\leq xe^{x}$ , we have, for ${\bm{\gamma}}^{\!\mathrm{\scriptscriptstyle\top}\!}\tilde{\bm{x}}\leq\delta_{0}$ , ${\delta_{0}}^{-1}({e^{{\bm{\gamma}}^{\!\mathrm{\scriptscriptstyle\top}\!}\tilde{\bm{x}}}-1})-1\leq e^{{\bm{\gamma}}^{\!\mathrm{\scriptscriptstyle\top}\!}\tilde{\bm{x}}}-1\leq\delta_{0}e^{\delta_{0}}$ . We have

[TABLE]

Choosing $\delta_{0}$ to maximize $\delta_{0}(1-\delta_{0}e^{\delta_{0}})$ , we have

[TABLE]

and

[TABLE]

By (35)

[TABLE]

If $\mathrm{E}[W_{2i}(\bm{\gamma},\bm{p})]\leq n^{-1+\epsilon^{\prime}}$ , for any $\epsilon^{\prime}\in(\epsilon,1/2)$ , then by (37), (38), (7.6) we have, for all $\bm{\gamma}\in\mathbb{B}_{d}(n^{-1+\epsilon})$ ,

[TABLE]

For any $\epsilon^{\prime}\in(\epsilon,1/2)$ , if

[TABLE]

then we have, by (7.6), (7.6), and the LIL,

[TABLE]

and, by Kolmogorov’s SLLN,

[TABLE]

Thus, by (7.6), there is an $\eta>0$ so that ${\mathcal{R}}({\bm{\gamma}},\bm{p})=\sum_{j=0}^{2}\tilde{\mathcal{R}}_{0j}({\bm{\gamma}},\bm{p})\geq\eta n^{\epsilon^{\prime}}.$ While at $\bm{p}=\bm{p}_{0}$ , $m\geq C_{0}n^{1/\rho}$ , ${\mathcal{R}}({\bm{\gamma}},\bm{p}_{0})=\mathcal{O}(n^{\epsilon})=o(n^{\epsilon^{\prime}}).$ By (7.6), the minimizer $\tilde{\bm{p}}$ of ${\mathcal{R}}({\bm{\gamma}},\bm{p})$ for the fixed $\bm{\gamma}$ satisfies $D_{0}^{2}(\tilde{\bm{p}};\bm{x}_{0})\leq\mathrm{E}[W_{2i}(\bm{\gamma},\bm{p})]+\mathrm{E}[|e^{{\bm{\gamma}}^{\!\mathrm{\scriptscriptstyle\top}\!}\tilde{\bm{x}}_{i}}-e^{{\bm{\gamma}}_{0}^{\!\mathrm{\scriptscriptstyle\top}\!}\tilde{\bm{x}}_{i}}|U_{2i}^{2}(\bm{p})]\leq C^{\prime}n^{-1+\epsilon^{\prime}}$ for some constant $C^{\prime}$ and $\tilde{\bm{p}}\in A_{m}(\epsilon_{n})$ .

Similarly, for any $\bm{p}$ that satisfies $D_{0}^{2}({\bm{p}};\bm{x}_{0})\leq Cn^{-1+\epsilon}$ , we can prove that the maximizer $\tilde{\bm{\gamma}}$ of $\ell(\bm{\gamma},\bm{p})$ for the fixed $\bm{p}$ satisfies $\|\tilde{\bm{\gamma}}-\bm{\gamma}_{0}\|^{2}\leq C^{\prime}n^{-1+\epsilon^{\prime}}$ , for all $\epsilon^{\prime}\in(\epsilon,1/2)$ , almost surely. The proof under condition (C0) is complete.

Proof of Lemma 4 under condition (C1):

Case I: current status data, all $\delta_{i}=1$ . Let $G_{1}(\cdot|\bm{x})$ be the conditional distribution of the censoring variable given $\bm{X}=\bm{x}$ . We have

[TABLE]

The LIL and the Kolmogorov’s SLLN for $U_{3i}$ ’s implies, for all $\bm{p}\in\mathcal{A}(\epsilon_{n})$ ,

[TABLE]

By Taylor expansion, with $u=e^{\bm{\gamma}^{\!\mathrm{\scriptscriptstyle\top}\!}\tilde{\bm{x}}}$ , $a=e^{\bm{\gamma}_{0}^{\!\mathrm{\scriptscriptstyle\top}\!}\tilde{\bm{x}}}$ , $v=S_{m}(y|\bm{x}_{0};\bm{p})$ , $b=S(y|\bm{x}_{0})$ ,

[TABLE]

where

[TABLE]

for some $(\bar{a},\bar{b})$ on the line segment joining $(u,v)$ and $(a,b)$ , i.e.,

[TABLE]

For all $\bm{p}\in A_{m}(\epsilon_{n})$ , $|v-b|/{b}\leq\epsilon_{n}$ ,

[TABLE]

For $k=1,2$ ,

[TABLE]

Since $\log(1+z)=\sum_{k=1}^{\infty}(-1)^{k+1}\frac{z^{k}}{k}$ , $|z|<1$ , we have, for all $\bm{p}\in A_{m}(\epsilon_{n})$ ,

[TABLE]

For all positive integer $k$ we have

[TABLE]

For any $\bm{\gamma}\in\mathbb{B}_{d}(n^{-1+\epsilon})$ , $\epsilon\in(0,1/2)$ and $\bm{x}_{0}$ such that $\bm{\gamma}^{\!\mathrm{\scriptscriptstyle\top}\!}\bm{x}_{0}=\max_{\bm{x}\in\mathcal{X}}\bm{\gamma}^{\!\mathrm{\scriptscriptstyle\top}\!}\bm{x}$ . If, for $\epsilon^{\prime}\in(\epsilon,1/2)$ ,

[TABLE]

then it follows from (45–49), the triangular inequality, and inequality $|u(\log u)^{k}|\leq k^{k}e^{-k}$ , $u\in[0,1]$ , for positive integer $k$ , that, for all $\bm{p}\in A_{m}(\epsilon_{n})$ ,

[TABLE]

By (49), $\sigma^{2}(U_{3i})\geq|n^{-(1-\epsilon^{\prime})/2}-2e^{-1}\mathrm{E}^{1/2}[O(Y|\bm{X})]\mathcal{O}(n^{-(1-\epsilon)/2})|^{2}+o(n^{-1+\epsilon^{\prime}}).$ Thus, there is an $\eta_{0}>0$ , so that, for all $\bm{p}$ that satisfy $D_{1}^{2}(\bm{p};\bm{x}_{0})=n^{-1+\epsilon^{\prime}}$ , we have $\mathcal{R}(\bm{\gamma},\bm{p})\geq\eta_{0}n^{\epsilon^{\prime}}$ , a.s.. At $\bm{p}=\bm{p}_{0}$ , with $m\geq C_{0}n^{1/\rho}$ , $\mathcal{R}(\bm{\gamma},\bm{p}_{0})=\mathcal{O}(n^{\epsilon})$ , a.s.. Therefore $\mathcal{R}(\bm{\gamma},\bm{p})$ is minimized by $\tilde{\bm{p}}=\tilde{\bm{p}}(\bm{\gamma})$ such that

[TABLE]

Similarly, by (7.6), if $D_{1}^{2}(\tilde{\bm{p}};\bm{x}_{0})<n^{-1+\epsilon}$ for an $\bm{x}_{0}\in\mathcal{X}$ , then the minimizer $\tilde{\bm{\gamma}}=\tilde{\bm{\gamma}}(\bm{p})$ of $\mathcal{R}(\bm{\gamma},\bm{p})$ satisfies $\tilde{\bm{\gamma}}\in\mathcal{B}_{d}(n^{-1+\epsilon^{\prime}})$ for all $\epsilon^{\prime}\in(\epsilon,1/2)$ .

Proof of Lemma 4 under condition (C2):

For Case II interval censored data $\delta_{i}=1$ , let $G_{2}(y_{1},y_{2}|\bm{x})$ be the conditional distribution of $(Y_{1},Y_{2})$ given $\bm{X}=\bm{x}$ . We have

[TABLE]

Similarly

[TABLE]

Simplifying notations $\tilde{S}_{i}=S_{m}(Y_{i}|\bm{X};\bm{\gamma},\bm{p})$ , $S_{i}=S(Y_{i}|\bm{X})$ , and $\Lambda_{i}=\Lambda(Y_{i}|\bm{X})$ , $i=1,2,$ we have, clearly,

[TABLE]

Thus the proof under condition (C2) can be done by the argument similar to the proof under condition (C1). The proof of Lemma 4 is complete.

Now we prove Theorem 4. Let $\mathbb{B}_{d}(r)=\{\bm{\gamma}:\|\bm{\gamma}-\bm{\gamma}_{0}\|\leq r\}$ , where $\|\cdot\|$ denotes the Euclidean norm in $R^{d}$ . For a decreasing positive sequence $\epsilon_{n}\searrow 0$ slowly as $n\to\infty$ , e.g., $\epsilon_{n}=1/\log(n+2)$ , let $A_{m}(\epsilon_{n})$ be a subset of $\mathbb{S}_{{m^{*}}}$ so that, for all $t\in[0,b]$ , $|f_{m}(t|\bm{x}_{0};\bm{p})-f(t|\bm{x}_{0})|/f(t|\bm{x}_{0})\leq\epsilon_{n}$ . Clearly, for all $\bm{p}\in A_{m}(\epsilon_{n})$ , we have $|S_{m}(t|\bm{x}_{0};\bm{p})-S(t|\bm{x}_{0})|/S(t|\bm{x}_{0})\leq\epsilon_{n}$ .

If $\bm{\gamma}^{(0)}$ is chosen to be an efficient and asymptotically normal estimator of $\bm{\gamma}$ as in Cox, (1972) and Huang and Wellner, (1997), then, under the conditions of the theorem, for large $n$ , almost surely $\|\bm{\gamma}^{(0)}-\bm{\gamma}_{0}\|^{2}<n^{-1+\epsilon}.$ Lemma 4 and the convergence of $(\bm{\gamma}^{(s)},\bm{p}^{(s)})$ imply that $\|\hat{\bm{\gamma}}-\bm{\gamma}_{0}\|\leq n^{-1+\epsilon}$ , $D^{2}_{i}(\hat{\bm{p}};\hat{\bm{x}}_{0})\leq n^{-1+\epsilon}$ , and $\hat{\bm{p}}\in A_{m}(\epsilon_{n})$ . The proof is complete.

7.7 Proof of Theorem 5.

Uncensored Data: all $\delta_{i}=0$

Expansion of $Q(\tilde{\bm{\gamma}},S_{m})=\frac{\partial\ell_{m}(\tilde{\bm{\gamma}},{\bm{p}}_{0})}{\partial\bm{\gamma}}$ at $\bm{\gamma}_{0}$ :

[TABLE]

where

[TABLE]

and $\bar{\bm{\gamma}}={\bm{\gamma}}_{0}+\theta(\tilde{\bm{\gamma}}-{\bm{\gamma}}_{0})$ for some $\theta\in[0,1]$ . If $m=m_{n}$ satisfies $n^{1/2}m^{-\rho/2}=o(1)$ then

[TABLE]

and $\bm{Z}_{n}$ converges in distribution to normal with mean $\bm{0}$ and variance $\mathcal{I}$ . For any $\epsilon>0$ and large $n$ , $R_{n}(\tilde{\bm{\gamma}})=\mathcal{O}(n^{\epsilon})$ , a.s.. Thereofor $\sqrt{n}(\tilde{\bm{\gamma}}-\bm{\gamma}_{0})=\bm{J}_{n}^{-1}[\bm{Z}_{n}+\mathcal{O}(n^{-1/2+\epsilon})]$ converges in distribution to normal with mean $\bm{0}$ and variance $\mathcal{I}^{-1}$ .

Interval censored Data: all $\delta_{i}=1$

Expansion of $Q(\tilde{\bm{\gamma}},S_{m})=\frac{\partial\ell_{m}(\tilde{\bm{\gamma}},{\bm{p}}_{0})}{\partial\bm{\gamma}}$ at $\bm{\gamma}_{0}$ gives

[TABLE]

where

[TABLE]

for any $\epsilon>0$ and large $n$ , $R_{n}(\tilde{\bm{\gamma}})=\mathcal{O}(n^{\epsilon})$ , a.s.. If $m=m_{n}$ satisfies $n^{1/2}m^{-\rho/2}=o(1)$ then, for current status data,

[TABLE]

and for Case $k$ ( $k\geq 2$ ) interval censored data,

[TABLE]

where $\Lambda_{i}=\Lambda(Y_{i}|\bm{X})$ , $i=1,2$ . It is clear

[TABLE]

In both cases, $\bm{Z}_{n}$ converges in distribution to normal with mean $\bm{0}$ and variance $\mathcal{I}$ . For any $\epsilon>0$ and large $n$ , $R_{n}(\tilde{\bm{\gamma}})=\mathcal{O}(n^{\epsilon})$ , a.s.. Hence $\sqrt{n}(\tilde{\bm{\gamma}}-\bm{\gamma}_{0})=\bm{J}_{n}^{-1}[\bm{Z}_{n}+\mathcal{O}(n^{-1/2+\epsilon})]$ converges in distribution to normal with mean $\bm{0}$ and variance $\mathcal{I}^{-1}$ .

Bibliography43

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Anderson and Darling, (1954) Anderson, T. W. and Darling, D. A. (1954). A test of goodness of fit. Journal of the American Statistical Association , 49:765–769.
2(2) Anderson-Bergman, C. (2017 a). An efficient implementation of the EMICM algorithm for the interval censored NPMLE. Journal of Computational and Graphical Statistics , 26(2):463–467.
3(3) Anderson-Bergman, C. (2017 b). icenreg: Regression models for interval censored data in R. Journal of Statistical Software, Articles , 81(12):1–23.
4Becker and Melbye, (1991) Becker, N. G. and Melbye, M. (1991). Use of a log-linear model to compute the empirical survival curve from interval-censored data, with application to data on tests for HIV-positivity. Australian Journal of Statistics , 33(2):125–133.
5Bernstein, (1912) Bernstein, S. N. (1912). Démonstration du théorème de Weierstrass fondée sur le calcul des probabilitiés. Communications of the Kharkov Mathematical Society , 13:1–2.
6Betensky et al., (1999) Betensky, R. A., Lindsey, J. C., Ryan, L. M., and Wand, M. P. (1999). Local EM estimation of the hazard function for interval-censored data. Biometrics, , 55(1):238–245.
7Bickel et al., (1998) Bickel, P. J., Klaassen, C. A. J., Ritov, Y., and Wellner, J. A. (1998). Efficient and adaptive estimation for semiparametric models . Springer-Verlag, New York.
8Boyd and Vandenberghe, (2004) Boyd, S. and Vandenberghe, L. (2004). Convex optimization . Cambridge University Press, Cambridge.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Maximum Approximate Bernstein Likelihood Estimation in Proportional Hazard Model for Interval-Censored Data

Abstract

1 Introduction

2 Methodology

2.1 Proportional Hazards Model

2.2 Approximate Bernstein Polynomial Model

Lemma 1**.**

Theorem 1**.**

Theorem 2**.**

Lemma 2**.**

2.2.1 Some Special Cases

2.3 Model Selection

3 Asymptotic Results

3.1 Some Assumptions and Conditions

(A1)****.

(A2)****.

Lemma 3**.**

(C0)****.

(C1)****.

(C2)****.

Theorem 3**.**

3.2 Some Statistical Distances

Theorem 4**.**

Theorem 5**.**

Remark 1**.**

4 Simulation

5 Examples

5.1 Gentleman and Geyer, (1994)’s Example

5.2 Stanford Heart Transplant Data

5.3 Ovarian Cancer Data

6 Concluding Remarks

7 Appendix

7.1 Proof of Lemma 1

7.2 Proof of Lemma 2

7.3 Proof of Theorem 1

7.4 Proof of Theorem 2

7.5 Proof of Theorem 3

7.6 Proof of Theorem 4

Lemma 4**.**

Proof of Lemma 4

Proof of Lemma 4 under condition (C0):

Proof of Lemma 4 under condition (C1):

Proof of Lemma 4 under condition (C2):

7.7 Proof of Theorem 5.

Uncensored Data: all δi=0\delta_{i}=0δi​=0

Interval censored Data: all δi=1\delta_{i}=1δi​=1

Lemma 1.

Theorem 1.

Theorem 2.

Lemma 2.

(A1).

(A2).

Lemma 3.

(C0).

(C1).

(C2).

Theorem 3.

Theorem 4.

Theorem 5.

Remark 1.

Lemma 4.

Uncensored Data: all $\delta_{i}=0$

Interval censored Data: all $\delta_{i}=1$