Efficient Estimation For The Cox Proportional Hazards Cure Model

Khandoker Akib Mohammad; Yuichi Hirose; Budhi Surya; Yuan Yao

arXiv:1905.08424·stat.ME·January 27, 2020

Efficient Estimation For The Cox Proportional Hazards Cure Model

Khandoker Akib Mohammad, Yuichi Hirose, Budhi Surya, Yuan Yao

PDF

Open Access

TL;DR

This paper develops an efficient estimation method for the Cox Proportional Hazards Cure Model, providing explicit variance estimation and demonstrating its effectiveness through simulations and real data analysis.

Contribution

It introduces an explicit variance estimator for the profile likelihood estimator and shows the equivalence of the efficient score function and the profile likelihood score function.

Findings

01

Estimated standard errors are comparable between bootstrap and proposed method.

02

The method provides accurate variance estimates in simulation studies.

03

Application to melanoma data demonstrates practical utility.

Abstract

While analysing time-to-event data, it is possible that a certain fraction of subjects will never experience the event of interest and they are said to be cured. When this feature of survival models is taken into account, the models are commonly referred to as cure models. In the presence of covariates, the conditional survival function of the population can be modelled by using cure model which depends on the probability of being uncured (incidence) and the conditional survival function of the uncured subjects (latency), and a combination of logistic regression and Cox proportional hazards (PH) regression is used to model the incidence and latency respectively. In this paper, we have shown the asymptotic normality of the profile likelihood estimator via asymptotic expansion of the profile likelihood and obtain the explicit form of the variance estimator with an implicit function in the…

Tables1

Table 1. Table 1: Simulation results for Cox PH cure model

cure rate 25%		SMCURE package				Our approach
Parameter	True value	Bias	SE	ESE	CP	Bias	SE	ESE	CP
$b_{0}$	2.1	0.0622	0.3493	0.3570	0.9490	0.0678	0.3489	0.3429	0.9660
$b_{1}$	-1	-0.0270	0.4227	0.4255	0.9720	-0.0527	0.4170	0.4182	0.9670
$b_{2}$	0.3	0.0104	0.2034	0.2116	0.9570	0.0414	0.1976	0.2025	0.9530
$β_{1}$	-1	0.0019	0.1844	0.1834	0.9500	0.0152	0.1818	0.1707	0.9320
$β_{2}$	0.5	0.0055	0.0883	0.0924	0.9550	-0.0073	0.0889	0.0908	0.9510
cure rate 50%		SMCURE package				Our approach
Parameters	True value	Bias	SE	ESE	CP	Bias	SE	ESE	CP
$b_{0}$	1.022	0.0247	0.2352	0.2190	0.9290	0.0285	0.2353	0.2279	0.9480
$b_{1}$	-1	-0.0311	0.3084	0.3210	0.9590	-0.0565	0.3082	0.3139	0.9550
$b_{2}$	0.3	0.0091	0.1691	0.1763	0.9580	0.0211	0.1697	0.1712	0.9570
$β_{1}$	-1	-0.0152	0.2240	0.2232	0.9530	-0.0086	0.2241	0.2035	0.9300
$β_{2}$	0.5	0.0103	0.1155	0.1153	0.9490	-0.0003	0.1147	0.1172	0.9530
cure rate 75%		SMCURE package				Our approach
Parameters	True value	Bias	SE	ESE	CP	Bias	SE	ESE	CP
$b_{0}$	-0.1	-0.0049	0.2043	0.1926	0.9370	0.0105	0.2109	0.2073	0.9430
$b_{1}$	-1	-0.0106	0.3200	0.3283	0.9640	-0.0315	0.3362	0.3231	0.9490
$b_{2}$	0.3	0.0027	0.1603	0.1603	0.9530	0.0191	0.1678	0.1580	0.9440
$β_{1}$	-1	-0.0092	0.3112	0.3240	0.9580	0.0186	0.3207	0.2948	0.9150
$β_{2}$	0.5	0.0147	0.1401	0.1449	0.9530	-0.0045	0.1427	0.1484	0.9600

Equations262

p = P r (V = 1, W; b) = \frac{e ^{b^{'} W}}{1 + e ^{b^{'} W}},

p = P r (V = 1, W; b) = \frac{e ^{b^{'} W}}{1 + e ^{b^{'} W}},

λ (t ∣ V = 1, Z; β) = λ_{0} (t ∣ V = 1) e^{β^{'} Z},

λ (t ∣ V = 1, Z; β) = λ_{0} (t ∣ V = 1) e^{β^{'} Z},

p f (t ∣ V = 1, Z; λ, β),

p f (t ∣ V = 1, Z; λ, β),

(1 - p) + pS (t ∣ V = 1, Z; λ, β),

(1 - p) + pS (t ∣ V = 1, Z; λ, β),

\delta_{i}=\left\{\begin{array}[]{rcl}1&\mbox{for}&T_{i}=\textrm{event time}\\ 0&\mbox{for}&T_{i}=\textrm{censored time}\end{array}\right.

\delta_{i}=\left\{\begin{array}[]{rcl}1&\mbox{for}&T_{i}=\textrm{event time}\\ 0&\mbox{for}&T_{i}=\textrm{censored time}\end{array}\right.

L(b,\beta,\lambda)=\prod_{i=1}^{n}\bigg{\{}p_{i}f(t_{i}|V=1,Z_{i};\lambda,\beta)\bigg{\}}^{\delta_{i}}\bigg{\{}(1-p_{i})+p_{i}S(t_{i}|V=1,Z_{i};\lambda,\beta)\bigg{\}}^{1-\delta_{i}},

L(b,\beta,\lambda)=\prod_{i=1}^{n}\bigg{\{}p_{i}f(t_{i}|V=1,Z_{i};\lambda,\beta)\bigg{\}}^{\delta_{i}}\bigg{\{}(1-p_{i})+p_{i}S(t_{i}|V=1,Z_{i};\lambda,\beta)\bigg{\}}^{1-\delta_{i}},

f (t ∣ V = 1, Z; λ, β) = λ (t ∣ V = 1, Z; β) S (t ∣ V = 1, Z; λ, β) .

f (t ∣ V = 1, Z; λ, β) = λ (t ∣ V = 1, Z; β) S (t ∣ V = 1, Z; λ, β) .

L (b, β, Λ_{0}) =

L (b, β, Λ_{0}) =

L_{c} (b, β, Λ_{0}; v) =

L_{c} (b, β, Λ_{0}; v) =

\displaystyle\times\prod_{i=1}^{n}\bigg{[}p_{i}S(t_{i}|V=1,Z_{i};\lambda,\beta)\bigg{]}^{(1-\delta_{i})v_{i}}\times\prod_{i=1}^{n}\bigg{[}1-p_{i}\bigg{]}^{(1-\delta_{i})(1-v_{i})}.

L_{c} (b, β, Λ_{0}; v) =

L_{c} (b, β, Λ_{0}; v) =

\displaystyle\sum_{i=1}^{n}\bigg{\{}\gamma(V_{i})\log p_{i}+(1-\gamma(V_{i}))\log(1-p_{i})\bigg{\}}+\sum_{i=1}^{n}\gamma(V_{i})\bigg{\{}\delta_{i}\log\lambda(t_{i}|V=1,Z_{i};\beta)+\log S(t_{i}|V=1,Z_{i};\lambda,\beta)\bigg{\}},

\displaystyle\sum_{i=1}^{n}\bigg{\{}\gamma(V_{i})\log p_{i}+(1-\gamma(V_{i}))\log(1-p_{i})\bigg{\}}+\sum_{i=1}^{n}\gamma(V_{i})\bigg{\{}\delta_{i}\log\lambda(t_{i}|V=1,Z_{i};\beta)+\log S(t_{i}|V=1,Z_{i};\lambda,\beta)\bigg{\}},

γ (V_{i}) =

γ (V_{i}) =

\sum_{i=1}^{n}\gamma(V_{i})\bigg{[}\delta_{i}\big{\{}\log\lambda_{i}+\beta^{\prime}Z_{i}\big{\}}-e^{\beta^{\prime}Z_{i}}\sum_{j=1}^{n}\lambda_{j}1\{t_{j}\leq t_{i}\}\bigg{]}.

\sum_{i=1}^{n}\gamma(V_{i})\bigg{[}\delta_{i}\big{\{}\log\lambda_{i}+\beta^{\prime}Z_{i}\big{\}}-e^{\beta^{\prime}Z_{i}}\sum_{j=1}^{n}\lambda_{j}1\{t_{j}\leq t_{i}\}\bigg{]}.

\hat{λ}_{k} (t ∣ V = 1; β) = \frac{δ _{k}}{\sum _{l = 1}^{n} γ ( V _{l} ) 1 { t _{k} \leq t _{l} } e ^{β^{'} Z_{l}}} .

\hat{λ}_{k} (t ∣ V = 1; β) = \frac{δ _{k}}{\sum _{l = 1}^{n} γ ( V _{l} ) 1 { t _{k} \leq t _{l} } e ^{β^{'} Z_{l}}} .

\hat{Λ} (t ∣ V = 1; β) = i = 1 \sum n \frac{δ _{i} 1 { t _{i} \leq t }}{\sum _{l = 1}^{n} γ ( V _{l} ) 1 { t \leq t _{l} } e ^{β^{'} Z_{l}}} .

\hat{Λ} (t ∣ V = 1; β) = i = 1 \sum n \frac{δ _{i} 1 { t _{i} \leq t }}{\sum _{l = 1}^{n} γ ( V _{l} ) 1 { t \leq t _{l} } e ^{β^{'} Z_{l}}} .

\gamma(V_{i})=E(V_{i}|T_{i},\delta_{i},Z_{i})=\bigg{(}\frac{p_{i}S\big{(}t_{i}|V=1,Z_{i};\hat{\lambda}(\beta),\beta\big{)}}{1-p_{i}+p_{i}S\big{(}t_{i}|V=1,Z_{i};\hat{\lambda}(\beta),\beta\big{)}}\bigg{)}^{1-\delta_{i}}.

\gamma(V_{i})=E(V_{i}|T_{i},\delta_{i},Z_{i})=\bigg{(}\frac{p_{i}S\big{(}t_{i}|V=1,Z_{i};\hat{\lambda}(\beta),\beta\big{)}}{1-p_{i}+p_{i}S\big{(}t_{i}|V=1,Z_{i};\hat{\lambda}(\beta),\beta\big{)}}\bigg{)}^{1-\delta_{i}}.

\small\sum_{i=1}^{n}\bigg{[}\bigg{\{}\gamma(V_{i})\log p_{i}+(1-\gamma(V_{i}))\log(1-p_{i})\bigg{\}}+\gamma(V_{i})\bigg{\{}\delta_{i}\log\hat{\lambda}(t_{i}|V=1,Z_{i};\beta)+\log S(t_{i}|V=1,Z_{i};\hat{\lambda}(\beta),\beta)\bigg{\}}\bigg{]},

\small\sum_{i=1}^{n}\bigg{[}\bigg{\{}\gamma(V_{i})\log p_{i}+(1-\gamma(V_{i}))\log(1-p_{i})\bigg{\}}+\gamma(V_{i})\bigg{\{}\delta_{i}\log\hat{\lambda}(t_{i}|V=1,Z_{i};\beta)+\log S(t_{i}|V=1,Z_{i};\hat{\lambda}(\beta),\beta)\bigg{\}}\bigg{]},

\hat{Λ} (t) = \int_{0}^{t} \frac{\sum _{i = 1}^{n} d N _{i} ( u )}{\sum _{i = 1}^{n} γ ( V _{i} ) Y _{i} ( u ) e ^{β^{'} Z_{i}}},

\hat{Λ} (t) = \int_{0}^{t} \frac{\sum _{i = 1}^{n} d N _{i} ( u )}{\sum _{i = 1}^{n} γ ( V _{i} ) Y _{i} ( u ) e ^{β^{'} Z_{i}}},

\hat{Λ}_{β, F_{n}} (t) = \int_{0}^{t} \frac{E _{F_{n}} d N ( u )}{E _{F_{n}} γ ( V ) Y ( u ) e ^{β^{'} Z}} .

\hat{Λ}_{β, F_{n}} (t) = \int_{0}^{t} \frac{E _{F_{n}} d N ( u )}{E _{F_{n}} γ ( V ) Y ( u ) e ^{β^{'} Z}} .

\sum_{i=1}^{n}\bigg{\{}\log P(V_{i}|b)+\log P\big{(}T_{i},\delta_{i}|\hat{\Lambda}_{\beta,F_{n}},\beta\big{)}\bigg{\}},

\sum_{i=1}^{n}\bigg{\{}\log P(V_{i}|b)+\log P\big{(}T_{i},\delta_{i}|\hat{\Lambda}_{\beta,F_{n}},\beta\big{)}\bigg{\}},

lo g P (V_{i} ∣ b) =

lo g P (V_{i} ∣ b) =

=

\displaystyle\log P\big{(}T_{i},\delta_{i}|\hat{\Lambda}_{\beta,F_{n}},\beta\big{)}={}

\displaystyle\log P\big{(}T_{i},\delta_{i}|\hat{\Lambda}_{\beta,F_{n}},\beta\big{)}={}

=

ϕ (V_{i}, T_{i}, δ_{i} ∣ b, β, F_{n}) = ϕ_{l} (V_{i} ∣ b) + ϕ_{s} (T_{i}, δ_{i} ∣ β, F_{n}),

ϕ (V_{i}, T_{i}, δ_{i} ∣ b, β, F_{n}) = ϕ_{l} (V_{i} ∣ b) + ϕ_{s} (T_{i}, δ_{i} ∣ β, F_{n}),

ϕ_{l} (V_{i} ∣ b) =

ϕ_{l} (V_{i} ∣ b) =

ϕ_{s} (T_{i}, δ_{i} ∣ β, F_{n}) =

ϕ_{s} (T_{i}, δ_{i} ∣ β, F_{n}) =

=

B (T_{i}, δ_{i} ∣ β, F) h =

B (T_{i}, δ_{i} ∣ β, F) h =

=

\displaystyle+e^{\beta^{\prime}Z_{i}}\int_{0}^{T_{i}}\frac{E_{F}dN(u)E_{h}Y(u)\gamma(V)e^{\beta^{\prime}Z}}{\big{(}E_{F}Y(u)\gamma(V)e^{\beta^{\prime}Z}\big{)}^{2}}\bigg{\}}.

\hat{Λ}_{β_{0}, F_{0}} (t) = \int_{0}^{t} \frac{E [ d N ( u )]}{E [ γ ( V ) Y ( u ) e ^{β_{0}^{'} Z} ]},

\hat{Λ}_{β_{0}, F_{0}} (t) = \int_{0}^{t} \frac{E [ d N ( u )]}{E [ γ ( V ) Y ( u ) e ^{β_{0}^{'} Z} ]},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Advanced Statistical Methods and Models · Advanced Statistical Process Monitoring

Full text

EFFICIENT ESTIMATION FOR THE

COX PROPORTIONAL HAZARDS CURE MODEL

Khandoker Akib Mohammad∗, Yuichi Hirose, Budhi Surya and Yuan Yao

Victoria University of Wellington

Abstract: While analysing time-to-event data, it is possible that a certain fraction of subjects will never experience the event of interest and they are said to be cured. When this feature of survival models is taken into account, the models are commonly referred to as cure models. In the presence of covariates, the conditional survival function of the population can be modelled by using cure model which depends on the probability of being uncured (incidence) and the conditional survival function of the uncured subjects (latency), and a combination of logistic regression and Cox proportional hazards (PH) regression is used to model the incidence and latency respectively. In this paper, we have shown the asymptotic normality of the profile likelihood estimator via asymptotic expansion of the profile likelihood and obtain the explicit form of the variance estimator with an implicit function in the profile likelihood. We have also shown the efficient score function based on projection theory and the profile likelihood score function are equal. Our contribution in this paper is that we have expressed the efficient information matrix as the variance of the profile likelihood score function. A simulation study suggests that the estimated standard errors from bootstrap samples (SMCURE package) and the profile likelihood score function (our approach) are providing similar and comparable results. The numerical result of our proposed method is also shown by using the melanoma data from SMCURE R-package (Cai et al., 2012) and we compare the results with the output obtained from SMCURE package.

Key words and phrases: Cox PH model, Cure model, Efficient score function, EM algorithm, Implicit function, Profile likelihood.

**1. Introduction

**In survival analysis, Cox PH cure model has attracted attention for decades. Kuk and Chen (1992) first proposed the Cox PH cure model as a semiparametric generalization of Farewell’s model (1982) where a combination of Cox PH model and logistic regression has been used to study the survival times of uncured subjects and cure rate respectively. In clinical settings, Cox PH cure model has been widely used for modelling the failure time data for various types of cancer studies such as breast cancer, head and neck cancer, leukemia, prostate cancer, melanoma etc (Peng and Dear, 2000; Sy and Taylor, 2000, 2001; Zhao and Zhou, 2006; Othus et al., 2012; Peng and Taylor, 2014; Amico and Keilegom, 2018).

The efficiency and asymptotic distribution of semiparametric maximum likelihood estimator have been studied for the Cox PH cure model by Fang, Li and Sun (2005). Later a non-parametric maximum likelihood approach has been used to find the estimator of the cumulative hazard and the regression parameters from the Cox PH cure model, and the asymptotic properties are established by the modern empirical process theory (Lu, 2008). The joint maximization approach developed by Murphy (1994, 1995) has been used by Lu (2008) to find the efficient estimators for Cox PH cure model. However, the above works of efficiency and asymptotic distribution of maximum likelihood estimator did not address the computation with the implicit function in the profile likelihood estimation. Later, Cai et al. (2012) developed an R package (SMCURE) to fit the Cox PH cure model which have received much attention in recent years (Peng and Taylor, 2014; Amico and Keilegom, 2018). In SMCURE package, Cai et al. (2012) used the melanoma data from the ECOG phase III clinical trial e1684, where the standard errors of the estimated parameters have been calculated by using bootstrap methods.

Hsieh et al. (2006) pointed out that in some examples, the estimator of baseline hazard function based on the profile likelihood approach is an implicit function. For these examples, including the model in this paper, it was very challenging to show asymptotic normality of the profile likelihood estimator. For this reason, Hsieh et al. (2006) proposed to use Bootstrap method (Efron and Tibshirani, 1994) to get the standard errors while using the profile likelihood approach.

In the papers (Zeng and Cai, 2005; Zeng and Lin, 2007, 2010), the baseline hazard and regression parameters have been maximized jointly where the idea of equality between the profile likelihood estimator and the maximum likelihood estimator has been used. Maximizing the profile likelihood with respect to regression parameters lead us to same estimate as one obtained by maximizing the likelihood jointly with respect to regression parameters and baseline hazard. In these papers, the asymptotic distributions of the estimated regression parameters and baseline hazard have shown by jointly maximizing the likelihood function where projection theory has been used to compute the efficient score function. Using these results, the asymptotic normality of the estimated regression parameters has been shown without going through profile likelihood expansion.

Murphy and Vaart (2000) have invented a version of profile likelihood approach in semiparametric models as an inferential tool. They have used an ’approximate least favorable submodel’ to deal with implicit function under the profile likelihood approach. Their proposed approach has been used to differentiate the ’approximate least favorable submodel’ and didn’t involve the differentiation of the profile likelihood function.

All of these works (Murphy and Vaart, 2000; Zeng and Cai, 2005; Zeng and Lin, 2007, 2010) have used the techniques which can avoid differentiation of the implicit function under the profile likelihood function and therefore they didn’t derive the score function based on the profile likelihood. Ultimately, they have shown the asymptotic variance of the estimator as the inverse of the efficient information matrix. However, they have not shown the efficient information matrix in terms of the profile likelihood score function.

In this paper, we profile out the baseline hazard function from Cox PH cure model and plugged the estimator in the likelihood function. However the problem is that the estimator of the baseline hazard function is an implicit function (Rizopoulos, 2012). We solve the difficulty of showing asymptotic normality of the estimator (Theorem-3 and Theorem-4 in Section 3.3). This approach is alternative to the methodologies where the asymptotic normality of profile likelihood estimator has been studied (Murphy and Vaart, 2000; Zeng and Cai, 2005; Zeng and Lin, 2007, 2010; Hirose, 2011b, 2016). In this paper, we have used the asymptotic expansion of profile likelihood function to get the asymptotic normality of the profile likelihood estimator and obtain the explicit form of the variance estimator using the profile likelihood score function. These results can be used in computation to calculate the estimated variance of the profile likelihood estimator which is illustrated in the simulation study (Section-4) and in numerical example (Section-5). For the numerical example, we have used the data (ECOG phase III clinical trial e1684) from SMCURE package and computed the standard errors of the estimated parameters from the efficient information matrix (based on the profile likelihood score function).

Our contribution in this paper is that we have expressed the efficient information matrix as the variance of the score function in the profile likelihood. This gives us not only analytical understanding of the profile likelihood estimation, but also numerical method to compute the efficient information matrix using the profile likelihood score function.

This paper is organized as follows. A brief discussion on Cox PH cure model has been given in Section-2. In Section-3, we describe the estimation procedure and theorems which are used to show that the profile likelihood estimators are consistent and asymptotically normal. Results obtained from the profile expansion of Cox PH cure model are shown in Section-4 and Section-5. This paper concludes in Section-6 with a short discussion.

2 Cox PH Cure Model

Let us define a binary variable $V$ , where $V=0$ indicates an individual that will be a long-term survivor (never experience the event of interest) and $V=1$ indicates an individual that will experience the event. For an individual with covariate vector $W=(1,W_{1},...,W_{n})$ , the distribution of $V=1$ can be expressed as a logistic model

[TABLE]

where $p$ is the probability of being susceptible (often called incidence of the model), $b$ is a vector parameter and $W$ include the intercept. The time to experience the event among individuals for which $V=1$ can be modelled by Cox PH model

[TABLE]

where we observe another set of covariate $Z$ without intercept and $\lambda_{0}(t|V=1)$ is the baseline hazard function. The two sets of covariates may be identical, or partially or completely different from each other (Kuk and Chen, 1992).

An individual who experience the event at time $t$ contributes a likelihood factor

[TABLE]

which is the probability of death at time $t$ (Kuk and Chen, 1992). On the other hand, an individual who has been followed to time $t$ without experiencing the event contributes a likelihood factor

[TABLE]

which is the probability of long-term survivor (cure) plus the probability of experiencing the event after time $t$ (Kuk and Chen, 1992). In addition $S(t|V=1,Z;\lambda,\beta)=S_{0}(t|V=1)e^{\beta^{\prime}Z}$ is the conditional survival function of the susceptibles (often called the latency) where $S_{0}(t|V=1)=\exp\big{(}-\Lambda_{0}(t|V=1)\big{)}=\exp\big{(}-\int_{0}^{t}\lambda_{0}(s|V=1)ds\big{)}$ is the baseline survival function and $\Lambda_{0}(t|V=1)$ is the baseline cumulative hazard function.

3 Estimation

Suppose the observed data for individual $i$ can be denoted by $(T_{i},\delta_{i},Z_{i});i=1,2...,n$ where $T_{i}$ is the length of time a subject was observed, $Z_{i}$ is a vector of covariates. Moreover, $\delta_{i}$ indicates whether the observed time is censored or not

[TABLE]

For convenience, let $W_{i}=(1,Z^{\prime}_{i})^{\prime}$ , although the covariates in $W_{i}$ and $Z_{i}$ do not have to be equal.

The likelihood for $n$ observations will be

[TABLE]

where $p_{i}$ is the probability of $i$ th individual being susceptible. We know that

[TABLE]

So for the Cox PH cure model, the observed full likelihood function can be written as

[TABLE]

Here we want to obtain the estimates of $b$ and $\beta$ that maximize $L(b,\beta,\Lambda_{0})$ . For maximizing $L(b,\beta,\Lambda_{0})$ , we are going to apply profile likelihood technique in which $\Lambda_{0}(t)$ is profiled out from the likelihood.

3.1 The Expectation-Maximization (EM) Algorithm

Let us define the complete data by $(t_{i},\delta_{i},Z_{i},v_{i}),~{}i=1,...,n$ which includes the observed data and unobserved $v_{i}$ , where $v_{i}$ is the value taken by the variable $V_{i}$ . It follows that if $\delta_{i}=1$ then $v_{i}=1$ and if $\delta_{i}=0$ then $v_{i}$ is unobserved. The choice for using EM algorithm is justified by the fact that the model depends on a latent variable, $v_{i}$ (cure status). Moreover, the aim of EM algorithm is to maximize observed data likelihood from a complete data likelihood (Dempster, Laird and Rubin, 1977). So the complete data likelihood can be written as

[TABLE]

The above equation can be rewritten as the product of a logistic and a PH component.

[TABLE]

So it is possible to estimate the incidence and the latency separately (Amico and Keilegom, 2018). Now the expected complete data log-likelihood under $p(V|T,\delta,Z)$ is

[TABLE]

where $\gamma(V_{i})$ can be defined as

[TABLE]

Here, for censored cases $\gamma(V_{i})=E(V_{i}|T_{i},\delta_{i},Z_{i})$ and for uncensored cases $\gamma(V_{i})=1$ . To estimate all parameters and the baseline hazards simultaneously, we combine the EM algorithm and profile likelihood approach. From equation (3.4), it can be observed that the likelihood function for the logistic component is same as for a classical logistic regression model. To estimate the parameters for incidence, we can apply the Newton-Raphson technique.

Baseline Hazard Estimation

Before starting the EM algorithm, we profile out the baseline hazard function $\lambda_{0}(t)$ using NPMLE (non-parametric maximum likelihood estimation). The survival part of equation (3.5) can be separately maximized with respect to $\lambda$ using the log-likelihood:

[TABLE]

Now from the derivative with respect to $\lambda_{k}$ , we get

[TABLE]

So the estimate of the baseline cumulative hazard, $\Lambda(t)$ will be

[TABLE]

The E-step

In the E-step, we use the current parameter estimates $b$ and $\beta$ to find the expected values of $V_{i}$ :

[TABLE]

The M-step

By replacing $\lambda$ with $\hat{\lambda}(\beta)$ , we maximize the equation (3.5)

[TABLE]

with respect to $b$ and $\beta$ to obtain $\hat{b}$ and $\hat{\beta}$ respectively. The estimated parameters from the M-step are returned into E-step until the values of $\hat{b}$ and $\hat{\beta}$ converge.

3.2 Score Functions

An estimator of the baseline cumulative hazard function in the counting process notation (Fleming and Harrington, 2011) can be written from equation (3.8) as

[TABLE]

where $N(t)=1\{T\leq t,\delta=1\}$ and $Y(t)=1\{T\geq t\}.$

Let us denote $E_{F_{n}}f=\int fdF_{n}$ . Then $\hat{\Lambda}(t)$ can be expressed as

[TABLE]

Now from (3.10), the log-profile likelihood can be written as

[TABLE]

where $\log P(V_{i}|b)$ and $\log P\big{(}T_{i},\delta_{i}|\hat{\Lambda}_{\beta,F_{n}},\beta\big{)}$ are the log-profile likelihood functions (for one observation) for logistic and Cox PH component respectively. Now we can express the components as

[TABLE]

and

[TABLE]

The score functions for the profile likelihood are

[TABLE]

where $\phi_{l}(V_{i}|b)$ is the score function for logistic component which can be expressed as

[TABLE]

and $\phi_{s}(T_{i},\delta_{i}|\beta,F_{n})$ is the score function for survival component which can be written as

[TABLE]

Now we will calculate the score function $B(T_{i},\delta_{i}|\beta,F)$ , which is Hadamard differentiable with respect to $F$ . For an integrable function $h$ with the same domain as $F$ , we can express

[TABLE]

where, $d_{F}\log P(T_{i},\delta_{i}|\beta,\hat{\Lambda}_{\beta,F})$ represents the Hadamard derivative of $\log P(T_{i},\delta_{i}|\beta,\hat{\Lambda}_{\beta,F})$ with respect to $F$ (Hirose, 2011a).

Theorem 1: At the true value of $(b,\beta,F)$ , we are going to prove the followings

$\hat{\Lambda}_{\beta_{0},F_{0}}(t)=\Lambda_{0}(t)$ , the true cumulative hazard and
The score function $\phi(V,T,\delta|b_{0},\beta_{0},F_{0})$ defined in (3.16) is the efficient score function where we drop the subscript $i$ .

Proof: Replace $F_{n}$ by $F_{0}$ ,we get from (3.12)

[TABLE]

where $E$ is the expectation with respect to the true distribution $F_{0}$ . At the true value of the parameters $(\beta,F)$ we can write

[TABLE]

So from this point of view, we have $\hat{\Lambda}_{\beta_{0},F_{0}}(t)=\Lambda_{0}(t).$

The score function $\phi(V,T,\delta|b,\beta,F)=\phi_{l}(V|b)+\phi_{s}(T,\delta|\beta,F)$ in (3.16) has two parts. We know that the logistic model is a parametric model that does not involve $\Lambda$ , so we will work on the survival part of the score function. So the score function for the survival part at the true value of the parameters $(\beta,F)$ can be expressed as

[TABLE]

Let $M_{1}(u)=E[\gamma(V)Y(u)Ze^{\beta_{0}^{\prime}Z}]$ and $M_{0}(u)=E[\gamma(V)Y(u)e^{\beta_{0}^{\prime}Z}]$ . So by using equation (3.20), the above equation can be expressed as

[TABLE]

which is the efficient score function for Cox PH cure model. The calculation of efficient score function based on the projection theory is given in Supplementary Materials (equation S5.12).

3.3 Asymptotic Normality of the MLE

Assumptions:

To show the asymptotic normality of the MLE and its asymptotic variance, we have to consider some assumptions. On the set of cdf functions $\digamma$ , we use the sup-norm, i.e., for $F,F_{0}\in\digamma$ ,

[TABLE]

For $\rho>0$ , let

[TABLE]

The assumptions are given below

A1: We assume that there exists a finite number $\tau>0$ such that $S(\tau)=P(T>\tau)=E[Y(\tau)]>0$ .

A2: The range of $Z$ is bounded and $\beta$ is in the compact set $\Theta$ which follows $||Z||\leq M$ and $||\beta||\leq M$ for some $0<M<\infty$ .

A3: The empirical cdf $F_{n}$ is $\sqrt{n}$ consistent i.e. $\sqrt{n}|F_{n}-F_{0}|=O_{p}(1)$ .

A4: The efficient information matrix $I_{s}^{*}=E[\phi^{*}_{\beta}\phi^{*^{\prime}}_{\beta}]$ is invertible.

Theorem-2: If the assumptions (A1-A4) hold, then

$\hat{\beta}_{n}\stackrel{{\scriptstyle P}}{{\rightarrow}}\beta_{0}$ as $n\rightarrow\infty$ and
$\hat{\Lambda}_{\hat{\beta}_{n},F_{n}}-\Lambda_{0}=o_{p}(1)$ .

The proof of Theorem-2 is given in Supplementary Materials.

Theorem-3: The score functions $\phi_{s}(T,\delta|\beta,F)$ and $B(T,\delta|\beta,F)$ are defined previously. Suppose for $(A1)$ - $(A4)$ , $\hat{\beta}_{n}\stackrel{{\scriptstyle P}}{{\rightarrow}}\beta_{0}$ and $F_{n}\stackrel{{\scriptstyle P}}{{\rightarrow}}F_{0}$ as $n\rightarrow\infty$ , then we have

[TABLE]

and

[TABLE]

Remark: The results are obtained without assuming the derivative of the score functions $\frac{\partial}{\partial\beta}\phi_{s}(T,\delta|\beta,F)$ and $d_{F}B(T,\delta|\beta,F)$ exist. This result give us asymptotic expansion of profile likelihood without differentiating the score function that involve implicit function.

The proof of Theorem-3 is given in Supplementary Materials.

Theorem-4: If the assumptions $\{A1,A2,A3,A4\}$ are satisfied, then a consistent estimator $\hat{\beta}_{n}$ to the estimating equation

[TABLE]

is an asymptotically linear estimator for $\beta_{0}$ (Hirose, 2011a) with the efficient influence function $(I_{s}^{*})^{-1}\phi_{s}(T,\delta|\beta_{0},F_{0})$ , so that

[TABLE]

where $N\{0,(I_{s}^{*})^{-1}\}$ is a normal distribution with mean zero and variance $(I_{s}^{*})^{-1}$ . So the estimator $\hat{\beta}_{n}$ is efficient.

The proof of Theorem-4 is given in Supplementary Materials.

4 Simulation Study

We are going to perform a simulation study where our goal is to compare and contrast the SMCURE package with our approach by assessing parameter estimation and standard error estimation. Survival times and censoring times were generated from Weibull proportional hazards model and uniform distribution respectively. Simulation results for Cox PH cure model were evaluated with two covariates (fixed by design), one binary covariate from binomial distribution with probability 0.5 and one continuous covariate generated from standard normal distribution $N(0,1)$ . Therefore, the covariate vectors for logistic and survival components were $W=(W_{0},W_{1},W_{2})$ and $Z=(Z_{1},Z_{2})$ respectively.

The cure rates were varied through the coefficients ( $b$ ) corresponding to $W$ . The slight cure rate for the treatment group ( $W_{1}=1$ ) was 25% and for the control group ( $W_{1}=0$ ) was 11%, resulting from $b=(2.1,-1,0.3)$ . The moderate cure rate for the treatment group was 50% and for the control group was 27%, resulting from $b=(1.022,-1,0.3)$ . Moreover, The substantial cure rate for the treatment group was 75% and for the control group was 53%, resulting from $b=(-0.1,-1,0.3)$ . For each configuration, mean was chosen (which is zero) as the value of the continuous covariate ( $W_{2}$ ). Moreover, the coefficient vector for survival part was $\beta=(-1,0.5)$ . These results includes a sample of 200 individuals ( $n=200$ ) with 1000 replications from both SMCURE package and our approach.

The results from simulation studies such as estimate biases, standard errors (SE), estimated standard errors (ESE) and confidence interval coverage probabilities (CP) for each configuration are given in Table-1. The standard errors (SE) have been calculated using 1000 simulation estimates. In SMCURE packgae, bootstrap samples have been used to compute ESE whereas in our approach we have calculated the ESE analytically through the profile likelihood score function. For coverage probabilities (CP), we have computed the 95% confidence interval for each of the parameter estimates and determine the frequency in which the true parameter value was captured.

From Table 1, we can observe that for both SMCURE package and our approach, the parameter estimates are close to the true values and estimate biases are very small with most less than 0.05. For all configurations, with only a few exceptions, the SE and ESE of the parameters are very close for both SMCURE package and our approach. The capture rates based on the confidence interval are relatively similar for SMCURE package and our approach.

Due to the complexity of the estimating equation in SMCURE package, the ESE of estimated parameters are not directly available. As a result, the package used bootstrap samples to compute the standard errors of estimated parameters. On the other hand, we have found the explicit form of the efficient score function (via the profile likelihood score function) and hence computed the ESE analytically through the efficient information matrix.

5 Application to Eastern Cooperative Oncology Group (ECOG) Data

We have used the melanoma data (ECOG phase III clinical trial e1684) from SMCURE package (Cai et al., 2012) as a numerical example to compare our results with the output obtained from SMCURE package. The advantage of our approach is that we have used the efficient score function to get the standard errors of the estimated parameters whereas in SMCURE package, bootstrap sampling procedure has been used due to the complexity of the estimating equation in the EM algorithm (implicit form of their score function).

In the dataset, the subjects had melanoma cancer and were treated with interferon alpha-2b (IFN) regimen. The purpose of the study was to investigate the effects of high dose interferon alpha-2b (IFN) regimen against the placebo as the postoperative adjuvant therapy. In this example, relapse free survival is defined as the event and the time from initial treatment to recurrence of melanoma is defined as failure time. A total number of 284 observations (after deleting two missing observations) has been used for the statistical analysis. Three covariates are considered: gender (0=male,1=female), treatment (0=control,1=treatment) and age (continuous variable which is centered to the mean) for both the incidence and latency parts.

Out of 284 individuals, 196 had melanoma cancer recurring (approximately 31% censoring rate). The observed follow-up time of the individuals ranged from 0.032 to 9.643 years. The parameter estimates, standard errors and 95% CI using SMCURE package and our approach (for logistic and Cox PH components) are given in Table 2 and Table 3 respectively.

From Table 2, we observed that in SMCURE package only intercept was significant at 5% level of significance whereas in our approach, intercept and treatment both have significant effects in determining the long term incidence. The result for treatment suggests that the probability of recurring melanoma for control group is significantly higher compared to the treatment group. However, age and sex both are insignificant on SMCURE package and our approach.

On the other hand, from Table 3 it is observed that in both SMCURE package and our approach, all the covariates have insignificant effect on latency.

6 Discussion

Over the years, many techniques have been used which can avoid differentiation of the implicit function under the profile likelihood function (Murphy and Vaart, 2000; Zeng and Cai, 2005; Zeng and Lin, 2007; Lu, 2008; Zeng and Lin, 2010). Therefore, these approaches didn’t involve the score function for the profile likelihood. As a result, the asymptotic variance of the estimator has shown without using the profile likelihood score function.

In this paper, we have shown the asymptotic normality of the maximum profile likelihood estimator via asymptotic expansion of the profile likelihood and compute the efficient information matrix based on the profile likelihood score function. This is an additional method to compute standard errors for the maximum profile likelihood estimator.

Supplementary Materials

An additional document has been provided as Supplementary Materials, where proofs of all necessary Theorems have given.

S1 Lemma with Proof

Lemma-1: Let $\digamma$ be the set of cdf functions and $\zeta_{\rho}\subset\digamma$ ( $\zeta_{\rho}$ is defined in Section-3.3 of the main manuscript). If the assumptions (A1-A4) hold, then

(i) $P(T,\delta|\beta,F)$ is bounded away from zero.

(ii) The class of functions $\bigg{\{}\log P\big{(}T,\delta|\beta,F\big{)}:\beta\in\Theta,F\in\zeta_{\rho}\bigg{\}}\mbox{ is uniformly bounded Donsker.}$

(iii) The class of functions $\bigg{\{}\phi_{s}\big{(}T,\delta|\beta,F\big{)}:\beta\in\Theta,F\in\zeta_{\rho}\bigg{\}}\mbox{ is uniformly bounded Donsker.}$

Proof: For (i), we know

[TABLE]

Since the map $(f,F)\rightarrow E_{F}(f)=\int fdF$ is continuous, there is a constant $c>0$ , such that for all $F\in\zeta_{\rho}$ (based on A1), we can write

[TABLE]

We know $\gamma(V)=\bigg{(}\frac{pS(\tau)}{1-p+pS(\tau)}\bigg{)}^{1-\delta}$ , so we have

[TABLE]

On the basis of A2, we can write $e^{-M^{2}}\leq e^{\beta^{\prime}Z}\leq e^{M^{2}}$ . So the upper bound of $E_{F}\gamma(V)Y(u)e^{\beta^{\prime}Z}$ can be expressed as

[TABLE]

Now by using equation (S1.2), we can write

[TABLE]

For some constant $c_{1}>0$ , we can write $0<c_{1}\leq E_{F}{dN(u)}\leq 1$ . Since $E_{F}\gamma(V)Y(u)e^{\beta^{\prime}Z}$ is bounded away from zero (equation S1.4), we get

[TABLE]

When $\delta=1$ , from (S1.1) we get

[TABLE]

and when $\delta=0$ ,

[TABLE]

From the above equations, we can write

[TABLE]

So finally we can say that $P(T,\delta|\beta,F)$ is bounded away from zero and hence (i) is proved.

For (ii), the profile log-likelihood function of the survival part for Cox PH cure model is

[TABLE]

We know the set of cdf functions $\digamma$ is uniformly bounded Donsker. Hence the subset $\zeta_{\rho}\subset\digamma$ is uniformly bounded Donsker. The class of functions $\big{\{}N(t):t\in[0,\tau]\big{\}}$ and $\big{\{}Y(t):t\in[0,\tau]\big{\}}$ are uniformly bounded Donsker (Theorem 2.10.6 in Van Der Vaart and Wellner, 1996).

The class of functions $\big{\{}\beta^{\prime}Z:\beta\in\Theta\big{\}}$ is Lipschitz in $\beta$ . So, by Theorem 2.10.6 in Van Der Vaart and Wellner (1996), the class of functions $\big{\{}\beta^{\prime}Z:\beta\in\Theta\big{\}}$ is uniformly bounded Donsker.

Since $e^{\beta^{\prime}Z}$ is a Lipschitz continuous function, so by Theorem 2.10.6 in Van Der Vaart and Wellner (1996), the class of functions $\big{\{}e^{\beta^{\prime}Z}:\beta\in\Theta\big{\}}$ is uniformly bounded Donsker.

Since $\big{\{}Y(t):t\in[0,\tau]\big{\}}$ and $\big{\{}e^{\beta^{\prime}Z}:\beta\in\Theta\big{\}}$ are uniformly bounded Donsker, so by Example 2.10.8 (Van Der Vaart and Wellner, 1996), the class of functions $\big{\{}\gamma(V)Y(t)e^{\beta^{\prime}Z}:t\in[0,\tau],\beta\in\Theta\big{\}}$ is uniformly bounded Donsker.

Since $E_{F}(f)=\int fdF$ is Lipschitz, so for the class of functions $\big{\{}E_{F}\gamma(V)Y(t)e^{\beta^{\prime}Z}:t\in[0,\tau],\beta\in\Theta,F\in\zeta_{\rho}\big{\}}$ , we can write

[TABLE]

Let max $\bigg{(}Me^{M^{2}},e^{M^{2}}\bigg{)}=Me^{M^{2}}$ , then the above equation can be expressed as

[TABLE]

which is Lipschitz in parameters ( $\beta$ , F). So by Theorem 2.10.6 in Van Der Vaart and Wellner (1996), the class of functions $\big{\{}E_{F}\gamma(V)Y(t)e^{\beta^{\prime}Z}:t\in[0,\tau],\beta\in\Theta,F\in\zeta_{\rho}\big{\}}$ is uniformly bounded Donsker. Similarly the class of functions $\big{\{}E_{F}N(t):t\in[0,\tau]\big{\}}$ is uniformly bounded Donsker.

Since $\big{\{}E_{F}\gamma(V)Y(t)e^{\beta^{\prime}Z}:t\in[0,\tau],\beta\in\Theta,F\in\zeta_{\rho}\big{\}}$ is uniformly bounded Donsker and $E_{F}\gamma(V)Y(u)e^{\beta^{\prime}Z}$ is bounded away from zero (equation S1.4), by Example 2.10.9 in Van Der Vaart and Wellner (1996), the class of functions

[TABLE]

is uniformly bounded Donsker.

Since the map $(f,F)\rightarrow E_{F}(f)=\int fdF$ is Lipschitz, by Theorem 2.10.6 (Van Der Vaart and Wellner, 1996), the class of functions

[TABLE]

is uniformly bounded Donsker.

Since $\big{\{}e^{\beta^{\prime}Z}:\beta\in\Theta\big{\}}$ is uniformly bounded Donsker, so by Example 2.10.8 (Van Der Vaart and Wellner, 1996), the class of functions

[TABLE]

is uniformly bounded Donsker.

Since the class $\big{\{}{\beta^{\prime}Z}:\beta\in\Theta\big{\}}$ is uniformly bounded Donsker, by Example 2.10.7 (Van Der Vaart and Wellner, 1996), the class of functions

[TABLE]

is uniformly bounded Donsker.

Since the map $(f,F)\rightarrow E_{F}(f)=\int fdF$ is Lipschitz, so by Theorem 2.10.6 in Van Der Vaart and Wellner (1996), the class of functions $\bigg{\{}\log P(T,\delta|\beta,F):\beta\in\Theta,F\in\zeta_{\rho}\bigg{\}}$ is uniformly bounded Donsker. So (ii) is proven.

For (iii), we know the score function of the survival part for Cox PH cure model is

[TABLE]

Similar proof to (ii), we can show that the class of functions $\bigg{\{}\phi_{s}(T,\delta|\beta,F):\beta\in\Theta,F\in\zeta_{\rho}\bigg{\}}$ is uniformly bounded Donsker.

Lemma-2: If the assumptions (A1-A4) hold, then

[TABLE]

where $M^{\prime\prime}$ is a $P_{0}$ -square integrable function.

Proof: From equation (3.18) of the main manuscript, the score function for the survival part is

[TABLE]

Define

[TABLE]

Then the function is differentiable with respect to $\beta$ , $F$ and $\Lambda$ . Now we have

[TABLE]

Similar to the proof of Lemma-1, we can show that the derivative of the score function will also be uniformly bounded.

From these we can say that the class of functions $\big{\{}\phi_{s}(T,\delta|\beta,F,\Lambda):\beta\in\Theta,F\in\zeta_{\rho},\Lambda\in H\big{\}}$ is Lipschitz in parameters $(\beta,F,\Lambda)$ and the result follows:

[TABLE]

S2 Theorem- 2 with proof

Theorem 2: If the assumptions (A1-A4) hold, then

$\hat{\beta}_{n}\stackrel{{\scriptstyle P}}{{\rightarrow}}\beta_{0}$ as $n\rightarrow\infty$ and
$\hat{\Lambda}_{\hat{\beta}_{n},F_{n}}-\Lambda_{0}=o_{p}(1)$

Proof: For (1), we are going to use the idea of Theorem-5.7 (Van der Vaart, 2000), where we have to show

(i) $\int\log P(T,\delta|\hat{\beta}_{n},F_{n})dF_{n}-\int\log P(T,\delta|\hat{\beta}_{n},F_{0})dF_{0}\stackrel{{\scriptstyle P}}{{\longrightarrow}}0$ as $n\rightarrow\infty$

(ii) $E\bigg{[}\log\frac{P(T,\delta|\beta_{0},F_{0})}{P(T,\delta|\hat{\beta}_{n},F_{0})}\bigg{]}>0$ if $\beta_{0}\neq\hat{\beta}_{n}$

We will start with (i). Since the class of functions $\bigg{\{}\log P(T,\delta|\beta,F):\beta\in\Theta,F\in\zeta_{\rho}\bigg{\}}$ is uniformly bounded Donsker. Hence it is Glivenko-Cantelli. So we can write

[TABLE]

For (ii), we are going to use the idea of Kullback-Leibler (KL) distance. The distance between $P(T,\delta|\beta_{0},F_{0})$ and $P(T,\delta|\hat{\beta}_{n},F_{0})$ can be written as

[TABLE]

We know that $-\log x$ is a convex function for $x>0$ , so by using Jensen’s inequality in (S2.1), we can write

[TABLE]

and

[TABLE]

Hence (ii) is also proven.

So from Theorem 5.7 (Van Der Vaart, 2000), it follows that

[TABLE]

For (2), we can write (from Theorem-1)

[TABLE]

We know $\hat{\beta}_{n}\stackrel{{\scriptstyle P}}{{\rightarrow}}\beta_{0}$ and $F_{n}\stackrel{{\scriptstyle P}}{{\rightarrow}}F_{0}$ as $n\rightarrow\infty$ . Since $(f,F)\rightarrow E_{F}(f)=\int fdF$ is continuous and $\hat{\Lambda}_{\beta,F}$ is differentiable with respect to $\beta$ and Hadamard differentiable with respect to $F$ , so we can write

[TABLE]

So (2) is also proven. Finally we have proved Theorem-2.

S3 Theorem- 3 with proof

Theorem 3: Suppose for assumptions (A1-A4), $\hat{\beta}_{n}\stackrel{{\scriptstyle P}}{{\rightarrow}}\beta_{0}$ and $F_{n}\stackrel{{\scriptstyle P}}{{\rightarrow}}F_{0}$ as $n\rightarrow\infty$ , then we have

[TABLE]

and

[TABLE]

Proof: Based on Lemma-1, we know $P(T,\delta|\beta_{0},F_{0})>\delta>0$ for some positive constant $\delta>0$ . So by the differentiability of $P(T,\delta|\beta,F)$ with respect to $\beta$ and $F$ , we have

[TABLE]

In Lemma-1, we showed the class of functions $\big{\{}\phi_{s}(T,\delta|\beta,F):\beta\in\Theta,F\in\zeta_{\rho}\big{\}}$ is uniformly bounded. Similarly, we can show the class of functions $\big{\{}B(T,\delta|\beta,F):\beta\in\Theta,F\in\zeta_{\rho}\big{\}}$ is uniformly bounded. From these results, it follows that there is a $P_{0}$ -square integrable function, such that

[TABLE]

where $M^{\prime}$ is a $P_{0}$ -square integrable function $\forall\beta,\forall\beta^{\prime}\in\Theta$ and $\forall F,\forall F^{\prime}\in\zeta_{\rho}$ .

First we start with (S3.1), for each $n$ , the equality

[TABLE]

holds and we can express the above equation as

[TABLE]

By the dominated convergence theorem with (S3.3), the right hand side of (S3.6) can be expressed as, when $n\rightarrow\infty$

[TABLE]

So from (S3.6) and (S3.7), we can write

[TABLE]

So (S3.1) is proven. Now we prove (S3.2) by following the similar idea of proving (S3.1). For each $n$ , the following equation holds

[TABLE]

We can express the above equation as

[TABLE]

By using the dominated convergence theorem with (S3.5) and Lemma-2, when $n\rightarrow\infty$ , the left hand side of (S3.8) can be derived as

[TABLE]

where we used $F_{n}-F_{0}=o_{p}(1)$ from assumption (A3), $\hat{\beta}_{n}-\beta_{0}=o_{p}(1)$ and $\hat{\Lambda}_{\hat{\beta}_{n},F_{n}}-\hat{\Lambda}_{{\beta}_{0},F_{0}}=o_{p}(1)$ from Theorem-2.

By the dominated convergence theorem with (S3.4), the right hand side of (S3.8) can be written as, when $n\rightarrow\infty$

[TABLE]

So by combining (S3.9) and (S3.10), the equality (S3.8) is equivalent to

[TABLE]

So equation (S3.2) is also proven. Hence, we proved Theorem-3.

S4 Theorem- 4 with proof

Theorem 4: If the assumptions $(\textbf{A1-A4})$ are satisfied, then a consistent estimator $\hat{\beta}_{n}$ to the estimating equation

[TABLE]

is an asymptotically linear estimator for $\beta_{0}$ (Hirose, 2011a) with the efficient influence function $(I_{s}^{*})^{-1}\phi_{s}(T,\delta|\beta_{0},F_{0})$ , so that

[TABLE]

where $N\{0,(I_{s}^{*})^{-1}\}$ is a normal distribution with mean zero and variance $(I_{s}^{*})^{-1}$ . So the estimator $\hat{\beta}_{n}$ is efficient.

In addition, we know that $\phi_{l}(V|b)$ is the score function of logistic regression (which is a parametric model), then a consistent estimator $\hat{b}_{n}$ to the estimating equation

[TABLE]

is an asymptotically linear estimator for $b_{0}$ with the influence function $(I_{l})^{-1}\phi_{l}(V|b_{0})$ , so that

[TABLE]

where $I_{l}=E[\phi_{l}\phi^{\prime}_{l}]$ and $N\{0,(I_{l})^{-1}\}$ is a normal distribution with mean zero and variance $(I_{l})^{-1}$ .

Proof: Since $\phi_{s}(T_{,}\delta|\beta,F)$ is uniformly bounded Donsker (Lemma-1). So by Lemma 19.24 (Van Der Vaart, 2000), we can write

[TABLE]

From (S3.1), it follows that

[TABLE]

From (S3.2), it follows that

[TABLE]

Since $B(T,\delta|\beta_{0},F_{0})$ is in the nuisance tangent space and $\phi_{s}(T,\delta|\beta_{0},F_{0})$ is the efficient score function, so we can consider

[TABLE]

Now using (S4.3), (S4.4) and (S4.5), the right hand side of (S4.2) can be expressed as

[TABLE]

We know that $\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\phi_{s}(T_{i},\delta_{i}|\hat{\beta}_{n},F_{n})=0$ , so using (S4.6), the equation (S4.2) can be written as

[TABLE]

By Central Limit Theorem (CLT), we can write $\frac{1}{\sqrt{n}}\sum_{i=1}^{n}\phi_{s}(T_{i},\delta_{i}|\beta_{0},F_{0})+o_{p}(1)=O_{p}(1)$ . Since $I_{s}^{*}$ is invertible, we have $\big{(}I_{s}^{*}+o_{p}(1)\big{)}^{-1}=O_{p}(1)$ .

So from (S4.7) we can write $\sqrt{n}(\hat{\beta}_{n}-\beta_{0})=\big{(}I_{s}^{*}+o_{p}(1)\big{)}^{-1}O_{p}(1)=O_{p}(1)$ .

Finally we can express (S4.2) as

[TABLE]

It follows that the large sample distribution of the estimator $\hat{\beta}_{n}$ can be expressed as

[TABLE]

where $I_{s}^{*}=E[\phi^{*}_{\beta}\phi^{*^{\prime}}_{\beta}]$ is the efficient information ( $\phi^{*}_{\beta}$ is the efficient score function defined in Theorem- 1).

S5 Efficient Score Function for Cox PH Cure Model using Projection Theory

To get the efficient score function using the projection theory, we assume the parameters $(\beta,\Lambda)$ are evaluated at the true values $\beta_{0}$ , $\Lambda_{0}$ and omit subscript “0” for brevity.

The log-likelihood function of the survival part for one observation can be written as

[TABLE]

Score Function for $\beta$

[TABLE]

Score Operator for $\Lambda$

Let us take a measurable function which is bounded such as $g:[0,\tau]\rightarrow R$ , where $g$ is defined in the interval $[0,\tau]$ because $\Lambda$ is also restricted within this interval. The path can be defined as

[TABLE]

The corresponding path for the baseline hazard function is

[TABLE]

The derivative of the log-likelihood function with respect to $s$ can be expressed as

[TABLE]

Information Operator $B^{}_{\Lambda}B_{\Lambda}$ and its Inverse $\big{(}B^{}_{\Lambda}B_{\Lambda}\big{)}^{-1}$

Let us start with the information operator $B^{*}_{\Lambda}B_{\Lambda}$ and take two arbitrary functions $f$ and $g$ . By definition of the adjoint, we can write

[TABLE]

The path defined by $d\Lambda_{r,s}=(1+rf+sg+rsfg)d\Lambda$ is positive for small $r$ and $s$ . It can be written as $d\Lambda_{r,s}=(1+rf)(1+sg)d\Lambda$ . The corresponding path for the baseline hazard function is

[TABLE]

Now we can write

[TABLE]

and

[TABLE]

Using (S5.2) and (S5.3) we can write

[TABLE]

Now we manipulate the integral involving the function $\xi$ , we deduce

[TABLE]

Indeed, if $\xi>T$ , then the contribution will be 0 to the integral. So the last term in equation (S5.4) can be expressed as

[TABLE]

Using Fubini’s theorem, equation (S5.5) can be written as

[TABLE]

From equation (S5.1) we can write

[TABLE]

So, the information operator is

[TABLE]

It follows that the inverse of information operator is

[TABLE]

The Action of the Adjoint Score Operator $B^{*}_{\Lambda}$ on the Score Function $\phi_{\beta}$

Assume the differentiable paths $(r,s)\mapsto P(T,\delta|\beta+ru,\Lambda_{s})$ can be exploited with the path $d\Lambda_{s}=(1+sg)d\Lambda$ . Now we can write

[TABLE]

and

[TABLE]

Using equation (S5.7) and (S5.8) we can write

[TABLE]

Now by manipulating the integral involving the function $\xi$ , the equation (S5.9) can be expressed as

[TABLE]

Using the Fubini’s theorem, we can conclude that

[TABLE]

We know that

[TABLE]

So we can write

[TABLE]

Efficient Score Function $\phi^{*}_{\beta}$ :

Finally the efficient score function can be expressed as

[TABLE]

where $M_{0}(T)$ and $M_{1}(T)$ were defined in the proof of Theorem 1.

Bibliography28

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Amico and Keilegom (2018) Amico, M. and Van Keilegom, I. (2018). Cure models in survival analysis. Annual Review of Statistics and Its Application 5 , pp. 311-342.
2Cai et al. (2012) Cai, C., Zou, Y., Peng, Y. and Zhang, J. (2012). smcure: An R-Package for estimating semiparametric mixture cure models. Computer methods and programs in biomedicine 108(3) , pp. 1255-1260.
3Dempster, Laird and Rubin (1977) Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Methodological) , pp. 1-38.
4Efron and Tibshirani (1994) Efron, B. and Tibshirani, R. J. (1994). An introduction to the bootstrap. CRC press .
5Fang, Li and Sun (2005) Fang, H.B., Li, G. and Sun, J. (2005). Maximum likelihood estimation in a semiparametric logistic/proportional-hazards mixture model. Scandinavian Journal of Statistics 32(1) , pp. 59-75.
6Farewell (1982) Farewell, V. T. (1982). The use of mixture models for the analysis of survival data with long-term survivors. Biometrics 38 , pp. 1041-1046.
7Fleming and Harrington (2011) Fleming, T. R. and Harrington, D. P. (2011). Counting processes and survival analysis. John Wiley and Sons 169 .
8Hirose (2011 a) Hirose, Y. (2011 a). Efficiency of profile likelihood in semi-parametric models. Annals of the Institute of Statistical Mathematics 63(6) , pp. 1247-1275.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

2 Cox PH Cure Model

3 Estimation

4 Simulation Study

5 Application to Eastern Cooperative Oncology Group (ECOG) Data

6 Discussion

Supplementary Materials

S1 Lemma with Proof

S2 Theorem- 2 with proof

S3 Theorem- 3 with proof

S4 Theorem- 4 with proof

S5 Efficient Score Function for Cox PH Cure Model using Projection Theory

Score Function for β\betaβ

Score Operator for Λ\LambdaΛ

Information Operator BΛ∗BΛB^{*}_{\Lambda}B_{\Lambda}BΛ∗​BΛ​ and its Inverse \big{(}B^{*}_{\Lambda}B_{\Lambda}\big{)}^{-1}

The Action of the Adjoint Score Operator BΛ∗B^{*}_{\Lambda}BΛ∗​ on the Score Function ϕβ\phi_{\beta}ϕβ​

Efficient Score Function ϕβ∗\phi^{*}_{\beta}ϕβ∗​:

Score Function for $\beta$

Score Operator for $\Lambda$

Information Operator $B^{}_{\Lambda}B_{\Lambda}$ and its Inverse $\big{(}B^{}_{\Lambda}B_{\Lambda}\big{)}^{-1}$

The Action of the Adjoint Score Operator $B^{*}_{\Lambda}$ on the Score Function $\phi_{\beta}$

Efficient Score Function $\phi^{*}_{\beta}$ :