On the assumption of independent right censoring

Morten Overgaard; Stefan Nygaard Hansen

arXiv:1905.02508·math.ST·December 11, 2024

On the assumption of independent right censoring

Morten Overgaard, Stefan Nygaard Hansen

PDF

TL;DR

This paper examines various assumptions on right-censoring mechanisms in survival analysis, distinguishing between minimal identifiability and stronger independence assumptions, and characterizes their implications for estimator consistency.

Contribution

It provides a comprehensive classification of eight assumptions on right censoring, clarifying their relationships and implications for survival analysis estimators.

Findings

01

Eight assumptions categorized into two groups.

02

Characterization of pointwise and full independence.

03

Examples illustrating assumption differences.

Abstract

Various assumptions on a right-censoring mechanism to ensure consistency of the Kaplan--Meier and Aalen--Johansen estimators in a competing risks setting are studied. Specifically, eight different assumptions are seen to fall in two categories: a weaker identifiability assumption, which is the weakest possible assumption in a precise sense, and a stronger representativity assumption which ensures the existence of an independent censoring time. When a given censoring time is considered, similar assumptions can be made on the censoring time. This allows for a characterization of so-called pointwise independence as well as full independence of censoring time and event time and type. Examples illustrate how the various assumptions differ.

Equations107

P (s, t) = S (t ∣ s) 0 ⋮ 0 F_{1} (t ∣ s) 1 ⋮ 0 \dots \dots ⋱ \dots F_{d} (t ∣ s) 01

P (s, t) = S (t ∣ s) 0 ⋮ 0 F_{1} (t ∣ s) 1 ⋮ 0 \dots \dots ⋱ \dots F_{d} (t ∣ s) 01

H (t) = - H (t) 0 ⋮ 0 H_{1} (t) 0 ⋮ 0 \dots \dots ⋱ \dots H_{d} (t) 0 ⋮ 0, \tilde{H} (t) = - \tilde{H} (t) 0 ⋮ 0 \tilde{H}_{1} (t) 0 ⋮ 0 \dots \dots ⋱ \dots \tilde{H}_{d} (t) 0 ⋮ 0

H (t) = - H (t) 0 ⋮ 0 H_{1} (t) 0 ⋮ 0 \dots \dots ⋱ \dots H_{d} (t) 0 ⋮ 0, \tilde{H} (t) = - \tilde{H} (t) 0 ⋮ 0 \tilde{H}_{1} (t) 0 ⋮ 0 \dots \dots ⋱ \dots \tilde{H}_{d} (t) 0 ⋮ 0

a_{j} (t) + B (t) = 1

a_{j} (t) + B (t) = 1

\tilde{N}_{j} (t) - \int_{0}^{t} \tilde{Y} (s) H_{j} (d s),

\tilde{N}_{j} (t) - \int_{0}^{t} \tilde{Y} (s) H_{j} (d s),

a_{j} (t) = P (\tilde{T} \geq t ∣ T \geq t)

a_{j} (t) = P (\tilde{T} \geq t ∣ T \geq t)

= \tilde{N}_{j} (t) - \int_{0}^{t} \tilde{Y} (s) H_{j} (d s) = \int_{(0, t] \cap J} (\tilde{N}_{j} (d s) - \tilde{Y} (s) H_{j} (d s))

= \tilde{N}_{j} (t) - \int_{0}^{t} \tilde{Y} (s) H_{j} (d s) = \int_{(0, t] \cap J} (\tilde{N}_{j} (d s) - \tilde{Y} (s) H_{j} (d s))

= \int_{(0, t] \cap J} (\tilde{N}_{j} (d s) - \tilde{Y} (s) \tilde{H}_{j} (d s)) = \tilde{N}_{j} (t) - \int_{0}^{t} \tilde{Y} (s) \tilde{H}_{j} (d s),

\tilde{F}_{j} (t) = E (\tilde{N}_{j} (t)) = \int_{0}^{t} E (\tilde{Y} (s)) H_{j} (d s) = \int_{0}^{t} \tilde{S} (s -) H_{j} (d s) .

\tilde{F}_{j} (t) = E (\tilde{N}_{j} (t)) = \int_{0}^{t} E (\tilde{Y} (s)) H_{j} (d s) = \int_{0}^{t} \tilde{S} (s -) H_{j} (d s) .

H_{j} (t) = \int_{0}^{t} \frac{1}{S ~ ( s - )} \tilde{F}_{j} (d s) = \tilde{H}_{j} (t)

H_{j} (t) = \int_{0}^{t} \frac{1}{S ~ ( s - )} \tilde{F}_{j} (d s) = \tilde{H}_{j} (t)

\tilde{H}_{j} (t)

\tilde{H}_{j} (t)

= \int_{0}^{t} \frac{a _{j} ( s )}{P ( T ~ \geq s ∣ T \geq s )} H_{j} (d s)

N_{j} (t) - \int_{0}^{t} Y (s) H_{j} (d s),

N_{j} (t) - \int_{0}^{t} Y (s) H_{j} (d s),

P (T \leq t, D = j ∣ \tilde{T} > s) = P (T \leq t, D = j ∣ T > s)

P (T \leq t, D = j ∣ \tilde{T} > s) = P (T \leq t, D = j ∣ T > s)

P (T \leq t, D = j ∣ \tilde{T} = s, \tilde{D} = 0) = P (T \leq t, D = j ∣ T > s)

P (T \leq t, D = j ∣ \tilde{T} = s, \tilde{D} = 0) = P (T \leq t, D = j ∣ T > s)

P (T \leq t, D = j ∣ \tilde{T} > s) = \int_{s}^{t} P (T \geq u ∣ \tilde{T} > s) H_{j} (d u) .

P (T \leq t, D = j ∣ \tilde{T} > s) = \int_{s}^{t} P (T \geq u ∣ \tilde{T} > s) H_{j} (d u) .

E (1 (T \in (s, t], D = j, \tilde{T} > u)) = E (\int_{s}^{t} 1 (T \geq v, \tilde{T} > u) H_{j} (d v)),

E (1 (T \in (s, t], D = j, \tilde{T} > u)) = E (\int_{s}^{t} 1 (T \geq v, \tilde{T} > u) H_{j} (d v)),

P (T \leq t, D = j, C > s)

P (T \leq t, D = j, C > s)

= \Prodi_{0}^{s} (1 - \overset{ˇ}{H}_{0} (d u)) \int_{0}^{t} \Prodi_{0}^{u -} (1 - \tilde{H} (d v)) \tilde{H}_{j} (d u)

= P (C > s) P (T \leq t, D = j)

P (T \in (s, t], D = j, C > s)

P (T \in (s, t], D = j, C > s)

= P (T \leq t, D = j ∣ T > s) P (\tilde{T} > s)

= \int_{s}^{t} \Prodi_{s}^{u -} (1 - H (d v)) H_{j} (d u) \Prodi_{0}^{s} (1 - (\tilde{H} + \tilde{H}_{0}) (d u)

= \int_{s}^{t} \Prodi_{0}^{u -} (1 - H (d v)) H_{j} (d u) \Prodi_{0}^{s} (1 - \overset{ˇ}{H}_{0} (d u))

= P (T \in (s, t], D = j) P (C > s),

\overset{ˇ}{H}_{0} (t) = \int_{0}^{t} \frac{1}{1 - Δ H ~ ( s )} \tilde{H}_{0} (d s)

\overset{ˇ}{H}_{0} (t) = \int_{0}^{t} \frac{1}{1 - Δ H ~ ( s )} \tilde{H}_{0} (d s)

\tilde{S} (t) = \Prodi_{0}^{t} (1 - \tilde{H} (d s)) \Prodi_{0}^{t} (1 - \overset{ˇ}{H}_{0} (d s))

\tilde{S} (t) = \Prodi_{0}^{t} (1 - \tilde{H} (d s)) \Prodi_{0}^{t} (1 - \overset{ˇ}{H}_{0} (d s))

\overset{ˇ}{H}_{0} (t) = \int_{0}^{t} \frac{P ( T ~ = s , D ~ = 0 ∣ C = s )}{S ˇ ( s ) / K ( s - )} H_{0} (d s) .

\overset{ˇ}{H}_{0} (t) = \int_{0}^{t} \frac{P ( T ~ = s , D ~ = 0 ∣ C = s )}{S ˇ ( s ) / K ( s - )} H_{0} (d s) .

\tilde{F}_{j} (t) = \int_{0}^{t} K (s -) F_{j} (d s)

\tilde{F}_{j} (t) = \int_{0}^{t} K (s -) F_{j} (d s)

\tilde{F}_{0} (t) = \int_{0}^{t} S (s) G (d s)

\tilde{F}_{0} (t) = \int_{0}^{t} S (s) G (d s)

P (C \leq t ∣ \tilde{T} = s, \tilde{D} = j) = P (C \leq t ∣ C \geq s)

P (C \leq t ∣ \tilde{T} = s, \tilde{D} = j) = P (C \leq t ∣ C \geq s)

T_{2} (t, c)

T_{2} (t, c)

C_{2} (t, c)

β_{1, j + 1} (s, t) = \int_{s}^{t} β_{1, 1} (s, u -) d H_{j} (u)

β_{1, j + 1} (s, t) = \int_{s}^{t} β_{1, 1} (s, u -) d H_{j} (u)

P (\tilde{T} < t ∣ T \geq t) = B (t) + \int_{0}^{t -} \frac{S ~ ( s - )}{S ( s )} (\tilde{H} - H) (d s)

P (\tilde{T} < t ∣ T \geq t) = B (t) + \int_{0}^{t -} \frac{S ~ ( s - )}{S ( s )} (\tilde{H} - H) (d s)

P (T \leq t ∣ C \geq t) = j = 1 \sum d \int_{0}^{t} \frac{1}{K ( s - )} \tilde{F}_{j} (d s) + \int_{0}^{t -} \frac{S ˇ ( s )}{K ( s )} (\overset{ˇ}{H}_{0} - H_{0}) (d s)

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

On the assumption of independent right censoring

Morten Overgaard1 and Stefan Nygaard Hansen2

(Department of Public Health, Aarhus University

Bartholins Allé 2 - Building 1260, DK-8000 Aarhus C, Aarhus, Denmark

[email protected]

May 7, 2019 )

Abstract

Various assumptions on a right-censoring mechanism to ensure consistency of the Kaplan–Meier and Aalen–Johansen estimators in a competing risks setting are studied. Specifically, eight different assumptions are seen to fall in two categories: a weaker identifiability assumption, which is the weakest possible assumption in a precise sense, and a stronger representativity assumption which ensures the existence of an independent censoring time. When a given censoring time is considered, similar assumptions can be made on the censoring time. This allows for a characterization of so-called pointwise independence as well as full independence of censoring time and event time and type. Examples illustrate how the various assumptions differ.

Keywords: Censoring; competing risks; consistency; identifiability; product integral; representativity.

1 Introduction

When dealing with right censoring in survival analysis, assumptions on the censoring mechanism are inevitably needed in order to bridge the gap between the observable world and the underlying world of interest. Many seemingly different assumptions have been proposed in the literature. The papers of Williams & Lagakos, (1977), Kalbfleisch & MacKay, (1979), and Lagakos, (1979) did clarify connections between some of the different assumptions. Since then, martingale theory has become a much used tool in survival analysis, and assumptions on the censoring mechanism are made by means of martingale assumptions in, for instance, Aalen & Johansen, (1978), Gill, (1980), and Andersen et al., (1993). A clear overview and comparison of the various assumptions does, however, seem to be lacking.

The purpose of this paper is to provide a clear overview and a comparison of various assumptions on the censoring mechanism made in order to ensure consistency of estimators such as the Kaplan–Meier and Aalen–Johansen estimators. This is done in a competing risks setting and without assuming absolute continuity of the involved random variables. Along the way, we obtain assumptions that are minimal in a precise sense for ensuring this consistency. We also make clear that important differences exist between considering a given, underlying censoring time producing right censoring and not considering such a censoring time.

Additionally, the use of product integrals and the techniques used in the many proofs might, in itself, be of interest to researchers in the field of theoretical survival analysis. In particular, the appendix provides a wealth of technical results that may be useful in other settings as well.

The paper is structured as follows. In Section 2, right censoring in a general form is studied and minimal conditions to ensure consistency of the Kaplan–Meier and Aalen–Johansen estimators are obtained. Various assumptions from the literature are discussed and it is shown that they only correspond to two nested properties: an identifiability assumption and a representativity assumption – the latter being the strongest. Section 3 concerns the setting where an explicit censoring time is given. We discuss assumptions on the censoring mechanism and show that independence of the event and censoring times is equivalent to representativity assumptions on both the event and censoring time. In Section 4 we treat two examples in order to show that the representativity assumption is strictly stronger than the identifiability assumption and to illustrate the assumptions in a practical setting. Finally, in Section 5 we discuss some of the perspectives of the paper.

2 A censored event time

Consider an event time $T>0$ and event type $D\in\{1,\ldots,d\}$ that are subject to right censoring meaning that we are only able to observe a $\tilde{T}>0$ with $\tilde{T}\leq T$ and an indicator $\tilde{D}=D\mathbf{1}(\tilde{T}=T)$ with values in $\{0,\ldots,d\}$ where 0 indicates a censoring. These are all considered proper random variables, that is, with $\operatorname{P}(\tilde{T}<\infty)=\operatorname{P}(T<\infty)=1$ . We will refer to $\tilde{T}$ and $\tilde{D}$ as the observed exit time and exit type, respectively, because the risk set is exited at time $\tilde{T}$ and $\tilde{D}$ states how. This setting does not involve an explicit, underlying censoring time and may be useful in certain practical settings where such a censoring time is difficult to define. A setting with a given censoring time is dealt with in the next section.

For the pair $(T,D)$ of interest we define the survival function $S(t)=\operatorname{P}(T>t)$ , the cause-specific cumulative incidence functions $F_{j}(t)=\operatorname{P}(T\leq t,D=j)$ for $j=1,\ldots,d$ and the cause-specific cumulative hazard functions $H_{j}(t)=\int_{0}^{t}S(s-)^{-1}F_{j}(\hskip 1.0pt\mathrm{d}s)$ for $j=1,\ldots,d$ . We define the corresponding functions for the observed pair $(\tilde{T},\tilde{D})$ , that is, $\tilde{S}(t)=\operatorname{P}(\tilde{T}>t)$ , $\tilde{F}_{j}(t)=\operatorname{P}(\tilde{T}\leq t,\tilde{D}=j)$ and $\tilde{H}_{j}(t)=\int_{0}^{t}\tilde{S}(s-)^{-1}\tilde{F}_{j}(\hskip 1.0pt\mathrm{d}s)$ for $j=0,\ldots,d$ . Both $H_{j}$ and $\tilde{H}_{j}$ are well-defined functions from $[0,\infty)$ into $[0,\infty]$ for $j=1,\dots,d$ . Here and in the following, division by 0 can be interpreted as 0 or any arbitrary number since it only occurs in integrals on a null set of the integrator. Frequently, a restriction to the interval $\mathcal{J}=\{t\in[0,\infty)\mathbin{\mid}\tilde{S}(t)>0\}$ is relevant since we will never observe an exit time beyond $\mathcal{J}$ . Let $\tau$ denote $\sup\{t>0:\tilde{S}(t)>0\}$ and note that either $\mathcal{J}=[0,\tau)$ , when $\tilde{S}(\tau-)=0$ , or $\mathcal{J}=[0,\tau]$ , when $\tilde{S}(\tau-)>0$ .

In this section we shall study the assumptions under which we can identify $S$ and $F_{j}$ by the Kaplan–Meier and Aalen–Johansen estimators defined in Appendix 2. To this end, let $\mathbf{P}$ denote the $(d+1)\times(d+1)$ matrix of transition probabilities

[TABLE]

where $S(t\mathbin{\mid}s)=\operatorname{P}(T>t\mathbin{\mid}T>s)=S(t)/S(s)$ and $F_{j}(t\mathbin{\mid}s)=\operatorname{P}(T\leq t,D=j\mathbin{\mid}T>s)=(F_{j}(t)-F_{j}(s))/S(s)$ for $j=1,\ldots,d$ and $t\geq s$ . With a slight abuse of notation, we let $\mathbf{P}(t)=\mathbf{P}(0,t)$ which is the matrix of interest. If $H(t)=\sum_{j=1}^{d}H_{j}(t)$ denotes the all-cause cumulative hazard function and $\tilde{H}(t)=\sum_{j=1}^{d}\tilde{H}_{j}(t)$ denotes the observed counterpart, then we define the two $(d+1)\times(d+1)$ matrices

[TABLE]

and again, with slight abuse of notation, we let $\mathbf{H}(s,t)=\mathbf{H}(t)-\mathbf{H}(s)$ and $\tilde{\mathbf{H}}(s,t)=\tilde{\mathbf{H}}(t)-\tilde{\mathbf{H}}(s)$ for $t\geq s$ .

According to (14) of the appendix, the Aalen–Johansen estimator $\hat{\mathbf{P}}_{n}(t)$ , defined in (13) of the appendix, is consistent for $\prodi_{0}^{t}(\mathbf{I}+\tilde{\mathbf{H}}(\hskip 1.0pt\mathrm{d}s))$ for any $t\in\mathcal{J}$ in a setting with independent and identically distributed observations. We now have the following result.

Proposition 1.

In a setting with $n$ independent and identically distributed observations, the Aalen–Johansen estimator $\hat{\mathbf{P}}_{n}(t)$ consistently estimates $\mathbf{P}(t)$ for all $t\in\mathcal{J}$ if and only if $\tilde{\mathbf{H}}(t)=\mathbf{H}(t)$ for all $t\in\mathcal{J}$ . In other words, the Aalen–Johansen estimator of $F_{j}(t)$ is consistent for all $t\in\mathcal{J}$ for $j=1,\dots,d$ if and only if $\tilde{H}_{j}(t)=H_{j}(t)$ for all $t\in\mathcal{J}$ for $j=1,\dots,d$ .

Proof.

By uniqueness of the product integral, we immediately have $\mathbf{H}(t)=\tilde{\mathbf{H}}(t)$ for all $t\in\mathcal{J}$ if and only if $\prodi_{0}^{t}(\mathbf{I}+\mathbf{H}(\hskip 1.0pt\mathrm{d}s))=\prodi_{0}^{t}(\mathbf{I}+\tilde{\mathbf{H}}(\hskip 1.0pt\mathrm{d}s))$ for all $t\in\mathcal{J}$ . This is due to Theorem 3 of Gill & Johansen, (1990) since both $\mathbf{H}$ and $\tilde{\mathbf{H}}$ are seen to be of bounded variation on $[0,t]$ for any $t\in\mathcal{J}$ . Now, $\mathbf{P}$ is seen to satisfy the requirements of Lemma 8 of the appendix by definition of $H_{j}$ from which it follows that $\mathbf{P}(t)=\prodi_{0}^{t}(\mathbf{I}+\mathbf{H}(\hskip 1.0pt\mathrm{d}s))$ . This establishes the equivalence. ∎

A similar argument reveals that the Kaplan–Meier estimator $\hat{S}_{n}(t)$ from (15) in the appendix consistently estimates $S(t)$ for all $t\in\mathcal{J}$ if and only if $\tilde{H}(t)=H(t)$ for all $t\in\mathcal{J}$ .

We call the property of Proposition 1 the property of identity of forces of mortality with inspiration from Elandt-Johnson, (1976). An assumption of identity of forces of mortality is, for instance, used by Gail, (1975) in a competing risks setting as a weaker substitute for the assumption of independent latent event times.

Williams & Lagakos, (1977) study, in a setting without competing risks, the constant-sum assumption as a weaker alternative to the assumption of independence of event time and censoring time. Let $a_{j}$ be the function, unique up to $F_{j}$ -null sets, given by $a_{j}(t)=\operatorname{P}(\tilde{T}=t,\tilde{D}=j\mathbin{\mid}T=t,D=j)$ and let $B(t)=\int_{0}^{t-}S(s)^{-1}\tilde{F}_{0}(\hskip 1.0pt\mathrm{d}s)$ . In the competing risks setting, the constant-sum property can then be phrased as

[TABLE]

for $F_{j}$ -almost all $t\in\mathcal{J}$ for $j=1,\dots,d$ . In the paper of Kalbfleisch & MacKay, (1979), the authors argue that this property is equivalent to identity of forces of mortality in a setting without competing risks and with a differentiable event hazard function.

Estimators in survival analysis and in the competing risks setting have often been studied using martingale theory, for instance in Aalen & Johansen, (1978), Gill, (1980), Jacobsen, (1989), and Andersen et al., (1993). In such a setting, the following martingale property, which we will call the weak martingale property in light of stronger properties introduced later on, has been shown to ensure the desired consistency of estimators. Let $\tilde{N}_{j}(t)=\mathbf{1}(\tilde{T}\leq t,\tilde{D}=j)$ for $j=0,\dots,d$ and $\tilde{Y}(t)=\mathbf{1}(\tilde{T}\geq t)$ . The weak martingale property is that the processes given by

[TABLE]

for $t\geq 0$ , for $j=1,\dots,d$ are all martingales with respect to the filtration given by $\tilde{\mathcal{F}}_{t}=\sigma(\tilde{N}_{j}(s):j\in\{0,\dots,d\},s\leq t)$ , which models the observed information. This or similar assumptions are made, for instance, in Assumption 3.1.1 of Gill, (1980), in (2.9) of Jacobsen, (1989), in Definition 3.1.1 of Martinussen & Scheike, (2007), in (5.5) of Kalbfleisch & Prentice, (1980), and in Theorem 1.3.1 of Fleming & Harrington, (1991).

Recall that $a_{j}(t)=\operatorname{P}(\tilde{T}=t,\tilde{D}=j\mathbin{\mid}T=t,D=j)$ . We consider here yet another property, which we call status-independent observation. Status-independent observation is the property that

[TABLE]

for $F_{j}$ -almost all $t\in\mathcal{J}$ for $j=1,\dots,d$ , and it is called so because it states that between the statuses of surviving up to a certain time, $T\geq t$ , and having some event at that time, $T=t$ with $D=j$ , the probability, given a certain status, of that status actually being observed does not depend on the status.

As the following result shows, these four properties are in fact equivalent, and we will refer them collectively as the identifiability property in light of Proposition 1.

Proposition 2.

The following properties are equivalent.

(2.1)

Identity of forces of mortality: $\tilde{H}_{j}(t)=H_{j}(t)$ for $j=1,\dots,d$ and for any $t\in\mathcal{J}$ . 2. (2.2)

The weak martingale property: The processes given by $\tilde{N}_{j}(t)-\int_{0}^{t}\tilde{Y}(s)H_{j}(\hskip 1.0pt\mathrm{d}s)$ , $t\geq 0$ , for $j=1,\dots,d$ are all martingales with respect to the filtration $(\tilde{\mathcal{F}}_{t})$ , the observed information. 3. (2.3)

Status-independent observation: $a_{j}(t)=\operatorname{P}(\tilde{T}\geq t\mathbin{\mid}T\geq t)$ for $F_{j}$ -almost all $t\in\mathcal{J}$ for $j=1,\ldots,d$ . 4. (2.4)

The constant-sum property: $a_{j}(t)+B(t)=1$ for $F_{j}$ -almost all $t\in\mathcal{J}$ for $j=1,\dots,d$ .

Proof.

We consider it well known that $\tilde{N}_{j}(t)-\int_{0}^{t}\tilde{Y}(s)\tilde{H}_{j}(\hskip 1.0pt\mathrm{d}s)$ , $t\geq 0$ defines a martingale with respect to $(\tilde{\mathcal{F}}_{t})$ . Under the assumption of (2.1) and since $\tilde{Y}$ is 0 and there is no increment in $\tilde{N}_{j}$ outside $\mathcal{J}$ almost surely, we have that

[TABLE]

almost surely for all $t\geq 0$ which yields the result. On the other hand, assume that (2.2) holds. Then, for a given $j\in\{1,\dots,d\}$ and a given $t\in\mathcal{J}$ ,

[TABLE]

Since $\tilde{S}(s-)>0$ for $s\leq t$ , integrating $\tilde{S}(s-)^{-1}$ with respect to both sides establishes

[TABLE]

and this yields (2.1).

Generally, $\tilde{F}_{j}(t)=\int_{0}^{t}a_{j}(s)F_{j}(\hskip 1.0pt\mathrm{d}s)$ and $\tilde{S}(s-)=\operatorname{P}(\tilde{T}\geq s\mathbin{\mid}T\geq s)S(s-)$ . For $t\in\mathcal{J}$ , this establishes

[TABLE]

and thereby the equivalence of (2.1) and (2.3), since $H_{j}$ and $F_{j}$ have the same null sets on $\mathcal{J}$ . Assume that (2.1) and (2.3) hold. By using equation (6) of the appendix, it can be seen that $B(t)=\operatorname{P}(\tilde{T}<t\mathbin{\mid}T\geq t)$ for all $t\in\mathcal{J}$ under this assumption. Since $\operatorname{P}(\tilde{T}\geq t\mathbin{\mid}T\geq t)=a_{j}(t)$ for $F_{j}$ -almost all $t\in\mathcal{J}$ for $j=1,\dots,d$ under the assumption, we have established $a_{j}(t)+B(t)=1$ for $F_{j}$ -almost all $t\in\mathcal{J}$ for $j=1,\dots,d$ , which is (2.4). Assume instead that (2.4) holds. Equation (8) of the appendix implies that, again, $B(t)=\operatorname{P}(\tilde{T}<t\mathbin{\mid}T\geq t)$ for all $t\in\mathcal{J}$ . Use of the constant-sum condition again then yields $a_{j}(t)=\operatorname{P}(\tilde{T}\geq t\mathbin{\mid}T\geq t)$ for $F_{j}$ -almost all $t\in\mathcal{J}$ for $j=1,\dots,d$ , which is (2.3). ∎

A somewhat stronger martingale property has, however, also been considered. Let $N_{j}(t)=\mathbf{1}(T\leq t,D=j)$ for $j=1,\dots,d$ and $Y(t)=\mathbf{1}(T\geq t)$ . Define also a filtration by $\mathcal{F}_{t}=\sigma(N_{j}(s):j\in\{1,\dots,d\},s\leq t)$ and an enlarged filtration by $\mathcal{G}_{t}=\mathcal{F}_{t}\bigvee\tilde{\mathcal{F}}_{t}$ . What we call the strong martingale property is that the processes given by

[TABLE]

for $t\geq 0$ , for $j=1,\dots,d$ are all martingales with respect to the enlarged filtration $(\mathcal{G}_{t})$ . It seems well-known that the processes are martingales with respect to $(\mathcal{F}_{t})$ . So, loosely speaking, the property states that enlarging the filtration by $(\tilde{\mathcal{F}}_{t})$ does not add any information relevant for the processes. This property has similarities to Definition III.2.1 of Andersen et al., (1993) of an independent right censoring concept, which also requires the underlying martingale processes to be martingales with respect to an enlarged filtration. Similarly, Aalen & Johansen, (1978) also require the underlying martingale processes to be martingales with respect to an enlarged filtration.

The property that

[TABLE]

for all $t\geq 0$ , $s\in\mathcal{J}$ and $j=1,\ldots,d$ plays a role in Theorem 3.1.1 of Gill, (1980), in condition (G) of Jacobsen, (1989), and also matches the interpretation of independent right censoring given by Andersen & Keiding, (2006), p. 466. We call this the property of non-prognostic observation since it implies that, given survival past time $s$ , the extra knowledge that the survival past $s$ is observed, $\tilde{T}>s$ , does not influence the prognosis, that is, the probability of having events at a later point in time.

In Williams & Lagakos, (1977), survival is said to be independent of the conditions producing censoring when a property like

[TABLE]

for any $t\geq 0$ and $\tilde{H}_{0}$ -almost all $s\in\mathcal{J}$ holds for $j=1,\dots,d$ . With inspiration from Lagakos, (1979), we will call this property non-prognostic censoring because, under assumption of this property, the censoring does not provide any prognostic information about the event time or type other than survival to the censoring time.

The following result shows that these three properties are equivalent and, moreover, that they are equivalent to the existence of an independent censoring time. We will refer to them collectively as the representativity property because, looking at (3.2) and (3.3), this property implies that those at risk at a given time, $\tilde{T}>s$ , are representative for those being censored at this time, $\tilde{T}=s,\tilde{D}=0$ , in terms of the event risks.

Proposition 3.

The following properties are equivalent.

(3.1)

The strong martingale property: The processes that are given by $N_{j}(t)-\int_{0}^{t}Y(s)H_{j}(\hskip 1.0pt\mathrm{d}s)$ , $t\geq 0$ , for $j=1,\dots,d$ , are martingales with respect to the enlarged filtration $(\mathcal{G}_{t})$ . 2. (3.2)

Non-prognostic observation: $\operatorname{P}(T\leq t,D=j\mathbin{\mid}\tilde{T}>s)=\operatorname{P}(T\leq t,D=j\mathbin{\mid}T>s)$ for all $t\geq 0$ and $s\in\mathcal{J}$ . 3. (3.3)

Non-prognostic censoring: $\operatorname{P}(T\leq t,D=j\mathbin{\mid}\tilde{T}=s,\tilde{D}=0)=\operatorname{P}(T\leq t,D=j\mathbin{\mid}T>s)$ for all $t\geq 0$ and $\tilde{F}_{0}$ -almost all $s\in\mathcal{J}$ and $j=1,\dots,d$ . 4. (3.4)

Existence of an independent censoring time: A censoring time, $C>0$ , exists such that $\tilde{T}=T\wedge C$ and $C\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}(T,D)$ .

Proof.

Assume (3.1) and let $s\in\mathcal{J}$ and $t>s$ be given. Since $\{\tilde{T}>s\}\in\mathcal{G}_{s}$ , we can use the martingale property to obtain $\operatorname{E}(\mathbf{1}(T\in(s,t],D=j,\tilde{T}>s))=\operatorname{E}(\int_{s}^{t}\mathbf{1}(T\geq u,\tilde{T}>s)H_{j}(\hskip 1.0pt\mathrm{d}u))$ and divide by $\operatorname{P}(\tilde{T}>s)$ to get

[TABLE]

The $(d+1)\times(d+1)$ matrix-valued function given by $\mathbf{B}(s,t)=\{\beta_{ij}(s,t)\}$ , $\beta_{1,j+1}(s,t)=\operatorname{P}(T\leq t,D=j\mathbin{\mid}\tilde{T}>s)$ for $j=1,\dots,d$ , $\beta_{1,1}(s,t)=1-\sum_{j=1}^{d}\beta_{1,j+1}(s,t)=\operatorname{P}(T>t\mathbin{\mid}\tilde{T}>s)$ , and $\beta_{i,j}(s,t)=\mathbf{1}(i=j)$ for $i=2,\dots,d+1$ and $j=1,\ldots,d+1$ , is right continuous with left limits in both variables and by (2) is seen to satisfy the conditions of Lemma 8. Thus, we conclude that $\mathbf{B}(s,t)=\prodi_{s}^{t}(\mathbf{I}+\mathbf{H}(\hskip 1.0pt\mathrm{d}u))$ , which then implies that $\mathbf{B}(s,t)=\mathbf{P}(s,t)$ since $\mathbf{P}(s,t)=\prodi_{s}^{t}(\mathbf{I}+\mathbf{H}(\hskip 1.0pt\mathrm{d}u))$ as seen earlier. In particular we have $\operatorname{P}(T\leq t,D=j\mathbin{\mid}\tilde{T}>s)=\operatorname{P}(T\leq t,D=j\mathbin{\mid}T>s)$ for all $t\in[s,\infty)\cap\{u:S(u)>0\}$ by this argument, and this extends to all $t\geq 0$ since $\{T\notin[s,\infty)\cap\{u:S(u)>0\}\}$ has probability 0 in either probability measure. We have thereby established (3.2).

Assuming (3.2), we may argue the other way to obtain, for $u\leq s\leq t$ ,

[TABLE]

which is enough to establish the martingale property of (3.1) since $\mathcal{G}_{t}$ is generated by sets of the type $\{T>t,\tilde{T}>s\}$ for $s\leq t$ and $\{T\leq s,D=j,\tilde{T}>u\}$ for $s,u\leq t$ .

Assume again (3.2). Then we have the strong martingale property, (3.1), which is seen to imply (2.2) since integration of the $(\mathcal{G}_{t})$ -predictable process $t\mapsto\mathbf{1}(\tilde{T}\geq t)$ with respect to the integrator $t\mapsto N_{j}(t)-\int_{0}^{t}Y(s)H_{j}(\hskip 1.0pt\mathrm{d}s)$ yields $t\mapsto\tilde{N}_{j}(t)-\int_{0}^{t}\tilde{Y}(s)H_{j}(\hskip 1.0pt\mathrm{d}s)$ , which is then a $(\mathcal{G}_{t})$ -martingale and thus also a $(\tilde{\mathcal{F}}_{t})$ -martingale. In light of Proposition 2 this means that (2.1) holds. For any given $t\geq 0$ and $j\in\{1,\dots,d\}$ , equation (10) of the appendix reveals that $\int_{s}^{\infty}(\operatorname{P}(T\leq t,D=j\mathbin{\mid}\tilde{T}=u,\tilde{D}=0)-\operatorname{P}(T\leq t,D=j\mathbin{\mid}T>u))\tilde{F}_{0}(\hskip 1.0pt\mathrm{d}u)=0$ for any $s\geq 0$ since the integrand is 0 for $u>t$ , since we are assuming (3.2), and since the first two integrals of (10) are always zero because for $u\geq 0$ either $\tilde{H}_{j}(u)=H_{j}(u)$ for all $j=1,\dots,d$ or $\tilde{S}(u-)=0$ . This establishes (3.3).

If we instead assume (3.3), we obtain (2.4) from equation (11) of the appendix and so (2.1) from Proposition 2. Then equation (10) of the appendix shows that (3.2) holds since, again, for $u\geq 0$ either $\tilde{H}_{j}(u)=H_{j}(u)$ for $j=1,\dots,d$ or $\tilde{S}(u-)=0$ .

Assume now that (3.2) holds and let us show (3.4). The construction used is the one given in Appendix 3 and is based on the modification $\check{H}_{0}$ of $\tilde{H}_{0}$ as defined in equation (3) below. By construction we have that $\tilde{T}=T\wedge C$ . Furthermore, we see how, for $t\leq s$ with $t\in\mathcal{J}$ ,

[TABLE]

according to equations (16) and (17) of the appendix since also (2.1) holds. The conclusion, $\operatorname{P}(T\leq t,D=j,C>s)=\operatorname{P}(C>s)\operatorname{P}(T\leq t,D=j)$ remains valid for $t\leq s$ when $t\in(0,\infty)\backslash\mathcal{J}$ and so $s\in(0,\infty)\backslash\mathcal{J}$ since in this case either $\operatorname{P}(C>s)=0$ or $\operatorname{P}(T\leq t,D=j)=\operatorname{P}(T\in(0,t]\cap\mathcal{J},D=j)$ because $\operatorname{P}(C\wedge T\in\mathcal{J})=1$ . For $t>s$ with $s\in\mathcal{J}$ , we have

[TABLE]

using among other things (3.2) and the product structure of (4) below. The conclusion $\operatorname{P}(T\in(s,t],D=j,C>s)=\operatorname{P}(T\in(s,t],D=j)\operatorname{P}(C>s)$ remains valid when $s\in(0,\infty)\backslash\mathcal{J}$ since either side is 0 in this case. Put together, this establishes independence of $C$ and $(T,D)$ and so (3.4).

Under assumption of (3.4) we have $\operatorname{P}(T\leq t,D=j\mathbin{\mid}\tilde{T}>s)=\operatorname{P}(T\leq t,D=j\mathbin{\mid}T>s,C>s)=\operatorname{P}(T\leq t,D=j\mathbin{\mid}T>s)$ , using the independence, and this is (3.2). ∎

As noted by many authors working under assumption of some version of the representativity property, representativity implies identifiability. As demonstrated by Williams & Lagakos, (1977) in their setting, the two properties are not equivalent. This is also the case in our setting.

Proposition 4.

The representativity property implies the identifiability property, but the reverse does not hold.

Proof.

In the proof of Proposition 3, the implication has already been established. Let us here present another argument. Assume (3.4) and choose a censoring time $C$ accordingly such that $C\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}(T,D)$ . Then (2.3) holds since $a_{j}(t)=\operatorname{P}(C\geq t)=\operatorname{P}(\tilde{T}\geq t\mathbin{\mid}T\geq t)$ for $F_{j}$ almost all $t\in\mathcal{J}$ for $j=1,\dots,d$ . This shows the implication.

On the other hand, the event time $T_{2}$ and the observed pair $(\tilde{T},\tilde{D})$ constructed in Section 4 below provides an example where identifiability holds but representativity does not. ∎

3 Censoring by a given censoring time

In this section we consider as given an event time $T$ , an event type $D$ , and a censoring time $C$ . The observed pair is thus explicitly $\tilde{T}=T\wedge C$ and $\tilde{D}=D\mathbf{1}(T\leq C)$ , which is a special case of the setting in Section 2.

For the censoring time, we denote its survival function $K(t)=\operatorname{P}(C>t)$ , distribution function $G(t)=\operatorname{P}(C\leq t)$ and cumulative hazard function $H_{0}(t)=\int_{0}^{t}K(s-)^{-1}G(\hskip 1.0pt\mathrm{d}s)$ .

As a result of the asymmetry between $T$ and $C$ in the definition of $(\tilde{T},\tilde{D})$ where $T$ takes priority, a modification of $\tilde{H}_{0}$ is relevant for it to be comparable to the defined $H_{0}$ . We let

[TABLE]

define this modification and note that $\Delta\check{H}_{0}(t)(1-\Delta\tilde{H}(t))=\Delta\tilde{H}_{0}(t)$ and so $(1-\Delta\check{H}_{0}(t))(1-\Delta\tilde{H}(t))=1-\Delta(\tilde{H}_{0}+\tilde{H})(t)$ . The continuous parts of $\check{H}_{0}$ and $\tilde{H}_{0}$ are the same so by the characterization of the product integral $\tilde{S}(t)=\prodi_{0}^{t}(1-(\tilde{H}_{0}+\tilde{H})(\hskip 1.0pt\mathrm{d}s))$ of Definition 4 from Gill & Johansen, (1990), the modification allows for the product structure

[TABLE]

which has technical importance in the following.

If we define $\check{S}(t)=\tilde{S}(t-)(1-\Delta\tilde{H}(t))=\prodi_{0}^{t}(1-\tilde{H}(\hskip 1.0pt\mathrm{d}s))\prodi_{0}^{t-}(1-\check{H}_{0}(\hskip 1.0pt\mathrm{d}s))$ , this modification can also be expressed as $\check{H}_{0}(t)=\int_{0}^{t}\check{S}(s)^{-1}\tilde{F}_{0}(\hskip 1.0pt\mathrm{d}s)$ using the definition of $\tilde{H}_{0}$ . The difference $\check{S}(t)-\tilde{S}(t)$ is seen to be $\check{S}(t)\Delta\check{H}_{0}(t)=\Delta\tilde{F}_{0}(t)=\operatorname{P}(\tilde{T}=t,\tilde{D}=0)$ , and, by letting $\check{Y}(t)=\mathbf{1}(\tilde{T}>t)+\mathbf{1}(\tilde{T}=t,\tilde{D}=0)$ , we see that $\check{S}(t)=\operatorname{E}(\check{Y}(t))=\operatorname{P}(T>t,C\geq t)$ .

We now consider properties relating to the censoring similar to those of Proposition 2 which are then naturally termed the censoring identifiability property.

Proposition 5.

The following properties are equivalent.

(5.1)

We have that $\check{H}_{0}(t)=H_{0}(t)$ for any $t\in\mathcal{J}$ . 2. (5.2)

The process given by $\tilde{N}_{0}(t)-\int_{0}^{t}\check{Y}(s)H_{0}(\hskip 1.0pt\mathrm{d}s)$ , $t\geq 0$ , is a martingale with respect to the filtration $(\tilde{\mathcal{F}}_{t})$ , the observed information. 3. (5.3)

We have that $\operatorname{P}(\tilde{T}=t,\tilde{D}=0\mathbin{\mid}C=t)=\operatorname{P}(T>t\mathbin{\mid}C\geq t)$ for $G$ -almost all $t\in\mathcal{J}$ . 4. (5.4)

We have that $\operatorname{P}(\tilde{T}=t,\tilde{D}=0\mathbin{\mid}C=t)+\sum_{j=1}^{d}\int_{0}^{t}K(s-)^{-1}\tilde{F}_{j}(\hskip 1.0pt\mathrm{d}s)=1$ for $G$ -almost all $t\in\mathcal{J}$ .

Proof.

The equivalence of (5.1) and (5.2) follows by a similar argument as in the proof of Proposition 2 but now using the fact that $\tilde{N}_{0}(t)-\int_{0}^{t}\check{Y}(s)\check{H}_{0}(\hskip 1.0pt\mathrm{d}s)$ , $t\geq 0$ , can be shown to define a martingale.

The equivalence of (5.1) and (5.3) is obtained by mimicking the steps in Proposition 2 while using the identity

[TABLE]

The equivalence then follows by noting that $\check{S}(s)/K(s-)=\operatorname{P}(T>t\mathbin{\mid}C\geq t)$ .

The identity in (7) of the appendix immediately shows that (5.1) implies (5.4) by exploiting the fact that we have already established the equivalence between (5.3) and (5.1). Similarly, the identity in (10) of the appendix immediately shows that (5.4) implies (5.3). ∎

Williams & Lagakos, (1977) considered, with inspiration from Gail, (1975), an independent censoring assumption which, in this setting, may be formulated as the following property. The property is

[TABLE]

for $j=1,\dots,d$ and

[TABLE]

for all $t\in\mathcal{J}$ . In Williams & Lagakos, (1977), this assumption was seen to be a stronger assumption than the constant-sum property, here given in (2.4). As was also noted by Kalbfleisch & MacKay, (1979), in the setting of their paper, this is the case only because it includes an additional requirement on the given censoring time. This is the content of the following result.

Proposition 6.

The following properties are equivalent.

(6.1)

$\tilde{H}_{j}(t)=H_{j}(t)$ * for $j=1,\dots,d$ and $\check{H}_{0}(t)=H_{0}(t)$ for all $t\in\mathcal{J}$ .* 2. (6.2)

$\tilde{F}_{j}(t)=\int_{0}^{t}K(s-)F_{j}(\hskip 1.0pt\mathrm{d}s)$ * for $j=1,\dots,d$ and $\tilde{F}_{0}(t)=\int_{0}^{t}S(s)G(\hskip 1.0pt\mathrm{d}s)$ for all $t\in\mathcal{J}$ .* 3. (6.3)

$\operatorname{P}(C\geq t\mathbin{\mid}T=t,D=j)=\operatorname{P}(C\geq t)$ * for $F_{j}$ -almost all $t\in\mathcal{J}$ for $j=1,\dots,d$ and $\operatorname{P}(T>t\mathbin{\mid}C=t)=\operatorname{P}(T>t)$ for $G$ -almost all $t\in\mathcal{J}$ .*

Proof.

Assume that (6.1) holds. The product structure $\tilde{S}(t)=\prodi_{0}^{t}(1-\tilde{H}(\hskip 1.0pt\mathrm{d}s))\prodi_{0}^{t}(1-\check{H}_{0}(\hskip 1.0pt\mathrm{d}s))$ results in $\tilde{S}(t)=S(t)K(t)$ and similarly $\check{S}(t)=S(t)K(t-)$ under the assumption. Using the assumption again, we have $\tilde{F}_{j}(t)=\int_{0}^{t}\tilde{S}(s-)H_{j}(\hskip 1.0pt\mathrm{d}s)=\int_{0}^{t}K(s-)F_{j}(\hskip 1.0pt\mathrm{d}s)$ and $\tilde{F}_{0}(t)=\int_{0}^{t}\check{S}(s)H_{0}(\hskip 1.0pt\mathrm{d}s)=\int_{0}^{t}S(s)G(\hskip 1.0pt\mathrm{d}s)$ which is (6.2).

Assume now that (6.2) holds. Then $B(t)=\int_{0}^{t-}S(s)^{-1}\tilde{F}_{0}(\hskip 1.0pt\mathrm{d}s)=G(t-)=1-K(t-)$ using the last part of the assumption. Using this in combination with the first part of the assumption yields $\tilde{F}_{j}(t)=\int_{0}^{t}(1-B(s))F_{j}(\hskip 1.0pt\mathrm{d}s)$ . We already know that $\tilde{F}_{j}(t)=\int_{0}^{t}a_{j}(s)F_{j}(\hskip 1.0pt\mathrm{d}s)$ , so the constant sum property of (2.4) and hence also (2.1) follow. The property (5.4) and so (5.1) can be obtained in a similar manner. This establishes (6.1).

From the equalities $\tilde{F}_{j}(t)=\int_{0}^{t}\operatorname{P}(C\geq s\mathbin{\mid}T=s,D=j)F_{j}(\hskip 1.0pt\mathrm{d}s)$ and $\tilde{F}_{0}(t)=\int_{0}^{t}\operatorname{P}(T>s\mathbin{\mid}C=s)G(\hskip 1.0pt\mathrm{d}s)$ which hold for all $t\in\mathcal{J}$ , the properties of (6.2) and (6.3) are seen to be equivalent. ∎

The equivalence between (6.1) and (6.2) shows that the property introduced by Williams & Lagakos, (1977) is equivalent to having both the identifiability and the censoring identifiability property. The property of (6.3) can be considered pointwise independence between $(T,D)$ and $C$ and, as is evident from the proof of Proposition 6, it also implies $\operatorname{P}(T>t,C>t)=\operatorname{P}(T>t)\operatorname{P}(C>t)$ for all $t\geq 0$ . For this reason, we refer to the properties in Proposition 6 collectively as the property of pointwise independence. It does not imply independence of $(T,D)$ and $C$ , however.

Independence of $(T,D)$ and $C$ is here referred to as full independence. This assumption is made by many authors, and is, for instance, used in Kaplan & Meier, (1958). In Lagakos, (1979), the property is described as strictly stronger than the non-prognostic censoring property from Proposition 3. The next result shows that this is the case only because full independence includes a further property of representativity of the given censoring time. This property is that

[TABLE]

holds for any $t\geq 0$ and $\tilde{F}_{j}$ -almost all $s\in\mathcal{J}$ for $j=1,\dots,d$ . We will refer to this as the censoring representativity property as it is a counterpart to (3.2). An argument similar to the one used in Proposition 4 shows that censoring representativity implies censoring identifiability but that the two properties are not equivalent. The following result now applies.

Proposition 7.

Full independence, $C\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}(T,D)$ , holds if and only if both the representativity property and the censoring representativity property hold.

Proof.

It is evident that full independence implies (3.4). Similarly, $\operatorname{P}(C\leq t\mathbin{\mid}\tilde{T}=s,\tilde{D}=j)=\operatorname{P}(C\leq t\mathbin{\mid}C\geq s,T=s,D=j)=\operatorname{P}(C\leq t\mathbin{\mid}C\geq s)$ for any $t\geq 0$ and $\tilde{H}_{j}$ -almost all $s\in\mathcal{J}$ for $j=1,\dots,d$ under the independence assumption.

Assume instead that the properties of Proposition 3 and (5) hold. By equation (12) of the appendix, (5.4) and so, by Proposition 5, also (5.1). Now (5) states that $\operatorname{P}(C>t\mathbin{\mid}\tilde{T}=s,\tilde{D}=j)=\prodi_{s-}^{t}(1-\check{H}_{0}(\hskip 1.0pt\mathrm{d}u))$ . This is exactly the conditional distribution of the independent censoring time constructed in the proof of Proposition 3. Since we are assuming that the properties of Proposition 3 hold, the same calculations lead to the independence of $C$ and $(T,D)$ . ∎

4 Examples

4.1 A technical setting

This technical example serves to illustrate the differences between the identifiability and representativity properties.

Consider the probability space $(\Omega,\mathcal{F},\operatorname{P})$ with $\Omega=[0,1]^{2}=\{(t,c)\in\mathbb{R}^{2}\mathbin{\mid}0\leq t\leq 1,0\leq c\leq 1\}$ , $\mathcal{F}$ the Borel $\sigma$ -algebra, and $\operatorname{P}$ the uniform distribution such that $\operatorname{P}([s,t]\times[u,v])=(t-s)(v-u)$ for $s\leq t$ , $u\leq v$ , all in $[0,1]$ . The random variables given by $T_{1}(t,c)=t$ and $C_{1}(t,c)=c$ are then independent. We further define the random variables

[TABLE]

and $C_{3}(t,c)=c\mathbf{1}(c<t)+\mathbf{1}(c\geq t)$ . A direct calculation reveals that the distributions of $T_{1},T_{2},C_{1},C_{2}$ are all uniform on $[0,1]$ .

If we define $\tilde{T}(t,c)=t\wedge c$ and $\tilde{D}(t,c)=\mathbf{1}(t\leq c)$ , then $\tilde{T}=T_{i}\wedge C_{j}$ and $\tilde{D}=\mathbf{1}(T_{i}\leq C_{j})$ for any choice of $i\in\{1,2\}$ and $j\in\{1,2,3\}$ . That is, any combination of the event and censoring times defined above yields the same observable exit time and exit type.

Note that the representativity property holds for $T_{1}$ by virtue of (3.4) because $T_{1}$ is independent of $C_{1}$ and $\tilde{T}=T_{1}\wedge C_{1}$ and so by Proposition 4, the identifiability property also holds for $T_{1}$ . Thus, the identifiability property also holds for $T_{2}$ since, for instance, the property of identity of forces of mortality is inherited from $T_{1}$ as $T_{1}$ and $T_{2}$ have the same distribution. A calculation reveals that for $\tilde{F}_{0}$ -almost all $s\in[0,\frac{1}{2})$ we have $\operatorname{P}(T_{2}\leq 1-s\mathbin{\mid}\tilde{T}=s,\tilde{D}=0)=1$ and $\operatorname{P}(T_{2}\leq 1-s\mathbin{\mid}T_{2}>s)=1-s/(1-s)$ such that non-prognostic censoring and thereby representativity cannot hold for $T_{2}$ . Similarly, censoring representativity holds for $C_{1}$ , but cannot hold for $C_{2}$ .

Since, for $t\in[0,t)$ , $\operatorname{P}(C_{3}\leq t)=\operatorname{P}(\tilde{T}\leq t,\tilde{D}=0)=\tilde{F}_{0}(t)$ , the cumulative hazard associated with the distribution of $C_{3}$ is $\int_{0}^{t}(1-\tilde{F}_{0}(s))^{-1}\tilde{F}_{0}(\hskip 1.0pt\mathrm{d}s)<\int_{0}^{t}(1-\tilde{F}_{0}(s)-\tilde{F}_{1}(s))^{-1}\tilde{F}_{0}(\hskip 1.0pt\mathrm{d}s)=\int_{0}^{t}\tilde{S}(s)^{-1}\tilde{F}_{0}(\hskip 1.0pt\mathrm{d}s)=\tilde{H}_{0}(t)$ for all $t\in[0,1)$ such that censoring identifiability cannot hold for $C_{3}$ .

Figure 1 illustrates the definition of $T_{i}$ and $C_{j}$ as well as the observed exit time $\tilde{T}$ as a heat map. Note how, for any combination of $T_{i}$ and $C_{j}$ , the minimum of their respective graphs correspond to the graph of $\tilde{T}$ . The assumptions met for the various choices of $T_{i}$ and $C_{j}$ to produce $(\tilde{T},\tilde{D})$ are summarized by Table 1.

The primary idea behind these examples is that with basis in independent $T_{1}$ and $C_{1}$ , we can alter the unobserved parts of the underlying event and censoring time without altering the observed $(\tilde{T},\tilde{D})$ . If the event time is left unaltered but the unobserved part of the censoring time is altered arbitrarily, the representativity property is retained. If the marginal distribution is retained as is the case for $T_{2}$ and $C_{2}$ , the identifiability property is retained.

4.2 A practical setting

As an illustration of a practical setting, we can consider the following example of a register-based study. Suppose we are interested in studying the cumulative incidences of different causes of death in a certain population. In this case, we can let $(T,D)$ denote the pair of time of death and cause of death for a randomly picked member of the population. Imagine that we have information on age and cause of death of population members except in the case of emigration from the population. In other words, we have information on $\tilde{T}\leq T$ , which equals $T$ if the time of death is observed and is the time of emigration otherwise, and $\tilde{D}=D\mathbf{1}(\tilde{T}=T)$ , which is the cause of death if the time of death is observed and 0, denoting emigration, otherwise. As can be seen from Proposition 13 of the appendix, data on the observed pair $(\tilde{T},\tilde{D})$ alone does not allow us to refute the idea that $(\tilde{T},\tilde{D})$ is produced by $(T,D)$ and a time to emigration $C$ that are independent, $C\mathchoice{\mathrel{\hbox to0.0pt{$ \displaystyle\perp $\hss}\mkern 2.0mu{\displaystyle\perp}}}{\mathrel{\hbox to0.0pt{$ \textstyle\perp $\hss}\mkern 2.0mu{\textstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptstyle\perp $\hss}\mkern 2.0mu{\scriptstyle\perp}}}{\mathrel{\hbox to0.0pt{$ \scriptscriptstyle\perp $\hss}\mkern 2.0mu{\scriptscriptstyle\perp}}}(T,D)$ . However, in this case, common sense tells us that emigration cannot happen after death so the time to emigration $C$ can never be independent of $(T,D)$ . Instead, one should rather define $C=\infty$ when $\tilde{D}\neq 0$ or, as in Section 2, simply not trouble oneself with defining a time to emigration for all individuals.

Suppose now we are interested in estimating the cumulative incidence proportion $F_{j}(t)$ for various time points $t\in\mathcal{J}$ for the different causes of death $j=1,\dots,d$ . In the end, the problem of defining a time to emigration has no bearing on the validity of the Aalen–Johansen estimator as an estimate of $F_{j}(t)$ . We instead require the identifiability property relating $(T,D)$ to $(\tilde{T},\tilde{D})$ directly as laid out in Proposition 2. In terms of the identity of forces of mortality property, this requirement has the interpretation that the observable hazard of any of the causes of death as represented by $\tilde{H}_{j}$ should equal the underlying hazard of the same cause as represented by $H_{j}$ on the relevant time interval. In terms of the status-independent observation property, the requirement has the interpretation that the status of survival up to any time point and the status of death of a certain cause at the same time point are equally likely to be observed, that is, the probability of not emigrating before that time point given the status does not depend on the status.

If we are instead interested in using the Aalen–Johansen estimator for prognosis, we need a stronger assumption. Suppose we are looking at population members that are alive and have not emigrated at time point $s$ and we are interested in estimating the probabilities of dying of the different causes before time $t>s$ . In other words, we are interested in estimating $\operatorname{P}(T\leq t,D=j\mathbin{\mid}\tilde{T}>s)$ . Under the identifiability assumption, a valid estimate of $\operatorname{P}(T\leq t,D=j\mathbin{\mid}T>s)=(F_{j}(t)-F_{j}(s))/S(s)$ can be obtained based on the Aalen–Johansen and related Kaplan–Meier estimator. In order for this estimate to be a valid estimate of $\operatorname{P}(T\leq t,D=j\mathbin{\mid}\tilde{T}>s)$ as well, an assumption of the non-prognostic observation property from Proposition 3 is needed. That is, we require the stronger representativity property to hold. By the equivalence to non-prognostic censoring, this entails that population members emigrating at time $s$ have the same probabilities of dying of certain causes as members that are alive at time $s$ .

The representativity assumption also implies the existence of a censoring time $C$ independent of $(T,D)$ which corresponds to the time to emigration for individuals who emigrate. It may be useful to think in terms of such a $C$ , but its value for individuals who are not observed to be emigrating should not, at least without further assumptions, be confused with a counterfactual emigration time that would have been observed if death had not occurred beforehand. In fact, the censoring time $C$ may not have any relevant interpretation for individuals who are observed to die.

In register-based studies, censoring at end of follow up may be much more prominent than, for instance, censoring by emigration from the population. End of follow-up is an example of a censoring time that may defined explicitly without consideration of the underlying $(T,D)$ . This extra piece of information can be used to judge whether censoring identifiability and censoring representativity are appropriate, but these properties do not help us in judging the validity of the representativity or identifiability properties related to $(T,D)$ and thus the validity of the Aalen–Johansen and Kaplan–Meier estimators.

5 Discussion

When no given, underlying censoring time is considered, the assumptions that we have studied that ensure consistency of the Kaplan–Meier and Aalen–Johansen estimators fall in two categories: an identifiability assumption and a representativity assumption. Although the properties within one category are all equivalent, they are quite different in their interpretations and hence some may be easier to communicate to a clinical researcher than others. Which interpretation is most suitable is a matter of preference but it seems to us that the properties of status-independent observation and non-prognostic observation are much easier to interpret and potentially refute than, for example, the corresponding martingale properties.

The appropriateness of either assumption cannot be assessed based on information on the exit time and exit type alone as is seen from Proposition 13 of the appendix, which ensures the existence of an event time and type that realize the observed exit time and exit type and at the same time satisfy the representativity assumption. This is in a similar vein to the result by Molenberghs et al., (2008) that one cannot distinguish between missing-at-random and missing-not-at-random models based on only the observed data. Consequently, any information used to the assess the validity of the identifiability or representativity assumption must come from an external source.

Other properties than the ones treated in this paper have been considered in the literature. Ebrahimi et al., (2003) considered a certain property and proceeded to argue its equivalence to the constant-sum property. Jacobsen, (1989) considered three nested properties in a setting where the observed censoring times need not be independent and identically distributed, see his Proposition 3.4.

It appears that these three properties all translate into equivalents of the representativity property in our setting.

Our focus has been on marginal distributions, but in regression analysis in a survival analysis context, a similar question of necessary assumptions on the censoring mechanism is highly relevant. Seemingly, versions of the assumptions studied here in the conditional distribution given covariates of a regression model are useful in this respect.

Acknowledgements

The authors would like to thank an anonymous referee for valuable comments that have greatly improved the manuscript. Morten Overgaard is supported by the Novo Nordisk Foundation grant NNF17OC0028276.

Appendix 1

Technical results

Consider the matrix $\mathbf{H}$ defined in (1). We then have the following characterization of the product integral $\prodi(\mathbf{I}+\hskip 1.0pt\mathrm{d}\mathbf{H})$ .

Lemma 8.

Consider a $(d+1)\times(d+1)$ matrix-valued function given by $\mathbf{B}(s,t)=\{\beta_{i,j}(s,t)\}$ for $s,t\geq 0$ which is right continuous with left limits in both variables. Then, for given $s\geq 0$ , $\mathbf{B}(s,t)=\prodi_{s}^{t}(\mathbf{I}+\mathbf{H}(\hskip 1.0pt\mathrm{d}u))$ for all $t\in[s,\infty)\cap\{t:S(t)>0\}$ if and only if

[TABLE]

for $j=1,\dots,d$ , $\beta_{1,1}(s,t)=1-\sum_{j=1}^{d}\beta_{1,j+1}(s,t)$ , and $\beta_{i,j}(s,t)=\mathbf{1}(i=j)$ for $i=2,\dots,d+1$ for all $t\in[s,\infty)\cap\{t:S(t)>0\}$ .

Proof.

This is a special case of Theorem 5 of Gill & Johansen, (1990), which establishes that $\mathbf{B}(s,t)=\prodi_{s}^{t}(\mathbf{I}+\mathbf{H}(\hskip 1.0pt\mathrm{d}u))$ if and only if the forward equation $\mathbf{B}(s,t)-\mathbf{I}=\int_{s}^{t}\mathbf{B}(s,u-)\mathbf{H}(\hskip 1.0pt\mathrm{d}u)$ for all $t\in[s,\infty)\cap\{t:S(t)>0\}$ holds. The only solutions to the equations $\beta_{i,1}(s,t)=-\int_{s}^{t}\beta_{i,1}(s,u-)H(\hskip 1.0pt\mathrm{d}u)$ for all $t\in[s,\infty)\cap\{t:S(t)>0\}$ for $i=2,\dots,d+1$ implied by the forward equation, are $\beta_{i,1}(s,t)=0$ , see for instance Theorem 10 of Gill & Johansen, (1990). This, in turn, implies that $\beta_{i,j}(s,t)=\mathbf{1}(i=j)$ for $i=2,\dots,d+1$ for $\mathbf{B}$ to be a solution to the forward equation. ∎

In the following, we give some useful identities in the setup of Section 3 where event time $T$ , event type $D$ , and censoring time $C$ are all given and we observe $\tilde{T}=T\wedge C$ and $\tilde{D}=D\mathbf{1}(T\leq C)$ . The identities not involving the $C$ may, however, also be used in the setting of Section 2 where a censoring time $C$ is not explicitly given.

Lemma 9.

The equalities

[TABLE]

and

[TABLE]

hold for all $t\in\mathcal{J}$ .

Proof.

Since the product integral structures $S(t)/S(s)=\prodi_{s}^{t}(1-H(\hskip 1.0pt\mathrm{d}u))$ and $\tilde{S}(t)/\tilde{S}(s)=\prodi_{s}^{t}(1-(\tilde{H}_{0}+\tilde{H})(\hskip 1.0pt\mathrm{d}u))$ hold, the equality $S(t)-\tilde{S}(t)=\int_{0}^{t}S(t)S(s)^{-1}\tilde{S}(s-)(\tilde{H}_{0}+\tilde{H}-H)(\hskip 1.0pt\mathrm{d}s)$ holds according to the Duhamel equation, see Theorem 6 of Gill & Johansen, (1990). Note that $S(t-)-\tilde{S}(t-)=\operatorname{P}(\tilde{T}<t,T\geq t)$ and recall that $B(t)=\int_{0}^{t-}S(s)^{-1}\tilde{F}_{0}(\hskip 1.0pt\mathrm{d}s)=\int_{0}^{t-}\tilde{S}(s-)S(s)^{-1}\tilde{H}_{0}(\hskip 1.0pt\mathrm{d}s)$ to obtain the equality (6).

A similar argument leads to $\operatorname{P}(T\leq t\mathbin{\mid}C\geq t)=\sum_{j=1}^{d}\Delta F_{j}(s)/K(t-)+\sum_{j=1}^{d}\int_{0}^{t-}K(s)^{-1}\tilde{F}_{j}(\hskip 1.0pt\mathrm{d}s)+\int_{0}^{t-}\tilde{S}(s-)K(s)^{-1}(\tilde{H}_{0}-H_{0})(\hskip 1.0pt\mathrm{d}s)$ . The equality (7) now follows by realizing that $\int_{0}^{t-}\check{S}(s)K(s)^{-1}\check{H}_{0}(\hskip 1.0pt\mathrm{d}s)=\int_{0}^{t-}\tilde{S}(s-)K(s)^{-1}\tilde{H}_{0}(\hskip 1.0pt\mathrm{d}s)$ and $\int_{0}^{t-}(\tilde{S}(s-)-\check{S}(s))K(s)^{-1}H_{0}(\hskip 1.0pt\mathrm{d}s)=\sum_{j=1}^{d}\sum_{s<t}\Delta\tilde{F}_{j}(s)\Delta H_{0}(s)/K(s)=\sum_{j=1}^{d}\int_{0}^{t-}(K(s-)^{-1}-K(s)^{-1})\tilde{F}_{j}(\hskip 1.0pt\mathrm{d}s)$ . ∎

Lemma 10.

The equalities

[TABLE]

and

[TABLE]

hold for all $t\in\mathcal{J}$ .

Proof.

We know that $\sum_{j=1}^{d}F_{j}(t-)=1-S(t-)$ and similarly that $\tilde{S}(t-)+\sum_{j=0}^{d}\tilde{F}_{j}(t-)=1$ . Since $\tilde{F}_{j}(s)=\int_{0}^{s}a_{j}(u)F_{j}(\hskip 1.0pt\mathrm{d}u)$ , we have $1-\sum_{j=1}^{d}\int_{0}^{t-}a_{j}(s)F_{j}(\hskip 1.0pt\mathrm{d}s)=\tilde{S}(t-)+\tilde{F}_{0}(t-)$ . Recall that $B(s)=\int_{0}^{s-}S(u)^{-1}\tilde{F}_{0}(\hskip 1.0pt\mathrm{d}u)$ . A change in the order of integration reveals that $\tilde{F}_{0}(t-)-\sum_{j=1}^{d}\int_{0}^{t-}B(s)F_{j}(\hskip 1.0pt\mathrm{d}s)=\int_{0}^{t-}\operatorname{P}(T\geq t\mathbin{\mid}T>u)\tilde{F}_{0}(\hskip 1.0pt\mathrm{d}u)=S(t-)B(t)$ , where the definition of $B$ is used once more. Put together, this establishes that the equality

[TABLE]

holds for all $t\in\mathcal{J}$ . Now the desired result follows since $\operatorname{P}(\tilde{T}<t\mathbin{\mid}T\geq t)=(S(t-)-\tilde{S}(t-))/S(t-)$ . As for the second equality, similar arguments lead to $\sum_{j=1}^{d}\int_{0}^{t-}K(u-)^{-1}\tilde{F}_{j}(\hskip 1.0pt\mathrm{d}u)=(1-\tilde{S}(t-)K(t-)^{-1})+K(t-)^{-1}\int_{0}^{t-}(1-\operatorname{P}(\tilde{T}=s,\tilde{D}=0\mathbin{\mid}C=s)-\sum_{j=1}^{d}\int_{0}^{s}K(u-)^{-1}\tilde{F}_{j}(\hskip 1.0pt\mathrm{d}u))G(\hskip 1.0pt\mathrm{d}s)$ and (10) then follows since $\operatorname{P}(T\leq t\mathbin{\mid}C\geq t)=(1-\check{S}(t)K(t-)^{-1})=(1-\tilde{S}(t-)K(t-)^{-1})+\sum_{j=1}^{d}\Delta\tilde{F}_{j}(t)/K(t-)$ . ∎

Lemma 11.

The equality

[TABLE]

holds for all $t\geq 0$ and $s\in\mathcal{J}$ with $s<t$ .

Proof.

As a preliminary step, we have $\operatorname{P}(T\leq t,D=j\mathbin{\mid}\tilde{T}>s)-\operatorname{P}(\tilde{T}\leq t,\tilde{D}=j\mathbin{\mid}T>s)=\operatorname{P}(T\leq t,D=j,\tilde{T}\leq t,\tilde{D}=0\mathbin{\mid}\tilde{T}>s)=\operatorname{P}(\tilde{T}>s)^{-1}\int_{s}^{t}\operatorname{P}(T\leq t,D=j\mathbin{\mid}\tilde{T}=u,\tilde{D}=0)\tilde{F}_{0}(\hskip 1.0pt\mathrm{d}u)$ . On the other hand, an application of the Duhamel equation in $d+2$ dimensions, or a direct calculation, reveals that $\operatorname{P}(T\leq t,D=j\mathbin{\mid}T>s)-\operatorname{P}(\tilde{T}\leq t,\tilde{D}=j\mathbin{\mid}\tilde{T}>s)=\int_{s}^{t}\operatorname{P}(T\leq t,D=j\mathbin{\mid}T>u)\tilde{S}(u-)\tilde{S}(s)^{-1}(\tilde{H}_{0}+\tilde{H}-H)(\hskip 1.0pt\mathrm{d}u)+\int_{s}^{t}\tilde{S}(u-)\tilde{S}(s)^{-1}(H_{j}-\tilde{H}_{j})(\hskip 1.0pt\mathrm{d}u)$ . Subtract the first expression from the second expression to obtain (10). ∎

Lemma 12.

The equalities

[TABLE]

and

[TABLE]

hold for all $t\in\mathcal{J}$ for alle $j=1,\dots,d$ .

Proof.

A change in the order of integration shows that $\int_{0}^{t}B(u)F_{j}(\hskip 1.0pt\mathrm{d}u)=\int_{0}^{t}F_{j}(t\mathbin{\mid}s)\tilde{F}_{0}(\hskip 1.0pt\mathrm{d}s)$ . Split up $F_{j}(t)=\tilde{F}_{j}(t)+\int_{0}^{t}\operatorname{P}(T\leq t,D=j\mathbin{\mid}\tilde{T}=s,\tilde{D}=0)\tilde{F}_{0}(\hskip 1.0pt\mathrm{d}s)$ , where we have $\tilde{F}_{j}(t)=\int_{0}^{t}a_{j}(s)\hskip 1.0pt\mathrm{d}F_{j}(s)$ , and put together to obtain (11). The argument for (12) is similar. ∎

Appendix 2

Convergence of the Aalen–Johansen estimator

For an i.i.d. sample $(\tilde{T}_{1},\tilde{D}_{1}),\ldots,(\tilde{T}_{n},\tilde{D}_{n})$ of $(\tilde{T},\tilde{D})$ , we let $\hat{H}_{j,n}$ denote the Nelson–Aalen estimator for $H_{j}$ and $\hat{H}_{n}=\sum_{j=1}^{d}\hat{H}_{j,n}$ . If we define the $(d+1)\times(d+1)$ matrix

[TABLE]

then the Aalen–Johansen estimator is defined as

[TABLE]

By arguments similar to Section 4.2 of Gill & Johansen, (1990), we see that $\sup_{s\in[0,t]}|\hat{H}_{j,n}(s)-\tilde{H}_{j}(s)|\to 0$ almost surely for $n\to\infty$ for all $j$ and $t\in\mathcal{J}$ and thus also $\sup_{s\in[0,t]}\|\hat{\mathbf{H}}_{n}(s)-\tilde{\mathbf{H}}(s)\|\to 0$ almost surely for $n\to\infty$ for all $t\in\mathcal{J}$ . By continuity of the product integral (Gill & Johansen,, 1990) we conclude that

[TABLE]

almost surely as $n\to\infty$ for all $t\in\mathcal{J}$ .

The Kaplan–Meier estimator for the all-cause survival function $S$ is defined as

[TABLE]

which is just entry $(1,1)$ of $\hat{\mathbf{P}}_{n}(t)$ .

Appendix 3

Constructing latent times

Let us consider a probability space $(\Omega,\mathcal{F},\operatorname{P})$ on which random variables $\tilde{T}:\Omega\to(0,\infty)$ and $\tilde{D}:\Omega\to\{0,\dots,d\}$ are defined. We now want to extend the probability space in order to define a random variable $C$ that satisfies $C\geq\tilde{T}$ when $\tilde{D}\neq 0$ and $C=\tilde{T}$ when $\tilde{D}=0$ and follows a certain conditional distribution given $(\tilde{T},\tilde{D})$ . The desired conditional cumulative distribution function is given by $F_{C\mathbin{\mid}\tilde{T},\tilde{D}}(c\mathbin{\mid}\tilde{t},\tilde{d})=\mathbf{1}(\tilde{t}\leq c)\mathbf{1}(\tilde{d}=0)+(1-\prodi_{\tilde{t}-}^{c}(1-\check{H}_{0}(\hskip 1.0pt\mathrm{d}u)))\mathbf{1}(\tilde{t}\leq c)\mathbf{1}(\tilde{d}\neq 0)$ where $\check{H}_{0}$ is defined in (3) solely based on the distribution of $(\tilde{T},\tilde{D})$ . For given $\tilde{t}$ and $\tilde{d}$ , the function $c\mapsto F_{C\mathbin{\mid}\tilde{T},\tilde{D}}(c\mathbin{\mid}\tilde{t},\tilde{d})$ is right-continuous and increasing. For a right-continuous and increasing function $f:\mathbb{R}\to\mathbb{R}$ , the inversion, as defined in for instance Section II.2a of Asmussen & Glynn, (2007), given by $f^{\leftarrow}(u)=\inf\{x:f(x)\geq u\}\in\mathbb{R}\cup\{-\infty,\infty\}$ is a useful concept. Using right continuity and that $f$ is increasing, the conclusion that $u\leq f(x)$ if and only if $f^{\leftarrow}(u)\leq x$ can be reached. Extend the sample space to $\Omega^{\prime}=\Omega\times[0,1]$ and the $\sigma$ -algebra to $\mathcal{F}^{\prime}=\mathcal{F}\times\mathcal{B}([0,1])$ , where $\mathcal{B}$ is the Borel $\sigma$ -algebra, and the probability measure by $\operatorname{P}^{\prime}(A\times(u,v])=\operatorname{P}(A)(v-u)$ . By these extensions, the random variable $U$ defined on the probability space $(\Omega^{\prime},\mathcal{F}^{\prime},\operatorname{P}^{\prime})$ and given by $U(\omega^{\prime})=u$ for $\omega^{\prime}=(\omega,u)\in\Omega^{\prime}$ follows a uniform distribution on $[0,1]$ and is independent of $(\tilde{T},\tilde{D})$ . The random variable defined by $C(\omega^{\prime})=F_{C\mathbin{\mid}\tilde{T},\tilde{D}}^{\leftarrow}(u\mathbin{\mid}\tilde{T}(\omega),\tilde{D}(\omega))$ for $\omega^{\prime}=(\omega,u)$ , where $u\mapsto F_{C\mathbin{\mid}\tilde{T},\tilde{D}}^{\leftarrow}(u\mathbin{\mid}\tilde{t},\tilde{d})$ is the inversion of $c\mapsto F_{C\mathbin{\mid}\tilde{T},\tilde{D}}(c\mathbin{\mid}\tilde{t},\tilde{d})$ , now fulfills $C(\omega^{\prime})\geq\tilde{T}(\omega)$ when $\tilde{D}(\omega)\neq 0$ and $C(\omega^{\prime})=\tilde{T}(\omega)$ when $\tilde{D}(\omega)=0$ and has the desired conditional distribution. Renaming $\operatorname{P}^{\prime}$ to $\operatorname{P}$ , we note that, for $t\leq s$ ,

[TABLE]

where the last equation uses $\tilde{S}(s)=\prodi_{0}^{s}(1-\check{H}_{0}(\hskip 1.0pt\mathrm{d}u))\prodi_{0}^{s}(1-\tilde{H}(\hskip 1.0pt\mathrm{d}u))$ and $\tilde{H}_{j}(s)=\int_{0}^{s}\tilde{S}(u-)^{-1}\tilde{F}_{j}(\hskip 1.0pt\mathrm{d}u)$ . Since $\prodi_{0}^{s}(1-\tilde{H}(\hskip 1.0pt\mathrm{d}u))+\sum_{j=1}^{d}\int_{0}^{s}\prodi_{0}^{u-}(1-\tilde{H}(\hskip 1.0pt\mathrm{d}v))\tilde{H}_{j}(\hskip 1.0pt\mathrm{d}u)=1$ , these equalities can also be used to establish that

[TABLE]

This reveals that the constructed $C$ is proper when and only when $\prodi_{0}^{s}(1-\check{H}_{0}(\hskip 1.0pt\mathrm{d}u))\to 0$ for $s\to\infty$ .

In a setting identical to above, we now want to construct $(T,D)$ such that $(T,D)=(\tilde{T},\tilde{D})$ when $\tilde{D}\neq 0$ and $T>\tilde{T}$ when $\tilde{D}=0$ and such that $(T,D)$ has a certain conditional distribution given $(\tilde{T},\tilde{D})$ . Here, the desired conditional cumulative distribution function is given by $F_{T,D\mathbin{\mid}\tilde{T},\tilde{D}}(t,j\mathbin{\mid}\tilde{t},\tilde{d})=\mathbf{1}(\tilde{t}\leq t)\mathbf{1}(0<\tilde{d}\leq j)+\sum_{k=1}^{j}\int_{\tilde{t}}^{t}\prodi_{\tilde{t}}^{s-}(1-\tilde{H}(\hskip 1.0pt\mathrm{d}u))\tilde{H}_{k}(\hskip 1.0pt\mathrm{d}u)\mathbf{1}(\tilde{d}=0)\mathbf{1}(\tilde{t}\leq t)$ . This can be achieved in a manner similar to above. A two step procedure is to construct $D$ according to the conditional cumulative distribution function given by $F_{D\mathbin{\mid}\tilde{T},\tilde{D}}(j\mathbin{\mid}\tilde{t},\tilde{d})=F_{T,D\mathbin{\mid}\tilde{T},\tilde{D}}(\infty,j\mathbin{\mid}\tilde{t},\tilde{d})(F_{T,D\mathbin{\mid}\tilde{T},\tilde{D}}(\infty,d\mathbin{\mid}\tilde{t},\tilde{d}))^{-1}$ and then to construct $T$ according to the conditional cumulative distribution function given by

[TABLE]

where division by 0 can be taken to produce 0. The so constructed pair $(T,D)$ satisfies in particular $\operatorname{P}(T\leq t,D=j\mathbin{\mid}\tilde{T}=s,\tilde{D}=0)=\int_{s}^{t}\prodi_{s}^{u-}(1-\tilde{H}(\hskip 1.0pt\mathrm{d}v))\tilde{H}_{j}(\hskip 1.0pt\mathrm{d}u)$ for $t\geq\tilde{t}$ and $s\in\mathcal{J}$ for $j=1,\dots,d$ and so also

[TABLE]

These constructions lead to the following proposition, which has similarities to Theorem 2 of Tsiatis, (1975).

Proposition 13.

Given positive random variable $\tilde{T}$ and random variable $\tilde{D}$ with values in $\{0,\dots,d\}$ , positive random variables $T$ and $C$ and random variable $D$ with values in $\{1,\ldots,d\}$ exist such that $\tilde{T}=T\wedge C$ , $\tilde{D}=D\mathbf{1}(T\leq C)$ and such that $(T,D)$ is independent of $C$ .

Proof.

Construct $C$ , $T$ and $D$ as described above. The distribution of $C$ is given by $\operatorname{P}(C>t)=\prodi_{0}^{t}(1-\check{H}_{0}(\hskip 1.0pt\mathrm{d}s))$ and the distribution of $(T,D)$ is given by $\operatorname{P}(T\leq t,D=j)=\int_{0}^{t}\prodi_{0}^{s-}(1-\tilde{H}(\hskip 1.0pt\mathrm{d}u))\tilde{H}_{j}(\hskip 1.0pt\mathrm{d}s)$ . Equation (16) reveals that, for $t\leq s$ , $\operatorname{P}(T\leq t,D=j,C>s)=\operatorname{P}(\tilde{T}\leq t,\tilde{D}=j,C>s)=\operatorname{P}(C>s)\operatorname{P}(T\leq t,D=j)$ . By construction we have, for $t>s$ , $\operatorname{P}(T\leq t,D=j\mathbin{\mid}\tilde{T}=s,\tilde{D}=0)=\operatorname{P}(T\leq t,D=j\mathbin{\mid}T>s)$ . According to Proposition 3, this implies $\operatorname{P}(T\leq t,D=j\mathbin{\mid}\tilde{T}>s)=\operatorname{P}(T\leq t,D=j\mathbin{\mid}T>s)$ for all $t\geq 0$ and $s\in\mathcal{J}$ . Using this and (4), we have, for $t>s$ with $s\in\mathcal{J}$ ,

[TABLE]

Both sides are 0 if $s\in(0,\infty)\backslash\mathcal{J}$ . We have thereby established that $\operatorname{P}(T\leq t,D=j,C>s)=\operatorname{P}(T\leq t,D=j)\operatorname{P}(C>s)$ for all $t,s\geq 0$ and $j=1,\dots,d$ , and so $(T,D)$ and $C$ are independent. ∎

Bibliography19

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Aalen & Johansen, (1978) Aalen, OO, & Johansen, S. 1978. An empirical transition matrix for non-homogeneous Markov chains based on censored observations. Scandinavian Journal of Statistics , 141–150.
2Andersen & Keiding, (2006) Andersen, PK, & Keiding, N. 2006. Survival and event history analysis . Wiley.
3Andersen et al., (1993) Andersen, PK, Borgan, Ø, Gill, RD, & Keiding, N. 1993. Statistical models based on counting processes . Springer Series in Statistics. Springer-Verlag, New York.
4Asmussen & Glynn, (2007) Asmussen, S, & Glynn, PW. 2007. Stochastic simulation: algorithms and analysis . Vol. 57. Springer Science & Business Media.
5Ebrahimi et al., (2003) Ebrahimi, N, Molefe, D, & Ying, Z. 2003. Identifiability and censored data. Biometrika , 90 (3), 724–727.
6Elandt-Johnson, (1976) Elandt-Johnson, RC. 1976. Conditional failure time distributions under competing risk theory with dependent failure times and proportional hazard rates. Scandinavian Actuarial Journal , 1976 (1), 37–51.
7Fleming & Harrington, (1991) Fleming, TR, & Harrington, DP. 1991. Counting Processes and Survival Analysis . John Wiley & Sons.
8Gail, (1975) Gail, M. 1975. A review and critique of some models used in competing risk analysis. Biometrics , 209–222.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

On the assumption of independent right censoring

Abstract

1 Introduction

2 A censored event time

Proposition 1**.**

Proof.

Proposition 2**.**

Proof.

Proposition 3**.**

Proof.

Proposition 4**.**

Proof.

3 Censoring by a given censoring time

Proposition 5**.**

Proof.

Proposition 6**.**

Proof.

Proposition 7**.**

Proof.

4 Examples

4.1 A technical setting

4.2 A practical setting

5 Discussion

Acknowledgements

Appendix 1

Technical results

Lemma 8**.**

Proof.

Lemma 9**.**

Proof.

Lemma 10**.**

Proof.

Lemma 11**.**

Proof.

Lemma 12**.**

Proof.

Appendix 2

Convergence of the Aalen–Johansen estimator

Appendix 3

Constructing latent times

Proposition 13**.**

Proof.

Proposition 1.

Proposition 2.

Proposition 3.

Proposition 4.

Proposition 5.

Proposition 6.

Proposition 7.

Lemma 8.

Lemma 9.

Lemma 10.

Lemma 11.

Lemma 12.

Proposition 13.