Bayesian nonparametric estimation of survival functions with   multiple-samples information

Alan Riva Palacio; Fabrizio Leisen

arXiv:1704.07645·stat.ME·March 20, 2018

Bayesian nonparametric estimation of survival functions with multiple-samples information

Alan Riva Palacio, Fabrizio Leisen

PDF

Open Access

TL;DR

This paper introduces a Bayesian nonparametric approach for estimating survival functions that accounts for dependence among multiple samples, extending previous models to higher dimensions with theoretical and practical validation.

Contribution

It develops a flexible, dependent vector of nonparametric priors for survival analysis, extending existing models to arbitrary dimensions with theoretical insights.

Findings

01

Model effectively captures dependence among groups

02

Theoretical results on posterior behavior are established

03

Performance validated on simulated Clayton copula data

Abstract

In many real problems, dependence structures more general than exchangeability are required. For instance, in some settings partial exchangeability is a more reasonable assumption. For this reason, vectors of dependent Bayesian nonparametric priors have recently gained popularity. They provide flexible models which are tractable from a computational and theoretical point of view. In this paper, we focus on their use for estimating survival functions with multiple-samples information. Our methodology allows to model the dependence among survival times of different groups of observations and extend previous work to an arbitrary dimension . Theoretical results about the posterior behaviour of the underlying dependent vector of completely random measures are provided. The performance of the model is tested on a simulated dataset arising from a distributional Clayton copula.

Equations239

S (t) = P [Y > t ∣ μ] = e^{- μ (0, t]},

S (t) = P [Y > t ∣ μ] = e^{- μ (0, t]},

t \to \infty lim μ (0, t] = \infty.

t \to \infty lim μ (0, t] = \infty.

S (t) = P [Y_{i_{1}}^{(1)} > t_{1}, \dots, Y_{i_{d}}^{(d)} > t_{d} ∣ (μ_{1}, \dots, μ_{d})] = e^{- μ_{1} (0, t_{1}] - \dots - μ_{d} (0, t_{d}]},

S (t) = P [Y_{i_{1}}^{(1)} > t_{1}, \dots, Y_{i_{d}}^{(d)} > t_{d} ∣ (μ_{1}, \dots, μ_{d})] = e^{- μ_{1} (0, t_{1}] - \dots - μ_{d} (0, t_{d}]},

Y_{1}^{(i)}, \dots, Y_{n_{i}}^{(i)} \sim i.i.d. NTR (μ_{i})

Y_{1}^{(i)}, \dots, Y_{n_{i}}^{(i)} \sim i.i.d. NTR (μ_{i})

\big{\{}Z_{1},\ldots,Z_{m}\big{\}}\stackrel{{\scriptstyle\text{d}}}{{=}}\big{\{}Z_{\pi(1)},\ldots,Z_{\pi(m)}\big{\}}.

\big{\{}Z_{1},\ldots,Z_{m}\big{\}}\stackrel{{\scriptstyle\text{d}}}{{=}}\big{\{}Z_{\pi(1)},\ldots,Z_{\pi(m)}\big{\}}.

{(Z_{i}^{(1)}, Z_{i}^{(2)})}_{i = 1}^{\infty}

{(Z_{i}^{(1)}, Z_{i}^{(2)})}_{i = 1}^{\infty}

\big{\{}Z_{1}^{(1)},\ldots,Z_{m_{1}}^{(1)},Z_{1}^{(2)},\ldots,Z_{m_{2}}^{(2)}\big{\}}\stackrel{{\scriptstyle\text{d}}}{{=}}\big{\{}Z_{\pi_{1}(1)}^{(1)},\ldots,Z_{\pi_{1}(m_{1})}^{(1)},Z_{\pi_{2}(1)}^{(2)},\ldots,Z_{\pi_{2}(m_{2})}^{(2)}\big{\}}.

\big{\{}Z_{1}^{(1)},\ldots,Z_{m_{1}}^{(1)},Z_{1}^{(2)},\ldots,Z_{m_{2}}^{(2)}\big{\}}\stackrel{{\scriptstyle\text{d}}}{{=}}\big{\{}Z_{\pi_{1}(1)}^{(1)},\ldots,Z_{\pi_{1}(m_{1})}^{(1)},Z_{\pi_{2}(1)}^{(2)},\ldots,Z_{\pi_{2}(m_{2})}^{(2)}\big{\}}.

μ = μ_{d} + μ_{r} + μ_{f l},

μ = μ_{d} + μ_{r} + μ_{f l},

μ_{r} = i = 1 \sum \infty W_{i} δ_{X_{i}},

μ_{r} = i = 1 \sum \infty W_{i} δ_{X_{i}},

E [e^{- λ μ_{r} (A)}] = e^{- \int_{R^{+} \times A} (1 - e^{- λ s}) ν (d s, d x)},

E [e^{- λ μ_{r} (A)}] = e^{- \int_{R^{+} \times A} (1 - e^{- λ s}) ν (d s, d x)},

\int_{R^{+} \times A} min {s, 1} ν (d s, d x) < \infty,

\int_{R^{+} \times A} min {s, 1} ν (d s, d x) < \infty,

ν (d s, d x) = ρ (d s) α (d x),

ν (d s, d x) = ρ (d s) α (d x),

ν (d s, d x) = \frac{A σ s ^{- 1 - σ}}{Γ ( 1 - σ )} d s α (d x) .

ν (d s, d x) = \frac{A σ s ^{- 1 - σ}}{Γ ( 1 - σ )} d s α (d x) .

E [e^{- λ_{1} μ_{1} (A) - \dots - λ_{d} μ_{d} (A)}] = e^{- \int_{(R^{+})^{d} \times A} (1 - e^{- λ_{1} s_{1} - \dots - λ_{d} s_{d}}) ρ_{d} (d s_{1}, \dots, d s_{d}) α (d x)},

E [e^{- λ_{1} μ_{1} (A) - \dots - λ_{d} μ_{d} (A)}] = e^{- \int_{(R^{+})^{d} \times A} (1 - e^{- λ_{1} s_{1} - \dots - λ_{d} s_{d}}) ρ_{d} (d s_{1}, \dots, d s_{d}) α (d x)},

E [e^{- λ_{1} μ_{1} (0, t] - \dots - λ_{d} μ_{d} (0, t]}] = e^{- ψ_{t} (λ)} .

E [e^{- λ_{1} μ_{1} (0, t] - \dots - λ_{d} μ_{d} (0, t]}] = e^{- ψ_{t} (λ)} .

ν_{i} (A) = \int_{A} ν_{i} (d s) = \int_{(R^{+})^{d - 1}} ρ_{d} (d s_{1}, \dots, d s_{i - 1}, A, d s_{i + 1}, \dots, d s_{d}) .

ν_{i} (A) = \int_{A} ν_{i} (d s) = \int_{(R^{+})^{d - 1}} ρ_{d} (d s_{1}, \dots, d s_{i - 1}, A, d s_{i + 1}, \dots, d s_{d}) .

{v : v is a vertex of B} \sum sign (v) C (v) \geq 0,

{v : v is a vertex of B} \sum sign (v) C (v) \geq 0,

sign (v) = {1, if v_{k} = s_{k} for an even number of vertices, - 1, if v_{k} = s_{k} for an odd number of vertices.

sign (v) = {1, if v_{k} = s_{k} for an even number of vertices, - 1, if v_{k} = s_{k} for an odd number of vertices.

C_{⊥, d} (s) = s_{1} 1_{s_{2} = \infty, \dots, s_{d} = \infty} + \dots + s_{d} 1_{s_{1} = \infty, \dots, s_{d - 1} = \infty} .

C_{⊥, d} (s) = s_{1} 1_{s_{2} = \infty, \dots, s_{d} = \infty} + \dots + s_{d} 1_{s_{1} = \infty, \dots, s_{d - 1} = \infty} .

C_{∥, d} (s) = min {s_{1}, \dots, s_{d}} .

C_{∥, d} (s) = min {s_{1}, \dots, s_{d}} .

C_{θ, d} (s) = (s_{1}^{- θ} + \dots + s_{d}^{- θ})^{- \frac{1}{θ}} .

C_{θ, d} (s) = (s_{1}^{- θ} + \dots + s_{d}^{- θ})^{- \frac{1}{θ}} .

θ \to 0 lim C_{θ, d} (s) = C_{⊥, d} (s) and θ \to \infty lim C_{θ, d} (s) = C_{∥, d} (s) .

θ \to 0 lim C_{θ, d} (s) = C_{⊥, d} (s) and θ \to \infty lim C_{θ, d} (s) = C_{∥, d} (s) .

U (x)

U (x)

\displaystyle\quad=\int_{x_{1}}^{\infty}\dots\int_{x_{d}}^{\infty}{\left.\kern-1.2pt\frac{\partial^{d}}{\partial u_{1}\cdots\partial u_{d}}\mathcal{C}_{d}(\boldsymbol{u})\vphantom{\big{|}}\right|_{u_{1}=U_{1}(s_{1}),\cdots u_{d}=U_{d}(s_{d})}}\nu_{1}(s_{1})\cdots\nu_{d}(s_{d})\mathrm{d}\boldsymbol{s}.

\rho_{d}(\boldsymbol{s})={\left.\kern-1.2pt\frac{\partial^{d}}{\partial u_{1}\cdots\partial u_{d}}\mathcal{C}_{d}(\boldsymbol{u})\vphantom{\big{|}}\right|_{u_{1}=U_{1}(s_{1}),\cdots,x_{d}=U_{d}(s_{d})}}\nu_{1}(s_{1})\cdots\nu_{d}(s_{d}).

\rho_{d}(\boldsymbol{s})={\left.\kern-1.2pt\frac{\partial^{d}}{\partial u_{1}\cdots\partial u_{d}}\mathcal{C}_{d}(\boldsymbol{u})\vphantom{\big{|}}\right|_{u_{1}=U_{1}(s_{1}),\cdots,x_{d}=U_{d}(s_{d})}}\nu_{1}(s_{1})\cdots\nu_{d}(s_{d}).

ρ_{d, θ, A, σ} (s) = \frac{A ( 1 + θ ) ( 1 + 2 θ ) \dots ( 1 + ( d - 1 ) θ ) σ ^{d} ( s _{1} s _{2} \dots s _{d} ) ^{σ θ - 1}}{Γ ( 1 - σ ) ( s _{1}^{σ θ} + \dots + s _{d}^{σ θ} ) ^{\frac{1}{θ} + d}} .

ρ_{d, θ, A, σ} (s) = \frac{A ( 1 + θ ) ( 1 + 2 θ ) \dots ( 1 + ( d - 1 ) θ ) σ ^{d} ( s _{1} s _{2} \dots s _{d} ) ^{σ θ - 1}}{Γ ( 1 - σ ) ( s _{1}^{σ θ} + \dots + s _{d}^{σ θ} ) ^{\frac{1}{θ} + d}} .

ρ_{d, A, σ} (s) = \frac{A ( σ + 1 ) ( σ + 2 ) \dots ( σ + d - 1 ) σ}{Γ ( 1 - σ ) ( s _{1} + \dots + s _{d} ) ^{σ + d}} .

ρ_{d, A, σ} (s) = \frac{A ( σ + 1 ) ( σ + 2 ) \dots ( σ + d - 1 ) σ}{Γ ( 1 - σ ) ( s _{1} + \dots + s _{d} ) ^{σ + d}} .

ψ_{d, A, σ} (λ) = i = 1 \sum d \frac{λ _{i}^{σ + d - 1}}{\prod _{j = 1, j \neq = i}^{d} ( λ _{i} - λ _{j} )}; λ_{i} \neq = λ_{j} for j \neq = i,

ψ_{d, A, σ} (λ) = i = 1 \sum d \frac{λ _{i}^{σ + d - 1}}{\prod _{j = 1, j \neq = i}^{d} ( λ _{i} - λ _{j} )}; λ_{i} \neq = λ_{j} for j \neq = i,

{{Y_{j}^{(i)}}_{j = 1}^{\infty}}_{i = 1}^{d} .

{{Y_{j}^{(i)}}_{j = 1}^{\infty}}_{i = 1}^{d} .

P [Y_{1}^{(1)} > t_{1, 1}, \dots, Y_{n_{1}}^{(1)} > t_{1, n_{1}}, \dots, Y_{1}^{(d)} > t_{d, 1}, \dots, Y_{n_{d}}^{(d)} > t_{d, n_{d}} ∣ (μ_{1}, \dots, μ_{d})]

P [Y_{1}^{(1)} > t_{1, 1}, \dots, Y_{n_{1}}^{(1)} > t_{1, n_{1}}, \dots, Y_{1}^{(d)} > t_{d, 1}, \dots, Y_{n_{d}}^{(d)} > t_{d, n_{d}} ∣ (μ_{1}, \dots, μ_{d})]

= i = 1 \prod d j = 1 \prod n_{i} e^{- μ_{i} (0, t_{i, j}]} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Methods and Mixture Models · Financial Risk and Volatility Modeling · Statistical Methods and Inference

Full text

Bayesian nonparametric estimation of survival functions with multiple-samples information

Alan Riva Palacio and Fabrizio Leisen

**School of Mathematics, Statistics and Actuarial Sciences, University of Kent

Sibson Building, Canterbury, Kent CT2 7FS**

Abstract

In many real problems, dependence structures more general than exchangeability are required. For instance, in some settings partial exchangeability is a more reasonable assumption. For this reason, vectors of dependent Bayesian nonparametric priors have recently gained popularity. They provide flexible models which are tractable from a computational and theoretical point of view. In this paper, we focus on their use for estimating multivariate survival functions. Our model extends the work of Epifani and Lijoi (2010) to an arbitrary dimension and allows to model the dependence among survival times of different groups of observations. Theoretical results about the posterior behaviour of the underlying dependent vector of completely random measures are provided. The performance of the model is tested on a simulated dataset arising from a distributional Clayton copula.

1 Introduction

Bayesian nonparametric modelling in survival analysis problems often relies on the assumption that the times observed are exchangeable, see for example [5] and [10]. Such assumption fails to hold when we consider events that are pooled from different dependent scenarios. For example, consider patients under the same treatment but in different hospitals. The survival times of patients from the same hospital could be assumed exchangeable. On the other hand, this is not a reasonable assumption when we consider patients from different hospitals since factors specific to each hospital might exert significant influence. In general, we can consider that the data is originated from $d$ different but related studies. Formally, we have $d$ sets of observations where the exchangeability assumption is assumed only within each set. In the above cases, it would be more appropriate to assume a form of dependence called partial exchangeability (see Section 2 for a formal account on exchangeability and partial exchangeability). This motivates the extension of Bayesian nonparametric models into a partially exchangeable setting where multiple-samples information could be used.

Applications of Bayesian nonparametrics in survival analysis go back, for example, to [5] and [8], who used non-decreasing independent increment processes to construct random survival functions. [6] and [16] focused on random hazard rates. More recently, [10] used a general class of random hazard rate-based models, and [19] used a general short-term and long-term hazard ratios model. There is an ongoing effort in Bayesian nonparametrics to propose flexible dependent random probability measures as set forth with the seminal work of [17]. In survival analysis, for example, [4] introduced a model based on a dependent Dirichlet process. In a partial exchangeable setting, survival analysis models have been used, for example, in [7] where a dependent two-dimensional extension of the neutral to the right (NTR) model was introduced and in [15] where a dependent vector of hazard rates was constructed. [9] introduced a new class of vectors of dependent completely random measures, called Compound Random Measures, where the dependence contribution is modelled with a parametric distribution.

In the seminal work of [5], the NTR model for survival functions was introduced. The NTR model can be expressed in terms of a Completely Random Measure (CRM) $\mu$ . This means that when $\mu$ is evaluated at pairwise disjoint sets it gives rise to mutually independent nonnegative random variables. We say that a positive random variable $Y$ has a NTR distribution given by a CRM $\mu$ , denoted $Y\sim\text{NTR}(\mu)$ , if

[TABLE]

where $\mu$ is such that

[TABLE]

NTR distributions have several appealing properties, including the independence of normalized increments and posterior conjugacy for censored to the right data. An extension of the NTR model into a partially exchangeable setting was given by [7] for the $2-$ dimensional case. In the present work, we follow the approach of [7] and focus on models based on a $d$ -dimensional * vector of completely random measures* (VCRM). More precisely, we consider $d$ collections of survival times $\{Y^{(1)}_{j}\}_{j=1}^{\infty},\ldots,\{Y^{(d)}_{j}\}_{j=1}^{\infty}$ such that, for $\boldsymbol{t}=(t_{1},\ldots,t_{d})\in(\mathbb{R}^{+})^{d}$ ,

[TABLE]

with arbitrary $i_{1},\ldots,i_{d}\in\mathbb{N}\setminus\{0\}$ . This model is convenient for modeling data where the dependence among the entries of the VCRM $\boldsymbol{\mu}=\left(\mu_{1},\ldots,\mu_{d}\right)$ accounts for dependence among the multiple-samples in a partially exchangeable setting. Furthermore, marginally we recover the NTR model, namely

[TABLE]

with $i\in\{1,\ldots,d\}$ , $n_{i}\in\mathbb{N}\setminus\{0\}$ . In (2) we want to model the dependence of the VCRM $\boldsymbol{\mu}$ in a way that allows us to fix a marginal behavior so to exploit the fact that marginally we recover a NTR model; Lévy copulas are a natural framework to model the dependence structure of VCRM’s in such way.

In this paper we provide a posterior characterization for the above model, see Theorem 1. Similarly to [7] for 2-dimensional setting, we show that the posterior distribution corresponds to a survival function of the type as in (1) leading to a conjugacy property. Extensions of some results in [7] are also provided. We would like to stress that the derivation of such results are not trivial when considering an arbitrary dimension. In particular, Proposition 1 gives a general expression for the Laplace exponent when a Lévy copula is considered to set the dependence of the VCRM underlying the $d-$ dimensional NTR model; Proposition 3 gives an alternative characterization of the multivariate NTR. Furthermore, other theoretical results are proved in order to facilitate the calculation of posterior means when the inferential exercise is implemented. Finally, we illustrate the methodology on a synthetic dataset.

The paper is organized as follows: Section 2 presents the preliminary notions which are needed in this work. In Section 3 we extend some results in [7] to the multivariate setting. In particular, we state the posterior characterization of the model and provide some useful corollaries for implementing the posterior inference. In Section 4, an application with synthetic data is illustrated. All the proofs can be found in the appendix.

2 Preliminaries

In this section, we provide some preliminaries about exchangeability, partial exchangeability and vectors of completely random measures which are the building blocks of our Bayesian nonparametric proposal. Furthermore, we will illustrate the concept of a positive Lévy copula which is useful to model the dependence structure between the components of a vector of completely random measures.

2.1 Exchangeability and Partial exchangeability

Let $\mathbb{Z}$ be a complete and separable metric space, with corresponding Borel $\sigma$ -algebra $\mathcal{Z}=\mathcal{B}(\mathbb{Z})$

Definition 1.

A collection of random variables $\{Z_{i}\}_{i=1}^{\infty}$ in $\mathbb{Z}$ is exchangeable if for any permutation $\pi$ of $\{1,\dots,m\}$ we have that

[TABLE]

As highlighted in the Introduction, in several problems the exchangeability assumption appears far too restrictive. In particular, we considered $d$ groups of observations where the order in which they are collected within each group is irrelevant. To describe this setting we resorted to the notion of partial exchangeability, as set forth by [3], that formalizes the idea of partitioning the entire set of observations into a certain number of classes, say d, in such a way that exchangeability may be reasonably assumed within each class. For ease of exposition, we confine ourselves to consider the case where d = 2.

Definition 2.

The collection of random vectors

[TABLE]

in $\mathbb{Z}^{2}$ is partially exchangeable if, for any $m_{1},m_{2}\geq 1$ and for all permutations $\pi_{1}$ and $\pi_{2}$ of $\{1,\dots,m_{1}\}$ and $\{1,\dots,m_{2}\}$ respectively, we have that

[TABLE]

2.2 Vectors of completely random measures

Given a complete and separable metric space $\mathbb{X}$ , with corresponding Borel $\sigma$ -algebra $\mathcal{X}=\mathcal{B}(\mathbb{X})$ , we call a measure $\mu$ on $(\mathbb{X},\mathcal{X})$ boundedly finite if $\mu(A)<\infty$ for any bounded set $A\in\mathcal{X}$ . A random measure is a measurable function from a probability space $(\Omega,\mathcal{F},\mathbb{P})$ onto $(\mathbb{M}_{\mathbb{X}},\mathcal{M}_{\mathbb{X}})$ which is the measure space formed by $\mathbb{M}_{\mathbb{X}}$ , the space of boundedly finite measures on $(\mathbb{X},\mathcal{X})$ , and its corresponding Borel $\sigma$ -algebra $\mathcal{M}_{\mathbb{X}}$ . In particular we will focus on the class of completely random measures as introduced in [12].

Definition 3.

A random measure $\mu$ on a complete and separable metric space $\mathbb{X}$ with corresponding Borel $\sigma$ -algebra $\,\mathcal{X}=\mathcal{B}(\mathbb{X})$ is called a completely random measure (CRM) if for any collection of disjoint sets $\{A_{1},\dots,A_{n}\}\subset\mathcal{X}$ the random variables $\mu(A_{1}),\dots,\mu(A_{n})$ are mutually independent.

A CRM $\mu$ has the following representation [12],

[TABLE]

where $\mu_{d}$ is a deterministic measure, $\mu_{fl}$ is a measure that consists on jumps with possibly random jump heights but fixed jump locations, and

[TABLE]

where for $i\in\{1,2,\dots\}$ $X_{i}\in\mathbb{X}$ are random jump locations and $W_{i}\in\mathbb{R}^{+}$ are random jump heights. The measures $\mu_{d}$ , $\mu_{fl}$ and $\mu_{r}$ are mutually independent. In particular, $\mu_{r}$ is again a CRM and is characterized by the following Laplace transform

[TABLE]

where $\lambda>0$ and $\nu$ is a measure on $\mathbb{R}^{+}\times\mathbb{X}$ such that

[TABLE]

for any bounded set $A\in\mathcal{X}$ . The measure $\nu$ is usually called the Lévy intensity of $\mu_{r}$ . In the remainder of this work we only consider CRM’s $\mu$ without fixed jump locations nor deterministic part so we take $\mu=\mu_{r}$ to be solely determined by (3). In particular we focus on Lévy intensities $\nu$ which are homogeneous, i.e.

[TABLE]

where $\alpha$ is a non-atomic measure on $\mathbb{X}$ referring to the jump locations and $\rho$ is a measure on $\mathbb{R}^{+}$ referring to the jump heights. A popular example of an homogeneous CRM is the $\sigma$ -stable process given by

[TABLE]

As an illustration, we plot in Figure 1 the associated process $\mu(0,t]$ for the $\sigma$ -stable process (4) with $\alpha(\mathrm{d}x)=\mathrm{d}x$ .

We extend this framework to the multivariate setting by considering vectors $(\mu_{1},\dots,\mu_{d})$ where each $\mu_{i}$ is a homogeneous CRM on $(\mathbb{X},\mathcal{X})$ with respective Lévy intensities $\bar{\nu}_{j}(\mathrm{d}s,\mathrm{d}x)=\nu_{j}(\mathrm{d}s)\alpha(\mathrm{d}x)$ . Moreover we take the intensity $\alpha$ to be smooth in the sense that $\alpha((0,t])=\gamma(t)$ with $\gamma:[0,\infty)\rightarrow\mathbb{R}^{+}$ a non-decreasing and differentiable function such that $\gamma(0)=0$ and $\lim_{t\to\infty}\gamma(t)=\infty$ ; this last conditions on the limit behaviour will enable us to get, marginally, the associated NTR cumulative distributions in our models. We have that for any $A_{1},\dots,A_{n}$ in $\mathcal{X}$ , with $A_{i}\cap A_{j}=\emptyset$ for any $i\neq j$ , the random vectors $(\mu_{1}(A_{i}),\dots,\mu_{d}(A_{i}))$ and $(\mu_{1}(A_{j}),\dots,\mu_{d}(A_{j}))$ are mutually independent; furthermore, one has a multivariate analogue of the Laplace transform (3)

[TABLE]

where $\boldsymbol{\lambda}=(\lambda_{1},\dots,\lambda_{d})\in(\mathbb{R}^{+})^{d}$ and $\rho_{d}$ is a measure on $(\mathbb{R}^{+})^{d}$ . In particular, we introduce the notation for the multivariate Laplace transform

[TABLE]

Henceforth, $\psi_{t}(\boldsymbol{\lambda})$ is called the Laplace exponent of $\boldsymbol{\mu}=(\mu_{1},\dots,\mu_{d})$ ; in the case at hand, $\psi_{t}(\boldsymbol{\lambda})=\gamma(t)\psi(\boldsymbol{\lambda})$ where $\psi(\boldsymbol{\lambda})=\int_{(\mathbb{R}^{+})^{d}}(1-e^{-<\boldsymbol{\lambda},\boldsymbol{s}>})\rho_{d}(\mathrm{d}\boldsymbol{s})$ and $<\boldsymbol{\lambda},\boldsymbol{s}>=\sum_{i=1}^{d}\lambda_{i}s_{i}$ is the usual inner product in $\mathbb{R}^{d}$ . Marginalizing, we have that

[TABLE]

In Section 3, we use this particular kind of homogeneous and additive vector of CRM’s to construct priors for survival analysis models.

2.3 Positive Lévy copulas

Although in this work we consider vectors of CRM’s with fixed marginal behaviour, it remains to establish the dependence structure. [11] introduced the concept of positive Lévy copulas which allows to construct vectors of CRM’s with fixed marginals.

Definition 4.

A function $\mathcal{C}(\boldsymbol{s}=(s_{1},\dots,s_{d})):[0,\infty)^{d}\rightarrow[0,\infty]$ is a positive Lèvy copula if

(i)

$\forall\,B=[s_{1},t_{1}]\times\dots\times[s_{d},t_{d}]\subset[0,\infty)^{d}$ such that $s_{1}\leq t_{1},\dots,s_{d}<t_{d}$ we have that

[TABLE]

with

[TABLE] 2. (ii)

If $\boldsymbol{s}$ is such that $s_{i}=0$ for some $i\in\{1,\dots,d\}$ then $\mathcal{C}(\boldsymbol{s})=0$ . 3. (iii)

Let $y_{1}=\dots=y_{k-1}=y_{k+1}=\dots=y_{d}=\infty$ and

$C_{k}(s)=\mathcal{C}(y_{1},\dots,y_{k-1},s_{k},y_{k+1},\dots,y_{d})$ for $k\in\{1,\dots,d\}$ then $C_{k}(s)=s$ .

For example, a vector of independent CRM’s is obtained with

[TABLE]

A vector of completely dependent CRM’s, in the sense that the jumps of the stochastic vector are in a set $S$ such that whenever $\boldsymbol{v},\boldsymbol{u}\in S$ then either $v_{i}<u_{i}$ or $u_{i}<v_{i}$ for all $i\in\{1,\dots,d\}$ , is obtained with

[TABLE]

An interesting example of positive Lévy copulas is the Clayton Lévy copula

[TABLE]

The parameter $\theta$ is positive and regulates the level of dependence. The above copulas are special cases of the Clayton Lévy copula, i.e.

[TABLE]

We define the tail integral of an univariate Lévy intensity $\nu$ to be $U(x)=\int_{x}^{\infty}\nu(s)\mathrm{d}s$ . In the setting of Section $\boldsymbol{2.1}$ we use a Lévy copula $\mathcal{C}_{d}$ and the marginal tail integrals $U_{1},\dots,U_{d}$ associated to $\nu_{1},\dots,\nu_{d}$ to specify an absolutely continuous $\rho_{d}(\mathrm{d}\boldsymbol{s})=\rho_{d}(\boldsymbol{s})\mathrm{d}\boldsymbol{s}$ via

[TABLE]

Therefore, under suitable regularity conditions, we can recover the multivariate Lévy intensity from the copula and marginal intensities in the following way

[TABLE]

For example, consider the Clayton Lévy copula with $\sigma$ -stable margins, given by (4), and $\alpha(\mathrm{d}x)=\mathrm{d}x$ . Figure 2 shows the dependence behaviour when a $2$ -dimensional Clayton Lévy copula with parameter $\theta=0.3$ and $\theta=3.5$ is employed; we plot the associated stochastic processes $\mu_{i}(0,t]$ with $i\in\{1,2\}$ similarly to Figure 1. As expected, when $\theta=0.3$ , at each jumping time, the processes have one jump weight big and one small since we are close to the independence case (where the processes almost surely share no jumping times). On the other hand, when $\theta$ is increased to $3.5$ , we can appreciate the higher dependence induced by a larger value of the copula parameter. We simulated the trajectories in Figure 2 by using Algorithm 6.15 in [2], where a full treatment of the dependence structure of Lévy intensities is also given. [13], [14] and [20] used a Lévy copula approach for building vectors of dependent completely random measures.

2.3.1 Working example

If we consider the Lévy intensity arising from (8) when considering the $d$ -dimensional Clayton Lévy copula, (7), with parameter $\theta$ and $\sigma$ -stable marginals, (4), with parameters $A,\,\sigma$ , we obtain

[TABLE]

Furthermore, if we take $\theta=1/\sigma$ we obtain the simplified Lévy intensity

[TABLE]

Such intensity corresponds to a particular family of vectors of completely random measures known as Compound Random Measures (CoRM’s) and introduced in [9]; the previous Lévy intensity arises when taking $\phi=1$ in equation (4.4) of the aforementioned paper. A convenient feature of this Lévy intensity is that, as shown in Proposition 3.1 of [20], we can explicitly get the corresponding Laplace exponent

[TABLE]

where we take the appropriate limits when $\boldsymbol{\lambda}=(\lambda_{1},\dots,\lambda_{d})$ is such that $\lambda_{i}=\lambda_{j}$ for distinct $i,j\in\{1,\dots,d\}$ . As indicated in the remark at the end of section 3, evaluation of the Laplace exponent is necessary for the explicit calculation of the posterior mean of the survival function given censored data.

3 Main results

Let $d\in\mathbb{N}\setminus\{0\}$ , and suppose we have $d$ collections of random variables

[TABLE]

We characterize the probability distribution of these random variables in terms of a vector of CRM’s $\boldsymbol{\mu}=(\mu_{1},\dots,\mu_{d})$ . For $\boldsymbol{t}=(t_{1},\dots,t_{d})\in(\mathbb{R}^{+})^{d}$ , let

[TABLE]

We observe that under such model the random variables (12) are partially exchangeable and marginally follow a $NTR$ process. The dependence structure in this model can be given through the Lévy copula associated to the CRM $\boldsymbol{\mu}$ . This model extends the one in [7] to an arbitrary dimension $d$ .

The family of Clayton Lévy copulas is of interest because it has both the independence and complete dependence cases as limit behaviour. In the next result, we work towards finding expressions for the Laplace exponent associated to the Clayton family in such a way that the dependence structure is decoupled across dimensions. This result could be useful since, as we will see, an explicit calculation of $\psi$ is of key importance to implement the Bayesian inference in our survival analysis model.

Let $\rho_{d}(\boldsymbol{s};\theta)$ be the Lévy intensity associated via (8) to the Clayton Lévy copula $\mathcal{C}_{\theta,d}$ and fixed marginal Lévy intensities $\nu_{1},\dots,\nu_{d}$ with corresponding Laplace transforms $\psi_{1},\dots,\psi_{d}$ . We denote the vector of tail integrals corresponding to the marginal Lévy intensities as $\boldsymbol{U}_{d}(\boldsymbol{x})=(U_{1}(x_{1}),\dots,U_{d}(x_{d}))$ and fix the notation

[TABLE]

where $d\in\mathbb{N}\setminus\{0\}$ , $\boldsymbol{\lambda}=(\lambda_{1},\dots,\lambda_{d})\in(\mathbb{R}^{+})^{d}$ , $m\in\{1,\dots,d\}$ , and $\boldsymbol{i}=(i_{1},i_{2},\dots,i_{m})\in\{1,\dots,d\}^{m}$ is such that $i_{1}<\dots<i_{m}$ .

Proposition 1.

Suppose that $d\in\{2,3,\dots\}$ and

[TABLE]

then

[TABLE]

where $\boldsymbol{\lambda}=(\lambda_{1},\dots,\lambda_{d})\in(\mathbb{R}^{+})^{d}$ .

We refer to the Appendix A.1 for the proof. We incorporate the Lévy exponent $\psi$ in the multivariate survival analysis setting of (12), in the next result. We introduce the notation

[TABLE]

for $h\in\{1,\dots,d\}$ and distinct $i_{1},\dots,i_{h}\in\{1,\dots,d\}$ ; and denote $\psi_{i_{1},\cdots,i_{h}}$ for the respective Laplace exponents.

Proposition 2.

In the context of (12), let $\boldsymbol{1}=(1,\dots,1)$ . For $t_{1}\leq\cdots\leq t_{d}$ and $i_{1},\dots,i_{d}\in\{1,\dots,d\}$ such that $t_{i_{1}}\leq\dots\leq t_{i_{d}}$ then

[TABLE]

We refer to the Appendix A.2 for the proof. This result showcases the importance of the Laplace exponent $\psi$ for calculating probabilities in the model and the impact of the function $\gamma(t)$ , related to the time depending part of the Laplace exponent, in the survival function. In Section 4, we will show that the availability of the Laplace exponent is also of main importance to implement the Bayesian inference for the model. The model we are working on generalizes to arbitrary dimension the classic model of [5]. We present a multivariate extension of Theorem 3.1 in [5], which relates our model with the notion of neutrality to the right. Let $F$ be a $d$ -variate random distribution function on $(\mathbb{R}^{+})^{d}$ and, for a $d$ -variate vector of CRM’s $\boldsymbol{\mu}=(\mu_{1},\dots,\mu_{d})$ , denote $\mu_{i}(t)=\mu_{i}\left((0,t]\right)$ with $i\in\{1,\dots,d\}$ . Then, we have the following multivariate extension to Theorem 3.1 in [5] and Proposition 4 in [7].

Proposition 3.

$F(\boldsymbol{t}=(t_{1},\dots,t_{d}))$ * has the same distribution as*

[TABLE]

for some $d$ -variate CRM $\boldsymbol{\mu}=(\mu_{1},\dots,\mu_{d})$ if and only if for $h\in\{1,2,\dots\}$ and vectors $\boldsymbol{t}_{1}=(t_{1,1},\dots,t_{d,1}),\dots,$ $\boldsymbol{t}_{h}=(t_{1,h},\dots,t_{d,h})$ with $t_{0,i}=0<t_{1,i}<\cdots<t_{d,i}$ and $t_{j,0}=0<t_{j,1}<\cdots<t_{j,h}$ , there exists $h$ independent random vectors $(V_{1,1},\dots V_{d,1}),\dots,(V_{1,h},\dots V_{d,h})$ such that

[TABLE]

where $\bar{V}_{i,j}=1-V_{i,j}$ with $i\in\{1,\dots,d\}$ and $j\in\{1,\dots,h\}$ .

We refer to the Appendix A.3 for the proof. We now establish some notation in order to address the posterior distribution arising from (12) when some survival data is available. Let $\bm{Y}_{n_{i}}^{(i)}=\left(Y^{(i)}_{1},\dots,Y^{(i)}_{n_{i}}\right)$ , $i=1,\dots,d$ , be $d$ groups of observations that come from the distribution given by

[TABLE]

where $\bm{t}_{i,n_{i}}=\left(t_{i,1},\dots,t_{i,n_{i}}\right)$ and the event $\{\bm{Y}_{n_{i}}^{(i)}>\bm{t}_{i,n_{i}}\}$ corresponds to the event $\{Y^{(i)}_{1}>t_{i,1},\dots,Y^{(i)}_{n_{i}}>t_{i,n_{i}}\}$ . Let $c^{(1)}_{1},\dots,c^{(1)}_{n_{1}},\dots,c^{(d)}_{1},\dots,c^{(d)}_{n_{d}}$ be their respective censoring times; therefore, the set of censored data is the following

[TABLE]

where $T^{(i)}_{j}=\min\{Y^{(i)}_{j},c^{(i)}_{j}\}$ and $\delta^{(i)}_{j}=\mathbbm{1}_{(0,c^{(i)}_{j}]}\left(Y^{(i)}_{j}\right)$ . The number of exact observations is $n_{e}=\sum_{i=1}^{d}\sum_{j=1}^{n_{i}}\delta^{(i)}_{j}$ and the number of censored observations is $n_{c}=n_{1}+n_{2}-n_{e}$ . Taking into account the possible repetition of values among the observations, we consider the order statistics $(T_{(1)},\dots,T_{(k)})$ of the distinct observations where $k$ is the number of distinct observed times among all groups.

Let define the set functions

[TABLE]

for $i\in\{1,\dots,d\}$ , which denote the number of, respectively, exact and censored marginal observations in $A$ , with respect to group $i$ . We define $N_{i}^{e}(x)=m_{i}^{e}\left((x,\infty)\right)$ , $\,N_{i}^{c}(x)=m_{i}^{c}\left((x,\infty)\right)$ , for $i\in\{1,\dots,d\}$ and $\,n_{i,j}^{e}=m_{i}^{e}(\{T_{(j)}\})$ , $\,n_{i,j}^{c}=m_{i}^{c}(\{T_{(j)}\})$ , $\,\bar{n}_{i,j}^{e}=\sum_{r=j}^{k}n_{i,r}^{e}$ $\,\bar{n}_{i,j}^{c}=\sum_{r=j}^{k}n_{i,r}^{c}$ for $(i,j)\in\{1,\dots,d\}\times\{1,\dots,k\}$ ; and the corresponding vectors $\bar{\boldsymbol{n}}^{e}_{j}=(\bar{n}^{e}_{1,j},\dots,\bar{n}^{e}_{d,j})$ , $\bar{\boldsymbol{n}}^{c}_{j}=(\bar{n}^{c}_{1,j},\dots,\bar{n}^{c}_{d,j})$ , for $j\in\{1,\dots,k\}$ and $\boldsymbol{N}^{e}(x)=(N_{1}^{e}(x),\dots,N_{d}^{e}(x))$ , $\boldsymbol{N}^{c}(x)=(N_{1}^{c}(x),\dots,N_{d}^{c}(x))$ .

The next theorem determines the calculation of the posterior distribution for a vector of CRM’s given some censored data and it applies to general vectors of CRM’s. In particular, the assumption that the respective Lèvy intensity is homogeneous has been dropped.

Theorem 1.

Let $\boldsymbol{\mu}=(\mu_{1},\dots,\mu_{d})$ be a $d$ -variate CRM such that its corresponding Lèvy intensity $\nu(\boldsymbol{s},\mathrm{d}t)\mathrm{d}\boldsymbol{s}$ is differentiable with respect to $t_{0}$ on $\mathbb{R}^{+}\setminus\{0\}$ in the sense that for $\eta_{t}=\nu(\boldsymbol{s},(0,t])$ the partial derivative $\eta^{\prime}_{t_{0}}(\boldsymbol{s})=\partial{\left.\kern-1.2pt\eta_{t}(\boldsymbol{s})/\partial t\vphantom{\big{|}}\right|_{t=t_{0}}}$ exists. Moreover we assume that the entries of $\boldsymbol{\mu}$ are not independent. Then the posterior distribution of $\boldsymbol{\mu}$ given data $\boldsymbol{D}$ is the distribution of the random measure

[TABLE]

where

i)

$\boldsymbol{\mu}^{\star}=(\mu_{1}^{\star},\dots,\mu_{d}^{\star})$ * is a $d$ -variate CRM with Lévy intensity $\nu^{\star}$ such that*

[TABLE]

for $j\in\{1,\dots,k+1\}$ . 2. ii)

The vectors of jumps $\{(J_{1,j},\dots,J_{d,j})\}_{j\in J}$ , with $J=\{j\,:\,T_{(j)}\text{ is an exact observation}\}$ , are mutually independent and the vector of jumps corresponding to the exact observation $T_{(j)}$ has density

[TABLE] 3. iii)

*The random measure $\boldsymbol{\mu}^{\star}$ is independent of $\{(J_{1,j},\dots,J_{d,j})\}_{j\in J}$ , with *

$J=\{j\,:\,T_{(j)}\text{ is an exact observation}\}$ .

We refer to the Appendix A.4 for the proof. The previous result showcases that the posterior distribution arising from (12) can be modeled in the same framework via a vector of CRM’s by updating the prior vector of CRM’s $\boldsymbol{\mu}$ to $\boldsymbol{\mu}^{\star}$ as above.

This result is enough to provide a scheme for posterior inference. In particular, in the setting of (12) and Theorem 1, we want to estimate the corresponding survival function $\mathbb{P}\!\left[Y^{(1)}>t_{1},\dots,Y^{(d)}>t_{d}\,|(\mu_{1},\dots,\mu_{d})\right]$ when multiple samples information is available.

A natural approach in Bayesian nonparametrics is to marginalize over the infinite dimensional random element which characterizes the probability model. In our case, given censored data $\boldsymbol{D}$ , we calculate the mean of the survival function given the data by marginalizing over the vector of CRM’s $\boldsymbol{\mu}$ . As a result of Theorem 1, we can calculate such quantity. The next results allow us to implement the necessary inferential scheme for performing the estimation of the survival function as a posterior mean. We denote $\boldsymbol{e}_{i}$ for the canonical basis of $\mathbb{R}^{d}$ , and $S_{L}(t)=S(t\sum_{l\in L}\boldsymbol{e}_{l})$ for $t>0$ , $\emptyset\neq L\subset\{1,\dots,d\}$ . In view of the independent increments of the CRM’s, calculation of the posterior mean of $S_{L}$ is all that is needed for the evaluation of the posterior mean of $S$ . The next corollary shows how to evaluate the posterior mean of $S_{L}$ .

Corollary 1.

Let $\boldsymbol{\mu}$ be a vector of CRM’s with corresponding Lèvy intensity such that $\eta_{t}(\boldsymbol{s})=\gamma(t)\nu(\boldsymbol{s})$ with $\gamma$ a differentiable function satisfying $\gamma^{\prime}(t)\neq 0$ for $t>0$ . Moreover we assume that the entries of $\boldsymbol{\mu}$ are not independent. Let $\emptyset\neq L\subset\{1,\dots,d\}$ and set

[TABLE]

where $T_{(k+1)}=\infty$ . Then,

[TABLE]

where $T_{(0)}=0$ and for $\boldsymbol{\lambda}\in(\mathbb{R}^{+})^{d}$

[TABLE]

We see that we can estimate $S(\boldsymbol{t})$ for arbitrary $\boldsymbol{t}\in(\mathbb{R}^{+})^{d}$ in terms of the estimates defined in the previous corollary. Indeed, let $\boldsymbol{t}=(t_{1},\dots,t_{d})$ and $\pi$ be a permutation of $\{1,\dots,d\}$ such that $t_{\pi(1)}\leq t_{\pi(2)}\leq\dots\leq t_{\pi(d)}$ . We define, for $i\in\{1,\dots,d-1\}$ , the following sets

[TABLE]

From the independence of increments of CRM’s, it follows that the posterior mean of the survival function given censored data $\boldsymbol{D}$ is

[TABLE]

Usually, we deal with Lévy intensities which exhibit some dependences in a vector of hyper-parameters $\boldsymbol{c}$ . On the proof of Theorem 1, it is outlined how, given censored data $\boldsymbol{D}$ as before, we could derive the likelihood of the hyper-parameters in the Lévy intensity. This likelihood is necessary for implementing the inferential procedure and it is displayed in the next corollary.

Corollary 2.

Let $\boldsymbol{\mu}$ be a vector of CRM’s with corresponding Lèvy intensity such that $\eta_{t}(\boldsymbol{s})=\gamma(t)\rho_{d,\boldsymbol{c}}(\boldsymbol{s})$ with $\gamma$ a differentiable function satisfying $\gamma^{\prime}(t)\neq 0$ for $t>0$ , and $\boldsymbol{c}$ a vector of hyper-parameters. Given censored data $\boldsymbol{D}$ we get the likelihood on $\boldsymbol{c}$ .

[TABLE]

where $\psi_{d,\boldsymbol{c}}$ is the Laplace exponent associated to $\rho_{d,\boldsymbol{c}}$ .

The next lemma provides a useful identity for the computation of the integrals in Corollary 1 and Corollary 2.

Lemma 1.

For $\boldsymbol{q}=(q_{1},\dots,q_{d})\in(\mathbb{R}^{+})^{d}$ and $\boldsymbol{n}=(n_{1},\dots,n_{d})\in\mathbb{N}^{d}$

[TABLE]

We omit the proof as it is just an application of the binomial theorem in the same line as the proof of Lemma 3 in the appendix.

Remark.

The previous results highlights that the implementation of the inferential procedure depends on whether we can perform evaluations of the Laplace exponent or not.

4 Applications

In this section we perform the fitting of a multivariate survival function given censored to the right data in the framework of (12). As mentioned in the previous remark, the evaluation of the Laplace exponent of $\boldsymbol{\mu}$ in (12) is necessary to evaluate the posterior mean in Corollary 1 and the likelihood in Corollary 2; with this in mind, we choose the random measure $\boldsymbol{\mu}$ given by the Lévy intensity showcased in (9), so that the corresponding Laplace exponent is readily given by (10). For illustration purposes, we use 4-dimensional data arising from a distributional copula with fixed marginal distributions, see [18] for an overview of distributional copulas. More precisely, we generate simulated data $\boldsymbol{Y}=(Y_{1},...,Y_{4})$ with probability distribution $F_{\theta,\lambda}$ given by a distributional Clayton copula with parameter $\theta$ and exponential marginals with parameter $\lambda$ . Then, we perform right-censoring by considering censoring time variables $\boldsymbol{c}$ consisting of independent exponential random variables with parameter $\lambda_{c}$ , and define

[TABLE]

For fitting the data, we use the 4-dimensional Lévy intensity given by (9) and assign priors for the hyper-parameters in (9), $\sigma$ and $A$ . We choose a log-normal prior for the parameter A and a Beta prior for the parameter $\sigma$ . We use the Metropolis within Gibbs algorithm to draw samples from the posterior distributions of $A$ and $\sigma$ by making use of the likelihood showed in Corollary 2. We present a Monte Carlo approximation of the estimator (16), where we have averaged over the posterior draws of $A$ and $\sigma$ . A more in depth description of the simulation algorithm is given in Appendix A.5. In Figures 3 and 4 we show the fit for 150 possibly right censored observations as in (17). The simulated synthetic observations are such that

[TABLE]

We chose $\lambda_{c}=3.7$ so we have at least $75\%$ of exact observations for $\boldsymbol{T}$ in each dimension. The construction of $F_{\theta,\lambda}$ through a distributional Clayton allows us to calculate explicitly the associated survival function as showcased in Appendix A.6. We use the true survival function for comparison with the fitted survival functions. The estimated survival function are given by the posterior mean

[TABLE]

as in (16). The prior distributions of the hyperparameters are

[TABLE]

We ran $1000$ iterations for the associated Metropolis within Gibbs sampler. Figure 3 and Figure 4 show that the estimated survival functions approximate well the true functions. For comparison purposes, we presented a Kaplan-Meier estimator for the true survival function, see for example [1]. As there is no multivariate Kaplan-Meier, we use the next estimator for a multivariate survival function:

[TABLE]

where each $S_{\text{KM}}$ estimator is treated as a univariate Kaplan-Meier estimator restricted to the corresponding set of observations. In Figure 3 and Figure 4, we could appreciate that in the last subplots of each column the Kaplan-Meier can fit poorly as there are less observations on the conditioned Kaplan-Meier functions, as presented in the formula above.

Appendix

A.1 Proof of Proposition 1

Given $d\in\{2,3,\dots\}$ , we use the notation $\nu_{-i}(\boldsymbol{s})=\prod_{j=i+1}^{d}\nu_{j}(s_{j})$ and $\boldsymbol{U}_{k:d}(\boldsymbol{s})=\left(U_{k}(s_{1}),\dots,U_{d}(s_{d-k+1})\right)$ for $\boldsymbol{s}\in(\mathbb{R}^{+})^{d}$ . Furthermore we define integrals

[TABLE]

and

[TABLE]

where $k\in\{1,\dots,d\}$ , $m\in\{0,1,\dots,d\}$ and $\boldsymbol{\lambda}\in(\mathbb{R}^{+})^{d}$ such that $a_{0,d}(\boldsymbol{\lambda})<\infty$ ; we also define $\prod_{j=k}^{l}a_{j}=1\text{ when }k>l$ , and denote $\boldsymbol{x}_{-i}$ for the vector $\boldsymbol{x}$ without its $i$ -th entry.

An integration by parts shows that

[TABLE]

and in general for $r\in\{1,\dots,d\}$ we get the recursion formula

[TABLE]

We prove the next technical lemma

Lemma 2.

If $a_{0,d}(\boldsymbol{\lambda})<\infty$ then the next $d+1$ identities hold

[TABLE]

Proof.

We proceed by mathematical induction over the dimension $d$ . We observe that from the definition of $\kappa$ we always have

[TABLE]

For the case $d=2$ we have from Proposition 1 in [7] that

[TABLE]

And integrating by parts we obtain

[TABLE]

Therefore, we get the validity of the equations in (A.19) for the case $d=2$ . Now, suppose that (A.19) is true for $d=m-1$ , we must show the validity for $d=m$ . From the recursion formula (A.18) we get for $r\in\{0,1,\cdots,d\}$

[TABLE]

The validity of (A.19) for $d=m$ follows from the validity for $d=m-1$ and a combinatorial argument. ∎

Proposition 1 follows by considering the first equation in the Lemma statement and the definition of $a_{0,d}$ .

A.2 Proof of Proposition 2

Proof.

Using the independent increments property of CRM’s we get that

[TABLE]

∎

A.3 Proof of Proposition 3

For notation purposes, in this proof we use the shorthand $\mu(t)=\mu\left((0,t]\right)$ for a measure $\mu$ and positive real number $t$ .

Proof.

For the only if part we define $V_{i,j}=1-\mathbf{e}^{-[\mu_{i}(t_{i,j})-\mu_{i}(t_{i,j-1})]}$ for $i\in\{1,\dots,d\}$ and $j\in\{1,\dots,h\}$ so by supposing $(F_{1}(t_{1}),\dots,F_{d}(t_{d}))\stackrel{{\scriptstyle d}}{{=}}(1-\mathbf{e}^{-\mu_{1}(t_{1})},\dots,1-\mathbf{e}^{-\mu_{d}(t_{d})})$ we have

[TABLE]

We observe that for $i\in\{2,\dots,h\}$ and $r\in\{1,\dots,d\}$

[TABLE]

So for $i\in\{2,\dots d\}$

[TABLE]

Concluding the only if part.

For the if part we define $\mu_{i}(t)=-\log(1-F_{i}(t))$ for $i\in\{1,\dots,d\}$ and suppose for $h\in\{1,2,\dots\}$ , $\boldsymbol{t}_{1}=(t_{1,1},\dots,t_{d,1}),\dots,\boldsymbol{t}_{h}=(t_{1,h},\dots,t_{d,h})$ with $t_{0,i}=0<t_{1,i}<\cdots<t_{d,i}$ and $t_{j,0}=0<t_{j,1}<\cdots<t_{j,h}$ the existence of independent random vectors $(V_{1,1},\dots V_{d,1}),\dots,(V_{1,h},\dots V_{d,h})$ such that we have (15).

Marginalizing in $(\ref{ntr})$ , we can apply Theorem 3.1 of [5] to each $F_{i}$ so we obtain that $F_{i}\sim\text{NTR}(\mu_{i})$ for some CRM $\mu_{i}$ that is stochastically continuous, almost surely non-decreasing and has the appropriate limit behaviour.

We observe that

[TABLE]

Hence $(\mu_{1},\dots,\mu_{d})$ defines a vector of CRM’s. ∎

A.4 Proof of Theorem 1

This proof is not only restricted to the homogeneous Lévy intensity case; in this general setting, we recall that the Laplace exponent has the form (6). In order to prove the theorem we use the next technical lemma.

Lemma 3.

Let $(\mu_{1},\dots,\mu_{d})$ be a $d$ -variate CRM such that $\mu_{1},\dots,\mu_{d}$ are not independent and let the Lévy intensity $\nu(\boldsymbol{s},\mathrm{d}t)\mathrm{d}\boldsymbol{s}$ of $(\mu_{1},\dots,\mu_{d})$ be such that $\eta_{t}=\nu(\boldsymbol{x},(0,t])$ is differentiable with respect to $t\in\mathbb{R}^{+}$ at some $t_{0}\neq 0$ and denote $\eta^{\prime}_{t_{0}}(\boldsymbol{s})=\partial{\left.\kern-1.2pt\eta_{t}(\boldsymbol{s})/\partial t\vphantom{\big{|}}\right|_{t=t_{0}}}$ . If $\boldsymbol{q}=(q_{1},\dots,q_{d})\in\mathbb{N}^{d}$ are such that $\max\{q_{1},\dots,q_{d}\}\geq 1$ and $\boldsymbol{r}=(r_{1},\dots,r_{d})\in(\mathbb{R}^{+})^{d}$ are such that $\min\{r_{1},\dots,r_{d}\}\geq 1$ , then

[TABLE]

as $0<\epsilon\to 0$ , with $A_{\epsilon}=(t_{0}-\epsilon,t_{0}]$ for some $t_{0}\in\mathbb{R}^{+}\setminus\{0\}$ .

Proof.

We denote $\triangle_{s_{1}}^{s_{2}}f_{t}(\boldsymbol{r})=f_{s_{2}}(\boldsymbol{r})-f_{s_{1}}(\boldsymbol{r})$ for a function $f$ where $s_{1},s_{2}\in\mathbb{R}^{+}$ and $\boldsymbol{r}\in\mathbb{R}^{d}$ . We use the binomial theorem and apply expectation to write the left hand side in the equation above as

[TABLE]

We note that for $j_{i}\in\{0,\dots,x_{i}\}$ , $\,i\in\{1,\dots,d\}$ , $\boldsymbol{j}=(j_{1},\dots,j_{d})$ , a Taylor expansion yields

[TABLE]

Furthermore by the binomial theorem we get the next $d$ identities

(1)

$\displaystyle\sum_{i=1}^{d}\sum_{j=1}^{q}\binom{q}{j}(-1)^{j}(1-\mathbf{e}^{-js})=-\sum_{i=1}^{d}(1-\mathbf{e}^{-s})^{q}$

(2)

$\displaystyle\sum_{\stackrel{{\scriptstyle i_{1},i_{2}\in\{1,\dots,d\}}}{{i_{1}<i_{2}}}}\sum_{j_{1}=1}^{q_{i_{1}}}\sum_{j_{2}=1}^{q_{i_{2}}}\binom{q_{i_{1}}}{j_{1}}\binom{q_{i_{2}}}{j_{2}}(-1)^{j_{1}+j_{2}}(1-\mathbf{e}^{-j_{1}s_{i_{1}}-j_{2}s_{i_{2}}})\\ =\sum_{\stackrel{{\scriptstyle i_{1},i_{2}\in\{1,\dots,d\}}}{{i_{1}<i_{2}}}}\left\{(1-\mathbf{e}^{-s_{i_{1}}})^{q_{i_{1}}}+(1-\mathbf{e}^{-s_{i_{2}}})^{q_{i_{2}}}-(1-\mathbf{e}^{-s_{i_{1}}})^{q_{i_{1}}}(1-\mathbf{e}^{-s_{i_{2}}})^{q_{i_{2}}}\right\}$

⋮
(d-1)

$\displaystyle\sum_{\stackrel{{\scriptstyle i_{1},\dots,i_{d-1}\in\{1,\dots,d\}}}{{i_{1}<\cdots<i_{d-1}}}}\sum_{j_{1}=1}^{q_{i_{1}}}\cdots\sum_{j_{d-1}=1}^{q_{i_{d-1}}}\binom{q_{i_{1}}}{j_{1}}\cdots\binom{q_{i_{d-1}}}{j_{d-1}}(-1)^{j_{1}+\cdots+j_{d-1}}(1-\mathbf{e}^{-j_{1}s_{i_{1}}-\cdots-j_{d-1}s_{i_{d-1}}})\\ =\sum_{\stackrel{{\scriptstyle i_{1},\dots,i_{d-1}\in\{1,\dots,d\}}}{{i_{1}<\cdots<i_{d-1}}}}\Bigg{\{}(-1)^{d-1}\sum_{j=1}^{d-1}(1-\mathbf{e}^{-s_{i_{j}}})^{q_{i_{j}}}+\\ (-1)^{d-2}\sum_{\stackrel{{\scriptstyle j_{1},j_{2}\in\{i_{1},\dots,i_{d-1}\}}}{{j_{1}<j_{2}}}}(1-\mathbf{e}^{-s_{j_{1}}})^{q_{j_{1}}}(1-\mathbf{e}^{-s_{j_{2}}})^{q_{j_{2}}}+\cdots-(1-\mathbf{e}^{-s_{i_{1}}})^{q_{i_{1}}}\cdots(1-\mathbf{e}^{-s_{i_{d-1}}})^{q_{i_{d-1}}}\Bigg{\}}\\$

(d)

$\displaystyle\sum_{j_{1}=1}^{q_{1}}\cdots\sum_{j_{d}=1}^{q_{d}}\binom{q_{1}}{j_{1}}\cdots\binom{q_{d}}{j_{d}}(-1)^{\langle\boldsymbol{1},\boldsymbol{j}\rangle}(1-\mathbf{e}^{-\langle\boldsymbol{j},\boldsymbol{s}\rangle})=(-1)^{d}\sum_{j=1}^{d}(1-\mathbf{e}^{-s_{j}})^{q_{j}}+\\ (-1)^{d-1}\sum_{\stackrel{{\scriptstyle j_{1},j_{2}\in\{1,\dots,d\}}}{{j_{1}<j_{2}}}}(1-\mathbf{e}^{-s_{j_{1}}})^{q_{j_{1}}}(1-\mathbf{e}^{-s_{j_{2}}})^{q_{j_{2}}}+\cdots-(1-\mathbf{e}^{-s_{i_{1}}})^{q_{i_{1}}}\cdots(1-\mathbf{e}^{-s_{i_{d}}})^{q_{i_{d}}}$

So we have that (D.20) becomes

[TABLE]

∎

Define

[TABLE]

so that

[TABLE]

We observe that defining $T_{(0)}=0$ , $\;\bar{n}_{i,k+1}^{e}=0$ for $i\in\{1,\dots,d\}$ and selecting $\epsilon$ sufficiently small such that $t\not\in(T_{(j)}-\epsilon,T_{(j)})$ for all $j\in\{1,\dots,k\}$

[TABLE]

So defining

[TABLE]

We get from the independence property of CRM’s that

[TABLE]

We observe that for $r_{i}=\lambda_{i}\mathbbm{1}_{(0,t]}(T_{(j)})+\bar{n}_{i,j}^{c}+\bar{n}_{i,j+1}^{e}$ , $\,i\in\{1,\dots,d\}$ we have that $\min\{r_{1},\dots,r_{d}\}\geq 1$ and for $j\in\{1,\dots,k\}$ such that $T_{(j)}$ is an exact observation we have that $\max\{n_{1,j},\dots,n_{d,j}\}\geq 1$ so lemma 2 can be applied yielding

[TABLE]

On the other hand, for $j\not\in\mathcal{J}=\{j\,:\,T_{(j)}\text{ is an exact observation}\}$ we have $n_{i,j}^{e}=0$ so by the continuity of $\eta_{t}(\boldsymbol{s})$ in $t$ we have

[TABLE]

From (D.23), (D.24) and the independence property of CRM’s we obtain

[TABLE]

Also by continuity and independence, defining $\boldsymbol{\lambda}=(\lambda_{1},\dots,\lambda_{d})$ , we get

[TABLE]

So by (D.22), (D.24) and (D.23) we get that

[TABLE]

And similarly

[TABLE]

We set $T_{(k+1)}=\infty$ so we conclude

[TABLE]

A.5 Simulation Algorithm

We use a Metropolis within Gibbs sampler to draw simulations from $\sigma|\boldsymbol{\mathcal{D}}$ and $A|\boldsymbol{\mathcal{D}}$ as in Section 4. We recall that Corollary 2 gives the likelihood $l(\sigma,A;\boldsymbol{\mathcal{D}})$ and we denote $p_{\sigma}$ , $p_{A}$ for the prior distributions of $\sigma$ and $A$ as in Section 4. Given initial values $\sigma^{(0)}$ , $A^{0)}$ , the algorithm is as follows

(1)

Draw $A^{(i+1)}$ from a Metropolis-Hastings sampler with proposal distribution $g(x^{\prime}|x)\sim\text{Log-Norm}(\log(x),1)$ and target distribution

[TABLE] 2. (2)

Draw $\sigma^{(i+1)}$ from a Metropolis-Hastings sampler with Uniform proposal distribution and target distribution

[TABLE]

For the fits in Section 4 we used 100 iterations for each inner Metropolis-Hasting sampler and 1000 iterations for the overall Gibbs sampler.

A.6 Survival function of $F_{\theta,\lambda}$ .

Let $\mathcal{C}_{\theta,d}$ be a $d$ -dimensional distributional Clayton copula and $\tilde{F}_{i}$ , $i=1,\ldots,d$ , a collection of marginal cumulative distribution functions; then the survival function associated to the Clayton distributional copula and marginals is given by

[TABLE]

see Section 2.6 in [18].

Bibliography20

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Aalen et al. [2008] Aalen, O., Borgan, O., and Gjessing, H. (2008). Survival and event history analysis: a process point of view. Springer Science & Business Media.
2Cont and Tankov [2004] Cont, R. and Tankov, P. (2004). Financial modelling with jump processes . Chapman & Hall.
3de Finetti [1938] De Finetti, B. (1938). ’Sur la condition de ”equivalence partielle”’, Colloque consacré à la théorie des probabilités, Vol. VI, Université de Genève, Hermann et C. ie, Paris.
4De Iorio et al. [2009] De Iorio, M., Johnson, W. O., Müller, P., & Rosner, G. L. (2009). Bayesian nonparametric nonproportional hazards survival modeling. Biometrics , 65 , 762-771.
5Doksum [1974] Doksum, K. (1974). Tailfree and neutral random probabilities and their posterior distributions. The Annals of Probability , 2 , 183-201.
6Dykstra and Laud [1981] Dykstra, R. L. and Laud, P. (1981). A Bayesian nonparametric approach to reliability. The Annals of Statistics , 9 , 356-367.
7Epifani and Lijoi [2010] Epifani, I. and Lijoi. A. (2010). Nonparametric priors for vectors of survival functions. Statistica Sinica , 20 , 1455–1484.
8Ferguson and Phadia [1979] Ferguson, T. S., and Phadia, E. G. (1979). Bayesian nonparametric estimation based on censored data. The Annals of Statistics , 7 , 163-186.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Abstract

1 Introduction

2 Preliminaries

2.1 Exchangeability and Partial exchangeability

Definition 1**.**

Definition 2**.**

2.2 Vectors of completely random measures

Definition 3**.**

2.3 Positive Lévy copulas

Definition 4**.**

2.3.1 Working example

3 Main results

Proposition 1**.**

Proposition 2**.**

Proposition 3**.**

Theorem 1**.**

Corollary 1**.**

Corollary 2**.**

Lemma 1**.**

Remark**.**

4 Applications

Appendix

A.1 Proof of Proposition 1

Lemma 2**.**

Proof.

A.2 Proof of Proposition 2

Proof.

A.3 Proof of Proposition 3

Proof.

A.4 Proof of Theorem 1

Lemma 3**.**

Proof.

A.5 Simulation Algorithm

A.6 Survival function of Fθ,λF_{\theta,\lambda}Fθ,λ​.

Definition 1.

Definition 2.

Definition 3.

Definition 4.

Proposition 1.

Proposition 2.

Proposition 3.

Theorem 1.

Corollary 1.

Corollary 2.

Lemma 1.

Remark.

Lemma 2.

Lemma 3.

A.6 Survival function of $F_{\theta,\lambda}$ .