The Mixture of Markov Jump Processes: Monte Carlo Method and the EM   Estimation

H. Frydman; B.A. Surya

arXiv:1812.07730·math.ST·February 4, 2019

The Mixture of Markov Jump Processes: Monte Carlo Method and the EM Estimation

H. Frydman, B.A. Surya

PDF

Open Access

TL;DR

This paper introduces a Monte Carlo method and EM algorithm for statistical estimation of a mixture of Markov jump processes with unobservable regimes, enabling better modeling of complex stochastic systems.

Contribution

It provides a novel Monte Carlo simulation approach and an EM-based estimation procedure for a generalized mixture of Markov jump processes, extending previous models.

Findings

01

Monte Carlo method accurately simulates the process.

02

EM algorithm effectively estimates model parameters.

03

Numerical examples demonstrate method performance.

Abstract

This paper discusses tractable development and statistical estimation of a continuous time stochastic process with a finite state space having non-Markov property. The process is formed by a finite mixture of right-continuous Markov jump processes moving at different speeds on the same finite state space, whereas the speed regimes are assumed to be unobservable. The mixture was first proposed by Frydman (J. Am. Stat. Assoc., 100, 1046-1053, 2005) in 2005 and recently generalized in Surya (Stoch. Syst. 8, 29-44, 2018), in which distributional properties and explicit identities of the process are given in its full generality. The contribution of this paper is two fold. First, we present Monte Carlo method for constructing the process and show distributional equivalence between the simulated process and the actual process. Secondly, we perform statistical inference on the distribution…

Equations282

τ = in f {t \geq 0 : X_{t} = Δ} and \overline{F} (t) = P {τ > t} .

τ = in f {t \geq 0 : X_{t} = Δ} and \overline{F} (t) = P {τ > t} .

\mathbf{Q}=\left(\begin{array}[]{cc}\mathbf{T}&-\mathbf{T}\mathbb{1}\\ \mathbf{0}&0\\ \end{array}\right),

\mathbf{Q}=\left(\begin{array}[]{cc}\mathbf{T}&-\mathbf{T}\mathbb{1}\\ \mathbf{0}&0\\ \end{array}\right),

q_{ii} \leq 0, q_{ij} \geq 0, j \neq = i \sum q_{ij} = - q_{ii} = q_{i}, (i, j) \in S .

q_{ii} \leq 0, q_{ij} \geq 0, j \neq = i \sum q_{ij} = - q_{ii} = q_{i}, (i, j) \in S .

P (t) = exp (Q t), t \geq 0.

P (t) = exp (Q t), t \geq 0.

\overline{F} (t) = π^{⊤} e^{T t} 1 and f (t) = - π^{⊤} e^{T t} T 1.

\overline{F} (t) = π^{⊤} e^{T t} 1 and f (t) = - π^{⊤} e^{T t} T 1.

τ_{k} := in f {t \geq 0 : X_{t} \in Γ_{k}} .

τ_{k} := in f {t \geq 0 : X_{t} \in Γ_{k}} .

\begin{split}\overline{F}(t_{1},...,t_{p})=&\mathbb{P}\{\tau_{1}>t_{1},...,\tau_{p}>t_{p})\\ =&\boldsymbol{\pi}^{\top}\prod_{k=1}^{p}\exp\big{(}\mathbf{T}(t_{i_{k}}-t_{i_{k-1}})\big{)}\mathbf{H}_{i_{k}}\mathbb{1},\end{split}

\begin{split}\overline{F}(t_{1},...,t_{p})=&\mathbb{P}\{\tau_{1}>t_{1},...,\tau_{p}>t_{p})\\ =&\boldsymbol{\pi}^{\top}\prod_{k=1}^{p}\exp\big{(}\mathbf{T}(t_{i_{k}}-t_{i_{k-1}})\big{)}\mathbf{H}_{i_{k}}\mathbb{1},\end{split}

\begin{split}\overline{F}_{i,t}(t_{1},...,t_{p})=&\mathbb{P}\big{\{}\tau_{1}>t_{1},...,\tau_{p}>t_{p}\big{|}\mathcal{F}_{t,i}\big{\}}\\ \overline{F}_{t}(t_{1},...,t_{n})=&\mathbb{P}\big{\{}\tau_{1}>t_{1},...,\tau_{p}>t_{p}\big{|}\mathcal{G}_{t}\big{\}},\end{split}

\begin{split}\overline{F}_{i,t}(t_{1},...,t_{p})=&\mathbb{P}\big{\{}\tau_{1}>t_{1},...,\tau_{p}>t_{p}\big{|}\mathcal{F}_{t,i}\big{\}}\\ \overline{F}_{t}(t_{1},...,t_{n})=&\mathbb{P}\big{\{}\tau_{1}>t_{1},...,\tau_{p}>t_{p}\big{|}\mathcal{G}_{t}\big{\}},\end{split}

X = ⎩ ⎨ ⎧ X^{(1)}, ⋮ X^{(m)}, ϕ = 1 ϕ = m

X = ⎩ ⎨ ⎧ X^{(1)}, ⋮ X^{(m)}, ϕ = 1 ϕ = m

X (t) = k = 1 \sum M Φ^{(k)} X^{(k)} (t) with k = 1 \sum M Φ^{(k)} = 1.

X (t) = k = 1 \sum M Φ^{(k)} X^{(k)} (t) with k = 1 \sum M Φ^{(k)} = 1.

s_{i_{0}}^{(k)} = P {ϕ = k ∣ X_{0} = i_{0}} with k = 1 \sum m s_{i_{0}}^{(k)} = 1,

s_{i_{0}}^{(k)} = P {ϕ = k ∣ X_{0} = i_{0}} with k = 1 \sum m s_{i_{0}}^{(k)} = 1,

\begin{split}\mathbf{L}_{i,j}^{Q^{(k)}}(t):=\mathbb{P}\{\mathcal{F}_{t,j}|\phi=k,X_{0}=i\}=\prod_{l\in\mathbb{S}}\exp\big{(}-q_{l}^{(k)}T_{l}\big{)}\prod_{j\neq l,j\in\mathbb{S}}(q_{lj}^{(k)})^{N_{lj}},\end{split}

\begin{split}\mathbf{L}_{i,j}^{Q^{(k)}}(t):=\mathbb{P}\{\mathcal{F}_{t,j}|\phi=k,X_{0}=i\}=\prod_{l\in\mathbb{S}}\exp\big{(}-q_{l}^{(k)}T_{l}\big{)}\prod_{j\neq l,j\in\mathbb{S}}(q_{lj}^{(k)})^{N_{lj}},\end{split}

s_{j}^{(k)} (t) = P {ϕ = k ∣ F_{t, j}}, with k = 1 \sum m s_{j}^{(k)} (t) = 1, for j \in S, t \geq 0 .

s_{j}^{(k)} (t) = P {ϕ = k ∣ F_{t, j}}, with k = 1 \sum m s_{j}^{(k)} (t) = 1, for j \in S, t \geq 0 .

\widetilde{\mathbf{S}}^{(k)}(t)=\left(\begin{array}[]{cc}\mathbf{S}^{(k)}(t)&\mathbf{0}\\ \mathbf{0}&s_{n+1}^{(k)}(t)\\ \end{array}\right),\;\;\textrm{s.t.}\;\;\sum_{k=1}^{m}\widetilde{\mathbf{S}}^{(k)}(t)=\mathbf{I},\;\;\textrm{for}\;\;t\geq 0,

\widetilde{\mathbf{S}}^{(k)}(t)=\left(\begin{array}[]{cc}\mathbf{S}^{(k)}(t)&\mathbf{0}\\ \mathbf{0}&s_{n+1}^{(k)}(t)\\ \end{array}\right),\;\;\textrm{s.t.}\;\;\sum_{k=1}^{m}\widetilde{\mathbf{S}}^{(k)}(t)=\mathbf{I},\;\;\textrm{for}\;\;t\geq 0,

s_{j}^{(k)} (t) = \frac{π ^{⊤} S ^{(k)} L ^{Q^{(k)}} ( t ) e _{j}}{\sum _{k = 1}^{m} π ^{⊤} S ^{(k)} L ^{Q^{(k)}} ( t ) e _{j}}, k = 1, ..., m .

s_{j}^{(k)} (t) = \frac{π ^{⊤} S ^{(k)} L ^{Q^{(k)}} ( t ) e _{j}}{\sum _{k = 1}^{m} π ^{⊤} S ^{(k)} L ^{Q^{(k)}} ( t ) e _{j}}, k = 1, ..., m .

s_{j}^{(k)} (t) = \frac{s _{i_{0}}^{(k)} L _{i_{0}, j}^{Q^{(k)}} ( t )}{\sum _{k = 1}^{m} s _{i_{0}}^{(k)} L _{i_{0}, j}^{Q^{(k)}} ( t )} .

s_{j}^{(k)} (t) = \frac{s _{i_{0}}^{(k)} L _{i_{0}, j}^{Q^{(k)}} ( t )}{\sum _{k = 1}^{m} s _{i_{0}}^{(k)} L _{i_{0}, j}^{Q^{(k)}} ( t )} .

s_{j}^{(k)}(t)=\frac{\widetilde{\boldsymbol{\pi}}^{\top}\widetilde{\mathbf{S}}^{(k)}\exp\big{(}\mathbf{Q}^{(k)}t\big{)}\mathbf{e}_{j}}{\sum_{k=1}^{m}\widetilde{\boldsymbol{\pi}}^{\top}\widetilde{\mathbf{S}}^{(k)}\exp\big{(}\mathbf{Q^{(k)}}t\big{)}\mathbf{e}_{j}}.

s_{j}^{(k)}(t)=\frac{\widetilde{\boldsymbol{\pi}}^{\top}\widetilde{\mathbf{S}}^{(k)}\exp\big{(}\mathbf{Q}^{(k)}t\big{)}\mathbf{e}_{j}}{\sum_{k=1}^{m}\widetilde{\boldsymbol{\pi}}^{\top}\widetilde{\mathbf{S}}^{(k)}\exp\big{(}\mathbf{Q^{(k)}}t\big{)}\mathbf{e}_{j}}.

s_{j}^{(k)}(t)=\frac{\mathbf{e}_{i_{0}}^{\top}\widetilde{\mathbf{S}}^{(k)}\exp\big{(}\mathbf{Q}^{(k)}t\big{)}\mathbf{e}_{j}}{\sum_{k=1}^{m}\mathbf{e}_{i_{0}}^{\top}\widetilde{\mathbf{S}}^{(k)}\exp\big{(}\mathbf{Q}^{(k)}t\big{)}\mathbf{e}_{j}}.

s_{j}^{(k)}(t)=\frac{\mathbf{e}_{i_{0}}^{\top}\widetilde{\mathbf{S}}^{(k)}\exp\big{(}\mathbf{Q}^{(k)}t\big{)}\mathbf{e}_{j}}{\sum_{k=1}^{m}\mathbf{e}_{i_{0}}^{\top}\widetilde{\mathbf{S}}^{(k)}\exp\big{(}\mathbf{Q}^{(k)}t\big{)}\mathbf{e}_{j}}.

P {F_{t, j}, ϕ = k} =

P {F_{t, j}, ϕ = k} =

=

=

\displaystyle s_{j}^{(k)}(t)=\mathbb{P}\{\phi=k|\mathcal{F}_{t,j}\}=\frac{\mathbb{P}\{\mathcal{F}_{t,j},\phi=k\}}{\sum_{k=1}^{m}\mathbb{P}\{\mathcal{F}_{t,j},\phi=k\}}.{\mbox{\, \vspace{3mm}}}\hfill\mbox{$\square$}

\displaystyle s_{j}^{(k)}(t)=\mathbb{P}\{\phi=k|\mathcal{F}_{t,j}\}=\frac{\mathbb{P}\{\mathcal{F}_{t,j},\phi=k\}}{\sum_{k=1}^{m}\mathbb{P}\{\mathcal{F}_{t,j},\phi=k\}}.{\mbox{\, \vspace{3mm}}}\hfill\mbox{$\square$}

\exp\big{(}\mathbf{Q}^{(k)}t\big{)}=\sum_{l=1}^{n+1}\exp\big{(}\lambda_{l}^{(k)}t\big{)}\prod_{j=1,j\neq l}^{n+1}\Big{(}\frac{\mathbf{Q}^{(k)}-\lambda_{j}^{(k)}\mathbf{I}}{\lambda_{l}^{(k)}-\lambda_{j}^{(k)}}\Big{)},

\exp\big{(}\mathbf{Q}^{(k)}t\big{)}=\sum_{l=1}^{n+1}\exp\big{(}\lambda_{l}^{(k)}t\big{)}\prod_{j=1,j\neq l}^{n+1}\Big{(}\frac{\mathbf{Q}^{(k)}-\lambda_{j}^{(k)}\mathbf{I}}{\lambda_{l}^{(k)}-\lambda_{j}^{(k)}}\Big{)},

\lim_{t\rightarrow\infty}s_{j}^{(k)}(t)=\begin{cases}1,&\text{if $\overline{\lambda}=\lambda_{i_{k}}^{(k)}$}\\ \frac{\widetilde{\boldsymbol{\pi}}^{\top}\widetilde{\mathbf{S}}^{(k)}\mathcal{L}[\mathbf{Q}^{(k)}]\mathbf{e}_{j}}{\widetilde{\boldsymbol{\pi}}^{\top}\big{(}\widetilde{\mathbf{S}}^{(k)}\mathcal{L}[\mathbf{Q}^{(k)}]+\widetilde{\mathbf{S}}^{(l)}\mathcal{L}[\mathbf{Q}^{(l)}]\big{)}\mathbf{e}_{j}},&\text{if $\lambda_{i_{k}}^{(k)}=\lambda_{i_{l}}^{(l)}=\overline{\lambda},l\neq k$}\\ \frac{\widetilde{\boldsymbol{\pi}}^{\top}\widetilde{\mathbf{S}}^{(k)}\mathcal{L}[\mathbf{Q}^{(k)}]\mathbf{e}_{j}}{\sum_{k=1}^{m}\widetilde{\boldsymbol{\pi}}^{\top}\widetilde{\mathbf{S}}^{(k)}\mathcal{L}[\mathbf{Q}^{(k)}]\mathbf{e}_{j}},&\text{if $\lambda_{i_{k}}^{(k)}=\lambda_{i_{l}}^{(l)}=\overline{\lambda},\forall l\neq k$,}\end{cases}

\lim_{t\rightarrow\infty}s_{j}^{(k)}(t)=\begin{cases}1,&\text{if $\overline{\lambda}=\lambda_{i_{k}}^{(k)}$}\\ \frac{\widetilde{\boldsymbol{\pi}}^{\top}\widetilde{\mathbf{S}}^{(k)}\mathcal{L}[\mathbf{Q}^{(k)}]\mathbf{e}_{j}}{\widetilde{\boldsymbol{\pi}}^{\top}\big{(}\widetilde{\mathbf{S}}^{(k)}\mathcal{L}[\mathbf{Q}^{(k)}]+\widetilde{\mathbf{S}}^{(l)}\mathcal{L}[\mathbf{Q}^{(l)}]\big{)}\mathbf{e}_{j}},&\text{if $\lambda_{i_{k}}^{(k)}=\lambda_{i_{l}}^{(l)}=\overline{\lambda},l\neq k$}\\ \frac{\widetilde{\boldsymbol{\pi}}^{\top}\widetilde{\mathbf{S}}^{(k)}\mathcal{L}[\mathbf{Q}^{(k)}]\mathbf{e}_{j}}{\sum_{k=1}^{m}\widetilde{\boldsymbol{\pi}}^{\top}\widetilde{\mathbf{S}}^{(k)}\mathcal{L}[\mathbf{Q}^{(k)}]\mathbf{e}_{j}},&\text{if $\lambda_{i_{k}}^{(k)}=\lambda_{i_{l}}^{(l)}=\overline{\lambda},\forall l\neq k$,}\end{cases}

π_{j} (t) = \frac{\sum _{k = 1}^{m} π ^{⊤} S ^{(k)} L ^{Q^{(k)}} ( t ) e _{j}}{\sum _{k = 1}^{m} π ^{⊤} S ^{(k)} L ^{Q^{(k)}} ( t ) 1} .

π_{j} (t) = \frac{\sum _{k = 1}^{m} π ^{⊤} S ^{(k)} L ^{Q^{(k)}} ( t ) e _{j}}{\sum _{k = 1}^{m} π ^{⊤} S ^{(k)} L ^{Q^{(k)}} ( t ) 1} .

π_{j} (t) = \frac{\sum _{k = 1}^{m} s _{i_{0}}^{(k)} L _{i_{0}, j}^{Q^{(k)}} ( t )}{\sum _{j \in S} \sum _{k = 1}^{m} s _{i_{0}}^{(k)} L _{i_{0}, j}^{Q^{(k)}} ( t )} .

π_{j} (t) = \frac{\sum _{k = 1}^{m} s _{i_{0}}^{(k)} L _{i_{0}, j}^{Q^{(k)}} ( t )}{\sum _{j \in S} \sum _{k = 1}^{m} s _{i_{0}}^{(k)} L _{i_{0}, j}^{Q^{(k)}} ( t )} .

\displaystyle\widetilde{\pi}_{j}(t)=\sum_{k=1}^{m}\widetilde{\boldsymbol{\pi}}^{\top}\widetilde{\mathbf{S}}^{(k)}\exp\big{(}\mathbf{Q}^{(k)}t\big{)}\mathbf{e}_{j}.

\displaystyle\widetilde{\pi}_{j}(t)=\sum_{k=1}^{m}\widetilde{\boldsymbol{\pi}}^{\top}\widetilde{\mathbf{S}}^{(k)}\exp\big{(}\mathbf{Q}^{(k)}t\big{)}\mathbf{e}_{j}.

\displaystyle\widetilde{\pi}_{j}(t)=\sum_{k=1}^{m}\mathbf{e}_{i_{0}}^{\top}\widetilde{\mathbf{S}}^{(k)}\exp\big{(}\mathbf{Q}^{(k)}t\big{)}\mathbf{e}_{j}.

\displaystyle\widetilde{\pi}_{j}(t)=\sum_{k=1}^{m}\mathbf{e}_{i_{0}}^{\top}\widetilde{\mathbf{S}}^{(k)}\exp\big{(}\mathbf{Q}^{(k)}t\big{)}\mathbf{e}_{j}.

P {F_{t, j}, ϕ = k, X_{0} = i} =

P {F_{t, j}, ϕ = k, X_{0} = i} =

=

P {F_{t, j}} =

P {F_{t, j}} =

π_{j} (t) = P {X_{t} = j ∣ G_{t}} = \frac{P { F _{t, j} }}{\sum _{k \in S} P { F _{t, k} }},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Distribution Estimation and Applications · Probabilistic and Robust Engineering Design · Advanced Statistical Process Monitoring

Full text

Conditional Joint Probability Distributions for the Mixture of Markov Jump Processes

B.A. Surya111School of Mathematics and Statistics, Victoria University of Wellington, Gate 6 Kelburn PDE, Wellington 6140, New Zealand. Email address: [email protected]

School of Mathematics and Statistics

Victoria University of Wellington, New Zealand

(12 May 2018)

Abstract

New results on conditional joint probability distributions of first exit times are presented for a continuous-time stochastic process defined as the mixture of Markov jump processes moving at different speeds on the same finite state space, while the mixture occurs at a random time. Such mixture was first proposed by Frydman [21] and Frydman and Schuermann [20] as a generalization of the mover-stayer model of Blumen et at. [17], and was recently extended in Surya [37], in which further explicit distributional identities of the process are given, in particular in the presence of an absorbing state. We revisit [37] for a finite mixture with different overlapping absorbing sets. The contribution of this paper is two fold. First, we generalize distributional properties of the mixture process discussed in [21], [20] and [37]. Secondly, we give distributional identities of the first exit times to the absorbing sets of the process explicitly in terms of the intensity matrices of the underlying Markov processes and the Bayesian updates of switching probability and the probability distribution of states, despite the fact that the process itself is non-Markov. They form non-stationary functions of time and have the ability to capture heterogeneity and path dependence when conditioning on the available information (either full or partial) of the process. In particular, the initial profile of the distributions forms a generalized mixture of the multivariate phase-type distributions of Assaf et al. [8]. When the underlying processes move at the same speed, in which case the mixture becomes a simple Markov process, these features are removed, and the initial distributions reduce to [8]. Some explicit and numerical examples are discussed to illustrate the main results.

MSC2010 Subject Classification: 60J20, 60J27, 60J28, 62N99

Keywords: Markov jump process, mixture of Markov jump processes, first exit times, conditional multivariate distributions, phase-type model

1 Introduction

Markov process has been one of the most important probabilistic tools in modeling complex stochastic systems dynamics. It has been widely used in variety of applications across various fields such as, among others, in modeling vegetation dynamics (Balzter [11]), demography (Nowak [32]), in marketing to model consumer relationship (Berger and Nasr [13] and Pfeifer and Carraway [33]) and to identify substitutions behavior of customers in assortment problem (Blanchet et al. [16]), in describing credit rating transitions used in many credit risk and pricing applications (Jarrow and Turnbull [26], Jarrow et al. [25], Bielecki et al. [14]), in queueing networks and performance engineering (Bolch et al. [18]).

One of the key variables in the analysis of stochastic systems is the time until an event occurs (the lifetime of systems), for example, the lifetime of a corporate bond [25], customer relationship (Ma et al. [29]), networks [18], etc. It represents the first exit time to an absorbing set of the underlying Markov process. Its distribution is usually referred to as the phase-type distribution, which was first introduced in univariate form by Neuts [31] in 1975 as generalization of Erlang distribution. It has dense property, which can approximate any distribution of positive random variables arbitrarily well, and has closure property under finite convex mixtures and convolutions. When the jumps of compound Poisson process has phase-type distribution, it results in a dense class of Lévy processes, see Asmussen [6]. The advantage of working under phase-type distribution is that it allows some analytically tractable results in applications. To mention some, in option pricing (Asmussen et al. [5]), actuarial science (Albrecher and Asmussen [7], Rolski et al. [35], Zadeh et al. [40]), in survival analysis (Aalen [2], Aalen and Gjessing [1]), in queueing theory (Chakravarthy and Neuts [19], Asmussen [6]), in reliability theory (Assaf and Levikson [9], Okamura and Dohi [38]).

The phase-type distribution $\overline{F}$ is expressed in terms of a Markov jump process $\{X_{t}\}_{t\geq 0}$ with a finite state space $\mathbb{S}=E\cup\{\Delta\}$ , where for some integer $n\geq 1$ , $E=\{i:i=1,...,n\}$ and $\Delta$ represent respectively the transient and absorbing states. We also refer to $\Delta$ as the $(n+1)$ th element of $\mathbb{S}$ , i.e., $\Delta=n+1$ . The first exit time of $X$ to the absorbing state and its distribution are defined by

[TABLE]

In view of credit risk applications, the state space $\mathbb{S}$ represents the possible credit classes, with $1$ being the highest (Aaa in Moody’s rankings) and $n$ being the lowest (C in Moody’s rankings), whilst the absorbing state $\Delta$ represents bankruptcy, D. The distribution $\pi_{k}$ represents the proportion of homogeneous bonds in the rating $k$ . We refer to [26] and [25] and literature therein for details.

Unless stated otherwise, we denote by $\widetilde{\boldsymbol{\pi}}=(\boldsymbol{\pi},\pi_{\Delta})$ the initial probability of starting $X$ in any of the $n+1$ phases. For simplicity, we assume that $\pi_{\Delta}=0$ , so that $\mathbb{P}\{\tau>0\}=1$ . The speed at which the Markov process moves along the state space $\mathbb{S}$ is described by an intensity matrix $\mathbf{Q}$ . This matrix has block partition according to the process moving in the transient state $E$ and in the absorbing state $\Delta$ , which admits the following block-partitioned form:

[TABLE]

with $\mathbb{1}=(1,...,1)^{\top}$ , as the rows of the intensity matrix $\mathbf{Q}$ sums to zero. That is to say that the entry $q_{ij}$ of the matrix $\mathbf{Q}$ satisfies the following properties:

[TABLE]

See Chapter II of Asmussen [6] for more details on the Markov jump processes. Since the states $E$ is transient and that $-\mathbf{T}\mathbb{1}$ is a non-negative vector and $\mathbb{1}^{\top}\mathbf{T}\mathbb{1}<0$ , the condition (1.3) implies that $\mathbf{T}$ is a negative definite matrix. See Section II4d of [6]. The matrix $\mathbf{T}$ is known as the phase generator matrix of $\mathbf{Q}$ . The absorption is certain if and only if $\mathbf{T}$ is nonsingular, see Neuts [30].

Following Theorem 3.4 and Corollary 3.5 in [6] and by the homogeneity of $X$ , the transition probability matrix $\mathbf{P}(t)$ of $X$ over the period of time $(0,t)$ is

[TABLE]

The entry $q_{ij}$ has probabilistic interpretation: $1/(-q_{ii})$ is the expected length of time that $X$ remains in state $i\in E$ , and $q_{ij}/q_{i}$ is the probability that when a transition out of state $i$ occurs, it is to state $j\in\mathbb{S}$ , $j\neq i$ . The representation of the distribution $\overline{F}$ is uniquely specified by $(\boldsymbol{\pi},\mathbf{T})$ . We refer among others to Neuts [30] and Asmussen [6] for details. Following [30] and Proposition 4.1 [6],

[TABLE]

The extension of (1.5) to multivariate form was proposed by Assaf et al. [8] and later by Kulkarni [27]. Following [8], let $\Gamma_{1},...,\Gamma_{p}$ be nonempty stochastically closed subsets of $\mathbb{S}$ such that $\cap_{k=1}^{p}\Gamma_{k}$ is a proper subset of $\mathbb{S}$ . ( $\Gamma_{i}\subset\mathbb{S}$ is said to be stochastically closed if once $X$ enters $\Gamma_{i}$ , it never leaves.) We assume without loss of generality that $\cap_{k=1}^{p}\Gamma_{k}$ consists of only the absorbing state $\Delta$ , i.e., $\cap_{k=1}^{p}\Gamma_{k}=\Delta$ . Since $\Gamma_{k}$ is stochastically closed, necessarily $q_{ij}=0$ if $i\in\Gamma_{k}$ and $j\in\Gamma_{k}^{c}$ .

The first exit time of $X$ to the stochastically closed set $\Gamma_{k}$ is defined by

[TABLE]

The joint distribution $\overline{F}$ of $\{\tau_{k}\}$ is called the multivariate phase type distribution, see [8]. Let $t_{i_{p}}\geq\dots\geq t_{i_{1}}\geq 0$ be the ordering of $(t_{1},...,t_{p})\in\mathbb{R}_{+}^{p}$ . Following [8],

[TABLE]

where $\mathbf{H}_{i_{k}}$ is $(n\times n)$ diagonal matrix whose $i$ th diagonal element, for $i=1,...,n$ , equals $1$ when $i\in\Gamma_{i_{k}}^{c}$ and is zero otherwise. As before, we assume that $\widetilde{\boldsymbol{\pi}}$ has zero mass on $\Delta$ and $\pi_{i}\neq 0$ for $i\in\bigcap_{k=1}^{p}\Gamma_{k}^{c}$ implying that $\mathbb{P}\{\tau_{1}>0,...,\tau_{p}>0)=1$ .

The multivariate distribution (1.7) has found various applications, e.g., in modeling credit default contagion (Herbertsson [24], Bielecki et al. [14]), in modeling aggregate loss distribution in insurance (Berdel and Hipp [12], Asimit and Jones [3] and Willmot and Woo [39]), and in Queueing theory (Badila et al. [10]).

Due to spatial homogeneity of the Markov process, the distributions (1.5) and (1.7) have stationary property and are unable to capture heterogeneity and available information of its past. In their empirical works, Frydman [21], Frydman and Schuermann [20] found that bonds of the same credit rating, represented by the state space of the Markov process, can move at different speeds to other ratings. In addition to this observation, the inclusion of past credit ratings improves out-of-sample prediction of the Nelson-Aalen estimate of credit default intensity. These empirical findings suggest that the credit rating dynamics [25] can be represented by a mixture of Markov jump processes moving at different speeds, where the mixture itself is non-Markov. However, the analyses performed in [21] and [20] were based on a special structure of mixture process where each underlying process has the same probability of leaving a state to another. Surya [37] revisited the mixture model [21], [20] and gave further explicit distributional identities of the mixture, in particular in the presence of an absorbing state.

This paper attempts to extend [37] by relaxing the assumptions [21], [20] for a finite mixture of Markov jump processes with overlapping absorbing sets moving at different speeds. The main contribution of this paper is two fold. First, we give distributional properties of the mixture process $X$ for general case, in particular on the Bayesian update of probability distribution of starting the process at any given time $t\geq 0$ . Secondly, we derive the joint probability distributions of the first exit times $\{\tau_{k}\}$ (1.6) of $X$ , conditional on the available (either full or partial) information $\mathcal{F}_{t,i}=\mathcal{F}_{t-}\cup\{X_{t}=i\}$ , with $\mathcal{F}_{t-}=\{X_{s}:0\leq s\leq t-\}$ , of the process. Using the results, we derive the joint probability distributions of $\{\tau_{k}\}$ conditional on the information $\mathcal{G}_{t}:=\mathcal{F}_{t-}\cup\{X_{t}\neq\Delta\}$ knowing all previous observations of the process and given that it is still ”alive” at a given time $t\geq 0$ , i.e., $\mathcal{G}_{t}=\bigcup_{i\in E}\mathcal{F}_{t,i}$ . We write $\mathcal{G}_{t}=\mathcal{F}_{t-}$ if the only available information is the past observation $\mathcal{F}_{t-}$ . Conditional on $\mathcal{F}_{t,i}$ and $\mathcal{G}_{t}$ , we derive explicit formula for

[TABLE]

for the mixture process $X$ , with $n\geq 1$ , $i\in E\subseteq\mathbb{S}$ and $0\leq t\leq\min\{t_{1},...,t_{p}\}$ . Unless the underlying Markov processes move at the same speed, we show that the initial profile of the joint distributions (1.8) forms a generalized mixture of (1.7). Under partial information, given the process is still alive in the long run, we give the corresponding limiting (stationary) distributions of (1.8) as $t\rightarrow\infty$ .

From the credit risk point of view (see for e.g. [25], [15], [14], [24]), the quantity $\overline{F}_{i,t}(t_{1},...,t_{p})$ describes the joint probability distribution of first exit times $\{\tau_{k}\}$ of $i-$ rated bonds, due to cause-specific of exits (default, prepayment, calling back, debt retirement, etc), conditional on the credit rating history up to a given time $t$ , whilst the function $\overline{F}_{t}(t_{1},...,t_{p})$ determines the joint probability distribution of the bonds’ exit times $\{\tau_{k}\}$ across credit ratings viewed at time $t$ . In the framework of competing risks (see for e.g. Pintilie [34]), for the observed exit time $\tau:=\min\{\tau_{1},...,\tau_{p}\}$ and the reason of exit $\boldsymbol{\xi}=\textrm{argmin}\{\tau_{1},...,\tau_{p}\}$ , the probability $\mathbb{P}\{t\leq\tau\leq s,\boldsymbol{\xi}=1|\mathcal{F}_{t,i}\}$ determines the proportion of $i-$ rated bonds exiting by type $1$ from the credit portfolio within time interval $[t,s]$ , whilst $\mathbb{P}\{t\leq\tau\leq s,\boldsymbol{\xi}=1|\mathcal{G}_{t}\}$ represents the percentage of bonds exiting by type $1$ .

The organization of this paper is as follows. Section 2 discusses distributional properties of the Markov mixture process which generalize the results of [20] and [37], in particular on the Bayesian update on the probability of starting the process in any state at given time $t\geq 0$ . The main contributions of this paper are given in Section 3, where explicit forms of the conditional probability distributions and their Laplace transforms are presented. Some explicit examples are discussed in Section 4, in which we show that the exit times $\{\tau_{k}\}$ are independent under the Markov model, but not necessarily for the mixture model. Also in this section, we discuss numerical examples of the main results for bivariate distributions of birth-death mixture processes. Section 5 concludes this paper.

2 Mixture of Markov jump processes

Throughout the remaining of this paper we denote by $X=\{X_{t}^{(\phi)},t\geq 0\}$ the Markov mixture process, which is a continuous-time stochastic process defined as a finite mixture of Markov jump processes $X^{(k)}=\{X_{t}^{(k)}:t\geq 0\}$ , with $k=1,\dots,m$ , whose intensity matrices are given by $\{\mathbf{Q}^{(k)}\}$ . We assume that the underlying Markov processes $\{X^{(k)}\}$ have right-continuous sample paths, and are defined on the same finite state space $\mathbb{S}=\{1,\dots,n+1\}$ . It is defined by

[TABLE]

where the variable $\phi$ represents the speed regimes, assumed to be unobservable.

More conveniently, we can represent $X$ in terms of the underlying processes $\{X^{(k)}\}$ as follows. Define an indicator variable $\Phi^{(k)}=\mathbb{1}_{\{\phi=k\}}$ . Thus, for $t\geq 0$ ,

[TABLE]

It is clear that $X$ (2.1) represents a finite mixture of Markov processes $X^{(k)}$ .

For a given initial state $i_{0}\in\mathbb{S}$ , there is a separate mixing probability

[TABLE]

and $0\leq s_{i_{0}}^{(k)}\leq 1$ . The quantity $s_{i_{0}}^{(k)}$ has the interpretation as the proportion of population (e.g. bonds) with initial state $i_{0}$ evolving w.r.t to $X^{(k)}$ . In general, $X^{(k)}$ and $X^{(l)}$ , $k\neq l$ , have different expected length of occupation time of a state $i$ , i.e., $1/q_{i}^{(k)}\neq 1/q_{i}^{(l)}$ , and have different probability of leaving the state $i\in E$ to state $j\in\mathbb{S}$ , $j\neq i$ , i.e. $q_{ij}^{(k)}/q_{i}^{(k)}\neq q_{ij}^{(l)}/q_{i}^{(l)}$ . Note that we have used $q_{i}^{(k)}$ and $q_{ij}^{(k)}$ to denote the negative of the $i$ th diagonal element and the $(i,j)$ entry of $\mathbf{Q}^{(k)}$ .

Markov mixture process is a generalization of mover-stayer model, a mixture of two discrete-time Markov chains proposed by Blumen et al [17] in 1955 to model population heterogeneity in jobs labor market. In the mover-stayer model [17], the population of workers consists of stayers (workers who always stay in the same job category, $\mathbf{Q}^{(1)}=\mathbf{0}$ ) and movers (workers who move to other job according to a stationary Markov chain with intensity matrix $\mathbf{Q}^{(2)}$ ). Estimation of the mover-stayer model was discussed in Frydman [22]. Frydman [21] generalized the model to a finite mixture of Markov chains moving with different speeds. Frydman and Schuermann [20] later on used the result for the mixture of two Markov jump processes moving with intensity matrices $\mathbf{Q}^{(1)}$ and $\mathbf{Q}^{(2)}=\boldsymbol{\Psi}\mathbf{Q}^{(1)}$ , where $\boldsymbol{\Psi}$ is a diagonal matrix, to model the dynamics of firms’ credit ratings. Depending on whether $0=\psi_{i}:=[\boldsymbol{\Psi}]_{i,i}$ , $0<\psi_{i}<1$ , $\psi_{i}>1$ or $\psi_{i}=1$ , $X^{(2)}$ never moves out of state $i$ (the mover-stayer model), moves out of state $i$ at lower rate, higher rate or at the same rate, subsequently, than that of $X^{(1)}$ . If $\psi_{i}=1$ , for all $i\in\mathbb{S}$ , the mixture process $X$ reduces to a simple Markov jump process $X^{(1)}$ .

Figure 1 illustrates the transition of $X$ for the mixture of two Markov jump processes moving from state $J_{1}$ to $J_{2}$ , and vice versa. When $X$ is observed in state $J_{1}$ , it would stay in the state for an exponential period of time with intensity $q_{j_{1}}^{(1)}$ or $q_{j_{1}}^{(2)}$ before moving to $J_{2}$ with probability $q_{j_{1},j_{2}}^{(1)}/q_{j_{1}}^{(1)}$ or $q_{j_{1},j_{2}}^{(2)}/q_{j_{1}}^{(2)}$ , depending on whether it is either driven by the underlying Markov process $X^{(1)}$ or $X^{(2)}$ .

2.1 Distributional properties

Recall that the process $X$ (2.1) repeatedly changes its speed randomly in time according to the speed rate $\mathbf{Q}^{(k)}$ . The speed regime, represented by the variable $\phi$ , is however not directly observable; we can not classify from which regime the observed process $X$ came from. However, it can be identified based on available information of the process. We denote by $\mathcal{F}_{t-}$ all previous information about $X$ prior to time $t\geq 0$ , and by $\mathcal{F}_{t,i}=\mathcal{F}_{t-}\cup\{X_{t}=i\}$ , $i\in\mathbb{S}$ . The set $\mathcal{F}_{t-}$ may contain full, partial information or maybe nothing about the past of $X$ .

The likelihood of observing the past realization $\mathcal{F}_{t,j}$ of $X$ moving according to the process $X^{(k)}$ conditional on knowing its initial state $i$ is defined by

[TABLE]

where in the expression above we have denoted subsequently by $T_{l}$ and $N_{lj}$ the total time the observed process $X$ spent in state $l\in\mathbb{S}$ for $\mathcal{F}_{t,j}$ , and the number of transitions from state $l$ to state $j$ , with $j\neq l$ , observed in the information set $\mathcal{F}_{t,j}$ ; whereas $q_{lj}^{(k)}$ represents the $(l,j)-$ entry of the intensity matrix $\mathbf{Q}^{(k)}$ .

2.1.1 Bayesian updates of switching probability

The Bayesian updates of switching probability $s_{j}(t)$ of $X$ (2.1) is defined by

[TABLE]

It represents the proportion of those in state $j$ moving according to $X^{(k)}$ . Note that $s_{j}^{(k)}(0)=s_{j}^{(k)}$ (2.2). Denote by $\widetilde{\mathbf{S}}^{(k)}(t)$ , $t\geq 0$ , a diagonal matrix defined by

[TABLE]

where we have denoted by $\mathbf{I}$ an $(n+1)\times(n+1)-$ identity matrix, with $\mathbf{S}^{(k)}(t)=\mathrm{diag}(s_{1}^{(k)}(t),s_{2}^{(k)}(t),...,s_{n}^{(k)}(t))$ , representing switching probability matrix of $X$ .

For $t=0$ , in which case $\mathcal{F}_{t,j}=\{X_{0}=j\}$ , we write $\widetilde{\mathbf{S}}^{(k)}:=\widetilde{\mathbf{S}}^{(k)}(0)$ , $\mathbf{S}^{(k)}:=\mathbf{S}^{(k)}(0)$ . The element $s_{j}^{(k)}(t)$ , $j\in\mathbb{S}$ , of the intensity matrix $\widetilde{\mathbf{S}}^{(k)}(t)$ is given below.

Proposition 2.1

Let $\widetilde{\boldsymbol{\pi}}$ be the initial probability of starting the Markov mixture process $X$ (2.1) on a finite state space $\mathbb{S}$ . Define by $\mathbf{L}^{Q^{(k)}}(t)$ the likelihood matrix whose $(i,j)$ element $\mathbf{L}_{i,j}^{Q^{(k)}}(t)$ is defined in (2.3). Then, for $j\in\mathbb{S}$ and $t\geq 0$ ,

[TABLE]

To be more precise, depending on availability of information set $\mathcal{F}_{t-},$ we have:

(i)

Under full information $\mathcal{F}_{t,j}=\{X_{s},0\leq s\leq t-\}\cup\{X_{t}=j\}$ that

[TABLE] 2. (ii)

Under partial information $\mathcal{F}_{t,j}=\{X_{t}=j\}$ , $s_{j}(t)$ is defined by

[TABLE] 3. (iii)

Under partial information $\mathcal{F}_{t,j}=\{X_{0}=i_{0}\}\cup\{X_{t}=j\}$ , $s_{j}(t)$ is given by,

[TABLE]

The expression (2.6) generalizes the result of [20] and Lemma 3.1 in [37].

Note that we have used slightly different notations for the likelihood function (2.3) and the switching probability (2.6) from that of used in [20] and [37].

*Proof *[Proposition 2.1] By the law of total probability and the Bayes’ formula,

[TABLE]

The claim in (2.6) is finally established on account of the Bayes’ formula:

[TABLE]

If $\{\mathbf{Q}^{(k)}\}$ have distinct eigenvalues $\{\lambda_{j}^{(k)}:j=1,\dots,n+1\}$ , it can be proved similar to the Proposition 3.2 in [37] using the Lagrange-Sylvester formula

[TABLE]

see Theorem 2 of Apostol [4], that, under partial information, the probability $s_{j}^{(k)}(t)\rightarrow 1$ in the long-run, as $t\rightarrow\infty$ , implying that $X$ moves according to $X^{(k)}$ . The result can be used to deduce the stationary distribution of (1.8) as $t\rightarrow\infty.$

Proposition 2.2

Let $\{\mathbf{Q}^{(k)}\}$ have distinct eigenvalues $\{\lambda_{j}^{(k)}:j\in\mathbb{S}\},$ with $\lambda_{i_{k}}^{(k)}=\max\{\lambda_{j}^{(k)},j\in\mathbb{S}\},$ $i_{k}=\textrm{argmax}_{j}\{\lambda_{j}^{(k)}\}$ . Define $\overline{\lambda}=\max\{\lambda_{i_{k}}^{(k)}\}.$ For $j\in\mathbb{S}$ ,

[TABLE]

where $\mathcal{L}[\mathbf{Q}^{(k)}]=\prod\limits_{j=1,j\neq i_{k}}^{n+1}\Big{(}\frac{\mathbf{Q}^{(k)}-\lambda_{j}^{(k)}\mathbf{I}}{\lambda_{i_{k}}^{(k)}-\lambda_{j}^{(k)}}\Big{)}$ is the Lagrange interpolation coefficient.

It is clear following the above that when the intensity matrices $\{\mathbf{Q}^{(k)}\}$ take the form of (1.2), (2.8) reduces to the results of Proposition 3.2 of [37].

In the section below we derive the Bayesian updates $\widetilde{\boldsymbol{\pi}}(t)$ on the probability of starting $X$ at a given time $t\geq 0$ and available information of the process.

2.1.2 Bayesian updates of probability distribution $\widetilde{\boldsymbol{\pi}}$

The following proposition and its corollary provide Bayesian updates $\widetilde{\pi}_{j}(t)$ on finding $X$ in any state $j\in\mathbb{S}$ at a given time $t\geq 0$ based on all previous observations $\mathcal{F}_{t-}$ of the process and knowing that it is still ”alive” at time $t$ .

Proposition 2.3

Let $\mathcal{G}_{t}=\mathcal{F}_{t-}$ . Define $\pi_{j}(t)=\mathbb{P}\{X_{t}=j|\mathcal{G}_{t}\}$ for $j\in\mathbb{S},t\geq 0$ .

[TABLE]

(i)

Given all previous observations $\mathcal{F}_{t-}=\{X_{s},0\leq s\leq t-\}$ , we have

[TABLE] 2. (ii)

If $\mathcal{F}_{t-}=\emptyset$ , it follows from (2.3) that $\mathbf{L}^{Q^{(k)}}(t)=\exp\big{(}\mathbf{Q}^{(k)}t\big{)}$ . Then,

[TABLE] 3. (iii)

If $\mathcal{F}_{t-}=\{X_{0}=i_{0}\}$ , it follows from the above that $\pi_{j}(t)$ is given by

[TABLE]

Notice that $0<\widetilde{\pi}_{E}(t)<1$ , $\widetilde{\pi}_{\Delta}(t)>0$ , $\sum_{j\in\mathbb{S}}\widetilde{\pi}_{j}(t)=1$ for $t\geq 0$ , and $\widetilde{\boldsymbol{\pi}}=\widetilde{\boldsymbol{\pi}}(0)$ .

*Proof * The proof follows from applying the law of total probability and the Bayes’ formula for conditional probability. By applying the latter, we have that

[TABLE]

Therefore, we have by the above and applying the law of total probability that

[TABLE]

The result (2.9) is established by the Bayes’ rule and the law of total probability,

[TABLE]

while $(ii)$ and $(iii)$ follow taking the fact that $e^{\mathbf{Q}^{(k)}t}\mathbb{1}=\mathbb{1}$ , and $\widetilde{\boldsymbol{\pi}}^{\top}(t)\mathbb{1}=1$ . $\square$

Corollary 2.4

Suppose that the process is still alive at time $t\geq 0$ . Then,

(i)

Under full information $\mathcal{G}_{t}=\{X_{s},0\leq s\leq t-\}\cup\{X_{t}\neq\Delta\}$ , we have

[TABLE] 2. (ii)

If $\mathcal{G}_{t}=\{X_{t}\neq\Delta\}$ , it follows from (2.3) and the matrix partition (3.4),

[TABLE] 3. (iii)

If $\mathcal{G}_{t}=\{X_{0}=i_{0}\}\cup\{X_{t}\neq\Delta\}$ , following the above, $\pi_{j}(t)$ is given by

[TABLE]

It follows that $0<\pi_{E}(t)<1$ , $\pi_{\Delta}(t)=0$ , $\sum_{j\in E}\pi_{j}(t)=1$ for $t\geq 0$ , and $\widetilde{\boldsymbol{\pi}}=\widetilde{\boldsymbol{\pi}}(0)$ .

Notice that the Bayesian update $\pi_{j}(t)$ (2.12) and (2.13) form the normalization of the probability $\pi_{j}(t)$ (2.10) and (2.11), respectively, as such that $\pi_{\Delta}(t)=0$ . The results of Proposition 2.3 and Corollary 2.4 give additional features to the distributional properties of the mixture process [37] and [20].

Below we give the value of $\pi_{j}(t)$ as $t\rightarrow\infty$ under partial information. The result can be used to deduce the stationary distribution of (1.8) as $t\rightarrow\infty$ .

Proposition 2.5

Let $\{\mathbf{T}^{(k)}\}$ have distinct eigenvalues $\{\lambda_{j}^{(k)}:j\in E\},$ with $\lambda_{i_{k}}^{(k)}=\max\{\lambda_{j}^{(k)},j\in E\},$ $i_{k}=\textrm{argmax}_{j}\{\lambda_{j}^{(k)}\}$ . Define $\overline{\lambda}=\max\{\lambda_{i_{k}}^{(k)}\}.$ For $j\in E$ ,

[TABLE]

where $\mathcal{L}[\mathbf{T}^{(k)}]=\prod\limits_{j=1,j\neq i_{k}}^{n}\Big{(}\frac{\mathbf{T}^{(k)}-\lambda_{j}^{(k)}\mathbf{I}}{\lambda_{i_{k}}^{(k)}-\lambda_{j}^{(k)}}\Big{)}$ is the Lagrange interpolation coefficient.

In contrary to (2.10) and (2.11), we see from the above proposition that given the process still alive in the long run, the stationary distribution $\pi_{j}(\infty):=\lim_{t\rightarrow\infty}\pi_{j}(t)$ of $X$ does not have zero mass on the state $E$ with $\sum_{j\in E}\pi_{j}(\infty)=1.$

2.1.3 $\mathcal{F}_{t}-$ conditional transition probability matrix

The main feature of the mixture process $X$ (2.1) is that unlike its component $X^{(k)}$ , $X$ does not have the Markov property; future development of its state depends on its past information. The following theorem summarizes this property.

Theorem 2.6

For any $s\geq t\geq 0$ , the conditional transition probability matrix $[\mathbf{P}(t,s)]_{i,j}:=\mathbb{P}\{X_{s}=j|\mathcal{F}_{t,i}\}$ , $i,j\in\mathbb{S}$ , of the mixture process $X$ (2.1) is given by

[TABLE]

Theorem 2.6 generalizes the result of a lemma in [20] and Theorem 3.4 in [37].

*Proof * Similar to the proof of Theorem 3.4 in [37], (2.15) is established by applying the law of total probability and Bayes’ rule for conditional probability:

[TABLE]

where on the second last equality we used the fact that $X^{(k)}$ is Markovian. $\square$

It is clear from (2.15) that, unless the underlying Markov process $X^{(k)}$ moves at the same speed $\mathbf{Q}$ , i.e., $\mathbf{Q}^{(k)}=\mathbf{Q}$ for $k=1,\dots,m$ , $X$ does not inherit the Markov property of $X^{(k)}$ , i.e., future development of $X$ is determined by its past information $\mathcal{F}_{t,i}$ through its likelihood function (2.3). To be more precise, when $\mathbf{Q}^{(k)}=\mathbf{Q}$ , it follows from the transition probability matrix (2.15) that $\mathbf{P}(t,s)=e^{\mathbf{Q}(s-t)},$ by which $X$ reduces to a simple Markov jump process.

3 Probability distributions of first exit times

This section presents the main results of this paper on the joint probability distributions of the first exit times $\{\tau_{k}\}$ (1.6) of the Markov mixture process $X$ (2.1), conditional on the available information sets $\mathcal{F}_{t,i}$ and $\mathcal{G}_{t}$ . We first derive conditional univariate distribution of $\tau$ (1.1). To motivate the main results on the conditional multivariate distributions (1.8), we consider the bivariate case in some details. Throughout the remaining, we define intensity matrix $\mathbf{Q}^{(k)}$ by

[TABLE]

The following results on block partition of the transition probability matrix $\mathbf{P}(t,s)$ (2.15) and exponential matrix $e^{\mathbf{Q}^{(k)}t}$ will be used to derive the conditional probability distributions (1.8). We refer to Proposition 3.7 in [37] for details.

Lemma 3.1

Let the phase generator matrix $\mathbf{T}^{(k)}$ be nonsingular. Then,

[TABLE]

Proposition 3.2

The transition probability matrix (2.15) has block partition:

[TABLE]

3.1 Conditional univariate distributions

This section presents explicit identity for the probability distribution $\overline{F}_{t}(s)=\mathbb{P}\{\tau>s|\mathcal{G}_{t}\}$ , $s\geq t\geq 0$ , of the first exit time $\tau$ (1.1) given the information $\mathcal{G}_{t}$ .

Lemma 3.3

The $\mathcal{G}_{t}-$ conditional distribution $\overline{F}_{t}(s)$ is given for $s\geq t\geq 0$ by

[TABLE]

*Proof * Without loss of generality, let $\mathcal{G}_{t}=\mathcal{F}_{t-}$ . As $\tau$ is the first exit time of $X$ to the absorbing state $\Delta$ , by applying the law of total probability we have

[TABLE]

Again, by the law of total probability and the Bayes’ formula, we obtain

[TABLE]

Starting from equation (3.7), we have following the above expression that

[TABLE]

We arrive at the probability distribution (3.6) on account of $\sum\limits_{j\in E}\mathbf{e}_{j}\mathbf{e}_{j}^{\top}=\textrm{diag}(\mathbf{I},0)$ and the block partition (3.5) of the transition probability matrix $\mathbf{P}(t,s)$ . $\square$

Applying similar steps of derivation to the proof of (3.6), one can show that

[TABLE]

Lemma 3.4

Following the two identities (3.6) and (3.8), we deduce that

[TABLE]

Note that the measure $-d\overline{F}_{t}(s)$ has probability mass $f_{t}(t)=1-\boldsymbol{\pi}^{\top}(t)\mathbb{1}$ at the point $s=t$ when conditioning on $\mathcal{G}_{t}=\mathcal{F}_{t-}$ , and no mass given $\mathcal{G}_{t}=\mathcal{F}_{t-}\cup\{X_{t}\neq\Delta\}$ . Given that $\boldsymbol{\pi}^{\top}\mathbb{1}=1$ , it has zero mass at $t=0$ . It is absolutely continuous w.r.t Lebesgue measure $ds$ with density $f_{t}(s)$ on $\{s>t\}$ . Following (3.6), the density function $f_{t}(s)$ , its Laplace transform and $n$ th moment are given below.

Theorem 3.5

The $\mathcal{G}_{t}-$ conditional density function $f_{t}(s)$ is given for $s>t$ by

[TABLE]

(i)

The Laplace transform $\Psi_{t}(\lambda)=\int_{0}^{\infty}e^{-\lambda u}f_{t}(t+u)du$ is given by

[TABLE] 2. (ii)

*The * $\mathcal{G}_{t}-$ *conditional * $n$ th moment, for $n=0,1,...$ , of $\tau$ is given by

[TABLE]

Setting $\mathbf{T}^{(k)}=\mathbf{T}$ in (3.10), in which case $X$ never changes the speed, the above results coincide with that of given in [31] and Proposition 4.1 in [6] for $t=0$ .

The following theorem summarizes the dense and closure properties under finite convex mixtures and convolutions of $\overline{F}_{t}(s)$ (3.6). They can be established using matrix analytic approach [30]. See for e.g. Theorems 4.12 and 4.13 in [37].

Theorem 3.6

The phase-type distribution $\overline{F}_{t}(s)$ (3.6) is closed under finite convex mixtures and convolutions, and forms a dense class of distributions on $\mathbb{R}_{+}$ .

3.2 Conditional bivariate distributions

As in the univariate case, we consider the mixture process $X$ (2.1) on the finite state space $\mathbb{S}=E\cup\{\Delta\}$ . Following [8], let $\boldsymbol{\Gamma}_{1}$ and $\boldsymbol{\Gamma}_{2}$ be two nonempty stochastically closed subsets of $\mathbb{S}$ such that $\boldsymbol{\Gamma}_{1}\cap\boldsymbol{\Gamma}_{2}$ is a proper subset of $\mathbb{S}$ . We assume without loss of generality that $\boldsymbol{\Gamma}_{1}\cap\boldsymbol{\Gamma}_{2}=\Delta$ and the absorption into $\Delta$ is certain, i.e., the generator matrices $\{\mathbf{T}^{(k)}\}$ need to be nonsingular. As $\boldsymbol{\Gamma}_{l}$ , $l=1,2$ , are stochastically closed sets, necessarily we have $[\mathbf{Q}^{(k)}]_{i,j}=0$ if $i\in\boldsymbol{\Gamma}_{l}$ and $j\in\boldsymbol{\Gamma}_{l}^{c}$ .

We denote by $\widetilde{\boldsymbol{\pi}}$ the initial probability vector on $\mathbb{S}$ such that $\pi_{\Delta}=0$ . We shall assume that $\boldsymbol{\pi}_{i}\neq 0$ if $i\in\boldsymbol{\Gamma}_{1}^{c}\cap\boldsymbol{\Gamma}_{2}^{c}$ implying $\mathbb{P}\{\tau_{1}>0,\tau_{2}>0\}=1$ . As before, $\mathcal{F}_{t,i}=\mathcal{F}_{t-}\cup\{X_{t}=i\}$ defines all previous and current information of $X$ .

3.2.1 Conditional joint survival function of $\tau_{1}$ and $\tau_{2}$

The joint distribution of $\tau_{k}$ (1.8), for $k=1,2$ , are given by the following.

Lemma 3.7

The identity for $\mathcal{F}_{t,i}-$ conditional joint distribution $\overline{F}_{i,t}(t_{1},t_{2})=\mathbb{P}\{\tau_{1}>t_{1},\tau_{2}>t_{2}|\mathcal{F}_{t,i}\}$ of $\tau_{1}$ and $\tau_{2}$ is given for $t_{1},t_{2}\geq t\geq 0$ and $i\in E$ by

[TABLE]

with $\sum_{k=1}^{m}\mathbf{S}^{(k)}(t)=\mathbf{I}.$ Note that we have used $\mathbf{H}_{k}$ to denote a $(n\times n)-$ diagonal matrix whose $i$ th diagonal element for $i\in E$ equals $1$ if $i\in\Gamma_{k}^{c}$ and is [math] otherwise.

*Proof * To begin with, let $(t_{i_{1}},t_{i_{2}})$ , with $t_{i_{2}}\geq t_{i_{1}}$ be the ordering of $(t_{1},t_{2})$ , with $t_{i_{1}}\geq t_{i_{0}}=t$ . Since $\tau_{i_{k}}$ , $k=1,2$ , is the first exit time of $X$ (2.1) to $\Gamma_{i_{k}}$ ,

[TABLE]

The probability on the r.h.s of the last equality can be worked out as follows.

[TABLE]

Note that we have applied the law of total probability and the Bayes’ rule for conditional probability in the above equality. Recall that $\mathbb{P}\big{\{}X_{t_{i_{0}}}=J_{i_{0}}|\mathcal{F}_{t_{i_{0}},i}\big{\}}=1$ iff $J_{i_{0}}=i$ and zero otherwise. Therefore, starting from eqn. (3.11), we have

[TABLE]

leading to $\overline{F}_{i,t}(t_{1},t_{2})$ on account of $\mathbf{H}_{i_{k}}=\sum\limits_{J_{i_{k}}\in\Gamma_{i_{k}}^{c}}\mathbf{e}_{J_{i_{k}}}\mathbf{e}_{J_{i_{k}}}^{\top}$ , (2.5) and (3.4). $\square$

Using the result of Lemma 3.7, we derive the $\mathcal{G}_{t}-$ conditional probability distribution of $\tau_{1}$ and $\tau_{2}$ . A closed form distributional identity is given below.

Proposition 3.8

The distribution $\overline{F}_{t}(t_{1},t_{2})=\mathbb{P}\{\tau_{1}>t_{1},\tau_{2}>t_{2}|\mathcal{G}_{t}\}$ is given by

[TABLE]

*Proof * By (3.9) and the total probability law, $F_{t}(t_{1},t_{2})=\sum\limits_{i\in E}\pi_{i}(t)F_{i,t}(t_{1},t_{2})$ . $\square$

Remark 3.9

As $\mathbf{H}_{2}\mathbf{H}_{1}=\mathbf{H}_{1}\mathbf{H}_{2}$ , the measures $d\overline{F}_{i,t}(t_{1},t_{2})$ and $d\overline{F}_{t}(t_{1},t_{2})$ have probability mass $1-\mathbf{e}_{i}^{\top}\mathbf{H}_{2}\mathbf{H}_{1}\mathbb{1}$ and $1-\boldsymbol{\pi}^{\top}(t)\mathbf{H}_{2}\mathbf{H}_{1}\mathbb{1}$ , respectively, at the point $(t_{1}=t,t_{2}=t)$ . They are absolutely continuous w.r.t Lebesgue measure $dt_{1}dt_{2}$ with density $f_{i,t}(t_{1},t_{2})$ and $f_{t}(t_{1},t_{2})$ , subsequently, on $\{(t_{1},t_{2})\in\mathbb{R}_{+}^{2}:t_{1},t_{2}>t\}.$

3.2.2 Conditional joint probability density function

In general, the joint distribution $\overline{F}_{i,t}(t_{1},t_{2})$ (resp. $\overline{F}_{t}(t_{1},t_{2})$ ) has a singular component $\overline{F}_{i,t}^{(0)}(t_{1},t_{2})$ (resp. $\overline{F}_{t}^{(0)}(t_{1},t_{2})$ ) on the set $\{(t_{1},t_{2}):t_{2}=t_{1}\}$ . The singular component can be obtained by deriving the joint density of $\tau_{1}$ and $\tau_{2}$ and deduce the absolutely continuous and singular parts of the pdf, such as discussed in the theorem below. For non-matrix based bivariate function, see for instance [36].

Theorem 3.10

Given the joint distribution $\overline{F}_{i,t}(t_{1},t_{2})$ of $(\tau_{1},\tau_{2})$ as specified in Lemma 3.7, the joint probability density $f_{i,t}(t_{1},t_{2})$ of $(\tau_{1},\tau_{2})$ is given by

[TABLE]

where the absolutely continuous components $f_{i,t}^{(1)}(t_{1},t_{2})$ and $f_{i,t}^{(2)}(t_{1},t_{2})$ are

[TABLE]

where the matrix operator $[A,B]=AB-BA$ defines the commutator of $A$ and $B$ , whilst the singular component part $f_{i,t}^{(0)}(t_{1},t_{2})$ is defined by the function

[TABLE]

*Proof * The expressions for $f_{i,t}^{(1)}(t_{1},t_{2})$ and $f_{i,t}^{(2)}(t_{1},t_{2})$ follow from taking partial derivative $\frac{\partial^{2}}{\partial t_{2}\partial t_{1}}\overline{F}_{i,t}(t_{1},t_{2})$ (see derivation of Theorem 3.20) taking account

[TABLE]

To get $f_{i,t}^{(0)}(t_{1},t_{2})$ , recall that $\int_{0}^{\infty}e^{\mathbf{T}t}dt=-\mathbf{T}^{-1}$ , due to the phase-generator matrix $\mathbf{T}$ being negative definite (see Section II4d in [6]). Following Remark 3.9,

[TABLE]

Applying Fubini’s theorem, the first integral is given after some calculations by

[TABLE]

Following the same approach, one can show after some calculations that

[TABLE]

The proof is established on account of (3.14), the two identities above and

[TABLE]

Theorem 3.11

For $t\geq 0$ , the $\mathcal{G}_{t}-$ conditional density $f_{t}(t_{1},t_{2})$ is given by

[TABLE]

where the absolutely continuous components $f_{t}^{(1)}(t_{1},t_{2})$ and $f_{t}^{(2)}(t_{1},t_{2})$ are

[TABLE]

whilst the singular component $f_{t}^{(0)}(t_{1},t_{2})$ is defined by the function

[TABLE]

*Proof * It follows from identity (3.9) that $f_{t}(t_{1},t_{2})=\sum_{i\in E}\pi_{i}(t)f_{i,t}(t_{1},t_{2}).$ $\square$

Corollary 3.12

The singular component of $\overline{F}_{i,t}(t_{1},t_{2})$ and $\overline{F}_{t}(t_{1},t_{2})$ are

[TABLE]

Hence, the singular component of $\overline{F}_{it}(t_{1},t_{2})$ and $\overline{F}_{t}(t_{1},t_{2})$ is zero if and only if, for $k=1,\dots,m$ , $[\mathbf{T}^{(k)}]_{i,j}=0$ for $i\in\boldsymbol{\Gamma}_{1}^{c}\cap\boldsymbol{\Gamma}_{2}^{c}$ and $j=\Delta$ , which is equivalent to

[TABLE]

Remark 3.13

Consider the representation (4.2) for the matrices $\{\mathbf{T}^{(k)}\}$ . It is clear following (3.15) that the joint probability density function $f_{t}(t_{1},t_{2})$ coincides with the bivariate phase-type distribution [8] when we set each $\mathbf{T}^{(k)}=\mathbf{T}$ and $t=0$ taking into account the fact that $[\mathbf{T},\mathbf{H}_{1}]\mathbf{H}_{2}=[\mathbf{T},\mathbf{H}_{1}]$ and $[\mathbf{T},\mathbf{H}_{2}]\mathbf{H}_{1}=[\mathbf{T},\mathbf{H}_{2}]$ .

3.2.3 Conditional joint Laplace transform of $\tau_{1}$ and $\tau_{2}$

In order to compute the $\mathcal{F}_{t,i}-$ conditional moment $\mathbb{E}\big{\{}\tau_{1}^{n}\tau_{2}^{m}\big{|}\mathcal{F}_{t,i}\big{\}}$ , it is therefore convenient to study the $\mathcal{F}_{t,i}-$ conditional joint Laplace transform of $\tau_{1}$ and $\tau_{2}$ :

[TABLE]

Theorem 3.14

The $\mathcal{F}_{t,i}-$ conditional joint Laplace transform $\Psi_{i,t}(\lambda_{1},\lambda_{2})$ of the first exit times $\tau_{1}$ and $\tau_{2}$ of $X$ (2.1) is given for $\lambda_{1},\lambda_{2}\geq 0$ , $t\geq 0$ and $i\in E$ by

[TABLE]

*Proof * Recall that for $i\in E$ , $f_{i,t}(t_{1},t_{2})=0$ for $t_{1},t_{2}<t$ . Following Remark 3.9,

[TABLE]

The proof is established by applying Fubini’s theorem to double integrals. $\square$

By the law of total probability and Bayes’ rule we have the following result.

Theorem 3.15

The $\mathcal{G}_{t}-$ conditional joint Laplace transform $\Psi_{t}(\lambda_{1},\lambda_{2}):=\mathbb{E}\big{\{}e^{-\lambda_{1}\tau_{1}-\lambda_{2}\tau_{2}}\big{|}\mathcal{G}_{t}\big{\}}$ of the first exit times $\tau_{1}$ and $\tau_{2}$ is given for $\lambda_{1},\lambda_{2},t\geq 0$ by

[TABLE]

Following the joint Laplace transform (3.17), we obtain the joint moments:

[TABLE]

Example 3.16

The conditional joint moments $\mathbb{E}\{\tau_{1}\tau_{2}|\mathcal{G}_{t}\}$ is given by

[TABLE]

3.3 Conditional multivariate distributions

The extension to multivariate case follows similar approach to the bivariate one. Let $\Gamma_{1},...,\Gamma_{p}$ be nonempty stochastically closed subsets of $\mathbb{S}$ such that $\cap_{l=1}^{p}\Gamma_{l}$ is a proper subset of $\mathbb{S}$ . Without loss of generality, we assume that $\cap_{l=1}^{p}\Gamma_{l}=\Delta$ . Since $\Gamma_{l}$ is stochastically closed, we necessarily assume that $q_{ij}^{(k)}=0$ , $k=1,\dots,m$ , if $i\in\Gamma_{l}$ and $j\in\Gamma_{l}^{c}$ , for $l\in\{1,...,p\}$ , and $\boldsymbol{\pi}_{i}\neq 0$ whenever $i\in\cap_{l=1}^{p}\boldsymbol{\Gamma}_{l}^{c}$ .

Furthermore, denote by $\tau_{k}$ the first entry time of $X$ in the set $\boldsymbol{\Gamma}_{k}$ defined in (1.6). To formulate the joint distribution of $\{\tau_{k}\}$ , let $(t_{i_{1}},...,t_{i_{p}})$ be the time ordering of $(t_{1},...,t_{p})\in\mathbb{R}_{+}^{p}$ , where $(i_{1},...,i_{p})$ is a permutation of $(1,2,...,p)$ . Subsequently, we define by $j_{i_{k}}\in\boldsymbol{\Gamma}_{i_{k}}^{c}$ the state that $X$ occupies at time $t=t_{i_{k}}$ .

Lemma 3.17

Let $t_{i_{p}}\geq\dots\geq t_{i_{1}}\geq t_{i_{0}}=t\geq 0$ be the time ordering of $(t_{1},...,t_{p})\in\mathbb{R}_{+}^{p}$ . The joint distribution of the first exit times $\{\tau_{k}\}$ is given by

[TABLE]

where $\mathbf{H}_{i_{k}}$ is an $(n\times n)-$ diagonal matrix whose $i$ th element $[\mathbf{H}_{i_{k}}]_{i,i}=\mathbb{1}_{\{i\in\boldsymbol{\Gamma}_{i_{k}}^{c}\}}.$

*Proof * Following similar arguments of the proof in bivariate case, we obtain

[TABLE]

By Bayes’ theorem for conditional probability and the law of total probability,

[TABLE]

Note that $\mathbb{P}\big{\{}X_{t_{i_{0}}}=J_{i_{0}}|\mathcal{F}_{t_{i_{0}},j}\big{\}}=1$ iff $J_{i_{0}}=j$ and [math] otherwise. In terms of (2.4),

[TABLE]

Therefore, starting from equation (3.19) we have following the above that

[TABLE]

leading to $\overline{F}_{j,t}(t_{i_{1}},\dots,t_{i_{p}})$ on account of (2.5), the fact that $\mathbf{H}_{i_{k}}=\sum\limits_{J_{i_{k}}\in\Gamma_{i_{k}}^{c}}\mathbf{e}_{J_{i_{k}}}\mathbf{e}_{J_{i_{k}}}^{\top}$ and after applying block partition (3.4) to exponential matrices $e^{\mathbf{Q}^{(k)}t}$ . $\square$

Notice that the conditional joint probability distribution (3.18) forms a non-stationary function of time $t$ with the ability to capture heterogeneity and path dependence when conditioning on all previous and current information $\mathcal{F}_{t,j}$ of the mixture process $X$ . These features are removed when $\mathbf{T}^{(k)}=\mathbf{T}$ , in which case, the result reduces to the multivariate phase-type distribution (1.7) for $t=0$ .

Proposition 3.18

Let $t_{i_{p}}\geq\dots\geq t_{i_{1}}\geq t_{i_{0}}=t\geq 0$ be the time ordering of $(t_{1},...,t_{p})\in\mathbb{R}_{+}^{p}$ . The $\mathcal{G}_{t}-$ conditional joint distribution of $\{\tau_{k}\}$ (1.6) is given by

[TABLE]

where $\mathbf{H}_{i_{k}}$ is an $(n\times n)-$ diagonal matrix whose $i$ th element $[\mathbf{H}_{i_{k}}]_{i,i}=\mathbb{1}_{\{i\in\boldsymbol{\Gamma}_{i_{k}}^{c}\}}.$

*Proof * It follows from (3.9) that $\overline{F}_{t}(t_{i_{1}},...,t_{i_{p}})=\sum\limits_{j\in E}\pi_{j}(t)\overline{F}_{j,t}(t_{i_{1}},...,t_{i_{p}})$ . $\square$

Corollary 3.19

Set $\mathbf{T}^{(k)}=\mathbf{T}$ and $t=0$ in (3.20). The distribution of $\{\tau_{k}\}$ ,

[TABLE]

which coincides with the unconditional multivariate phase-type distribution [8].

The absolutely continuous component of the distribution $\overline{F}_{i,t}\big{(}t_{i_{1}},\dots,t_{i_{p}}\big{)}$ (respectively, $\overline{F}_{t}\big{(}t_{i_{1}},\dots,t_{i_{p}}\big{)}$ ) has a density given by the following theorem.

Theorem 3.20

Let $t_{i_{p}}\geq\dots\geq t_{i_{1}}>t_{i_{0}}=t\geq 0$ be the time ordering of $(t_{1},...,t_{p})\in\mathbb{R}_{+}^{p}$ . The conditional joint density function of $\{\tau_{k}\}$ (1.6) is given by

[TABLE]

*Proof * The proof follows from taking $p-$ times partial derivative to $F_{t}\big{(}t_{i_{1}},\dots,t_{i_{p}}\big{)}$ :

[TABLE]

To establish the result, it is enough to show the following partial derivative holds

[TABLE]

To justify the claim, we use induction argument. For this purpose, recall that

[TABLE]

Hence, by (3.13) and applying integration by part as we did before, we have

[TABLE]

from which the second order partial derivative $\frac{\partial^{2}}{\partial t_{i_{2}}\partial t_{i_{1}}}$ of (3.22) is given by

[TABLE]

After $(p-1)$ steps of taking the partial derivative, one can show that

[TABLE]

The claim is established on account of (3.13) and the fact that

[TABLE]

However, due to complexity of the joint distributions, the singular component of $\overline{F}_{i,t}(t_{i_{1}},\dots,t_{i_{p}})$ (resp. $\overline{F}_{t}(t_{i_{1}},\dots,t_{i_{p}})$ ) is more complicated to get in closed form.

Following (3.18) and (3.20), we see that the distributions are uniquely characterized by the Bayesian update on the probability $\widetilde{\boldsymbol{\pi}}$ of starting the process $X$ in any of the $(n+1)$ phases, the speeds of the process represented by the phase-generator matrices $\{\mathbf{T}^{(k)}\}$ , and by the Bayesian update of switching probability matrix $\mathbf{S}^{(k)}$ . The initial profile of the distributions form a generalized mixture of the multivariate phase-type distributions [8]. Unlike the latter, the distributions have non-stationary and path dependence property when conditioning on the available information (either full or partial) of $X$ , which is non-Markov. When the process never repeatedly changes the speed, i.e., $\mathbf{T}^{(k)}=\mathbf{T}$ , all these properties are removed and the initial distributions reduce to [8]. As in the univariate case, the multivariate distributions have closure and dense properties, which can be established in similar ways to the univariate analogs using matrix analytic approach [8]. We refer among others to [30], [9], [23] and [35] for Markov model, and to [37] for the mixture model. As a result, we have the following theorem.

Theorem 3.21 (Closure and dense properties)

The conditional multivariate probability distribution (3.20) forms a dense class of distributions on $\mathbb{R}_{+}^{p}$ , which is closed under finite convex mixtures and finite convolutions.

4 Some explicit and numerical examples

This section discusses some explicit examples of the main results presented in Section 3, particularly on the bivariate distributions. Using the closed form density functions (3.12) and (3.15), we discuss the mixtures of exponential distributions, Marshall-Olkin exponential distributions, and their generalization.

Example 4.1 (Mixture of exponential distributions)

Consider the mixture process $X$ (2.1) defined on the state space $\mathbb{S}=\{1,2,3\}\cup\{\Delta\}$ with stochastically closed sets $\boldsymbol{\Gamma}_{1}=\{2,\Delta\}$ and $\boldsymbol{\Gamma}_{2}=\{3,\Delta\}$ . Assume that the speed of the mixture process is represented by the following phase generator matrices:

[TABLE]

It is straightforward to derive from the state space representation that

[TABLE]

After some calculations, the matrices $[\mathbf{T}^{(1)},\mathbf{H}_{k}]$ and $\mathbf{T}^{(1)}\mathbf{H}_{k}$ , $k=1,2$ , are

[TABLE]

Similarly defined for $[\mathbf{T}^{(2)},\mathbf{H}_{k}]$ and $\mathbf{T}^{(2)}\mathbf{H}_{k}$ , for $k=1,2$ . Set the matrix $\mathbf{S}=\textrm{diag}(p_{1},p_{2},p_{3})$ , with $0<p_{k}<1$ , for $k=1,2,3$ , whilst the initial probability $\boldsymbol{\pi}$ has mass one on the state $1$ , i.e., $\boldsymbol{\pi}=\mathbf{e}_{1}$ . It is straightforward to check that the condition (3.16) is clearly satisfied implying that the joint density function (3.12) has zero singular component. Hence, following (3.12) we have for $t_{1},t_{2}\geq 0$

[TABLE]

The marginal distribution of $\tau_{1}$ and $\tau_{2}$ are given respectively by

[TABLE]

Hence, clearly, as $f_{\tau_{1},\tau_{2}}(t_{1},t_{2})\neq f_{\tau_{1}}(t_{1})f_{\tau_{2}}(t_{2})$ , it follows that the exit times $\tau_{1}$ and $\tau_{2}$ are not independent under the mixture model. They are independent if and only if $a_{1}=b_{1}=b_{2}=a_{2}$ , in which case the mixture corresponds to a simple Markov jump process. See the example on p. 691 in [8] and p. 59 in [23].

Furthermore, when conditioning on the information set $\mathcal{F}_{t,i}$ with $i=1$ , the conditional joint density function $f_{1,t}(t_{1},t_{2})$ is given for $t_{1},t_{2}\geq t\geq 0$ by

[TABLE]

where the switching probability $s_{1}(t)$ is defined for $\mathcal{F}_{t-}=\emptyset$ and $t\geq 0$ by

[TABLE]

Observe that, on the event $\{\textrm{min}\{\tau_{1},\tau_{2}\}>t\}$ , one can check that $s_{1}(t)\rightarrow 0$ (resp. $1$ ) as $t\rightarrow\infty$ if $b_{1}+b_{2}>\;(\textrm{resp.$ < $})\;a_{1}+a_{2}$ , implying that the mixture $X$ moves as a Markov process at the slow speed $\mathbf{T}^{(1)}$ (resp. $\mathbf{T}^{(2)}$ ) in the long run.

Given that $\Gamma_{1}^{c}\cap\Gamma_{2}^{c}=\{1\}$ , we have $\pi_{1}(t)=1$ for all $t\geq 0$ . Hence, the density function $f_{t}(t_{1},t_{2})$ (3.15) has therefore the same expression as (4.1).

Example 4.2 (Mixture of Marshall-Olkin distributions)

Consider the mixture process $X$ (2.1) with the same state space $\mathbb{S}$ and stochastically closed sets $\Gamma_{1}$ and $\Gamma_{2}$ as defined above. Let the speed of the mixture process be given by

[TABLE]

Set the matrix $\mathbf{S}=\textrm{diag}(p_{1},p_{2},p_{3})$ , with $0<p_{k}<1$ , for $k=1,2,3$ , while the initial distribution has mass one on the state $1$ , i.e., $\pi_{1}=1$ . Following (3.16), the joint density $f_{1,t}(t_{1},t_{2})$ has singular part on the set $\{(t_{1},t_{2}):t_{2}=t_{1}\}$ . By Theorem 3.10 and Corollary 3.12, the absolutely continuous parts are given by

[TABLE]

whereas the singular component $f_{t}^{(0)}(t_{1},t_{2})$ is given by the function:

[TABLE]

Note that the switching probability $s_{1}(t)$ is given for $\mathcal{F}_{t-}=\emptyset$ and $t\geq 0$ by

[TABLE]

4.1 General explicit identity

In order to take advantage of the structure of the generator matrices, let

[TABLE]

The generator matrices $\mathbf{T}^{(1)}$ and $\mathbf{T}^{(2)}$ are nonsingular if and only if $\mathbf{A}_{11}$ , $\mathbf{A}_{22}$ , $\mathbf{A}_{33}$ , $\mathbf{B}_{11}$ , $\mathbf{B}_{22}$ and $\mathbf{B}_{33}$ are all nonsingular. The matrices $\mathbf{H}_{1}$ and $\mathbf{H}_{2}$ are

[TABLE]

After some calculations the matrix $[\mathbf{T}^{(1)},\mathbf{H}_{k}]$ and $\mathbf{T}^{(1)}\mathbf{H}_{k}$ , $k=1,2$ , are given by

[TABLE]

Similarly defined for $[\mathbf{T},\mathbf{H}_{k}]$ and $\mathbf{T}\mathbf{H}_{k}$ , for $k=1,2$ . A rather long calculations using infinite series representation of exponential matrix shows following (3.12),

[TABLE]

for $i\in\Gamma_{1}^{c}\cap\Gamma_{2}^{c}$ , with the absolutely continuous parts $f_{i,t}^{(1)}(t_{1},t_{2})$ and $f_{i,t}^{(2)}(t_{1},t_{2})$ :

[TABLE]

whereas the singular component $f_{i,t}^{(0)}(t_{1},t_{1})$ is defined by the function

[TABLE]

Note that $\mathbf{S}_{11}(t)$ denotes the switching probability matrix of $X$ on $\Gamma_{1}^{c}\cap\Gamma_{2}^{c}$ .

It is straightforward to see that the absolutely continuous parts of $f_{i,t}(t_{1},t_{2})$ vanishes when $\mathbf{A}_{12}=\mathbf{B}_{12}=\mathbf{0}$ and $\mathbf{A}_{13}=\mathbf{B}_{13}=\mathbf{0}$ , in which case $\Gamma_{1}$ and $\Gamma_{2}$ are non overlapping. Moreover, $f_{i,t}(t_{1},t_{2})$ has no singular component $f_{i,t}^{(0)}(t_{1},t_{2})$ iff

[TABLE]

Denote by $\boldsymbol{\alpha}$ and $\boldsymbol{\gamma}$ the restriction of the probability $\boldsymbol{\pi}$ on the set $\Gamma_{1}^{c}\cap\Gamma_{2}^{c}$ and $E\backslash\{\Gamma_{1}^{c}\cap\Gamma_{2}^{c}\}$ , respectively, s.t. $\boldsymbol{\pi}=\big{(}\boldsymbol{\alpha},\boldsymbol{\gamma}\big{)}$ . The Bayesian updates $\boldsymbol{\pi}(t)$ on $\Gamma_{1}^{c}\cap\Gamma_{2}^{c}$ is defined by $\boldsymbol{\alpha}(t)$ . The conditional density $f_{t}(t_{1},t_{2})$ of $\tau_{1}$ and $\tau_{2}$ is given by

[TABLE]

where the subdensity functions $f_{t}^{(1)}(t_{1},t_{2})$ , $f_{t}^{(2)}(t_{1},t_{2})$ and $f_{t}^{(0)}(t_{1},t_{2})$ are

[TABLE]

The marginal probability density functions $f_{\tau_{k}}^{(i)}(s|t):=-\partial_{s}\mathbb{P}\{\tau_{k}>s\big{|}\mathcal{F}_{t,i}\}$ and $f_{\tau_{k}}(s|t):=-\partial_{s}\mathbb{P}\{\tau_{k}>s\big{|}\mathcal{G}_{t}\}$ of $\tau_{k}$ , $k=1,2$ , can be deduced from $\overline{F}_{i,t}(t_{1},t_{2})$ and $\overline{F}_{t}(t_{1},t_{2})$ . They are given for $s\geq t\geq 0$ and $i\in\Gamma_{k}^{c}$ by the following:

[TABLE]

where the phase-generator matrices $\mathbf{B}_{k}$ and $\mathbf{A}_{k}$ , for $k=1,2$ , are defined by

[TABLE]

In the next section we discuss some numerical examples of the main results presented in Section 3, in particular on the conditional bivariate distributions, taking the advantage of the structure of phase-generator matrices given in (4.2).

4.2 Numerical examples

Consider a mixture of birth-death processes with state diagram described in Figure 2. The birth-death process has been widely used in many places such as, among others, in queueing theory, performance engineering, see [18], demography, epidemiology and biology [32]. For simplicity, we set $\mathbb{S}=\{1,2,3,4,5\}\cup\{\Delta\}$ with $\Gamma_{1}=\{4,\Delta\}$ and $\Gamma_{2}=\{5,\Delta\}$ . The intensity matrix $\mathbf{Q}^{(1)}$ of $X^{(1)}$ is given by

[TABLE]

while the intensity matrix of $X^{(2)}$ is defined by $\mathbf{Q}^{(2)}=\boldsymbol{\Psi}\mathbf{Q}^{(1)}$ . Following [20] and [37] we choose $\boldsymbol{\Psi}=\psi\mathbf{I}$ , with $\psi\geq 0$ , whilst the initial switching probability matrix is defined by $\mathbf{S}=0.5\mathbf{I}$ . For numerical purposes, we set $\beta_{1}=\beta_{2}=2$ , $\alpha_{1}=\alpha_{2}=0.5$ , $\gamma_{1}=\gamma_{2}=1$ and $\delta_{i}=1$ , $i=1,2,3$ . The initial probability $\boldsymbol{\pi}$ of starting the process $X$ at any of the $5$ states is given by $\boldsymbol{\pi}=(0.6,0.3,0.1,0,0)^{\top}$ .

Numerical results on getting various shapes of conditional bivariate density functions (4.3) and (4.4) as function of time $t$ are presented in Figures 3 - 6.

The shape of the density functions $f_{i,t}(t_{1},t_{2})$ (4.3) and $f_{t}(t_{1},t_{2})$ (4.4) are displayed in Figure 3. The first plot in the top pictures exhibits the initial shape of $f_{i,t}(t_{1},t_{2})$ when $X_{t}$ starts in state $i=2$ at time zero, whereas the second plot presents the shape of stationary probability density function of $\tau_{1}$ and $\tau_{2}$ given that the process starts in the same state $i=2$ at time $t=10$ . The picture clearly shows that the function has zero value at initial time and left skewed. The two pictures below which represent the function $f_{t}(t_{1},t_{2})$ (4.4) when the process starts at a random initial state in $E$ at time $t=0$ and $t=10$ , respectively. The probability of starting the process at any given time $t\geq 0$ is given by $\boldsymbol{\pi}(t)$ . Note that we have used $\psi=0.5$ , by which $X^{(1)}$ moves two times faster than $X^{(2)}$ does. The function has nonzero value at initial time. However, unlike the two pictures above which, the function losses its hump shape in the long run. We can see this more detailed in Figure 6 in terms of the marginal density function of $\tau_{1}$ and $\tau_{2}.$

We observe that the joint density function $f_{i,t}(t_{1},t_{2})$ changes its shape as time $t$ increases, a feature that lacks in the Markov model $(\psi=1)$ . Given that $\mathbf{S}=0.5\mathbf{I}$ , the initial profile of density function $f_{t}(t_{1},t_{2})$ (for $t=0$ ) forms a mixture of bivariate phase-type distributions $f_{\boldsymbol{\pi},\mathbf{T}^{(1)}}(t_{1},t_{2})$ and $f_{\boldsymbol{\pi},\mathbf{T}^{(2)}}(t_{1},t_{2})$ , i.e., $f_{t}(t_{1},t_{2})=0.5f_{\boldsymbol{\pi},\mathbf{T}^{(1)}}(t_{1},t_{2})+0.5f_{\boldsymbol{\pi},\mathbf{T}^{(2)}}(t_{1},t_{2})$ , where $f_{\boldsymbol{\pi},\mathbf{T}^{(k)}}(t_{1},t_{2})$ , $k=1,2$ , is obtained by setting $\mathbf{T}^{(1)}=\mathbf{T}^{(2)}$ in (4.4), see e.g. [8]. In contrary to [8], the distribution $f_{t}(t_{1},t_{2})$ changes its shape as $t$ increases, as depicted in Figure 3.

The stationary values of $\mathbf{S}_{11}(t)$ and $\boldsymbol{\alpha}(t)$ are given as $t\rightarrow\infty$ respectively by

[TABLE]

from which it follows that, conditional it is still alive in the long run, $X$ moves according to the Markov process $X^{(2)}$ . Despite $\mathbb{1}^{\top}\boldsymbol{\alpha}(0)=1$ , we have for $t>0$ that $0<\mathbb{1}^{\top}\boldsymbol{\alpha}(t)<1$ . In all cases, the density has symmetry property for the values of parameters chosen. The contour plot in Figure 4 confirms this observation.

The shape of marginal distributions $f_{\tau_{1}}^{(i)}(t_{1}|t)$ and $f_{\tau_{1}}(t_{1}|t)$ of $\tau_{1}$ are presented in Figure 5 for different values of speed parameter $\psi$ . By symmetry, the marginal distributions of $\tau_{2}$ also share the same shape. Despite changing its shapes as $t$ increases, the pictures strongly suggest that the marginal pdf is left skewed and has zero value at zero for $f_{\tau_{1}}^{(i)}(t_{1}|t)$ , and positive value for $f_{\tau_{1}}(t_{1}|t)$ . They both decay to zero as $t_{1}$ increases as shown in more details in Figure 6. We also notice from the latter that the marginal probability density functions of $\tau_{1}$ and $\tau_{2}$ do not have common shape when the exit parameter $\delta_{2}$ changes its value.

5 Conclusions

We have introduced a new class of conditional joint probability distributions of first exit times of a continuous-time stochastic process defined as a finite mixture of right-continuous Markov jump processes, with overlapping absorbing sets, moving at different speeds on the same finite state space, while the mixture occurs at a random time. Distributional properties of the mixture process were discussed in general case, in particular the Bayesian update on the probability of starting the process in any phase of the state space at a given time, based on past observation of the process. The results presented in this paper generalizes that of given in [21], [20] and [37]. The new distributions form non-stationary functions of time and have the ability to capture heterogeneity and path dependence when conditioning on the available information (either full or partial) of the process. The attribution of path dependence is due to non-Markov property of the process.

Distributional identities are presented explicitly in terms of the intensity matrices of the underlying Markov processes, the Bayesian updates of switching probability and of the probability of starting the process in any of the phases in the state space, despite the fact that the mixture itself is non-Markov. In particular, the initial distributions form of a generalized mixture of the multivariate phase-type distributions of Assaf et al. [8]. When the underlying processes move at the same speed, in which case the mixture becomes a simple Markov jump process, heterogeneity and path dependence are removed and the initial distributions reduce to [8]. As in the univariate case, the probability distributions have dense and closure properties under finite convex mixtures and finite convolutions. These properties emphasize the additional importance of the new distributions.

As we have shown in this paper, the Markov mixture process forms a tractable construction of a continuous-time stochastic process having non-Markov property. Given their availability in explicit form and tractability, the Markov mixture process and the new conditional multivariate probability distributions should be able to offer appealing features for variety of applications, in which the Markov chains and the (multivariate) phase-type distributions have played central role.

6 Acknowledgments

The author acknowledges some inputs and suggestions from participants of Risk and Stochastic Seminar of London School of Economics, Hugo Steinhaus Center of Mathematics Seminar of Wroclaw University of Science and Technology, and Insurance: Mathematics and Economics Conference during 16-18 July 2018 at UNSW Sydney, at which part of the results of this work were presented. He enjoyed the hospitality provided by the hosts during his visits to Wroclaw and London, for which he respectively thanks Professor Zbigniew Palmowski and Angelos Dassios for the invitation. The author also acknowledges financial support from Victoria University of Wellington for the research grant # 218772.

Bibliography40

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Aalen, O.O. and Gjessing, H.K. (2001). Understanding the shape of the hazard rate: a process point of view. Stat. Sci. , 16 , 1-22.
2[2] Aalen, O.O. (1995). Phase type distributions in survival analysis. Scand. J. Stat. , 22 , 447-463.
3[3] Asimit, A.V. and Jones, B.L. (2007). Extreme behavior of multivariate phase-type distributions. Insurance Math. Econom. , 41 , 223-233.
4[4] Apostol, T. (1969). Explicit formulas for the exponential matrix e 𝐀 t superscript 𝑒 𝐀 𝑡 e^{\mathbf{A}t} . Am. Math. Mon. , 76 (3), p.289-292.
5[5] Asmussen, S., Avram, F. and Pistorius, M.R. (2004). Russian and American put options under exponential phase-type Lévy models. Stoch. Proc. Appl. , 109 , 79-111.
6[6] Asmussen, S. (2003). Applied Probability and Queues , 2nd Edition, Springer.
7[7] Albrecher, H. and Asmussen, S. (2010). Ruin Probabilities , 2nd Edition, World Scientific.
8[8] Assaf, D., Langberg, N.A., Savits, T.H. and Shaked, M. (1984). Multivariate phase-type distributions. Oper. Res. , 32 , 688-702.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Conditional Joint Probability Distributions for the Mixture of Markov Jump Processes

Abstract

1 Introduction

2 Mixture of Markov jump processes

2.1 Distributional properties

2.1.1 Bayesian updates of switching probability

Proposition 2.1

Proposition 2.2

2.1.2 Bayesian updates of probability distribution π~\widetilde{\boldsymbol{\pi}}π

Proposition 2.3

Corollary 2.4

Proposition 2.5

2.1.3 Ft−\mathcal{F}_{t}-Ft​−conditional transition probability matrix

Theorem 2.6

3 Probability distributions of first exit times

Lemma 3.1

Proposition 3.2

3.1 Conditional univariate distributions

Lemma 3.3

Lemma 3.4

Theorem 3.5

Theorem 3.6

3.2 Conditional bivariate distributions

3.2.1 Conditional joint survival function of τ1\tau_{1}τ1​ and τ2\tau_{2}τ2​

Lemma 3.7

Proposition 3.8

Remark 3.9

3.2.2 Conditional joint probability density function

Theorem 3.10

Theorem 3.11

Corollary 3.12

Remark 3.13

3.2.3 Conditional joint Laplace transform of τ1\tau_{1}τ1​ and τ2\tau_{2}τ2​

Theorem 3.14

Theorem 3.15

Example 3.16

3.3 Conditional multivariate distributions

Lemma 3.17

Proposition 3.18

Corollary 3.19

Theorem 3.20

Theorem 3.21** (Closure and dense properties)**

4 Some explicit and numerical examples

Example 4.1** (Mixture of exponential distributions)**

Example 4.2** (Mixture of Marshall-Olkin distributions)**

4.1 General explicit identity

4.2 Numerical examples

5 Conclusions

6 Acknowledgments

2.1.2 Bayesian updates of probability distribution $\widetilde{\boldsymbol{\pi}}$

2.1.3 $\mathcal{F}_{t}-$ conditional transition probability matrix

3.2.1 Conditional joint survival function of $\tau_{1}$ and $\tau_{2}$

3.2.3 Conditional joint Laplace transform of $\tau_{1}$ and $\tau_{2}$

Theorem 3.21 (Closure and dense properties)

Example 4.1 (Mixture of exponential distributions)

Example 4.2 (Mixture of Marshall-Olkin distributions)