The Mixture of Markov Jump Processes: Monte Carlo Method and the EM Estimation
H. Frydman, B.A. Surya

TL;DR
This paper introduces a Monte Carlo method and EM algorithm for statistical estimation of a mixture of Markov jump processes with unobservable regimes, enabling better modeling of complex stochastic systems.
Contribution
It provides a novel Monte Carlo simulation approach and an EM-based estimation procedure for a generalized mixture of Markov jump processes, extending previous models.
Findings
Monte Carlo method accurately simulates the process.
EM algorithm effectively estimates model parameters.
Numerical examples demonstrate method performance.
Abstract
This paper discusses tractable development and statistical estimation of a continuous time stochastic process with a finite state space having non-Markov property. The process is formed by a finite mixture of right-continuous Markov jump processes moving at different speeds on the same finite state space, whereas the speed regimes are assumed to be unobservable. The mixture was first proposed by Frydman (J. Am. Stat. Assoc., 100, 1046-1053, 2005) in 2005 and recently generalized in Surya (Stoch. Syst. 8, 29-44, 2018), in which distributional properties and explicit identities of the process are given in its full generality. The contribution of this paper is two fold. First, we present Monte Carlo method for constructing the process and show distributional equivalence between the simulated process and the actual process. Secondly, we perform statistical inference on the distribution…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Distribution Estimation and Applications · Probabilistic and Robust Engineering Design · Advanced Statistical Process Monitoring
Conditional Joint Probability Distributions for the Mixture of Markov Jump Processes
B.A. Surya111School of Mathematics and Statistics, Victoria University of Wellington, Gate 6 Kelburn PDE, Wellington 6140, New Zealand. Email address: [email protected]
School of Mathematics and Statistics
Victoria University of Wellington, New Zealand
(12 May 2018)
Abstract
New results on conditional joint probability distributions of first exit times are presented for a continuous-time stochastic process defined as the mixture of Markov jump processes moving at different speeds on the same finite state space, while the mixture occurs at a random time. Such mixture was first proposed by Frydman [21] and Frydman and Schuermann [20] as a generalization of the mover-stayer model of Blumen et at. [17], and was recently extended in Surya [37], in which further explicit distributional identities of the process are given, in particular in the presence of an absorbing state. We revisit [37] for a finite mixture with different overlapping absorbing sets. The contribution of this paper is two fold. First, we generalize distributional properties of the mixture process discussed in [21], [20] and [37]. Secondly, we give distributional identities of the first exit times to the absorbing sets of the process explicitly in terms of the intensity matrices of the underlying Markov processes and the Bayesian updates of switching probability and the probability distribution of states, despite the fact that the process itself is non-Markov. They form non-stationary functions of time and have the ability to capture heterogeneity and path dependence when conditioning on the available information (either full or partial) of the process. In particular, the initial profile of the distributions forms a generalized mixture of the multivariate phase-type distributions of Assaf et al. [8]. When the underlying processes move at the same speed, in which case the mixture becomes a simple Markov process, these features are removed, and the initial distributions reduce to [8]. Some explicit and numerical examples are discussed to illustrate the main results.
MSC2010 Subject Classification: 60J20, 60J27, 60J28, 62N99
Keywords: Markov jump process, mixture of Markov jump processes, first exit times, conditional multivariate distributions, phase-type model
1 Introduction
Markov process has been one of the most important probabilistic tools in modeling complex stochastic systems dynamics. It has been widely used in variety of applications across various fields such as, among others, in modeling vegetation dynamics (Balzter [11]), demography (Nowak [32]), in marketing to model consumer relationship (Berger and Nasr [13] and Pfeifer and Carraway [33]) and to identify substitutions behavior of customers in assortment problem (Blanchet et al. [16]), in describing credit rating transitions used in many credit risk and pricing applications (Jarrow and Turnbull [26], Jarrow et al. [25], Bielecki et al. [14]), in queueing networks and performance engineering (Bolch et al. [18]).
One of the key variables in the analysis of stochastic systems is the time until an event occurs (the lifetime of systems), for example, the lifetime of a corporate bond [25], customer relationship (Ma et al. [29]), networks [18], etc. It represents the first exit time to an absorbing set of the underlying Markov process. Its distribution is usually referred to as the phase-type distribution, which was first introduced in univariate form by Neuts [31] in 1975 as generalization of Erlang distribution. It has dense property, which can approximate any distribution of positive random variables arbitrarily well, and has closure property under finite convex mixtures and convolutions. When the jumps of compound Poisson process has phase-type distribution, it results in a dense class of Lévy processes, see Asmussen [6]. The advantage of working under phase-type distribution is that it allows some analytically tractable results in applications. To mention some, in option pricing (Asmussen et al. [5]), actuarial science (Albrecher and Asmussen [7], Rolski et al. [35], Zadeh et al. [40]), in survival analysis (Aalen [2], Aalen and Gjessing [1]), in queueing theory (Chakravarthy and Neuts [19], Asmussen [6]), in reliability theory (Assaf and Levikson [9], Okamura and Dohi [38]).
The phase-type distribution is expressed in terms of a Markov jump process with a finite state space , where for some integer , and represent respectively the transient and absorbing states. We also refer to as the th element of , i.e., . The first exit time of to the absorbing state and its distribution are defined by
[TABLE]
In view of credit risk applications, the state space represents the possible credit classes, with being the highest (Aaa in Moody’s rankings) and being the lowest (C in Moody’s rankings), whilst the absorbing state represents bankruptcy, D. The distribution represents the proportion of homogeneous bonds in the rating . We refer to [26] and [25] and literature therein for details.
Unless stated otherwise, we denote by the initial probability of starting in any of the phases. For simplicity, we assume that , so that . The speed at which the Markov process moves along the state space is described by an intensity matrix . This matrix has block partition according to the process moving in the transient state and in the absorbing state , which admits the following block-partitioned form:
[TABLE]
with , as the rows of the intensity matrix sums to zero. That is to say that the entry of the matrix satisfies the following properties:
[TABLE]
See Chapter II of Asmussen [6] for more details on the Markov jump processes. Since the states is transient and that is a non-negative vector and , the condition (1.3) implies that is a negative definite matrix. See Section II4d of [6]. The matrix is known as the phase generator matrix of . The absorption is certain if and only if is nonsingular, see Neuts [30].
Following Theorem 3.4 and Corollary 3.5 in [6] and by the homogeneity of , the transition probability matrix of over the period of time is
[TABLE]
The entry has probabilistic interpretation: is the expected length of time that remains in state , and is the probability that when a transition out of state occurs, it is to state , . The representation of the distribution is uniquely specified by . We refer among others to Neuts [30] and Asmussen [6] for details. Following [30] and Proposition 4.1 [6],
[TABLE]
The extension of (1.5) to multivariate form was proposed by Assaf et al. [8] and later by Kulkarni [27]. Following [8], let be nonempty stochastically closed subsets of such that is a proper subset of . ( is said to be stochastically closed if once enters , it never leaves.) We assume without loss of generality that consists of only the absorbing state , i.e., . Since is stochastically closed, necessarily if and .
The first exit time of to the stochastically closed set is defined by
[TABLE]
The joint distribution of is called the multivariate phase type distribution, see [8]. Let be the ordering of . Following [8],
[TABLE]
where is diagonal matrix whose th diagonal element, for , equals when and is zero otherwise. As before, we assume that has zero mass on and for implying that .
The multivariate distribution (1.7) has found various applications, e.g., in modeling credit default contagion (Herbertsson [24], Bielecki et al. [14]), in modeling aggregate loss distribution in insurance (Berdel and Hipp [12], Asimit and Jones [3] and Willmot and Woo [39]), and in Queueing theory (Badila et al. [10]).
Due to spatial homogeneity of the Markov process, the distributions (1.5) and (1.7) have stationary property and are unable to capture heterogeneity and available information of its past. In their empirical works, Frydman [21], Frydman and Schuermann [20] found that bonds of the same credit rating, represented by the state space of the Markov process, can move at different speeds to other ratings. In addition to this observation, the inclusion of past credit ratings improves out-of-sample prediction of the Nelson-Aalen estimate of credit default intensity. These empirical findings suggest that the credit rating dynamics [25] can be represented by a mixture of Markov jump processes moving at different speeds, where the mixture itself is non-Markov. However, the analyses performed in [21] and [20] were based on a special structure of mixture process where each underlying process has the same probability of leaving a state to another. Surya [37] revisited the mixture model [21], [20] and gave further explicit distributional identities of the mixture, in particular in the presence of an absorbing state.
This paper attempts to extend [37] by relaxing the assumptions [21], [20] for a finite mixture of Markov jump processes with overlapping absorbing sets moving at different speeds. The main contribution of this paper is two fold. First, we give distributional properties of the mixture process for general case, in particular on the Bayesian update of probability distribution of starting the process at any given time . Secondly, we derive the joint probability distributions of the first exit times (1.6) of , conditional on the available (either full or partial) information , with , of the process. Using the results, we derive the joint probability distributions of conditional on the information knowing all previous observations of the process and given that it is still ”alive” at a given time , i.e., . We write if the only available information is the past observation . Conditional on and , we derive explicit formula for
[TABLE]
for the mixture process , with , and . Unless the underlying Markov processes move at the same speed, we show that the initial profile of the joint distributions (1.8) forms a generalized mixture of (1.7). Under partial information, given the process is still alive in the long run, we give the corresponding limiting (stationary) distributions of (1.8) as .
From the credit risk point of view (see for e.g. [25], [15], [14], [24]), the quantity describes the joint probability distribution of first exit times of rated bonds, due to cause-specific of exits (default, prepayment, calling back, debt retirement, etc), conditional on the credit rating history up to a given time , whilst the function determines the joint probability distribution of the bonds’ exit times across credit ratings viewed at time . In the framework of competing risks (see for e.g. Pintilie [34]), for the observed exit time and the reason of exit , the probability determines the proportion of rated bonds exiting by type from the credit portfolio within time interval , whilst represents the percentage of bonds exiting by type .
The organization of this paper is as follows. Section 2 discusses distributional properties of the Markov mixture process which generalize the results of [20] and [37], in particular on the Bayesian update on the probability of starting the process in any state at given time . The main contributions of this paper are given in Section 3, where explicit forms of the conditional probability distributions and their Laplace transforms are presented. Some explicit examples are discussed in Section 4, in which we show that the exit times are independent under the Markov model, but not necessarily for the mixture model. Also in this section, we discuss numerical examples of the main results for bivariate distributions of birth-death mixture processes. Section 5 concludes this paper.
2 Mixture of Markov jump processes
Throughout the remaining of this paper we denote by the Markov mixture process, which is a continuous-time stochastic process defined as a finite mixture of Markov jump processes , with , whose intensity matrices are given by . We assume that the underlying Markov processes have right-continuous sample paths, and are defined on the same finite state space . It is defined by
[TABLE]
where the variable represents the speed regimes, assumed to be unobservable.
More conveniently, we can represent in terms of the underlying processes as follows. Define an indicator variable . Thus, for ,
[TABLE]
It is clear that (2.1) represents a finite mixture of Markov processes .
For a given initial state , there is a separate mixing probability
[TABLE]
and . The quantity has the interpretation as the proportion of population (e.g. bonds) with initial state evolving w.r.t to . In general, and , , have different expected length of occupation time of a state , i.e., , and have different probability of leaving the state to state , , i.e. . Note that we have used and to denote the negative of the th diagonal element and the entry of .
Markov mixture process is a generalization of mover-stayer model, a mixture of two discrete-time Markov chains proposed by Blumen et al [17] in 1955 to model population heterogeneity in jobs labor market. In the mover-stayer model [17], the population of workers consists of stayers (workers who always stay in the same job category, ) and movers (workers who move to other job according to a stationary Markov chain with intensity matrix ). Estimation of the mover-stayer model was discussed in Frydman [22]. Frydman [21] generalized the model to a finite mixture of Markov chains moving with different speeds. Frydman and Schuermann [20] later on used the result for the mixture of two Markov jump processes moving with intensity matrices and , where is a diagonal matrix, to model the dynamics of firms’ credit ratings. Depending on whether , , or , never moves out of state (the mover-stayer model), moves out of state at lower rate, higher rate or at the same rate, subsequently, than that of . If , for all , the mixture process reduces to a simple Markov jump process .
Figure 1 illustrates the transition of for the mixture of two Markov jump processes moving from state to , and vice versa. When is observed in state , it would stay in the state for an exponential period of time with intensity or before moving to with probability or , depending on whether it is either driven by the underlying Markov process or .
2.1 Distributional properties
Recall that the process (2.1) repeatedly changes its speed randomly in time according to the speed rate . The speed regime, represented by the variable , is however not directly observable; we can not classify from which regime the observed process came from. However, it can be identified based on available information of the process. We denote by all previous information about prior to time , and by , . The set may contain full, partial information or maybe nothing about the past of .
The likelihood of observing the past realization of moving according to the process conditional on knowing its initial state is defined by
[TABLE]
where in the expression above we have denoted subsequently by and the total time the observed process spent in state for , and the number of transitions from state to state , with , observed in the information set ; whereas represents the entry of the intensity matrix .
2.1.1 Bayesian updates of switching probability
The Bayesian updates of switching probability of (2.1) is defined by
[TABLE]
It represents the proportion of those in state moving according to . Note that (2.2). Denote by , , a diagonal matrix defined by
[TABLE]
where we have denoted by an identity matrix, with , representing switching probability matrix of .
For , in which case , we write , . The element , , of the intensity matrix is given below.
Proposition 2.1
Let be the initial probability of starting the Markov mixture process (2.1) on a finite state space . Define by the likelihood matrix whose element is defined in (2.3). Then, for and ,
[TABLE]
To be more precise, depending on availability of information set we have:
- (i)
Under full information that
[TABLE] 2. (ii)
Under partial information , is defined by
[TABLE] 3. (iii)
Under partial information , is given by,
[TABLE]
The expression (2.6) generalizes the result of [20] and Lemma 3.1 in [37].
Note that we have used slightly different notations for the likelihood function (2.3) and the switching probability (2.6) from that of used in [20] and [37].
*Proof *[Proposition 2.1] By the law of total probability and the Bayes’ formula,
[TABLE]
The claim in (2.6) is finally established on account of the Bayes’ formula:
[TABLE]
If have distinct eigenvalues , it can be proved similar to the Proposition 3.2 in [37] using the Lagrange-Sylvester formula
[TABLE]
see Theorem 2 of Apostol [4], that, under partial information, the probability in the long-run, as , implying that moves according to . The result can be used to deduce the stationary distribution of (1.8) as
Proposition 2.2
Let have distinct eigenvalues with . Define For ,
[TABLE]
where \mathcal{L}[\mathbf{Q}^{(k)}]=\prod\limits_{j=1,j\neq i_{k}}^{n+1}\Big{(}\frac{\mathbf{Q}^{(k)}-\lambda_{j}^{(k)}\mathbf{I}}{\lambda_{i_{k}}^{(k)}-\lambda_{j}^{(k)}}\Big{)} is the Lagrange interpolation coefficient.
It is clear following the above that when the intensity matrices take the form of (1.2), (2.8) reduces to the results of Proposition 3.2 of [37].
In the section below we derive the Bayesian updates on the probability of starting at a given time and available information of the process.
2.1.2 Bayesian updates of probability distribution
The following proposition and its corollary provide Bayesian updates on finding in any state at a given time based on all previous observations of the process and knowing that it is still ”alive” at time .
Proposition 2.3
Let . Define for .
[TABLE]
- (i)
Given all previous observations , we have
[TABLE] 2. (ii)
If , it follows from (2.3) that \mathbf{L}^{Q^{(k)}}(t)=\exp\big{(}\mathbf{Q}^{(k)}t\big{)}. Then,
[TABLE] 3. (iii)
If , it follows from the above that is given by
[TABLE]
Notice that , , for , and .
*Proof * The proof follows from applying the law of total probability and the Bayes’ formula for conditional probability. By applying the latter, we have that
[TABLE]
Therefore, we have by the above and applying the law of total probability that
[TABLE]
The result (2.9) is established by the Bayes’ rule and the law of total probability,
[TABLE]
while and follow taking the fact that , and .
Corollary 2.4
Suppose that the process is still alive at time . Then,
- (i)
Under full information , we have
[TABLE] 2. (ii)
If , it follows from (2.3) and the matrix partition (3.4),
[TABLE] 3. (iii)
If , following the above, is given by
[TABLE]
It follows that , , for , and .
Notice that the Bayesian update (2.12) and (2.13) form the normalization of the probability (2.10) and (2.11), respectively, as such that . The results of Proposition 2.3 and Corollary 2.4 give additional features to the distributional properties of the mixture process [37] and [20].
Below we give the value of as under partial information. The result can be used to deduce the stationary distribution of (1.8) as .
Proposition 2.5
Let have distinct eigenvalues with . Define For ,
[TABLE]
where \mathcal{L}[\mathbf{T}^{(k)}]=\prod\limits_{j=1,j\neq i_{k}}^{n}\Big{(}\frac{\mathbf{T}^{(k)}-\lambda_{j}^{(k)}\mathbf{I}}{\lambda_{i_{k}}^{(k)}-\lambda_{j}^{(k)}}\Big{)} is the Lagrange interpolation coefficient.
In contrary to (2.10) and (2.11), we see from the above proposition that given the process still alive in the long run, the stationary distribution of does not have zero mass on the state with
2.1.3 conditional transition probability matrix
The main feature of the mixture process (2.1) is that unlike its component , does not have the Markov property; future development of its state depends on its past information. The following theorem summarizes this property.
Theorem 2.6
For any , the conditional transition probability matrix , , of the mixture process (2.1) is given by
[TABLE]
Theorem 2.6 generalizes the result of a lemma in [20] and Theorem 3.4 in [37].
*Proof * Similar to the proof of Theorem 3.4 in [37], (2.15) is established by applying the law of total probability and Bayes’ rule for conditional probability:
[TABLE]
where on the second last equality we used the fact that is Markovian.
It is clear from (2.15) that, unless the underlying Markov process moves at the same speed , i.e., for , does not inherit the Markov property of , i.e., future development of is determined by its past information through its likelihood function (2.3). To be more precise, when , it follows from the transition probability matrix (2.15) that by which reduces to a simple Markov jump process.
3 Probability distributions of first exit times
This section presents the main results of this paper on the joint probability distributions of the first exit times (1.6) of the Markov mixture process (2.1), conditional on the available information sets and . We first derive conditional univariate distribution of (1.1). To motivate the main results on the conditional multivariate distributions (1.8), we consider the bivariate case in some details. Throughout the remaining, we define intensity matrix by
[TABLE]
The following results on block partition of the transition probability matrix (2.15) and exponential matrix will be used to derive the conditional probability distributions (1.8). We refer to Proposition 3.7 in [37] for details.
Lemma 3.1
Let the phase generator matrix be nonsingular. Then,
[TABLE]
Proposition 3.2
The transition probability matrix (2.15) has block partition:
[TABLE]
3.1 Conditional univariate distributions
This section presents explicit identity for the probability distribution , , of the first exit time (1.1) given the information .
Lemma 3.3
The conditional distribution is given for by
[TABLE]
*Proof * Without loss of generality, let . As is the first exit time of to the absorbing state , by applying the law of total probability we have
[TABLE]
Again, by the law of total probability and the Bayes’ formula, we obtain
[TABLE]
Starting from equation (3.7), we have following the above expression that
[TABLE]
We arrive at the probability distribution (3.6) on account of and the block partition (3.5) of the transition probability matrix .
Applying similar steps of derivation to the proof of (3.6), one can show that
[TABLE]
Lemma 3.4
Following the two identities (3.6) and (3.8), we deduce that
[TABLE]
Note that the measure has probability mass at the point when conditioning on , and no mass given . Given that , it has zero mass at . It is absolutely continuous w.r.t Lebesgue measure with density on . Following (3.6), the density function , its Laplace transform and th moment are given below.
Theorem 3.5
The conditional density function is given for by
[TABLE]
- (i)
The Laplace transform is given by
[TABLE] 2. (ii)
*The **conditional *th moment, for , of is given by
[TABLE]
Setting in (3.10), in which case never changes the speed, the above results coincide with that of given in [31] and Proposition 4.1 in [6] for .
The following theorem summarizes the dense and closure properties under finite convex mixtures and convolutions of (3.6). They can be established using matrix analytic approach [30]. See for e.g. Theorems 4.12 and 4.13 in [37].
Theorem 3.6
The phase-type distribution (3.6) is closed under finite convex mixtures and convolutions, and forms a dense class of distributions on .
3.2 Conditional bivariate distributions
As in the univariate case, we consider the mixture process (2.1) on the finite state space . Following [8], let and be two nonempty stochastically closed subsets of such that is a proper subset of . We assume without loss of generality that and the absorption into is certain, i.e., the generator matrices need to be nonsingular. As , , are stochastically closed sets, necessarily we have if and .
We denote by the initial probability vector on such that . We shall assume that if implying . As before, defines all previous and current information of .
3.2.1 Conditional joint survival function of and
The joint distribution of (1.8), for , are given by the following.
Lemma 3.7
The identity for conditional joint distribution of and is given for and by
[TABLE]
with Note that we have used to denote a diagonal matrix whose th diagonal element for equals if and is [math] otherwise.
*Proof * To begin with, let , with be the ordering of , with . Since , , is the first exit time of (2.1) to ,
[TABLE]
The probability on the r.h.s of the last equality can be worked out as follows.
[TABLE]
Note that we have applied the law of total probability and the Bayes’ rule for conditional probability in the above equality. Recall that \mathbb{P}\big{\{}X_{t_{i_{0}}}=J_{i_{0}}|\mathcal{F}_{t_{i_{0}},i}\big{\}}=1 iff and zero otherwise. Therefore, starting from eqn. (3.11), we have
[TABLE]
leading to on account of , (2.5) and (3.4).
Using the result of Lemma 3.7, we derive the conditional probability distribution of and . A closed form distributional identity is given below.
Proposition 3.8
The distribution is given by
[TABLE]
*Proof * By (3.9) and the total probability law, .
Remark 3.9
As , the measures and have probability mass and , respectively, at the point . They are absolutely continuous w.r.t Lebesgue measure with density and , subsequently, on
3.2.2 Conditional joint probability density function
In general, the joint distribution (resp. ) has a singular component (resp. ) on the set . The singular component can be obtained by deriving the joint density of and and deduce the absolutely continuous and singular parts of the pdf, such as discussed in the theorem below. For non-matrix based bivariate function, see for instance [36].
Theorem 3.10
Given the joint distribution of as specified in Lemma 3.7, the joint probability density of is given by
[TABLE]
where the absolutely continuous components and are
[TABLE]
where the matrix operator defines the commutator of and , whilst the singular component part is defined by the function
[TABLE]
*Proof * The expressions for and follow from taking partial derivative (see derivation of Theorem 3.20) taking account
[TABLE]
To get , recall that , due to the phase-generator matrix being negative definite (see Section II4d in [6]). Following Remark 3.9,
[TABLE]
Applying Fubini’s theorem, the first integral is given after some calculations by
[TABLE]
Following the same approach, one can show after some calculations that
[TABLE]
The proof is established on account of (3.14), the two identities above and
[TABLE]
Theorem 3.11
For , the conditional density is given by
[TABLE]
where the absolutely continuous components and are
[TABLE]
whilst the singular component is defined by the function
[TABLE]
*Proof * It follows from identity (3.9) that
Corollary 3.12
The singular component of and are
[TABLE]
Hence, the singular component of and is zero if and only if, for , for and , which is equivalent to
[TABLE]
Remark 3.13
Consider the representation (4.2) for the matrices . It is clear following (3.15) that the joint probability density function coincides with the bivariate phase-type distribution [8] when we set each and taking into account the fact that and .
3.2.3 Conditional joint Laplace transform of and
In order to compute the conditional moment \mathbb{E}\big{\{}\tau_{1}^{n}\tau_{2}^{m}\big{|}\mathcal{F}_{t,i}\big{\}}, it is therefore convenient to study the conditional joint Laplace transform of and :
[TABLE]
Theorem 3.14
The conditional joint Laplace transform of the first exit times and of (2.1) is given for , and by
[TABLE]
*Proof * Recall that for , for . Following Remark 3.9,
[TABLE]
The proof is established by applying Fubini’s theorem to double integrals.
By the law of total probability and Bayes’ rule we have the following result.
Theorem 3.15
The conditional joint Laplace transform \Psi_{t}(\lambda_{1},\lambda_{2}):=\mathbb{E}\big{\{}e^{-\lambda_{1}\tau_{1}-\lambda_{2}\tau_{2}}\big{|}\mathcal{G}_{t}\big{\}} of the first exit times and is given for by
[TABLE]
Following the joint Laplace transform (3.17), we obtain the joint moments:
[TABLE]
Example 3.16
The conditional joint moments is given by
[TABLE]
3.3 Conditional multivariate distributions
The extension to multivariate case follows similar approach to the bivariate one. Let be nonempty stochastically closed subsets of such that is a proper subset of . Without loss of generality, we assume that . Since is stochastically closed, we necessarily assume that , , if and , for , and whenever .
Furthermore, denote by the first entry time of in the set defined in (1.6). To formulate the joint distribution of , let be the time ordering of , where is a permutation of . Subsequently, we define by the state that occupies at time .
Lemma 3.17
Let be the time ordering of . The joint distribution of the first exit times is given by
[TABLE]
where is an diagonal matrix whose th element
*Proof * Following similar arguments of the proof in bivariate case, we obtain
[TABLE]
By Bayes’ theorem for conditional probability and the law of total probability,
[TABLE]
Note that \mathbb{P}\big{\{}X_{t_{i_{0}}}=J_{i_{0}}|\mathcal{F}_{t_{i_{0}},j}\big{\}}=1 iff and [math] otherwise. In terms of (2.4),
[TABLE]
Therefore, starting from equation (3.19) we have following the above that
[TABLE]
leading to on account of (2.5), the fact that and after applying block partition (3.4) to exponential matrices .
Notice that the conditional joint probability distribution (3.18) forms a non-stationary function of time with the ability to capture heterogeneity and path dependence when conditioning on all previous and current information of the mixture process . These features are removed when , in which case, the result reduces to the multivariate phase-type distribution (1.7) for .
Proposition 3.18
Let be the time ordering of . The conditional joint distribution of (1.6) is given by
[TABLE]
where is an diagonal matrix whose th element
*Proof * It follows from (3.9) that .
Corollary 3.19
Set and in (3.20). The distribution of ,
[TABLE]
which coincides with the unconditional multivariate phase-type distribution [8].
The absolutely continuous component of the distribution \overline{F}_{i,t}\big{(}t_{i_{1}},\dots,t_{i_{p}}\big{)} (respectively, \overline{F}_{t}\big{(}t_{i_{1}},\dots,t_{i_{p}}\big{)}) has a density given by the following theorem.
Theorem 3.20
Let be the time ordering of . The conditional joint density function of (1.6) is given by
[TABLE]
*Proof * The proof follows from taking times partial derivative to F_{t}\big{(}t_{i_{1}},\dots,t_{i_{p}}\big{)}:
[TABLE]
To establish the result, it is enough to show the following partial derivative holds
[TABLE]
To justify the claim, we use induction argument. For this purpose, recall that
[TABLE]
Hence, by (3.13) and applying integration by part as we did before, we have
[TABLE]
from which the second order partial derivative of (3.22) is given by
[TABLE]
After steps of taking the partial derivative, one can show that
[TABLE]
The claim is established on account of (3.13) and the fact that
[TABLE]
However, due to complexity of the joint distributions, the singular component of (resp. ) is more complicated to get in closed form.
Following (3.18) and (3.20), we see that the distributions are uniquely characterized by the Bayesian update on the probability of starting the process in any of the phases, the speeds of the process represented by the phase-generator matrices , and by the Bayesian update of switching probability matrix . The initial profile of the distributions form a generalized mixture of the multivariate phase-type distributions [8]. Unlike the latter, the distributions have non-stationary and path dependence property when conditioning on the available information (either full or partial) of , which is non-Markov. When the process never repeatedly changes the speed, i.e., , all these properties are removed and the initial distributions reduce to [8]. As in the univariate case, the multivariate distributions have closure and dense properties, which can be established in similar ways to the univariate analogs using matrix analytic approach [8]. We refer among others to [30], [9], [23] and [35] for Markov model, and to [37] for the mixture model. As a result, we have the following theorem.
Theorem 3.21** (Closure and dense properties)**
The conditional multivariate probability distribution (3.20) forms a dense class of distributions on , which is closed under finite convex mixtures and finite convolutions.
4 Some explicit and numerical examples
This section discusses some explicit examples of the main results presented in Section 3, particularly on the bivariate distributions. Using the closed form density functions (3.12) and (3.15), we discuss the mixtures of exponential distributions, Marshall-Olkin exponential distributions, and their generalization.
Example 4.1** (Mixture of exponential distributions)**
Consider the mixture process (2.1) defined on the state space with stochastically closed sets and . Assume that the speed of the mixture process is represented by the following phase generator matrices:
[TABLE]
It is straightforward to derive from the state space representation that
[TABLE]
After some calculations, the matrices and , , are
[TABLE]
Similarly defined for and , for . Set the matrix , with , for , whilst the initial probability has mass one on the state , i.e., . It is straightforward to check that the condition (3.16) is clearly satisfied implying that the joint density function (3.12) has zero singular component. Hence, following (3.12) we have for
[TABLE]
The marginal distribution of and are given respectively by
[TABLE]
Hence, clearly, as , it follows that the exit times and are not independent under the mixture model. They are independent if and only if , in which case the mixture corresponds to a simple Markov jump process. See the example on p. 691 in [8] and p. 59 in [23].
Furthermore, when conditioning on the information set with , the conditional joint density function is given for by
[TABLE]
where the switching probability is defined for and by
[TABLE]
Observe that, on the event , one can check that (resp. ) as if b_{1}+b_{2}>\;(\textrm{resp. <})\;a_{1}+a_{2}, implying that the mixture moves as a Markov process at the slow speed (resp. ) in the long run.
Given that , we have for all . Hence, the density function (3.15) has therefore the same expression as (4.1).
Example 4.2** (Mixture of Marshall-Olkin distributions)**
Consider the mixture process (2.1) with the same state space and stochastically closed sets and as defined above. Let the speed of the mixture process be given by
[TABLE]
Set the matrix , with , for , while the initial distribution has mass one on the state , i.e., . Following (3.16), the joint density has singular part on the set . By Theorem 3.10 and Corollary 3.12, the absolutely continuous parts are given by
[TABLE]
whereas the singular component is given by the function:
[TABLE]
Note that the switching probability is given for and by
[TABLE]
4.1 General explicit identity
In order to take advantage of the structure of the generator matrices, let
[TABLE]
The generator matrices and are nonsingular if and only if , , , , and are all nonsingular. The matrices and are
[TABLE]
After some calculations the matrix and , , are given by
[TABLE]
Similarly defined for and , for . A rather long calculations using infinite series representation of exponential matrix shows following (3.12),
[TABLE]
for , with the absolutely continuous parts and :
[TABLE]
whereas the singular component is defined by the function
[TABLE]
Note that denotes the switching probability matrix of on .
It is straightforward to see that the absolutely continuous parts of vanishes when and , in which case and are non overlapping. Moreover, has no singular component iff
[TABLE]
Denote by and the restriction of the probability on the set and , respectively, s.t. \boldsymbol{\pi}=\big{(}\boldsymbol{\alpha},\boldsymbol{\gamma}\big{)}. The Bayesian updates on is defined by . The conditional density of and is given by
[TABLE]
where the subdensity functions , and are
[TABLE]
The marginal probability density functions f_{\tau_{k}}^{(i)}(s|t):=-\partial_{s}\mathbb{P}\{\tau_{k}>s\big{|}\mathcal{F}_{t,i}\} and f_{\tau_{k}}(s|t):=-\partial_{s}\mathbb{P}\{\tau_{k}>s\big{|}\mathcal{G}_{t}\} of , , can be deduced from and . They are given for and by the following:
[TABLE]
where the phase-generator matrices and , for , are defined by
[TABLE]
In the next section we discuss some numerical examples of the main results presented in Section 3, in particular on the conditional bivariate distributions, taking the advantage of the structure of phase-generator matrices given in (4.2).
4.2 Numerical examples
Consider a mixture of birth-death processes with state diagram described in Figure 2. The birth-death process has been widely used in many places such as, among others, in queueing theory, performance engineering, see [18], demography, epidemiology and biology [32]. For simplicity, we set with and . The intensity matrix of is given by
[TABLE]
while the intensity matrix of is defined by . Following [20] and [37] we choose , with , whilst the initial switching probability matrix is defined by . For numerical purposes, we set , , and , . The initial probability of starting the process at any of the states is given by .
Numerical results on getting various shapes of conditional bivariate density functions (4.3) and (4.4) as function of time are presented in Figures 3 - 6.
The shape of the density functions (4.3) and (4.4) are displayed in Figure 3. The first plot in the top pictures exhibits the initial shape of when starts in state at time zero, whereas the second plot presents the shape of stationary probability density function of and given that the process starts in the same state at time . The picture clearly shows that the function has zero value at initial time and left skewed. The two pictures below which represent the function (4.4) when the process starts at a random initial state in at time and , respectively. The probability of starting the process at any given time is given by . Note that we have used , by which moves two times faster than does. The function has nonzero value at initial time. However, unlike the two pictures above which, the function losses its hump shape in the long run. We can see this more detailed in Figure 6 in terms of the marginal density function of and
We observe that the joint density function changes its shape as time increases, a feature that lacks in the Markov model . Given that , the initial profile of density function (for ) forms a mixture of bivariate phase-type distributions and , i.e., , where , , is obtained by setting in (4.4), see e.g. [8]. In contrary to [8], the distribution changes its shape as increases, as depicted in Figure 3.
The stationary values of and are given as respectively by
[TABLE]
from which it follows that, conditional it is still alive in the long run, moves according to the Markov process . Despite , we have for that . In all cases, the density has symmetry property for the values of parameters chosen. The contour plot in Figure 4 confirms this observation.
The shape of marginal distributions and of are presented in Figure 5 for different values of speed parameter . By symmetry, the marginal distributions of also share the same shape. Despite changing its shapes as increases, the pictures strongly suggest that the marginal pdf is left skewed and has zero value at zero for , and positive value for . They both decay to zero as increases as shown in more details in Figure 6. We also notice from the latter that the marginal probability density functions of and do not have common shape when the exit parameter changes its value.
5 Conclusions
We have introduced a new class of conditional joint probability distributions of first exit times of a continuous-time stochastic process defined as a finite mixture of right-continuous Markov jump processes, with overlapping absorbing sets, moving at different speeds on the same finite state space, while the mixture occurs at a random time. Distributional properties of the mixture process were discussed in general case, in particular the Bayesian update on the probability of starting the process in any phase of the state space at a given time, based on past observation of the process. The results presented in this paper generalizes that of given in [21], [20] and [37]. The new distributions form non-stationary functions of time and have the ability to capture heterogeneity and path dependence when conditioning on the available information (either full or partial) of the process. The attribution of path dependence is due to non-Markov property of the process.
Distributional identities are presented explicitly in terms of the intensity matrices of the underlying Markov processes, the Bayesian updates of switching probability and of the probability of starting the process in any of the phases in the state space, despite the fact that the mixture itself is non-Markov. In particular, the initial distributions form of a generalized mixture of the multivariate phase-type distributions of Assaf et al. [8]. When the underlying processes move at the same speed, in which case the mixture becomes a simple Markov jump process, heterogeneity and path dependence are removed and the initial distributions reduce to [8]. As in the univariate case, the probability distributions have dense and closure properties under finite convex mixtures and finite convolutions. These properties emphasize the additional importance of the new distributions.
As we have shown in this paper, the Markov mixture process forms a tractable construction of a continuous-time stochastic process having non-Markov property. Given their availability in explicit form and tractability, the Markov mixture process and the new conditional multivariate probability distributions should be able to offer appealing features for variety of applications, in which the Markov chains and the (multivariate) phase-type distributions have played central role.
6 Acknowledgments
The author acknowledges some inputs and suggestions from participants of Risk and Stochastic Seminar of London School of Economics, Hugo Steinhaus Center of Mathematics Seminar of Wroclaw University of Science and Technology, and Insurance: Mathematics and Economics Conference during 16-18 July 2018 at UNSW Sydney, at which part of the results of this work were presented. He enjoyed the hospitality provided by the hosts during his visits to Wroclaw and London, for which he respectively thanks Professor Zbigniew Palmowski and Angelos Dassios for the invitation. The author also acknowledges financial support from Victoria University of Wellington for the research grant # 218772.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Aalen, O.O. and Gjessing, H.K. (2001). Understanding the shape of the hazard rate: a process point of view. Stat. Sci. , 16 , 1-22.
- 2[2] Aalen, O.O. (1995). Phase type distributions in survival analysis. Scand. J. Stat. , 22 , 447-463.
- 3[3] Asimit, A.V. and Jones, B.L. (2007). Extreme behavior of multivariate phase-type distributions. Insurance Math. Econom. , 41 , 223-233.
- 4[4] Apostol, T. (1969). Explicit formulas for the exponential matrix e 𝐀 t superscript 𝑒 𝐀 𝑡 e^{\mathbf{A}t} . Am. Math. Mon. , 76 (3), p.289-292.
- 5[5] Asmussen, S., Avram, F. and Pistorius, M.R. (2004). Russian and American put options under exponential phase-type Lévy models. Stoch. Proc. Appl. , 109 , 79-111.
- 6[6] Asmussen, S. (2003). Applied Probability and Queues , 2nd Edition, Springer.
- 7[7] Albrecher, H. and Asmussen, S. (2010). Ruin Probabilities , 2nd Edition, World Scientific.
- 8[8] Assaf, D., Langberg, N.A., Savits, T.H. and Shaked, M. (1984). Multivariate phase-type distributions. Oper. Res. , 32 , 688-702.
