The Likelihood of Mixed Hitting Times
Jaap H. Abbring, Tim Salimans

TL;DR
This paper introduces a numerical method to compute the likelihood of mixed hitting-time models involving Lévy processes, enabling maximum likelihood estimation and application to strike data analysis.
Contribution
The paper develops a novel numerical approach for inverting Laplace transforms to compute likelihoods in mixed hitting-time models involving Lévy processes.
Findings
Successfully implemented maximum likelihood estimator in MATLAB.
Applied the method to analyze Kennan's strike data.
Demonstrated the method's effectiveness in real-world data analysis.
Abstract
We present a method for computing the likelihood of a mixed hitting-time model that specifies durations as the first time a latent L\'evy process crosses a heterogeneous threshold. This likelihood is not generally known in closed form, but its Laplace transform is. Our approach to its computation relies on numerical methods for inverting Laplace transforms that exploit special properties of the first passage times of L\'evy processes. We use our method to implement a maximum likelihood estimator of the mixed hitting-time model in MATLAB. We illustrate the application of this estimator with an analysis of Kennan's (1985) strike data.
| I | II | III | IV | V | VI | |
|---|---|---|---|---|---|---|
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSimulation Techniques and Applications · Probability and Risk Models · Advanced Queuing Theory Analysis
The Likelihood of Mixed Hitting Times††thanks: Forthcoming in the Journal of Econometrics: doi.org/10.1016/j.jeconom.2019.08.017.
Jaap H. Abbring Department of Econometrics & OR, Tilburg University, P.O. Box 90153, 5000 LE Tilburg, The Netherlands; and CEPR. E-mail: [email protected]. Web: jaap.abbring.org.
Tim Salimans Brain Team, Google Research, Amsterdam, The Netherlands. E-mail: [email protected]. Web: github.com/TimSalimans.
Keywords: duration analysis, first passage time, identification, Laplace transform, Lévy process, maximum likelihood, Mellin’s inverse formula, mixture, optimal stopping, strike duration.
JEL codes: C14, C41.
© 2021. This manuscript is made available under a CC BY-NC-ND 4.0 International license.
(April 2021)
Abstract
We present a method for computing the likelihood of a mixed hitting-time model that specifies durations as the first time a latent Lévy process crosses a heterogeneous threshold. This likelihood is not generally known in closed form, but its Laplace transform is. Our approach to its computation relies on numerical methods for inverting Laplace transforms that exploit special properties of the first passage times of Lévy processes. We use our method to implement a maximum likelihood estimator of the mixed hitting-time model in MATLAB. We illustrate the application of this estimator with an analysis of Kennan’s (1985) strike data.
1 Introduction
Mixed hitting-time (MHT) models are mixture duration models that specify durations as the first time a latent stochastic process crosses a heterogeneous threshold. They are of substantial interest because they can be applied to the analysis of optimal stopping decisions by heterogeneous agents (Abbring, 2012, 2010). In particular, they can be applied to problems that do not lead to the mixed proportional hazards (MPH) model, Lancaster’s (1979) and Vaupel et al.’s (1979) popular extension of the Cox (1972) proportional hazards model. Examples include models of job durations, marriage durations, and the entry and exit of firms that are driven by Brownian motions and more general persistent processes. Hitting-time duration models are also popular in statistics for their structural and descriptive appeal (Lee and Whitmore, 2006).
This paper considers likelihood-based empirical methods for an MHT model in which the latent process is a spectrally-negative Lévy process, a continuous-time process with stationary and independent increments and no positive jumps, and the threshold is proportional in the effects of observed regressors and unobserved heterogeneity. Spectrally-negative Lévy processes include Brownian motions with linear drifts and Poisson processes compounded with negative shocks as well-known special cases. Following empirical practice with mixture duration models such as the mixed proportional hazards model, we focus on parametric MHT models, and propose flexible parameterizations that can approximate arbitrary functional forms by increasing the number of parameters. The main obstacle in applying standard parametric likelihood methods is that, in general, we have no explicit expression for the MHT model’s likelihood. However, an explicit expression for its Laplace transform is always available. Our approach to likelihood computation exploits this.
We focus on the case in which the latent Lévy process has a nontrivial Gaussian component. We first show that this ensures that the model implies a duration distribution with nonzero Lebesgue density at all positive durations and that it is nonparametrically identified up to innocuous scale normalizations. We then adapt numerical methods for the inversion of the Laplace transforms of the hitting times of Lévy processes with nontrivial Gaussian components to compute the conditional density and survival function implied by the MHT model. In turn, these are used to construct a likelihood for independently censored duration data. If the latent process is a Brownian motion, the likelihood can be explicitly expressed in terms of mixed inverse Gaussian densities and survival functions. Therefore, we can use this special case as a benchmark for evaluating the quality of our procedure for computing the likelihood. We show that the numerical inversion that is required in the general case is sufficiently fast and precise to make maximum likelihood estimation feasible even if no explicit expression of the likelihood is available.
We implement a maximum likelihood estimator that uses this computational strategy in MATLAB, and illustrate its application with a reconsideration of Kennan’s (1985) empirical analysis of US contract strike durations.111We provide MATLAB code that implements the methods in this paper in a public repository at github.com/jabbring/mht-likelihood. The results in this paper can be replicated by running make in version v1.1.1 of this code, which we have deposited as Abbring and Salimans (2021). Our strategy for computing the MHT model’s likelihood can also be used to implement other likelihood-based empirical methods. For example, it can be combined with data augmentation and Markov chain Monte Carlo techniques to implement Bayesian estimators of the MHT model.
Abbring (2012) presented the MHT model studied in this paper, analyzed its empirical content, and highlighted its close relation to optimal stopping problems in economics. This paper shows that the restriction to an MHT model with a nontrivial Gaussian component suffices for its identification. It operationalizes this model by providing and analyzing feasible methods for computing its likelihood and its maximum likelihood estimator.
Singleton (2001) developed similar methods for a different class of models, discretely sampled affine diffusions. He noted that the density of an observation of such a diffusion conditional on the previous observation is not known explicitly, but that its characteristic function is. He proposed a maximum likelihood estimator based on the Fourier inverse of this characteristic function. This paper’s methods for the MHT model instead rely on the inversion of Laplace transforms and exploit specific results for the first passage times of Lévy processes.
Alternatively, we could avoid computation of the likelihood altogether by constructing an estimator directly from the equality of the Laplace transform of the duration data implied by the true model and its empirical analog. Abbring (2012, Section 5.3) sketched such a generalized method of moments (GMM) estimator for the MHT model. A disadvantage of this alternative approach is that, unlike this paper’s likelihood-based approach, it cannot straightforwardly handle censored duration data because we only have an expression of the Laplace transform of the complete (uncensored) duration distribution.222Singleton (2001) developed a similar GMM estimator for discretely sampled diffusions, based on their characteristic function. In that context, censoring is not important and such a GMM estimator is a natural alternative to maximum likelihood. Moreover, a practical implementation of such a GMM estimator is generally less efficient than maximum likelihood. Therefore, this paper focuses on likelihood-based methods.
The remainder of this paper is organized as follows. Section 2 reviews the MHT model and the corresponding characterization of the data presented in Abbring (2012). It also introduces the assumption that the latent process has a nontrivial Gaussian component and explores its implications, including novel nonparametric and parametric identification results. Section 3 presents a method for the computation of the model’s log likelihood and its derivatives and discusses maximum likelihood estimation. Section 4 assesses the numerical accuracy of our method and Section 5 applies it to strike data. Section 6 briefly discusses extensions to Bayesian and sieve estimators and reviews possible applications.
2 Mixed Hitting-Time Model
2.1 Specification
Following Abbring (2012, Section 2), we model the distribution of a random duration conditional on observed covariates by specifying as the first time a real-valued Lévy process crosses a threshold that depends on and some unobservables ; assuming that , , and are mutually independent; and specifying a marginal distribution of .
A Lévy process is the continuous-time equivalent of a random walk: It has stationary and independent increments. Bertoin (1996) provides a comprehensive analysis of Lévy processes. Formally, we have
Definition 1**.**
A Lévy process is a stochastic process such that the increment is independent of and has the same distribution as , for every .
We take to have right-continuous sample paths with left limits. Note that Definition 1 implies that almost surely.
An important example of a Lévy process is the scalar Brownian motion with drift, in which case is normally distributed with mean and variance , for some scalar parameters and . The Brownian motion is the single Lévy process with continuous sample paths. In general, Lévy processes may have jumps. Examples are compound Poisson processes, which have independently and identically distributed jumps at Poisson times. More generally, the jump process of a Lévy process is a Poisson point process with characteristic measure such that , and any Lévy process can be written as the sum of a Brownian motion with drift and an independent pure-jump process with jumps governed by such a point process (Bertoin, 1996, Chapter I, Theorem 1). The characteristic measure of ’s jump process is called its Lévy measure and, together with the drift and dispersion parameters of its Brownian motion component, fully characterizes ’s distributional properties.
Throughout the paper, we will focus on spectrally-negative Lévy processes, which are Lévy processes of which the characteristic measure has negative support, i.e. Lévy processes without positive jumps. This greatly facilitates the analysis of their hitting times, because it excludes that they jump across the threshold. Let be a spectrally-negative Lévy process and the first time it hits a threshold . Here, we use the convention that ; that is, we set if never crosses , which happens with positive probability for some specifications of . We exclude the trivial case that is weakly decreasing and almost surely.333This is implied by Assumption 1, which we will introduce only later because it is easier to formulate after developing the model’s characterization (which requires the weaker assumption made here).
Denote the support of the observed covariates with , let have distribution on , and recall that , , and are mutually independent. The (proportional) mixed hitting-time (MHT) model specifies the cumulative distribution of conditional on as , for some measurable function .444For expositional convenience, we have restricted the supports of and , and therefore of the threshold , to . It is straightforward to extend the analysis to -valued thresholds, as in Abbring (2012, Section 2.2 and Appendix A). This would allow for a probability mass at zero duration (as almost surely) and, with , a mass of “stayers.” Integrating out with respect to the distribution of gives the distribution of . We note the corresponding “survival function” with .
2.2 Characterization
The distribution is fully determined by its Laplace transform, , . Note that may be smaller than 1 if is such that, with positive probability, it never hits .
Abbring (2012, Section 4.1) showed that the Laplace transform , unlike itself, can be explicitly given for any specification of the latent process . This first requires a common probabilistic characterization of , in terms of its characteristic function. Bertoin (1996, Section VII.1) shows that , for all with real part , with the Laplace exponent given by the Lévy-Khintchine formula,
[TABLE]
Here, if is true and [math] otherwise, absorbs any linear drift of , is the dispersion parameter of its Brownian motion component; and is the Lévy measure of its jump component, where satisfies and has negative support. The Laplace exponent of fully characterizes its distributions, through its characteristic function .
Equation (1) gives the most common parameterization of . It corresponds to the Lévy-Itô decomposition of in a Brownian motion with linear drift , a compound Poisson process with jumps in , and a pure-jump martingale with jumps in (Bertoin, 1996, Section I.1). Alternative parameterizations arise if we decompose the jumps of in small and large shocks in other ways. These parameterizations all have the same dispersion parameter and Lévy measure , but have different drift parameters. For example, in the special case that , the compensator term for the small shocks in (1), , is a well-defined linear function of . Therefore, in this case, we can alternatively parameterize as
[TABLE]
where . This includes the important special case that , in which is the sum of a Brownian motion with drift parameter and a compound Poisson process with jumps of sizes in . In general, any of the equivalent parameterizations of can be used in the MHT model’s specification, but some are numerically and statistically more convenient than others; we return to this in Section 2.5.
With determined, we are ready to analyze the Laplace transform . The Laplace exponent, as a function on , is continuous and convex, and satisfies and, because is not weakly decreasing, . Therefore, there exists a largest solution to and an inverse of the restriction of to . Theorem 1 of Bertoin (1996, Chapter VII) implies that (Abbring, 2012, Section 4.1). Using iterated expectations, the Laplace transform of the distribution of follows from
[TABLE]
with the Laplace transform of the distribution of .
2.3 Nontrivial Gaussian Component
To facilitate the numerical computation of the MHT model’s likelihood and ensure standard conditions for the maximum likelihood estimator, we assume throughout the paper’s remainder that has a nontrivial Gaussian component:
Assumption 1** **(Nontrivial Gaussian Component).
satisfies (1) with .
Assumption 1 excludes the case that is a pure-jump process. To motivate this assumption, first consider the special case that itself is a nontrivial Brownian motion, i.e. a Brownian motion with general drift coefficient and dispersion coefficient (obviously, this case satisfies Assumption 1). Then, equals , so that equals and equals
[TABLE]
For later reference, we have made the dependence on the parameters and explicit here. Because there are no jumps, there is no ambiguity in the treatment of small and large jumps, and this parameterization of is unique. In particular, the Lévy-Khintchine representations (1) and (2) of coincide, and .
In this special case, the distribution of is known to be inverse Gaussian, with explicit expressions for its Lebesgue density and survival function (see Section 3.2). If , then and the distribution of is nondefective. If , however, and the distribution of has a defect of size . Either way, the MHT model specifies a mixed inverse Gaussian distribution for in this special case.555Mixed inverse Gaussian distributions have been used to model duration data in the statistical literature. For example, Aalen and Gjessing (2001) proposed such a model with parametric mixing over the Brownian motion’s drift coefficient . Because this distribution has a Lebesgue density with full (and thus parameter-independent) support, it is straightforward to specify the likelihood for a parametric specification of and and to compute the corresponding maximum likelihood estimator, and this estimator will have standard asymptotic properties.
If is a more general spectrally-negative Lévy process, then may have parameter-dependent support. For example, if , then , so that is concentrated on the support of . Assumption 1 excludes this pathology.
Lemma 1** **(Absolute Continuity).
If Assumption 1 holds then, for given and some positive density , for all .
Proof.
Because and , . Moreover, by Assumption 1, for given , the distribution of is the convolution of a normal distribution and the distribution of the cumulated jumps, and therefore has a positive Lebesgue density on . Using that and , Bertoin (1996, Chapter VII, Corollary 3) implies that has a positive Lebesgue density on , and for all . ∎
Note that, by Lemma 1 and Fubini’s theorem, Assumption 1 also implies that , for all , with positive Lebesgue density . Thus, Assumption 1 ensures that a standard parametric maximum likelihood approach can be used, as in the purely Gaussian case. A complication is that the distribution and its density are generally not known in closed form and need to be computed by inverting their Laplace transforms. As we will see in Section 3.3, Assumption 1 facilitates a crucial computational simplification of this inversion. Moreover, in the next section, we will see that Assumption 1, together with Abbring’s (2012) assumptions and innocuous normalizations, suffices for the model’s point identification.
2.4 Nonparametric Identification
The MHT model’s primitives are , , and . By Feller (1971, Section XIII.1, Theorem 1), there is a one-to-one relation between a probability distribution and its Laplace transform. Thus, we can equivalently write the primitives as , , and . By (3) and the definition of , each specification of such an MHT triplet implies a Laplace transform of the distribution , and thus itself, for all .
One may wonder whether, conversely, knowledge of , , would allow one to uniquely determine (“identify”) the model’s primitives , perhaps after imposing some normalizations and restrictions. To be practical, we explicitly take into account that data on and will not allow us to determine if . So, suppose that we can determine up to almost sure equivalence; that is, that we know for all measurable . Section 3.1 assumes a simple type of independent right censoring scheme for which this is true: random sampling from , with and drawn from the joint distribution of implied by some marginal distribution of and the model’s conditional distribution , , and, for given , the censoring time drawn, independently from , from a conditional distribution such that for all .666From the censored data, both the subdensity , for almost all , and the joint survival function are identified up to almost sure equivalence. Thus, the hazard rate is identified for almost all , which determines , up to almost sure equivalence. See e.g. Cox (1962). This argument extends to more general forms of independent censoring (see e.g. Andersen et al., 1993). Note that this includes the case in which we have “complete” observations from the joint distribution of (if always) and extends to more general independent censoring schemes.
Following Gill and Robins (2001, Section 3), we deal with the ambiguity arising from conditioning on (possibly) continuous covariates by assuming continuity of their effects. Let be an open ball of radius around . The support of contains all points such that for all .
Assumption 2** **(Continuity of the Covariate Effects).
The function and support of are such that, for each , .
For isolated mass points , for small enough , and Assumption 2 does not constrain . For points such that for some , Assumption 2 simply requires continuity of , as a function on , at . If has both finitely discrete and continuous components, then Assumption 2 requires continuity of in the continuous components for given values of the discrete components. Assumption 2 is satisfied if, for example, for some parameter vector .
Lemma 2** **(Identification of the Conditional Distribution).
If Assumption 2 holds, then
[TABLE]
Proof.
By Assumption 2 and continuity of , for every , there exists a such that for all , so that . ∎
Note that, if is an isolated point in , then (5) reduces to .
Following Abbring (2012), our identification analysis exploits variation of the threshold with the covariates.
Assumption 3** **(Nontrival Covariate Effects).
For some , .
As is clear from the proof of the following theorem, under Assumption 2, the covariate values and in Assumption 3 can be identified with values such that .
Theorem 1** **(Nonparametric Identification).
Let and be MHT triplets that satisfy Assumptions 1–3 and are observationally equivalent (imply the same conditional distribution up to almost sure equivalence). Then, for some : and for all , and .
Proof.
By Assumption 2 and Lemma 2, we can identify for all . In particular, we can identify such that , which exist by Assumption 3. Take these and as given.
We have that and imply the same identified and , and that . This is the two-sample problem studied by Abbring (2012). We first apply Abbring’s Theorem 1, with Assumption 1, to this two-sample problem and then extend the argument to the full domain of and .
The Lévy-Khintchine formula (1), , and dominated convergence imply that . Using dominated convergence once more, it follows that . With Assumption 1, this gives for all . The same is true for . Thus, both and vary regularly with exponent at infinity (Feller, 1971, Section VIII.8). Consequently, Abbring (2012, Theorem 1) applies with . Noting that Abbring’s setup, unlike ours, imposes a scale normalization on , this implies that, for some , and for all . The inverse of equals the restriction of to and can be uniquely analytically extended to its full domain ; the same is true for the inverse of . This gives for all .
Finally, fix any . Because is identified, observational equivalence implies that for all . Therefore, . ∎
The first part of the proof, which establishes the relation between and , only uses Assumption 2 for continuity at and . So, we can relax Assumption 2 accordingly if we weaken Theorem 1’s claim that to almost surely.
Unlike the model studied by Abbring (2012), our model with a nontrivial Gaussian component is identified, up to two unknown scale parameters and . It is easy to see why and cannot be determined by data on and alone. Mixed hitting times are not affected by rescaling both the latent process and the threshold by the same factor, nor by rescaling the threshold factors and without changing the threshold itself. Specifically, suppose that in Theorem 1 corresponds to a latent process and threshold . Then, the observationally equivalent corresponds to a latent process , an observed threshold factor , and an unobserved threshold factor . Clearly, the implied first hitting times are the same: . Identification therefore requires that the scales of two of , and are normalized. The most convenient way of implementing these normalizations depends on the chosen parameterization.
2.5 Parameterization and Normalization
This paper’s estimation procedure requires a computationally feasible, flexible parameterization of the model. To this end, we specify the Lévy measure up to a finite vector of unknown parameters . With a drift parameter and Gaussian dispersion parameter , this specification and the Lévy-Khintchine formula (in our proposed specifications, (2)) imply a parameterization of the Laplace exponent. We similarly specify , and up to finite vectors and and collect all parameters in . We make sure that the proposed parameterizations are unique, in the sense that different values of map into different primitives , , and . We also discuss ways to normalize them. A corollary to Theorem 1 then establishes parametric identification.
Latent process
Recall that and the Laplace exponent equals , with , if is a nontrivial Brownian motion with drift. We distinguish this basic specification with a subscript “BM” because it appears in our computations for more general specifications of as well. We consider two such specifications.
The first adds an independent compound Poisson process with a finitely discrete shock distribution to the basic specification. Because in this case, the Lévy-Khintchine formula (2) now offers the simplest way to parameterize : , where , with the Poisson rate at which shocks of size arrive; ; and .777Equivalently, in this specification, shocks arrive at a rate and are drawn independently from a distribution with points of support with probabilities . We exclude the boundary cases in which , , or , which correspond to specifications with fewer than shock sizes, to ensure a unique parameterization and standard inference. See Footnote 10.
The second specification instead assumes that shocks arrive at a Poisson rate and have sizes drawn from a gamma distribution with density ; ; at . We can again use (2), which now gives , where .
The Lévy-Khintchine formula (2) provides a unique parameterization of the Laplace exponent in terms of the drift parameter , the Gaussian dispersion parameter , and the Lévy measure .888Bertoin (1996, Chapter 1, Theorem 1) and the discussion following it show that the general Lévy-Khintchine formula (1) provides a unique parameterization of the Laplace exponent in terms of , , and . Consequently, formula (2) does as well with, as discussed in Section 2.2, a different drift parameter. In turn, our two specifications of the jump process give unique parameterizations of . Consequently, both parameterizations are unique.
The scale of can be normalized by setting , which implicitly assumes that , or . After all, if is a Laplace exponent with (or ) then, for , is a Laplace exponent with (or ).999One can alternatively normalize the scale of the jump component, which varies across specifications.
Covariate effects
The threshold is naturally specified to be loglinear in the covariates: . Note that this specification implies Assumption 2.
Suppose that is not contained in a proper linear subspace of . Then, this parameterization is unique: for all implies that . Moreover, it embodies a scale normalization: For given and , there exists no such that .
Unobserved heterogeneity
We entertain a finitely discrete specification of . This specification is versatile, computationally convenient, and appears naturally in Heckman and Singer’s (1984) work on semi-nonparametric estimation of the MPH model. It assumes that has support points , with ; . Then, , with and .101010We assume that all and that all support points are distinct to ensure that the parameterization of is unique. In practice, we may want to include the boundary cases, because these correspond to specifications with fewer than support points. This, however, leads to nonstandard identification and inference, because we can either reduce the number of support points from to by setting , in which case is irrelevant, or by setting , in which case only matters. The inequality constraints ensure that the parameterization is unique. It can be scale normalized by setting .
Corollary 1** **(Parametric Identification).
Let and , via one of this section’s parameterizations, map into observationally equivalent MHT triplets. Suppose that Assumptions 1 and 3 hold, is not contained in a proper linear subspace of , and either or is scale normalized. Then, .
Corollary 1 does not rely on the fact that the finitely discrete specification of ensures that , which would suffice for identification without Assumption 1 (see Abbring, 2012, Section 4.3). We maintain Assumption 1, because it is essential to our approach to estimation (see Section 2.3) and allows for alternative specifications of that do not imply . This may, for example, be useful in an extension to sieve estimation, in which it may be hard to impose (see Section 6).
3 Maximum Likelihood Estimation
Fix one of the previous section’s parameterizations . Denote the implied parametric density of with and the corresponding survival function with . Similarly, write and . This section presents a method for evaluating this parameterization’s likelihood for a basic but common sampling scheme, using the Gaussian special case as a benchmark.
3.1 Sampling and Likelihood
Let be a random sample from the distribution of induced by , , at the “true” parameter vector and some marginal distribution of . We do not directly observe this complete sample, but only a censored version of it: . Here, is the observed duration and a censoring indicator, for some random censoring time . Note that a complete observation pairs an MHT event with a censoring event , whereas a censored observation corresponds to and .
We assume a simple type of independent right-censoring (Andersen et al., 1993). Suppose that is independent across and that, conditional on , is independent of , with a distribution that does not depend on . Then, conditional on , the likelihood contribution of factorizes in an MHT part, , and a censoring part that does not depend on . Thus, the conditional likelihood is proportional to . Its maximizer is the full-information maximum likelihood estimator of if the covariates carry no information on .
Note that the case without censoring, so that and almost surely for all , is included as a special case in which almost surely for all . Also, with more general independent right censoring schemes, the resulting estimator remains a valid (but often, partial) likelihood estimator (Andersen et al., 1993). Moreover, the likelihood, and the corresponding estimator, can easily be adapted to other practically relevant sampling schemes, such as those involving interval censoring.
3.2 Gaussian Special Case
Suppose that is a Brownian motion with drift, so that, by the analysis in Section 2.3, has a mixed inverse Gaussian distribution. Then, up to a constant containing the censoring time events, the log conditional (on the covariates) likelihood equals
[TABLE]
where
[TABLE]
is the Lebesgue density of the inverse Gaussian distribution and
[TABLE]
is its survival function (Cox and Miller, 1965, Section 5.4). Here, is the cumulative standard normal distribution function. With Section 2.5’s finite discrete specification of , the log likelihood in (6) reduces to
[TABLE]
If we e.g. specify , this log likelihood, its derivatives, and its maximizer are easy to compute using (7) and (8). Under standard regularity conditions, including the normalizations and assumptions needed for Corollary 1’s parametric identification, is a consistent and asymptotically normal estimator of . Given the assumption that the marginal distribution of and the censoring times carry no information on , it is also asymptotically efficient. Its asymptotic covariance matrix can quickly be estimated using either the score or Hessian characterization of the Fisher information matrix.
Many of the models studied in the statistics literature similarly lead to explicit expressions for the likelihood that facilitate estimation (Lee and Whitmore, 2006). In the general Lévy case, such explicit expressions are not available, and maximum likelihood cannot be implemented directly. The next section develops methods for computing the maximum likelihood estimator and its asymptotic distribution in this general case.
3.3 General Case
In general, and are not explicitly known, but can be computed by numerically inverting their Laplace transforms. Our approach is based on the work of Rogers (2000), who applied a variant of Abate and Whitt’s (1992) inversion method to the problem of calculating the first-passage-time distribution of a spectrally one-sided Lévy process.
Following Rogers, we first consider calculating the survival function . Using integration by parts, it is easy to show that its Laplace transform . So, for given , we can explicitly construct and obtain using Mellin’s inverse formula (e.g. Davies, 2002),
[TABLE]
Here, the integration is along the contour , which traces out a straight line in , parallel to the imaginary axis from to . We make this contour’s dependence on explicit by writing for its value at . The parameter should be chosen such that it is larger than the real part of any singularity in the Laplace transform . Because is analytic on the set of all with , we can choose any .
The integral in (10) does not generally have an explicit solution, but can be efficiently approximated using numerical methods. A key complication is that our specification of involves the inverse function , which cannot generally be expressed in closed form. To circumvent this problem, we follow Rogers and instead integrate along the composition , which is a contour in from to . Here, is the inverse of the Laplace exponent of the Brownian motion component of , for which (4) gives an explicit expression. Note that necessarily has the same dispersion parameter as , but that its drift parameter is not uniquely pinned down (because the drift parameter of depends on the way we deal with small shocks; see Section 2.2). Fortunately, the exact value of the drift parameter of plays no role in the argument that follows. It can generally be set to the drift parameter in the specific parameterization of used; for example, in (1) or in (2). Following Section 2.5’s specifications of with compound Poisson jumps, we have set the drift parameter of equal to in (2). We make the transformed contour’s dependence on and the parameters of explicit by writing for its value at .
Rogers argued that, under Assumption 1, replacing by in (10) does not affect that integral’s value, so that
[TABLE]
with
[TABLE]
which no longer involves . This argument relies on Cauchy’s integral theorem, which implies that an integral over the analytic integrand in (10) along a closed contour equals zero. This is particularly true for the closed contour formed by going up from to , crossing over from to , going down from to , and crossing back from to . Consequently, the integrals in (10) and (11) are equal, provided that the integrals over the contour from to and the contour from to vanish as . Rogers concluded that this is the case, because the integrand vanishes sufficiently fast along these two contours as (in particular, as ) and, under Assumption 1, their lengths do not grow too fast with . In particular,
[TABLE]
converges to zero as (note that the right hand side of (1) is dominated by the Gaussian term for large ). Similarly, as .
Using a change of variables, we can rewrite (11) as an integral over the real line:
[TABLE]
where . Following Abate and Whitt, we can apply the trapezoidal rule to approximate (12) with the infinite sum
[TABLE]
where is the rule’s step size. Note that we only need to approximate the real part of (12), because its imaginary part should be zero. Abate and Whitt discussed the error introduced by this discretization and noted that it works particularly well because the integrand oscillates and the approximation errors tend to cancel out.
In practice, we need to truncate the infinite sum in (13) to for some and use extrapolation to approximate the case where . Because is nearly periodic in , can be efficiently approximated using Euler summation:
[TABLE]
for some . Abate and Whitt proposed to estimate the associated error by . In our case, this estimate quickly tends to zero as M is increases, which suggests that the approximation is accurate (see also Section 4).
We follow a similar procedure to calculate the density from its Laplace transform . We again start with Mellin’s inverse formula (10) with contour , but now with in its left hand side and in its right hand side. With the finitely discrete specification of , vanishes more rapidly than (, whereas ) as .111111This follows from the fact that the behavior of for large is dominated by the term corresponding to the lowest support point of . With specifications of that have support near zero, may vanish more slowly than as . For example, if is a gamma distribution, one can show that as . Simulations suggest our procedure is nevertheless accurate in this case. This suggests that we can again replace the contour in Mellin’s inverse formula with and that
[TABLE]
where
[TABLE]
As before, we can rewrite this into an integral over the real line,
[TABLE]
where , and approximate this integral with an Euler sum .
One could control the computation of and with different tuning parameters , , , and . However, as our notation and for the corresponding Euler sums suggests, we will not do so in this paper. We take guidance from Rogers in setting the common values of , , , and . In the next sections, we find that his suggestion to use duration- specific values and yields good numerical performance in our case. We will adopt these as our default settings, together with and .121212Rogers (2000) claimed that and trade off accuracy and speed well. Because of the advances in computing speed since then, we can opt for more accuracy. See Section 4 for some details.
The log likelihood for an independently censored sample satisfies
[TABLE]
We have implemented an estimator in MATLAB that maximizes this approximate log likelihood using a quasi-Newton algorithm with BFGS updates for the Hessian and multiple random starting values (Nocedal and Wright, 2006).
We supply an analytical gradient of the approximate log likelihood with respect to the parameter vector to ensure quick and stable maximization. This gradient sums contributions of the observations. Consider the contribution of observation . Suppose that this observation is complete (; the calculations for a censored observation are similar). The approximate likelihood contribution of this observation, , is the real part of a weighted sum of over finitely many values of , with weights that do not depend on . Each term in this weighted sum is the product of three factors;
[TABLE]
that are smooth in and , composed with , which is itself smooth in and . Its complex-valued derivative with respect to follows from tedious but straightforward application of the product and chain rules. We ignore the imaginary part of the weighted sum of these derivatives over , because the imaginary part of the likelihood contribution that we approximate with is zero. So, we set the contribution of observation to the gradient of the log likelihood equal to the real part of this weighted sum of derivatives, divided by . The analytical gradient sums these contributions. We construct asymptotic standard errors from the corresponding Hessian, which we calculate using finite differences of the analytical gradient. The replication package (Abbring and Salimans, 2021) provides further details.
The MATLAB code currently normalizes by setting . Note that this implicitly assumes that . It would be straightforward to adapt the code to instead normalize , which more generally allows for , or , which does not restrict at all.
Our estimator maximizes an approximate log likelihood. For some applications, it has been shown that the maximum approximate likelihood estimator is first order equivalent to the exact maximum likelihood estimator if the approximations improve sufficiently quickly with the sample size (e.g. Aït-Sahalia, 2002). We could try to derive a similar equivalence result for our estimator, using Abate and Whitt’s numerical analysis and some further results on the tail behavior of and . However, as we will see in Section 4, we can compute our estimator very accurately in reasonable time, so that a formal result establishing how accuracy should increase with sample size would not be of much practical use. Therefore, we take the pragmatic approach that much of the literature has taken and simply apply standard maximum likelihood asymptotics.131313This is how Singleton (2001) handled his maximum likelihood estimator of a discretely sampled affine diffusion, which, like our estimator, required numerical Fourier inversion. He expressed some worries about the computational burden of his Fourier inversion procedure, but only for the multivariate case. We only use univariate Fourier inversion and benefit from 20 years of computational development.
4 Numerical Experiments
We have investigated the accuracy of the proposed likelihood approximation by conducting a range of numerical experiments. We discuss the results of three of these experiments here. All three experiments use the default settings for the parameters that control the approximation, unless explicitly stated otherwise. The first two experiments directly compare the explicitly known duration density and likelihood implied by MHT models without shocks to their approximations. The third experiment focuses on a model with shocks, for which the implied duration density is not known in explicit form.
The first experiment compares direct computations of the log likelihood function of the mixed inverse Gaussian model using the explicit expression for the density in (7) to its numerical approximations as we vary . The log likelihood is calculated on the data set that we use in Section 5. This ensures that this experiment provides both a real life test case and a check on the results we present in that section. The data contain 566 complete strike durations. Because the approximation errors are close to unbiased, the error in the log likelihood scales with the root of the sample size.
Figure 1 plots the average of the absolute approximation error of the log likelihood, for different values of , over 100 model parameters randomly generated at the scale of their maximum likelihood estimates. We find that this average absolute error decreases exponentially with ; this result is robust across the various parameter values over which the plotted results are averaged. Consistently with Rogers (2000), we see that already provides a decent approximation for most practical purposes. However, because the time required for the calculations grows only linearly in , we can increase to 25 at a very low computational cost and obtain a nearly thousandfold increase in precision (with most of the gain already obtained with ). Once , other factors, such as rounding errors, become important, and the approximation error levels off. We also find that, with , increasing or decreasing the step size adds very little to the precision of the inversion. The numerical approximation of the log likelihood takes 9–11 times as long to calculate as the analytical expression. However, in absolute terms this is still very manageable. For example, it takes about a second to calculate the density for a specification with shocks on a regular laptop computer 100,000 times.141414We used Figure 3’s specification and MATLAB 2020b on a MacBook Pro (2018, 15inch, 2.9GHz 6-Core Intel Core i9, 32 GB 2400 MHz DDR4) with macOS 10.15.7. Consistently with this, the log likelihood can be maximized, starting from multiple random parameter values for each maximization, in under half a minute for the model specifications that we consider in Section 5.
The second experiment takes a closer look at the numerical approximation of the density of a basic inverse Gaussian model with parameters such that . We only present results for , but found very similar results for any . For the purpose of maximum likelihood estimation, we care most about the errors in the approximation of the log density, . Figure 2 plots the absolute error of this approximation against the log density itself, on a logarithmic scale. The (log-)linear relation displayed by the graph implies that the absolute error in the approximation of roughly equals . Consequently, the approximation error is generally small, but the approximation breaks down when the density gets very small (say, , or ). When estimating the model with maximum likelihood, we can easily avoid this by setting reasonable starting values for the parameters. This ensures that the approximation is sufficiently precise for numerically robust maximum likelihood estimation.
The third experiment considers a model with shocks and a heterogeneous threshold. Figure 3 plots the approximate density of for this model, again using . In this case, the true density is not explicitly known, so we compare the approximate density with a fine histogram of many simulated values of . Our approximate density closely tracks the simulated one. This finding is robust across model specifications.
5 Strike Durations
The mere existence of nontrivial delays in labor agreements has puzzled economists; duration patterns in their resolution have been studied to learn more about underlying bargaining games and information structures.
Lancaster (1972) analyzed strike durations using a Gaussian MHT model with regressors, but without unobserved heterogeneity. He interpreted the gap between the Brownian motion and the threshold as the level of disagreement, and concluded that this model fits his data for the United Kingdom well. Others used proportional hazards models to study strike durations. Kennan (1985), in particular, showed that the US strike duration hazard is -shaped and took this as evidence against Lancaster’s (homogeneous) MHT model. He noted that this aspect of the data can be interpreted in terms of heterogeneity in the conflicts underlying the strikes, but did not subsequently pursue this in his empirical analysis.
Here, we will investigate whether Kennan’s strike data can be matched well by a more general MHT model that explicitly takes into account unobserved heterogeneity in strikes. Such a model comes with Lancaster’s attractive interpretation in terms of a level of disagreement that may both vary over time and initially be heterogeneous between strikes. We will explicitly discuss our estimation results in terms of this interpretation, with an implicit understanding that it is our modest objective to illustrate our methods and the descriptive and potential structural appeal of the MHT model, without providing a fully structural analysis of strike durations.
Kennan’s (1985) data cover all contract strikes in US manufacturing in the period 1968–1976 that involved at least a thousand workers, and that were classified to be primarily about “general wage changes”. They include the durations in days of 566 strikes and, for each strike, a measure of the state of the business cycle in the month it started: the residuals of a regression of log industrial production in US manufacturing on linear and quadratic trend terms and seasonal dummies. We obtained the data in a fixed format text file strkdur.asc from Cameron and Trivedi’s (2005) web page. We divided all strike durations by seven, so that they are measured in weeks.
Table 1 reports maximum likelihood estimates for a range of Section 2.5’s flexible parameterizations. All reported estimates are computed using Section 3.3’s numerical methods, with . To further check these methods and their MATLAB implementation, we have also computed the same estimates for lower values of (not reported), and estimates for the first five specifications using the explicit expressions for the log likelihood that are available in these cases (not reported). These results are virtually identical to those reported in Table 1.
Columns I–V present estimates of models with Brownian motion latent processes and discrete unobserved heterogeneity. Throughout, the drift is normalized to 1 per week (), so that . By its construction as a regression residual, varies around zero and is close to zero on average in the sample. Consequently, can be interpreted as the unobserved initial level of disagreement, measured as the mean number of strike weeks it commands.
The log likelihood substantially improves when adding a second, third and fourth support point to the distribution of , between Columns I and IV, but a fifth support point (Column V) hardly changes the fit and the other parameters’ estimates. The estimates indicate that there is both substantial heterogeneity in the strikes’ initial levels of disagreement and uncertainty in their evolution over time. The numbers in Column IV imply that there are four unobserved types of labor conflict, on average commanding respectively , , , and strike weeks. Each type’s level of disagreement evolves with a standard deviation per week just above the unit drift towards agreement.
It is instructive to note that the variance of the latent process drops substantially, from close to 20 to just over 1, when more heterogeneity is added between Columns I and IV. Clearly, Column I’s specification falsely attributes heterogeneity in the strikes’ initial levels of disagreement to uncertainty in their evolution over time.
The estimates of the coefficient reflect the effect of the business cycle on strike durations. In line with Kennan’s (1985) results, strikes that begin in months with low production last longer. In the MHT model, this is captured by a countercyclical threshold: In times with low production, in expectation, conflicts command more strike days. One interpretation is that strike days are less costly in times with low production. The precision of the estimates of is low. This is consistent with Kennan’s results. He obtained more precise results with a binary cyclical indicator constructed from the indicator used here. For simplicity, we do not follow this lead here.
Column VI reports an estimate of a specification that includes discrete shocks of size at Poisson times. The estimates point to an infrequent shock that sets back just over five weeks of drift towards agreement. The shock only somewhat improves the likelihood; a specification without shock, such as those in Columns IV and V, seems to be sufficient.
Finally, a very similar result is found with a gamma shock at a Poisson time (not reported). With this specification, virtually the same estimate of the arrival rate of the shocks is obtained. Moreover, the estimated gamma shock distribution is close to degenerate at Column VI’s estimate of the shock size (). Specifically, the estimates of the shape () and scale () parameters of the gamma distribution are both very large, and their ratio equals Column VI’s estimated shock size. As expected, the same log likelihood is found.
Figure 4 plots the aggregate hazard implied by the MHT model’s estimates in Column IV of Table 1. It also plots the hazard implied by estimates a MPH model with a Weibull baseline and a discrete heterogeneity distribution with four support points. Note that this MPH specification has exactly the same number of parameters as Column IV’s MHT specification. In both cases, we computed the distribution of implied by these estimates, integrated over the empirical distribution of , and computed and plotted the hazard rate of the resulting distribution. Figure 4 also plots the empirical hazard rate, computed by kernel smoothing the raw data.
Both the MHT and the MPH models fit the empirical hazard well, but the MPH model’s log likelihood, at , is points lower. Because the Weibull baseline is monotonic, the Weibull MPH model can only fit the nonmonotonic strike hazard by compensating an increasing baseline hazard with negative duration dependence due to unobserved heterogeneity. Of course, usually MPH models with richer specifications of the baseline hazard are estimated and a sufficiently rich specification can fit the empirical hazard arbitrarily well.
6 Conclusion
The results in this paper enable applied researchers to analyze duration data with mixed hitting-time (MHT) models using standard likelihood-based estimation and inference methods. The MATLAB code for parametric maximum likelihood estimation that accompanies this paper can directly be applied to either complete or independently right-censored duration data, and is easy to adapt to more general censoring schemes.
Our procedure for likelihood computation lends itself well for use in semi-nonparametric maximum likelihood estimation (e.g. Chen, 2007). As in Heckman and Singer (1984)’s analysis of the MPH model, we could handle unobserved heterogeneity nonparametrically using discrete heterogeneity distributions with a varying number of support points. Some care would have to be taken to ensure that the likelihood approximation continues to work well if the unobserved heterogeneity, in the limit, has support near zero (see Footnote 11). Similarly, the Lévy-Itô decomposition of (see Section 2.2) suggests that we construct a sieve for using Section 2.5’s specification that sums a Gaussian component with an independent compound Poisson component, with the shocks distributed discretely with a varying number of support points. This way, each element of the sieve satisfies Assumption 1 and our computational procedure applies.
The procedure can also be used to implement other likelihood-based methods. For example, it can be combined with data augmentation and Markov chain Monte Carlo methods to implement a Bayesian estimator that can flexibly deal with unobserved heterogeneity.
Two types of empirical application of the MHT framework can be distinguished. First, it can be used as a descriptive framework, much like Cox’s (1972) proportional hazards model and Lancaster’s (1979) mixed proportional hazards model. Section 5’s analysis of Kennan’s (1985) strike data shows that estimates of the MHT model have descriptive appeal, with natural interpretations that nicely complement those that could be obtained from a proportional hazards analysis. Indeed, in statistics, there is substantial interest in the descriptive analysis of duration data with first hitting time models (Singpurwalla, 1995; Yashin and Manton, 1997; Aalen and Gjessing, 2001; Lee and Whitmore, 2006).
Second, it can be applied to the structural empirical analysis of heterogeneous agents’ optimal stopping decisions. Abbring (2012) presents a range of examples, based on the type of optimal stopping models that are reviewed and analyzed in Dixit and Pindyck (1994); Stokey (2009); Kyprianou (2006); Boyarchenko and Levendorskiĭ (2007). These include McDonald and Siegel’s (1986) model for the optimal timing of an irreversible investment; a model of unemployment durations based on Dixit’s (1989) model of entry and exit, complemented with heterogeneity in transition costs; and a model of job separations with heterogeneous search. The identification results in Abbring (2012, 2010) show that data on durations and covariates are informative on the economic primitives of such models. The methods developed in this paper can be applied to measure those primitives.
Acknowledgements
We are grateful to Yanqin Fan, the editor (Dennis Kristensen), two referees, and attendees of various conferences and seminars for their comments. We thank Justin Dijk for excellent research assistance. The research of Jaap Abbring is financially supported by the Dutch Research Council (NWO) through Vici grant 453-11-002. Tim Salimans worked on this paper while employed at Erasmus University Rotterdam.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Aalen and Gjessing (2001) Aalen, O. O. and H. K. Gjessing (2001). Understanding the shape of the hazard rate: A process point of view. Statistical Science 16 (1), 1–14.
- 2Abate and Whitt (1992) Abate, J. and W. Whitt (1992). The Fourier-series method for inverting transforms of probability distributions. Queueing Systems 10 , 5–88.
- 3Abbring (2010) Abbring, J. H. (2010). Identification of dynamic discrete choice models. Annual Review of Economics 2 , 367–394.
- 4Abbring (2012) Abbring, J. H. (2012, March). Mixed hitting-time models. Econometrica 80 (2), 783–819.
- 5Abbring and Salimans (2021) Abbring, J. H. and T. Salimans (2021, April). The likelihood of mixed hitting times: Replication package. Zenodo. https://doi.org/10.5281/zenodo.4670373 . · doi ↗
- 6Aït-Sahalia (2002) Aït-Sahalia (2002). Maximum likelihood estimation of discretely sampled diffusions: A closed-form approximation approach. Econometrica 70 (1), 223–262.
- 7Andersen et al. (1993) Andersen, P. K., Ø. Borgan, R. D. Gill, and N. Keiding (1993). Statistical Models Based on Counting Processes . New York: Springer-Verlag.
- 8Bertoin (1996) Bertoin, J. (1996). Lévy Processes . Number 121 in Cambridge Tracts in Mathematics. Cambridge: Cambridge University Press.
