The Likelihood of Mixed Hitting Times

Jaap H. Abbring; Tim Salimans

arXiv:1905.03463·econ.EM·May 3, 2021

The Likelihood of Mixed Hitting Times

Jaap H. Abbring, Tim Salimans

PDF

Open Access 1 Repo

TL;DR

This paper introduces a numerical method to compute the likelihood of mixed hitting-time models involving Lévy processes, enabling maximum likelihood estimation and application to strike data analysis.

Contribution

The paper develops a novel numerical approach for inverting Laplace transforms to compute likelihoods in mixed hitting-time models involving Lévy processes.

Findings

01

Successfully implemented maximum likelihood estimator in MATLAB.

02

Applied the method to analyze Kennan's strike data.

03

Demonstrated the method's effectiveness in real-world data analysis.

Abstract

We present a method for computing the likelihood of a mixed hitting-time model that specifies durations as the first time a latent L\'evy process crosses a heterogeneous threshold. This likelihood is not generally known in closed form, but its Laplace transform is. Our approach to its computation relies on numerical methods for inverting Laplace transforms that exploit special properties of the first passage times of L\'evy processes. We use our method to implement a maximum likelihood estimator of the mixed hitting-time model in MATLAB. We illustrate the application of this estimator with an analysis of Kennan's (1985) strike data.

Tables1

Table 1. Table 1: Maximum Likelihood Estimates for Kennan ’s ( 1985 ) Strike Duration Data

	I	II	III	IV	V	VI
$μ$	$1$	$1$	$1$	$1$	$1$	$1$
	$(0)$	$(0)$	$(0)$	$(0)$	$(0)$	$(0)$
$σ^{2}$	$19.659$	$6.218$	$2.067$	$1.227$	$1.197$	$0.542$
	$(3.157)$	$(0.863)$	$(0.403)$	$(0.217)$	$(0.218)$	$(0.315)$
$λ$						$0.019$
						$(0.021)$
$ν$						$- 5.133$
						$(2.546)$
$β$	$- 0.931$	$- 1.772$	$- 1.085$	$- 0.867$	$- 0.862$	$- 0.579$
	$(0.601)$	$(0.687)$	$(0.643)$	$(0.628)$	$(0.629)$	$(0.611)$
$v_{1}$	$6.260$	$2.543$	$1.537$	$1.105$	$1.031$	$0.755$
	$(0.467)$	$(0.199)$	$(0.142)$	$(0.113)$	$(0.175)$	$(0.177)$
$v_{2}$		$8.751$	$5.888$	$3.209$	$1.756$	$2.083$
		$(0.520)$	$(0.390)$	$(0.452)$	$(1.032)$	$(0.510)$
$v_{3}$			$18.161$	$7.165$	$3.518$	$4.138$
			$(1.011)$	$(0.560)$	$(0.763)$	$(0.842)$
$v_{4}$				$18.557$	$7.303$	$7.412$
				$(0.698)$	$(0.645)$	$(0.552)$
$v_{5}$					$18.575$	$17.004$
					$(0.693)$	$(1.220)$
$π_{1}$	$1$	$0.399$	$0.353$	$0.252$	$0.199$	$0.198$
	$(0)$	$(0.044)$	$(0.034)$	$(0.038)$	$(0.117)$	$(0.040)$
$π_{2}$		$0.601$	$0.492$	$0.283$	$0.098$	$0.201$
		$(0.044)$	$(0.034)$	$(0.050)$	$(0.133)$	$(0.073)$
$π_{3}$			$0.154$	$0.315$	$0.256$	$0.223$
			$(0.023)$	$(0.053)$	$(0.083)$	$(0.062)$
$π_{4}$				$0.151$	$0.297$	$0.238$
				$(0.019)$	$(0.064)$	$(0.064)$
$π_{5}$					$0.150$	$0.140$
					$(0.019)$	$(0.020)$
$ℓ_{N}$	$- 1658.9$	$- 1588.7$	$- 1583.0$	$- 1576.3$	$- 1576.1$	$- 1575.4$

Equations42

ψ (s) = \tilde{μ} s + \frac{σ ^{2}}{2} s^{2} + \int_{(- \infty, 0)} {e^{sy} - 1 - sy I (y > - 1)} Υ (d y) .

ψ (s) = \tilde{μ} s + \frac{σ ^{2}}{2} s^{2} + \int_{(- \infty, 0)} {e^{sy} - 1 - sy I (y > - 1)} Υ (d y) .

ψ (s) = μ s + \frac{σ ^{2}}{2} s^{2} + \int_{(- \infty, 0)} (e^{sy} - 1) Υ (d y),

ψ (s) = μ s + \frac{σ ^{2}}{2} s^{2} + \int_{(- \infty, 0)} (e^{sy} - 1) Υ (d y),

F (s ∣ x) = \int_{[0, \infty)} exp (- s t) d F (t ∣ x) = \int_{(0, \infty)} [\int_{[0, \infty)} exp (- s t) d F (t ∣ x, v)] d G (v) = \int_{(0, \infty)} exp [- Λ (s) ϕ (x) v] d G (v) = G [Λ (s) ϕ (x)],

F (s ∣ x) = \int_{[0, \infty)} exp (- s t) d F (t ∣ x) = \int_{(0, \infty)} [\int_{[0, \infty)} exp (- s t) d F (t ∣ x, v)] d G (v) = \int_{(0, \infty)} exp [- Λ (s) ϕ (x) v] d G (v) = G [Λ (s) ϕ (x)],

Λ_{BM} (s; μ, σ) \equiv \frac{μ ^{2} + 2 σ ^{2} s - μ}{σ ^{2}} .

Λ_{BM} (s; μ, σ) \equiv \frac{μ ^{2} + 2 σ ^{2} s - μ}{σ ^{2}} .

F (s ∣ x) = δ ↓ 0 lim \frac{E [ exp ( - s T ) I ( X \in B ( x , δ )) ]}{E [ I ( X \in B ( x , δ )) ]}, s \in [0, \infty), x \in X .

F (s ∣ x) = δ ↓ 0 lim \frac{E [ exp ( - s T ) I ( X \in B ( x , δ )) ]}{E [ I ( X \in B ( x , δ )) ]}, s \in [0, \infty), x \in X .

ℓ_{N} (θ) = n = 1 \sum N ln \int f_{BM} (T_{n}^{*} ∣ X_{n}, v; μ, σ, β)^{D_{n}} \overline{F}_{BM} (T_{n}^{*} ∣ X_{n}, v; μ, σ, β)^{1 - D_{n}} d G (v; κ),

ℓ_{N} (θ) = n = 1 \sum N ln \int f_{BM} (T_{n}^{*} ∣ X_{n}, v; μ, σ, β)^{D_{n}} \overline{F}_{BM} (T_{n}^{*} ∣ X_{n}, v; μ, σ, β)^{1 - D_{n}} d G (v; κ),

f_{BM} (t ∣ x, v; μ, σ, β) = \frac{ϕ ( x ; β ) v}{σ 2 π t ^{3}} exp (- \frac{[ ϕ ( x ; β ) v - μ t ] ^{2}}{2 σ ^{2} t})

f_{BM} (t ∣ x, v; μ, σ, β) = \frac{ϕ ( x ; β ) v}{σ 2 π t ^{3}} exp (- \frac{[ ϕ ( x ; β ) v - μ t ] ^{2}}{2 σ ^{2} t})

\overline{F}_{BM} (t ∣ x, v; μ, σ, β) = Φ (\frac{ϕ ( x ; β ) v - μ t}{σ t}) - exp (\frac{2 μ ϕ ( x ; β ) v}{σ ^{2}}) Φ (- \frac{ϕ ( x ; β ) v + μ t}{σ t})

\overline{F}_{BM} (t ∣ x, v; μ, σ, β) = Φ (\frac{ϕ ( x ; β ) v - μ t}{σ t}) - exp (\frac{2 μ ϕ ( x ; β ) v}{σ ^{2}}) Φ (- \frac{ϕ ( x ; β ) v + μ t}{σ t})

ℓ_{N} (θ) = n = 1 \sum N ln l = 1 \sum L π_{l} f_{BM} (T_{n}^{*} ∣ X_{n}, v_{l}; μ, σ, β)^{D_{n}} \overline{F}_{BM} (T_{n}^{*} ∣ X_{n}, v_{l}; μ, σ, β)^{1 - D_{n}} .

ℓ_{N} (θ) = n = 1 \sum N ln l = 1 \sum L π_{l} f_{BM} (T_{n}^{*} ∣ X_{n}, v_{l}; μ, σ, β)^{D_{n}} \overline{F}_{BM} (T_{n}^{*} ∣ X_{n}, v_{l}; μ, σ, β)^{1 - D_{n}} .

\overline{F} (t ∣ x; θ) = \frac{1}{2 π i} ξ \to \infty lim \int_{γ_{ξ}} exp (s t) \overline{F} (s ∣ x; θ) d s .

\overline{F} (t ∣ x; θ) = \frac{1}{2 π i} ξ \to \infty lim \int_{γ_{ξ}} exp (s t) \overline{F} (s ∣ x; θ) d s .

\overline{F} (t ∣ x; θ) = \frac{1}{2 π i} ξ \to \infty lim \int_{\tilde{γ}_{ξ}} exp (s t) \overline{F} (s; x; θ) d s = \frac{1}{2 π i} ξ \to \infty lim \int_{γ_{ξ}} \overline{q}^{*} (t, s ∣ x; θ) d s,

\overline{F} (t ∣ x; θ) = \frac{1}{2 π i} ξ \to \infty lim \int_{\tilde{γ}_{ξ}} exp (s t) \overline{F} (s; x; θ) d s = \frac{1}{2 π i} ξ \to \infty lim \int_{γ_{ξ}} \overline{q}^{*} (t, s ∣ x; θ) d s,

\overline{q}^{*} (t, s ∣ x; θ) \equiv exp {ψ [Λ_{BM} (s; μ, σ)] t; μ, σ, α} \frac{1 - G [ Λ _{BM} ( s ; μ , σ ) ϕ ( x ; β ) ; κ ]}{ψ [ Λ _{BM} ( s ; μ , σ ) ]} \frac{d}{d s} ψ [Λ_{BM} (s; μ, σ); μ, σ, α];

\overline{q}^{*} (t, s ∣ x; θ) \equiv exp {ψ [Λ_{BM} (s; μ, σ)] t; μ, σ, α} \frac{1 - G [ Λ _{BM} ( s ; μ , σ ) ϕ ( x ; β ) ; κ ]}{ψ [ Λ _{BM} ( s ; μ , σ ) ]} \frac{d}{d s} ψ [Λ_{BM} (s; μ, σ); μ, σ, α];

\frac{γ _{ξ} ( 1 ; c ) - γ ~ _{ξ} ( 1 ; μ , σ , α , c )}{γ _{ξ} ( 1 ; c )} = \frac{c + i ξ - ψ [ Λ _{BM} ( c + i ξ ; μ , σ ) ; μ , σ , α ]}{c + i ξ} = \frac{ψ _{BM} [ Λ _{BM} ( c + i ξ ; μ , σ ) ; μ , σ ] - ψ [ Λ _{BM} ( c + i ξ ; μ , σ ) ; μ , σ , α ]}{ψ _{BM} [ Λ _{BM} ( c + i ξ ; μ , σ ) ; μ , σ ]}

\frac{γ _{ξ} ( 1 ; c ) - γ ~ _{ξ} ( 1 ; μ , σ , α , c )}{γ _{ξ} ( 1 ; c )} = \frac{c + i ξ - ψ [ Λ _{BM} ( c + i ξ ; μ , σ ) ; μ , σ , α ]}{c + i ξ} = \frac{ψ _{BM} [ Λ _{BM} ( c + i ξ ; μ , σ ) ; μ , σ ] - ψ [ Λ _{BM} ( c + i ξ ; μ , σ ) ; μ , σ , α ]}{ψ _{BM} [ Λ _{BM} ( c + i ξ ; μ , σ ) ; μ , σ ]}

\overline{F} (t ∣ x; θ) = \frac{1}{2 π} \int_{- \infty}^{\infty} \overline{q} (t, u ∣ x; θ, c) d u,

\overline{F} (t ∣ x; θ) = \frac{1}{2 π} \int_{- \infty}^{\infty} \overline{q} (t, u ∣ x; θ, c) d u,

\overline{S}_{\infty} (t ∣ x; θ, c, h) \equiv \frac{h}{2 π} r = - \infty \sum \infty ℜ \overline{q} (t, r h ∣ x; θ, c),

\overline{S}_{\infty} (t ∣ x; θ, c, h) \equiv \frac{h}{2 π} r = - \infty \sum \infty ℜ \overline{q} (t, r h ∣ x; θ, c),

\overline{F} (t ∣ x) \approx \overline{E}_{R, M} (t ∣ x; θ, c, h) \equiv m = 0 \sum M 2^{- M} (m M) \overline{S}_{R + m} (t ∣ x; θ, c, h),

\overline{F} (t ∣ x) \approx \overline{E}_{R, M} (t ∣ x; θ, c, h) \equiv m = 0 \sum M 2^{- M} (m M) \overline{S}_{R + m} (t ∣ x; θ, c, h),

f (t ∣ x; θ) = \frac{1}{2 π i} ξ \to \infty lim \int_{γ_{ξ}} q^{*} (t, s ∣ x; θ) d s,

f (t ∣ x; θ) = \frac{1}{2 π i} ξ \to \infty lim \int_{γ_{ξ}} q^{*} (t, s ∣ x; θ) d s,

q^{*} (t, s ∣ x; θ) \equiv exp {ψ [Λ_{BM} (s; μ, σ)] t; μ, σ, α} G [Λ_{BM} (s; μ, σ) ϕ (x; β); κ] \frac{d}{d s} ψ [Λ_{BM} (s; μ, σ); μ, σ, α] .

q^{*} (t, s ∣ x; θ) \equiv exp {ψ [Λ_{BM} (s; μ, σ)] t; μ, σ, α} G [Λ_{BM} (s; μ, σ) ϕ (x; β); κ] \frac{d}{d s} ψ [Λ_{BM} (s; μ, σ); μ, σ, α] .

f (t ∣ x; θ) = \frac{1}{2 π} \int_{- \infty}^{\infty} q (t, u ∣ x; θ, c) d u,

f (t ∣ x; θ) = \frac{1}{2 π} \int_{- \infty}^{\infty} q (t, u ∣ x; θ, c) d u,

ℓ_{N} (θ) = n = 1 \sum N D_{n} ln f (T_{n}^{*} ∣ X_{n}; θ) + (1 - D_{n}) ln \overline{F} (T_{n}^{*} ∣ X_{n}; θ) \approx n = 1 \sum N D_{n} ln E_{R, M} (T_{n}^{*} ∣ X_{n}; θ, c, h) + (1 - D_{n}) ln \overline{E}_{R, M} (T_{n}^{*} ∣ X_{n}; θ, c, h) .

ℓ_{N} (θ) = n = 1 \sum N D_{n} ln f (T_{n}^{*} ∣ X_{n}; θ) + (1 - D_{n}) ln \overline{F} (T_{n}^{*} ∣ X_{n}; θ) \approx n = 1 \sum N D_{n} ln E_{R, M} (T_{n}^{*} ∣ X_{n}; θ, c, h) + (1 - D_{n}) ln \overline{E}_{R, M} (T_{n}^{*} ∣ X_{n}; θ, c, h) .

exp [ψ (z; μ, σ, α) T_{n}^{*}], G [z ϕ (X_{n}; β); κ], and ψ^{'} (z; μ, σ, α) Λ_{BM}^{'} (c + i r h; μ, σ);

exp [ψ (z; μ, σ, α) T_{n}^{*}], G [z ϕ (X_{n}; β); κ], and ψ^{'} (z; μ, σ, α) Λ_{BM}^{'} (c + i r h; μ, σ);

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jabbring/mht-likelihood
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSimulation Techniques and Applications · Probability and Risk Models · Advanced Queuing Theory Analysis

Full text

The Likelihood of Mixed Hitting Times††thanks: Forthcoming in the Journal of Econometrics: doi.org/10.1016/j.jeconom.2019.08.017.

Jaap H. Abbring Department of Econometrics & OR, Tilburg University, P.O. Box 90153, 5000 LE Tilburg, The Netherlands; and CEPR. E-mail: [email protected]. Web: jaap.abbring.org.

Tim Salimans Brain Team, Google Research, Amsterdam, The Netherlands. E-mail: [email protected]. Web: github.com/TimSalimans.

Keywords: duration analysis, first passage time, identification, Laplace transform, Lévy process, maximum likelihood, Mellin’s inverse formula, mixture, optimal stopping, strike duration.

JEL codes: C14, C41.

(April 2021)

Abstract

We present a method for computing the likelihood of a mixed hitting-time model that specifies durations as the first time a latent Lévy process crosses a heterogeneous threshold. This likelihood is not generally known in closed form, but its Laplace transform is. Our approach to its computation relies on numerical methods for inverting Laplace transforms that exploit special properties of the first passage times of Lévy processes. We use our method to implement a maximum likelihood estimator of the mixed hitting-time model in MATLAB. We illustrate the application of this estimator with an analysis of Kennan’s (1985) strike data.

1 Introduction

Mixed hitting-time (MHT) models are mixture duration models that specify durations as the first time a latent stochastic process crosses a heterogeneous threshold. They are of substantial interest because they can be applied to the analysis of optimal stopping decisions by heterogeneous agents (Abbring, 2012, 2010). In particular, they can be applied to problems that do not lead to the mixed proportional hazards (MPH) model, Lancaster’s (1979) and Vaupel et al.’s (1979) popular extension of the Cox (1972) proportional hazards model. Examples include models of job durations, marriage durations, and the entry and exit of firms that are driven by Brownian motions and more general persistent processes. Hitting-time duration models are also popular in statistics for their structural and descriptive appeal (Lee and Whitmore, 2006).

This paper considers likelihood-based empirical methods for an MHT model in which the latent process is a spectrally-negative Lévy process, a continuous-time process with stationary and independent increments and no positive jumps, and the threshold is proportional in the effects of observed regressors and unobserved heterogeneity. Spectrally-negative Lévy processes include Brownian motions with linear drifts and Poisson processes compounded with negative shocks as well-known special cases. Following empirical practice with mixture duration models such as the mixed proportional hazards model, we focus on parametric MHT models, and propose flexible parameterizations that can approximate arbitrary functional forms by increasing the number of parameters. The main obstacle in applying standard parametric likelihood methods is that, in general, we have no explicit expression for the MHT model’s likelihood. However, an explicit expression for its Laplace transform is always available. Our approach to likelihood computation exploits this.

We focus on the case in which the latent Lévy process has a nontrivial Gaussian component. We first show that this ensures that the model implies a duration distribution with nonzero Lebesgue density at all positive durations and that it is nonparametrically identified up to innocuous scale normalizations. We then adapt numerical methods for the inversion of the Laplace transforms of the hitting times of Lévy processes with nontrivial Gaussian components to compute the conditional density and survival function implied by the MHT model. In turn, these are used to construct a likelihood for independently censored duration data. If the latent process is a Brownian motion, the likelihood can be explicitly expressed in terms of mixed inverse Gaussian densities and survival functions. Therefore, we can use this special case as a benchmark for evaluating the quality of our procedure for computing the likelihood. We show that the numerical inversion that is required in the general case is sufficiently fast and precise to make maximum likelihood estimation feasible even if no explicit expression of the likelihood is available.

We implement a maximum likelihood estimator that uses this computational strategy in MATLAB, and illustrate its application with a reconsideration of Kennan’s (1985) empirical analysis of US contract strike durations.111We provide MATLAB code that implements the methods in this paper in a public repository at github.com/jabbring/mht-likelihood. The results in this paper can be replicated by running make in version v1.1.1 of this code, which we have deposited as Abbring and Salimans (2021). Our strategy for computing the MHT model’s likelihood can also be used to implement other likelihood-based empirical methods. For example, it can be combined with data augmentation and Markov chain Monte Carlo techniques to implement Bayesian estimators of the MHT model.

Abbring (2012) presented the MHT model studied in this paper, analyzed its empirical content, and highlighted its close relation to optimal stopping problems in economics. This paper shows that the restriction to an MHT model with a nontrivial Gaussian component suffices for its identification. It operationalizes this model by providing and analyzing feasible methods for computing its likelihood and its maximum likelihood estimator.

Singleton (2001) developed similar methods for a different class of models, discretely sampled affine diffusions. He noted that the density of an observation of such a diffusion conditional on the previous observation is not known explicitly, but that its characteristic function is. He proposed a maximum likelihood estimator based on the Fourier inverse of this characteristic function. This paper’s methods for the MHT model instead rely on the inversion of Laplace transforms and exploit specific results for the first passage times of Lévy processes.

Alternatively, we could avoid computation of the likelihood altogether by constructing an estimator directly from the equality of the Laplace transform of the duration data implied by the true model and its empirical analog. Abbring (2012, Section 5.3) sketched such a generalized method of moments (GMM) estimator for the MHT model. A disadvantage of this alternative approach is that, unlike this paper’s likelihood-based approach, it cannot straightforwardly handle censored duration data because we only have an expression of the Laplace transform of the complete (uncensored) duration distribution.222Singleton (2001) developed a similar GMM estimator for discretely sampled diffusions, based on their characteristic function. In that context, censoring is not important and such a GMM estimator is a natural alternative to maximum likelihood. Moreover, a practical implementation of such a GMM estimator is generally less efficient than maximum likelihood. Therefore, this paper focuses on likelihood-based methods.

The remainder of this paper is organized as follows. Section 2 reviews the MHT model and the corresponding characterization of the data presented in Abbring (2012). It also introduces the assumption that the latent process has a nontrivial Gaussian component and explores its implications, including novel nonparametric and parametric identification results. Section 3 presents a method for the computation of the model’s log likelihood and its derivatives and discusses maximum likelihood estimation. Section 4 assesses the numerical accuracy of our method and Section 5 applies it to strike data. Section 6 briefly discusses extensions to Bayesian and sieve estimators and reviews possible applications.

2 Mixed Hitting-Time Model

2.1 Specification

Following Abbring (2012, Section 2), we model the distribution of a random duration $T$ conditional on observed covariates $X$ by specifying $T$ as the first time a real-valued Lévy process $\{Y\}\equiv\{Y(t);t\geq 0\}$ crosses a threshold that depends on $X$ and some unobservables $V$ ; assuming that $\{Y\}$ , $X$ , and $V$ are mutually independent; and specifying a marginal distribution of $V$ .

A Lévy process is the continuous-time equivalent of a random walk: It has stationary and independent increments. Bertoin (1996) provides a comprehensive analysis of Lévy processes. Formally, we have

Definition 1.

A Lévy process is a stochastic process $\{Y\}$ such that the increment $Y(t+\Delta)-Y(t)$ is independent of $\{Y(\tau);0\leq\tau\leq t\}$ and has the same distribution as $Y(\Delta)$ , for every $t,\Delta\geq 0$ .

We take $\{Y\}$ to have right-continuous sample paths with left limits. Note that Definition 1 implies that $Y(0)=0$ almost surely.

An important example of a Lévy process is the scalar Brownian motion with drift, in which case $Y(\Delta)$ is normally distributed with mean $\mu\Delta$ and variance $\sigma^{2}\Delta$ , for some scalar parameters $\mu\in\mathbb{R}$ and $\sigma\in[0,\infty)$ . The Brownian motion is the single Lévy process with continuous sample paths. In general, Lévy processes may have jumps. Examples are compound Poisson processes, which have independently and identically distributed jumps at Poisson times. More generally, the jump process $\{\Delta Y\}$ of a Lévy process $\{Y\}$ is a Poisson point process with characteristic measure $\Upsilon$ such that $\int\min\{1,y^{2}\}\Upsilon(dy)<\infty$ , and any Lévy process $\{Y\}$ can be written as the sum of a Brownian motion with drift and an independent pure-jump process with jumps governed by such a point process (Bertoin, 1996, Chapter I, Theorem 1). The characteristic measure of $\{Y\}$ ’s jump process is called its Lévy measure and, together with the drift and dispersion parameters of its Brownian motion component, fully characterizes $\{Y\}$ ’s distributional properties.

Throughout the paper, we will focus on spectrally-negative Lévy processes, which are Lévy processes of which the characteristic measure $\Upsilon$ has negative support, i.e. Lévy processes without positive jumps. This greatly facilitates the analysis of their hitting times, because it excludes that they jump across the threshold. Let $\{Y\}$ be a spectrally-negative Lévy process and $T(y)\equiv\inf\{t\geq 0:Y(t)>y\}$ the first time it hits a threshold $y\in[0,\infty)$ . Here, we use the convention that $\inf\emptyset\equiv\infty$ ; that is, we set $T(y)=\infty$ if $\{Y\}$ never crosses $y$ , which happens with positive probability for some specifications of $\{Y\}$ . We exclude the trivial case that $\{Y\}$ is weakly decreasing and $T(y)=\infty$ almost surely.333This is implied by Assumption 1, which we will introduce only later because it is easier to formulate after developing the model’s characterization (which requires the weaker assumption made here).

Denote the support of the observed covariates $X$ with ${\cal X}\subseteq\mathbb{R}^{K}$ , let $V$ have distribution $G$ on $(0,\infty)$ , and recall that $\{Y\}$ , $X$ , and $V$ are mutually independent. The (proportional) mixed hitting-time (MHT) model specifies the cumulative distribution $F(\cdot|x,v)$ of $T$ conditional on $(X,V)=(x,v)\in{\cal X}\times(0,\infty)$ as $F(t|x,v)=\Pr\left[T(\phi(x)v)\leq t\right]$ , for some measurable function $\phi:{\cal X}\rightarrow(0,\infty)$ .444For expositional convenience, we have restricted the supports of $\phi(X)$ and $V$ , and therefore of the threshold $\phi(X)V$ , to $(0,\infty)$ . It is straightforward to extend the analysis to $[0,\infty]$ -valued thresholds, as in Abbring (2012, Section 2.2 and Appendix A). This would allow for a probability mass at zero duration (as $T(0)=0$ almost surely) and, with $T(\infty)\equiv\infty$ , a mass of “stayers.” Integrating out $v$ with respect to the distribution $G$ of $V$ gives the distribution $F(t|x)=\int F(t|x,v)dG(v)=\int\Pr\left[T(\phi(x)v)\leq t\right]dG(v)$ of $T|X=x$ . We note the corresponding “survival function” with $\overline{F}(t|x)\equiv 1-F(t|x)$ .

2.2 Characterization

The distribution $F(\cdot|x,v)$ is fully determined by its Laplace transform, ${\cal F}(s|x,v)\equiv\int_{[0,\infty)}\exp\left(-st\right)dF(t|x,v)$ , $s\in[0,\infty)$ . Note that ${\cal F}(0|x,v)=\lim_{t\rightarrow\infty}F(t|x,v)$ may be smaller than 1 if $\{Y\}$ is such that, with positive probability, it never hits $\phi(x)v$ .

Abbring (2012, Section 4.1) showed that the Laplace transform ${\cal F}(\cdot|x,v)$ , unlike $F(\cdot|x,v)$ itself, can be explicitly given for any specification of the latent process $\{Y\}$ . This first requires a common probabilistic characterization of $\{Y\}$ , in terms of its characteristic function. Bertoin (1996, Section VII.1) shows that $\mathbb{E}\left[\exp\left(sY(t)\right)\right]=\exp\left[\psi(s)t\right]$ , for all $s\in\mathbb{C}$ with real part $\Re\,s\geq 0$ , with the Laplace exponent $\psi$ given by the Lévy-Khintchine formula,

[TABLE]

Here, $I(\cdot)\equiv 1$ if $\cdot$ is true and [math] otherwise, $\tilde{\mu}\in\mathbb{R}$ absorbs any linear drift of $\{Y\}$ , $\sigma\geq 0$ is the dispersion parameter of its Brownian motion component; and $\Upsilon$ is the Lévy measure of its jump component, where $\Upsilon$ satisfies $\int\min\{1,y^{2}\}\Upsilon(dy)<\infty$ and has negative support. The Laplace exponent $\psi$ of $\{Y\}$ fully characterizes its distributions, through its characteristic function $u\in\mathbb{R}\mapsto\mathbb{E}\left[\exp\left(\mathrm{i}uY(t)\right)\right]=\exp\left[\psi(\mathrm{i}u)t\right]$ .

Equation (1) gives the most common parameterization of $\psi$ . It corresponds to the Lévy-Itô decomposition of $\{Y\}$ in a Brownian motion with linear drift $\tilde{\mu}t$ , a compound Poisson process with jumps in $(-\infty,-1]$ , and a pure-jump martingale with jumps in $(-1,0)$ (Bertoin, 1996, Section I.1). Alternative parameterizations arise if we decompose the jumps of $\{Y\}$ in small and large shocks in other ways. These parameterizations all have the same dispersion parameter $\sigma$ and Lévy measure $\Upsilon$ , but have different drift parameters. For example, in the special case that $\int_{(-1,0)}y\Upsilon(dy)<\infty$ , the compensator term for the small shocks in (1), $\int_{(-\infty,0)}syI(y>-1)\Upsilon(dy)=s\int_{(-1,0)}y\Upsilon(dy)$ , is a well-defined linear function of $s$ . Therefore, in this case, we can alternatively parameterize $\psi$ as

[TABLE]

where $\mu\equiv\tilde{\mu}+\int_{(-1,0)}y\Upsilon(dy)$ . This includes the important special case that $\int_{(-\infty,0)}\Upsilon(dy)<\infty$ , in which $\{Y\}$ is the sum of a Brownian motion with drift parameter $\mu$ and a compound Poisson process with jumps of sizes in $(-\infty,0)$ . In general, any of the equivalent parameterizations of $\psi$ can be used in the MHT model’s specification, but some are numerically and statistically more convenient than others; we return to this in Section 2.5.

With $\psi$ determined, we are ready to analyze the Laplace transform ${\cal F}(\cdot|x,v)$ . The Laplace exponent, as a function on $[0,\infty)$ , is continuous and convex, and satisfies $\psi(0)=0$ and, because $\{Y\}$ is not weakly decreasing, $\lim_{s\rightarrow\infty}\psi(s)=\infty$ . Therefore, there exists a largest solution $\Lambda(0)\geq 0$ to $\psi(\Lambda(0))=0$ and an inverse $\Lambda:[0,\infty)\rightarrow[\Lambda(0),\infty)$ of the restriction of $\psi$ to $[\Lambda(0),\infty)$ . Theorem 1 of Bertoin (1996, Chapter VII) implies that ${\cal F}(s|x,v)=\exp\left[-\Lambda(s)\phi(x)v\right]$ (Abbring, 2012, Section 4.1). Using iterated expectations, the Laplace transform ${\cal F}(\cdot|x)$ of the distribution $F(\cdot|x)$ of $T|X=x$ follows from

[TABLE]

with ${\cal G}$ the Laplace transform of the distribution $G$ of $V$ .

2.3 Nontrivial Gaussian Component

To facilitate the numerical computation of the MHT model’s likelihood and ensure standard conditions for the maximum likelihood estimator, we assume throughout the paper’s remainder that $\{Y\}$ has a nontrivial Gaussian component:

Assumption 1 (Nontrivial Gaussian Component).

$\psi$ satisfies (1) with $\sigma>0$ .

Assumption 1 excludes the case that $\{Y\}$ is a pure-jump process. To motivate this assumption, first consider the special case that $\{Y\}$ itself is a nontrivial Brownian motion, i.e. a Brownian motion with general drift coefficient $\mu\in\mathbb{R}$ and dispersion coefficient $\sigma\in(0,\infty)$ (obviously, this case satisfies Assumption 1). Then, $\psi(s)$ equals $\psi_{\mathrm{BM}}(s;\mu,\sigma)\equiv\mu s+\sigma^{2}s^{2}/2$ , so that $\Lambda(0)$ equals $\Lambda_{\mathrm{BM}}(0;\mu,\sigma)\equiv\min\{0,-2\mu/\sigma^{2}\}$ and $\Lambda(s)$ equals

[TABLE]

For later reference, we have made the dependence on the parameters $\mu$ and $\sigma$ explicit here. Because there are no jumps, there is no ambiguity in the treatment of small and large jumps, and this parameterization of $\psi$ is unique. In particular, the Lévy-Khintchine representations (1) and (2) of $\psi$ coincide, and $\mu=\tilde{\mu}$ .

In this special case, the distribution of $T|X=x,V=v$ is known to be inverse Gaussian, with explicit expressions for its Lebesgue density and survival function (see Section 3.2). If $\mu\geq 0$ , then $\Lambda_{\mathrm{BM}}(0;\mu,\sigma)=0$ and the distribution of $T|X=x,V=v$ is nondefective. If $\mu<0$ , however, $\Lambda_{\mathrm{BM}}(0;\mu,\sigma)=-2\mu/\sigma^{2}>0$ and the distribution of $T|X=x,V=v$ has a defect of size $1-\exp(2\phi(x)v\mu/\sigma^{2})$ . Either way, the MHT model specifies a mixed inverse Gaussian distribution for $T|X=x$ in this special case.555Mixed inverse Gaussian distributions have been used to model duration data in the statistical literature. For example, Aalen and Gjessing (2001) proposed such a model with parametric mixing over the Brownian motion’s drift coefficient $\mu$ . Because this distribution has a Lebesgue density with full (and thus parameter-independent) support, it is straightforward to specify the likelihood for a parametric specification of $\phi$ and $G$ and to compute the corresponding maximum likelihood estimator, and this estimator will have standard asymptotic properties.

If $\{Y\}$ is a more general spectrally-negative Lévy process, then $F(\cdot|x)$ may have parameter-dependent support. For example, if $Y(t)=\mu t$ , then $T(\phi(x)v)=\mu^{-1}\phi(x)v$ , so that $F(\cdot|x)$ is concentrated on the support of $\mu^{-1}\phi(x)V$ . Assumption 1 excludes this pathology.

Lemma 1 (Absolute Continuity).

If Assumption 1 holds then, for given $(x,v)\in{\cal X}\times(0,\infty)$ and some positive density $f(\cdot|x,v)$ , $F(t|x,v)=\int_{0}^{t}f(u|x,v)du$ for all $t\in[0,\infty)$ .

Proof.

Because $\phi(x)v>0$ and $\lim_{s\rightarrow\infty}\Lambda(s)=\infty$ , $F(0|x,v)=\lim_{s\rightarrow\infty}{\cal F}(s|x,v)=\lim_{s\rightarrow\infty}\exp\left[-\Lambda(s)\phi(x)v\right]=0$ . Moreover, by Assumption 1, for given $t\in(0,\infty)$ , the distribution of $Y(t)$ is the convolution of a normal distribution and the distribution of the cumulated jumps, and therefore has a positive Lebesgue density on $\mathbb{R}$ . Using that and $\phi(x)v>0$ , Bertoin (1996, Chapter VII, Corollary 3) implies that $F(\cdot|x,v)$ has a positive Lebesgue density $f(\cdot|x,v)$ on $(0,\infty)$ , and $F(t|x,v)=\int_{0}^{t}f(u|x,v)du$ for all $t\in[0,\infty)$ . ∎

Note that, by Lemma 1 and Fubini’s theorem, Assumption 1 also implies that $F(t|x)=\int_{0}^{t}f(u|x)du$ , for all $t\in[0,\infty)$ , with positive Lebesgue density $f(\cdot|x)\equiv\int_{0}^{\infty}f(\cdot|x,v)dG(v)$ . Thus, Assumption 1 ensures that a standard parametric maximum likelihood approach can be used, as in the purely Gaussian case. A complication is that the distribution $F(\cdot|x)$ and its density $f(\cdot|x)$ are generally not known in closed form and need to be computed by inverting their Laplace transforms. As we will see in Section 3.3, Assumption 1 facilitates a crucial computational simplification of this inversion. Moreover, in the next section, we will see that Assumption 1, together with Abbring’s (2012) assumptions and innocuous normalizations, suffices for the model’s point identification.

2.4 Nonparametric Identification

The MHT model’s primitives are $\psi$ , $\phi$ , and $G$ . By Feller (1971, Section XIII.1, Theorem 1), there is a one-to-one relation between a probability distribution and its Laplace transform. Thus, we can equivalently write the primitives as $\psi$ , $\phi$ , and ${\cal G}$ . By (3) and the definition of $\Lambda$ , each specification of such an MHT triplet $(\psi,\phi,{\cal G})$ implies a Laplace transform ${\cal F}(\cdot|x)$ of the distribution $F(\cdot|x)$ , and thus $F(\cdot|x)$ itself, for all $x\in{\cal X}$ .

One may wonder whether, conversely, knowledge of ${\cal F}(\cdot|x)$ , $x\in{\cal X}$ , would allow one to uniquely determine (“identify”) the model’s primitives $(\psi,\phi,{\cal G})$ , perhaps after imposing some normalizations and restrictions. To be practical, we explicitly take into account that data on $T$ and $X$ will not allow us to determine ${\cal F}(\cdot|x)$ if $\Pr(X=x)=0$ . So, suppose that we can determine ${\cal F}(\cdot|X)$ up to almost sure equivalence; that is, that we know $\mathbb{E}\left[{\cal F}(\cdot|X)I(X\in B)\right]=\mathbb{E}\left[\exp\left(-sT\right)I(X\in B)\right]$ for all measurable $B\subseteq{\cal X}$ . Section 3.1 assumes a simple type of independent right censoring scheme for which this is true: random sampling from $(\min\{T,C\},I(T\leq C),X)$ , with $T$ and $X$ drawn from the joint distribution of $(T,X)$ implied by some marginal distribution of $X$ and the model’s conditional distribution $F(\cdot|x)$ , $x\in{\cal X}$ , and, for given $X$ , the censoring time $C$ drawn, independently from $T$ , from a conditional distribution such that $\Pr(C\geq t|X)>0$ for all $t\in[0,\infty)$ .666From the censored data, both the subdensity $f(t|X)\Pr(C\geq t|X)$ , for almost all $t$ , and the joint survival function $\Pr(T\geq t,C\geq t|X)=\overline{F}(t|X)\Pr(C\geq t|X)$ are identified up to almost sure equivalence. Thus, the hazard rate $f(t|X)/\overline{F}(t|X)=f(t|X)\Pr(C\geq t|X)/\Pr(T\geq t,C\geq t|X)$ is identified for almost all $t$ , which determines ${\cal F}(\cdot|X)$ , up to almost sure equivalence. See e.g. Cox (1962). This argument extends to more general forms of independent censoring (see e.g. Andersen et al., 1993). Note that this includes the case in which we have “complete” observations from the joint distribution of $(T,X)$ (if $C=\infty$ always) and extends to more general independent censoring schemes.

Following Gill and Robins (2001, Section 3), we deal with the ambiguity arising from conditioning on (possibly) continuous covariates by assuming continuity of their effects. Let $B(x,\delta)$ be an open ball of radius $\delta>0$ around $x\in\mathbb{R}^{K}$ . The support ${\cal X}$ of $X$ contains all points $x\in{\cal X}$ such that $\Pr(X\in B\left(x,\delta)\right)>0$ for all $\delta>0$ .

Assumption 2 (Continuity of the Covariate Effects).

The function $\phi$ and support ${\cal X}$ of $X$ are such that, for each $x\in{\cal X}$ , $\lim_{\delta\downarrow 0}\sup_{x^{\prime}\in B(x,\delta)\cap{\cal X}}|\phi(x^{\prime})-\phi(x)|=0$ .

For isolated mass points $x\in{\cal X}$ , $B(x,\delta)\cap{\cal X}=\{x\}$ for small enough $\delta$ , and Assumption 2 does not constrain $\phi$ . For points $x$ such that $B(x,\delta)\subseteq{\cal X}$ for some $\delta>0$ , Assumption 2 simply requires continuity of $\phi$ , as a function on $\mathbb{R}^{K}$ , at $x$ . If $X$ has both finitely discrete and continuous components, then Assumption 2 requires continuity of $\phi$ in the continuous components for given values of the discrete components. Assumption 2 is satisfied if, for example, $\phi(x)=\exp(x^{\prime}\beta)$ for some parameter vector $\beta\in\mathbb{R}^{K}$ .

Lemma 2 (Identification of the Conditional Distribution).

If Assumption 2 holds, then

[TABLE]

Proof.

By Assumption 2 and continuity of ${\cal G}$ , for every $\epsilon>0$ , there exists a $\delta>0$ such that $|{\cal F}(s|x^{\prime})-{\cal F}(s|x)|=|{\cal G}\left(\Lambda(s)\phi(x^{\prime})\right)-{\cal G}\left(\Lambda(s)\phi(x)\right)|<\epsilon$ for all $x^{\prime}\in B(x,\delta)$ , so that $\left|{\cal F}(s|x)-\mathbb{E}\left[\exp(-sT)I(X\in B(x,\delta))\right]/\mathbb{E}\left[I(X\in B(x,\delta))\right]\right|<\epsilon$ . ∎

Note that, if $x$ is an isolated point in ${\cal X}$ , then (5) reduces to ${\cal F}(s|x)=\mathbb{E}\left[\exp(-sT)|X=x\right]$ .

Following Abbring (2012), our identification analysis exploits variation of the threshold with the covariates.

Assumption 3 (Nontrival Covariate Effects).

For some $x_{0},x_{1}\in{\cal X}$ , $\phi(x_{0})\neq\phi(x_{1})$ .

As is clear from the proof of the following theorem, under Assumption 2, the covariate values $x_{0}$ and $x_{1}$ in Assumption 3 can be identified with values such that $F(\cdot|x_{0})\neq F(\cdot|x_{1})$ .

Theorem 1 (Nonparametric Identification).

Let $(\psi,\phi,{\cal G})$ and $(\tilde{\psi},\tilde{\phi},\widetilde{{\cal G}})$ be MHT triplets that satisfy Assumptions 1–3 and are observationally equivalent (imply the same conditional distribution $F(\cdot|X)$ up to almost sure equivalence). Then, for some $a,b\in(0,\infty)$ : $\tilde{\psi}(s)=\psi(as)$ and $\widetilde{{\cal G}}(s)={\cal G}(bs)$ for all $s\in[0,\infty)$ , and $\tilde{\phi}=ab^{-1}\phi$ .

Proof.

By Assumption 2 and Lemma 2, we can identify ${\cal F}(\cdot|x)$ for all $x\in{\cal X}$ . In particular, we can identify $x_{0},x_{1}\in{\cal X}$ such that ${\cal F}(s|x_{0})={\cal G}\left[\Lambda(s)\phi(x_{0})\right]\neq{\cal G}\left[\Lambda(s)\phi(x_{1})\right]={\cal F}(\cdot|x_{1})$ , which exist by Assumption 3. Take these $x_{0}$ and $x_{1}$ as given.

We have that $(\psi;\phi(x_{0}),\phi(x_{1});{\cal G})$ and $(\tilde{\psi};\tilde{\phi}(x_{0}),\tilde{\phi}(x_{1});\widetilde{{\cal G}})$ imply the same identified ${\cal F}(\cdot|x_{0})$ and ${\cal F}(\cdot|x_{1})$ , and that ${\cal F}(\cdot|x_{0})\neq{\cal F}(\cdot|x_{1})$ . This is the two-sample problem studied by Abbring (2012). We first apply Abbring’s Theorem 1, with Assumption 1, to this two-sample problem and then extend the argument to the full domain ${\cal X}$ of $\phi$ and $\tilde{\phi}$ .

The Lévy-Khintchine formula (1), $\int\min\{1,y^{2}\}\Upsilon(dy)<\infty$ , and dominated convergence imply that $\psi^{\prime}(s)=\tilde{\mu}+\sigma^{2}s+\int_{(-\infty,0)}\left\{y\mathrm{e}^{sy}-yI(y>-1)\right\}\Upsilon(dy)$ . Using dominated convergence once more, it follows that $\lim_{s\rightarrow\infty}s^{-1}\psi^{\prime}(s)=\sigma^{2}$ . With Assumption 1, this gives $\lim_{s\rightarrow\infty}\psi^{\prime}(ws)/\psi^{\prime}(s)=\lim_{s\rightarrow\infty}w(ws)^{-1}\psi^{\prime}(ws)/\left[s^{-1}\psi^{\prime}(s)\right]=w$ for all $w\in(0,\infty)$ . The same is true for $\tilde{\psi}^{\prime}$ . Thus, both $|\psi^{\prime}|$ and $|\tilde{\psi}^{\prime}|$ vary regularly with exponent $1$ at infinity (Feller, 1971, Section VIII.8). Consequently, Abbring (2012, Theorem 1) applies with $\rho=1$ . Noting that Abbring’s setup, unlike ours, imposes a scale normalization on $\phi$ , this implies that, for some $a,b\in(0,\infty)$ , $\tilde{\Lambda}=a^{-1}\Lambda$ and $\widetilde{{\cal G}}(s)={\cal G}(bs)$ for all $s\in[0,\infty)$ . The inverse of $\Lambda$ equals the restriction of $\psi$ to $[\Lambda(0),\infty)$ and can be uniquely analytically extended to its full domain $[0,\infty)$ ; the same is true for the inverse of $\tilde{\Lambda}$ . This gives $\tilde{\psi}(s)=\psi(as)$ for all $s\in[0,\infty)$ .

Finally, fix any $s\in(0,\infty)$ . Because ${\cal F}(\cdot|x)$ is identified, observational equivalence implies that ${\cal G}\left[\Lambda(s)\phi(x)\right]={\cal F}(s|x)=\widetilde{{\cal G}}\left[\tilde{\Lambda}(s)\tilde{\phi}(x)\right]={\cal G}\left[\Lambda(s)a^{-1}b\tilde{\phi}(x)\right]$ for all $x\in{\cal X}$ . Therefore, $\tilde{\phi}=ab^{-1}\phi$ . ∎

The first part of the proof, which establishes the relation between $(\psi,{\cal G})$ and $(\tilde{\psi},\widetilde{{\cal G}})$ , only uses Assumption 2 for continuity at $x_{0}$ and $x_{1}$ . So, we can relax Assumption 2 accordingly if we weaken Theorem 1’s claim that $\tilde{\phi}=ab^{-1}\phi$ to $\tilde{\phi}(X)=ab^{-1}\phi(X)$ almost surely.

Unlike the model studied by Abbring (2012), our model with a nontrivial Gaussian component is identified, up to two unknown scale parameters $a$ and $b$ . It is easy to see why $a$ and $b$ cannot be determined by data on $T$ and $X$ alone. Mixed hitting times $T(\phi(X)V)$ are not affected by rescaling both the latent process $\{Y\}$ and the threshold $\phi(X)V$ by the same factor, nor by rescaling the threshold factors $\phi(X)$ and $V$ without changing the threshold itself. Specifically, suppose that $(\psi,\phi,{\cal G})$ in Theorem 1 corresponds to a latent process $\{Y\}$ and threshold $\phi(X)V$ . Then, the observationally equivalent $(\tilde{\psi},\tilde{\phi},\widetilde{{\cal G}})$ corresponds to a latent process $\{aY\}$ , an observed threshold factor $ab^{-1}\phi(X)$ , and an unobserved threshold factor $bV$ . Clearly, the implied first hitting times are the same: $\inf\left\{t\geq 0:Y(t)>\phi(X)V\right\}=\inf\left\{t\geq 0:aY(t)>ab^{-1}\phi(X)bV\right\}$ . Identification therefore requires that the scales of two of $\{Y\}$ , $\phi(X)$ and $V$ are normalized. The most convenient way of implementing these normalizations depends on the chosen parameterization.

2.5 Parameterization and Normalization

This paper’s estimation procedure requires a computationally feasible, flexible parameterization of the model. To this end, we specify the Lévy measure $\Upsilon(\cdot;\alpha)$ up to a finite vector of unknown parameters $\alpha$ . With a drift parameter $\mu$ and Gaussian dispersion parameter $\sigma$ , this specification and the Lévy-Khintchine formula (in our proposed specifications, (2)) imply a parameterization $\psi(\cdot;\mu,\sigma,\alpha)$ of the Laplace exponent. We similarly specify $\phi(\cdot;\beta)$ , and ${\cal G}(\cdot;\kappa)$ up to finite vectors $\beta$ and $\kappa$ and collect all parameters in $\theta\equiv(\mu,\sigma,\alpha,\beta,\kappa)$ . We make sure that the proposed parameterizations are unique, in the sense that different values of $\theta$ map into different primitives $\psi(\cdot;\mu,\sigma,\alpha)$ , $\phi(\cdot;\beta)$ , and ${\cal G}(\cdot;\kappa)$ . We also discuss ways to normalize them. A corollary to Theorem 1 then establishes parametric identification.

Latent process

Recall that $\Upsilon(\cdot;\alpha)=0$ and the Laplace exponent equals $\psi_{\mathrm{BM}}(s;\mu,\sigma)=\mu s+\frac{\sigma^{2}}{2}s^{2}$ , with $\sigma>0$ , if $\{Y\}$ is a nontrivial Brownian motion with drift. We distinguish this basic specification with a subscript “BM” because it appears in our computations for more general specifications of $\psi(\cdot;\alpha)$ as well. We consider two such specifications.

The first adds an independent compound Poisson process with a finitely discrete shock distribution to the basic specification. Because $\int_{(-1,0)}y\Upsilon(dy;\alpha)<\infty$ in this case, the Lévy-Khintchine formula (2) now offers the simplest way to parameterize $\psi$ : $\psi(s;\mu,\sigma,\alpha)=\mu s+\frac{\sigma^{2}}{2}s^{2}+\sum_{j=1}^{J}\lambda_{j}\left(\mathrm{e}^{s\nu_{j}}-1\right)$ , where $\alpha\equiv(\lambda_{1},\ldots,\lambda_{J},\nu_{1},\ldots,\nu_{J})$ , with $\lambda_{j}>0$ the Poisson rate at which shocks of size $\nu_{j}<0$ arrive; $j=1,\ldots,J$ ; and $\nu_{1}<\ldots<\nu_{J}$ .777Equivalently, in this specification, shocks arrive at a rate $\lambda\equiv\sum_{j=1}^{J}\lambda_{j}$ and are drawn independently from a distribution with $J$ points of support $(\nu_{1},\ldots,\nu_{J})$ with probabilities $\left(\lambda_{1}/\lambda,\ldots,\lambda_{J}/\lambda\right)$ . We exclude the boundary cases in which $\lambda_{j}=0$ , $\nu_{j}=0$ , or $\nu_{j-1}=\nu_{j}$ , which correspond to specifications with fewer than $J$ shock sizes, to ensure a unique parameterization and standard inference. See Footnote 10.

The second specification instead assumes that shocks arrive at a Poisson rate $\lambda$ and have sizes drawn from a gamma distribution with density $\frac{\omega^{\tau}}{\Gamma(\tau)}\>(-y)^{\tau-1}\exp(\omega y)$ ; $\omega,\tau>0$ ; at $y\in(-\infty,0)$ . We can again use (2), which now gives $\psi(s;\mu,\sigma,\alpha)=\mu s+\frac{\sigma^{2}}{2}s^{2}+\lambda\left\{(s/\omega+1)^{-\tau}-1\right\}$ , where $\alpha\equiv(\lambda,\omega,\tau)$ .

The Lévy-Khintchine formula (2) provides a unique parameterization of the Laplace exponent in terms of the drift parameter $\mu$ , the Gaussian dispersion parameter $\sigma$ , and the Lévy measure $\Upsilon$ .888Bertoin (1996, Chapter 1, Theorem 1) and the discussion following it show that the general Lévy-Khintchine formula (1) provides a unique parameterization of the Laplace exponent in terms of $\tilde{\mu}$ , $\sigma$ , and $\Upsilon$ . Consequently, formula (2) does as well with, as discussed in Section 2.2, a different drift parameter. In turn, our two specifications of the jump process give unique parameterizations of $\Upsilon$ . Consequently, both parameterizations $\psi(\cdot;\alpha)$ are unique.

The scale of $\psi(\cdot;\mu,\sigma,\alpha)$ can be normalized by setting $|\mu|=1$ , which implicitly assumes that $\mu\neq 0$ , or $\sigma=1$ . After all, if $\psi(\cdot;\mu,\sigma,\alpha)$ is a Laplace exponent with $|\mu|=1$ (or $\sigma=1$ ) then, for $a>0$ , $s\mapsto\psi(as;\mu,\sigma,\alpha)$ is a Laplace exponent with $|\mu|=a$ (or $\sigma=a$ ).999One can alternatively normalize the scale of the jump component, which varies across specifications.

Covariate effects

The threshold is naturally specified to be loglinear in the covariates: $\phi(x;\beta)=\exp(x^{\prime}\beta)$ . Note that this specification implies Assumption 2.

Suppose that ${\cal X}\subseteq\mathbb{R}^{K}$ is not contained in a proper linear subspace of $\mathbb{R}^{K}$ . Then, this parameterization is unique: $\exp(x^{\prime}\tilde{\beta})=\exp(x^{\prime}\beta)$ for all $x\in{\cal X}$ implies that $\beta=\tilde{\beta}$ . Moreover, it embodies a scale normalization: For given $\beta$ and $a\in(0,\infty)/\{1\}$ , there exists no $\tilde{\beta}$ such that $a\phi(x;\alpha)=\exp(\ln(a)+x^{\prime}\beta)=\exp(x^{\prime}\tilde{\beta})$ .

Unobserved heterogeneity

We entertain a finitely discrete specification of $G$ . This specification is versatile, computationally convenient, and appears naturally in Heckman and Singer’s (1984) work on semi-nonparametric estimation of the MPH model. It assumes that $V$ has $L\in\mathbb{N}$ support points $0<v_{1}<\cdots<v_{L}$ , with $0<\pi_{l}\equiv\Pr(V=v_{l})<1$ ; $l=1,\ldots,L$ . Then, ${\cal G}(s;\kappa)=\sum_{l=1}^{L}\pi_{l}\exp(-sv_{l})$ , with $\kappa\equiv(v_{1},\ldots,v_{L},\pi_{1},\ldots,\pi_{L-1})$ and $\pi_{L}\equiv 1-\sum_{l=1}^{L-1}\pi_{l}$ .101010We assume that all $\pi_{l}\in(0,1)$ and that all support points are distinct to ensure that the parameterization of $G$ is unique. In practice, we may want to include the boundary cases, because these correspond to specifications with fewer than $L$ support points. This, however, leads to nonstandard identification and inference, because we can either reduce the number of support points from $L$ to $L-1$ by setting $\pi_{L}=0$ , in which case $v_{L}$ is irrelevant, or by setting $v_{L-1}=v_{L}$ , in which case only $\pi_{L-1}+\pi_{L}$ matters. The inequality constraints ensure that the parameterization is unique. It can be scale normalized by setting $v_{1}=1$ .

Corollary 1 (Parametric Identification).

Let $\theta$ and $\tilde{\theta}$ , via one of this section’s parameterizations, map into observationally equivalent MHT triplets. Suppose that Assumptions 1 and 3 hold, ${\cal X}\subseteq\mathbb{R}^{K}$ is not contained in a proper linear subspace of $\mathbb{R}^{K}$ , and either $\phi$ or ${\cal G}$ is scale normalized. Then, $\theta=\tilde{\theta}$ .

Corollary 1 does not rely on the fact that the finitely discrete specification of $G$ ensures that $\mathbb{E}[V]<\infty$ , which would suffice for identification without Assumption 1 (see Abbring, 2012, Section 4.3). We maintain Assumption 1, because it is essential to our approach to estimation (see Section 2.3) and allows for alternative specifications of $G$ that do not imply $\mathbb{E}[V]<\infty$ . This may, for example, be useful in an extension to sieve estimation, in which it may be hard to impose $\mathbb{E}[V]<\infty$ (see Section 6).

3 Maximum Likelihood Estimation

Fix one of the previous section’s parameterizations $\theta\mapsto[\psi(\cdot;\mu,\sigma,\alpha),\phi(\cdot;\beta),{\cal G}(\cdot;\kappa)]$ . Denote the implied parametric density of $T|X=x$ with $f(\cdot|x;\theta)$ and the corresponding survival function with $\overline{F}(\cdot|x;\theta)$ . Similarly, write $f(\cdot|x,v;\theta)$ and $\overline{F}(\cdot|x,v;\theta)$ . This section presents a method for evaluating this parameterization’s likelihood for a basic but common sampling scheme, using the Gaussian special case as a benchmark.

3.1 Sampling and Likelihood

Let $\left\{(T_{1},X_{1}),\ldots,(T_{N},X_{N})\right\}$ be a random sample from the distribution of $(T,X)$ induced by $F(\cdot|x;\theta_{0})$ , $x\in{\cal X}$ , at the “true” parameter vector $\theta_{0}$ and some marginal distribution of $X$ . We do not directly observe this complete sample, but only a censored version of it: $\left\{(T_{1}^{*},D_{1},X_{1})\ldots,(T_{N}^{*},D_{N},X_{N})\right\}$ . Here, $T_{n}^{*}\equiv\min\{T_{n},C_{n}\}$ is the observed duration and $D_{n}\equiv I(T_{n}\leq C_{n})$ a censoring indicator, for some random censoring time $C_{n}$ . Note that a complete observation $(T_{n}^{*},D_{n})=(t,1)$ pairs an MHT event $T_{n}=t$ with a censoring event $C_{n}\geq t$ , whereas a censored observation $(T_{n}^{*},D_{n})=(t,0)$ corresponds to $T_{n}>t$ and $C_{n}=t$ .

We assume a simple type of independent right-censoring (Andersen et al., 1993). Suppose that $(T_{n},C_{n},X_{n})$ is independent across $n$ and that, conditional on $X_{n}$ , $C_{n}$ is independent of $T_{n}$ , with a distribution that does not depend on $\theta_{0}$ . Then, conditional on $X_{n}$ , the likelihood contribution of $(T_{n}^{*},D_{n})$ factorizes in an MHT part, $f(T_{n}^{*}|X_{n};\theta)^{D_{n}}{\overline{F}}(T_{n}^{*}|X_{n};\theta)^{1-D_{n}}$ , and a censoring part that does not depend on $\theta$ . Thus, the conditional likelihood is proportional to $\prod_{n=1}^{N}f(T_{n}^{*}|X_{n};\theta)^{D_{n}}{\overline{F}}(T_{n}^{*}|X_{n};\theta)^{1-D_{n}}$ . Its maximizer is the full-information maximum likelihood estimator of $\theta_{0}$ if the covariates $X_{n}$ carry no information on $\theta_{0}$ .

Note that the case without censoring, so that $T^{*}_{n}=T_{n}$ and $D_{n}=1$ almost surely for all $n$ , is included as a special case in which $C_{n}=\infty$ almost surely for all $n$ . Also, with more general independent right censoring schemes, the resulting estimator remains a valid (but often, partial) likelihood estimator (Andersen et al., 1993). Moreover, the likelihood, and the corresponding estimator, can easily be adapted to other practically relevant sampling schemes, such as those involving interval censoring.

3.2 Gaussian Special Case

Suppose that $\{Y\}$ is a Brownian motion with drift, so that, by the analysis in Section 2.3, $T|X$ has a mixed inverse Gaussian distribution. Then, up to a constant containing the censoring time events, the log conditional (on the covariates) likelihood $\ell_{N}(\theta)$ equals

[TABLE]

where

[TABLE]

is the Lebesgue density of the inverse Gaussian distribution and

[TABLE]

is its survival function (Cox and Miller, 1965, Section 5.4). Here, $\Phi$ is the cumulative standard normal distribution function. With Section 2.5’s finite discrete specification of $G$ , the log likelihood in (6) reduces to

[TABLE]

If we e.g. specify $\phi(x;\beta)=\exp(x^{\prime}\beta)$ , this log likelihood, its derivatives, and its maximizer $\hat{\theta}_{N}$ are easy to compute using (7) and (8). Under standard regularity conditions, including the normalizations and assumptions needed for Corollary 1’s parametric identification, $\hat{\theta}_{N}$ is a consistent and asymptotically normal estimator of $\theta_{0}$ . Given the assumption that the marginal distribution of $X$ and the censoring times carry no information on $\theta_{0}$ , it is also asymptotically efficient. Its asymptotic covariance matrix can quickly be estimated using either the score or Hessian characterization of the Fisher information matrix.

Many of the models studied in the statistics literature similarly lead to explicit expressions for the likelihood that facilitate estimation (Lee and Whitmore, 2006). In the general Lévy case, such explicit expressions are not available, and maximum likelihood cannot be implemented directly. The next section develops methods for computing the maximum likelihood estimator and its asymptotic distribution in this general case.

3.3 General Case

In general, $f(\cdot|x;\theta)$ and ${\overline{F}}(\cdot|x;\theta)$ are not explicitly known, but can be computed by numerically inverting their Laplace transforms. Our approach is based on the work of Rogers (2000), who applied a variant of Abate and Whitt’s (1992) inversion method to the problem of calculating the first-passage-time distribution of a spectrally one-sided Lévy process.

Following Rogers, we first consider calculating the survival function ${\overline{F}}(\cdot|x;\theta)$ . Using integration by parts, it is easy to show that its Laplace transform ${\overline{\cal F}}(s|x;\theta)\equiv\int_{0}^{\infty}\exp(-st)\overline{F}(t|x;\theta)dt=s^{-1}\left\{1-{\cal F}\left(s|X\right)\right\}$ . So, for given $\theta$ , we can explicitly construct ${\overline{\cal F}}(s|x;\theta)=s^{-1}\left\{1-{\cal G}\left[\Lambda(s;\mu,\sigma,\alpha)\phi(x;\beta);\kappa\right]\right\}$ and obtain $\overline{F}(\cdot|x;\theta)$ using Mellin’s inverse formula (e.g. Davies, 2002),

[TABLE]

Here, the integration is along the contour $\gamma_{\xi}:u\in[-1,1]\mapsto c+\mathrm{i}\xi u$ , which traces out a straight line in $\mathbb{C}$ , parallel to the imaginary axis from $c-\mathrm{i}\xi$ to $c+\mathrm{i}\xi$ . We make this contour’s dependence on $c\in\mathbb{R}$ explicit by writing $\gamma_{\xi}(u;c)$ for its value at $u$ . The parameter $c$ should be chosen such that it is larger than the real part of any singularity in the Laplace transform ${\overline{\cal F}}(\cdot|x;\theta)$ . Because ${\overline{\cal F}}(\cdot|x;\theta)$ is analytic on the set of all $s$ with $\Re\,s>0$ , we can choose any $c>0$ .

The integral in (10) does not generally have an explicit solution, but can be efficiently approximated using numerical methods. A key complication is that our specification of ${\overline{\cal F}}(\cdot|x;\theta)$ involves the inverse function $\Lambda$ , which cannot generally be expressed in closed form. To circumvent this problem, we follow Rogers and instead integrate along the composition $\tilde{\gamma}_{\xi}\equiv\psi\circ\Lambda_{\mathrm{BM}}\circ\gamma_{\xi}$ , which is a contour in $\mathbb{C}$ from $\psi\left[\Lambda_{\mathrm{BM}}\left(c-\mathrm{i}\xi;\mu,\sigma\right);\mu,\sigma,\alpha\right]$ to $\psi\left[\Lambda_{\mathrm{BM}}\left(c+\mathrm{i}\xi;\mu,\sigma\right);\mu,\sigma,\alpha\right]$ . Here, $\Lambda_{\mathrm{BM}}$ is the inverse of the Laplace exponent of the Brownian motion component of $\psi$ , for which (4) gives an explicit expression. Note that $\Lambda_{\mathrm{BM}}$ necessarily has the same dispersion parameter $\sigma$ as $\psi$ , but that its drift parameter is not uniquely pinned down (because the drift parameter of $\psi$ depends on the way we deal with small shocks; see Section 2.2). Fortunately, the exact value of the drift parameter of $\Lambda_{\mathrm{BM}}$ plays no role in the argument that follows. It can generally be set to the drift parameter in the specific parameterization of $\psi$ used; for example, $\tilde{\mu}$ in (1) or $\mu$ in (2). Following Section 2.5’s specifications of $\psi$ with compound Poisson jumps, we have set the drift parameter of $\Lambda_{\mathrm{BM}}$ equal to $\mu$ in (2). We make the transformed contour’s dependence on $c$ and the parameters of $\psi$ explicit by writing $\tilde{\gamma}_{\xi}(u;\mu,\sigma,\alpha,c)$ for its value at $u$ .

Rogers argued that, under Assumption 1, replacing $\gamma_{\xi}$ by $\tilde{\gamma}_{\xi}$ in (10) does not affect that integral’s value, so that

[TABLE]

with

[TABLE]

which no longer involves $\Lambda$ . This argument relies on Cauchy’s integral theorem, which implies that an integral over the analytic integrand in (10) along a closed contour equals zero. This is particularly true for the closed contour formed by going up $\gamma_{\xi}$ from $\gamma_{\xi}(-1;c)$ to $\gamma_{\xi}(1;c)$ , crossing over from $\gamma_{\xi}(1;c)$ to $\tilde{\gamma}_{\xi}(1;\mu,\sigma,\alpha,c)$ , going down $\tilde{\gamma}_{\xi}$ from $\tilde{\gamma}_{\xi}(1;\mu,\sigma,\alpha,c)$ to $\tilde{\gamma}_{\xi}(-1;\mu,\sigma,\alpha,c)$ , and crossing back from $\tilde{\gamma}_{\xi}(-1;\mu,\sigma,\alpha,c)$ to $\gamma_{\xi}(-1;c)$ . Consequently, the integrals in (10) and (11) are equal, provided that the integrals over the contour from $\gamma_{\xi}(1;c)$ to $\tilde{\gamma}_{\xi}(1;\mu,\sigma,\alpha,c)$ and the contour from $\gamma_{\xi}(-1;c)$ to $\tilde{\gamma}_{\xi}(-1;\mu,\sigma,\alpha,c)$ vanish as $\xi\rightarrow\infty$ . Rogers concluded that this is the case, because the integrand vanishes sufficiently fast along these two contours as $\xi\rightarrow\infty$ (in particular, $s{\overline{\cal F}}(s|x;\theta)\rightarrow 1$ as $|s|\rightarrow\infty$ ) and, under Assumption 1, their lengths do not grow too fast with $\xi$ . In particular,

[TABLE]

converges to zero as $\xi\rightarrow\infty$ (note that the right hand side of (1) is dominated by the Gaussian term for large $s$ ). Similarly, $\left|\frac{\gamma_{\xi}(-1;c)-\tilde{\gamma}_{\xi}(-1;\mu,\sigma,\alpha,c)}{\gamma_{\xi}(-1;c)}\right|\rightarrow 0$ as $\xi\rightarrow\infty$ .

Using a change of variables, we can rewrite (11) as an integral over the real line:

[TABLE]

where $\overline{q}(t,u|x;\theta,c)\equiv\overline{q}^{*}(t,c+\mathrm{i}u|x;\theta)$ . Following Abate and Whitt, we can apply the trapezoidal rule to approximate (12) with the infinite sum

[TABLE]

where $h>0$ is the rule’s step size. Note that we only need to approximate the real part of (12), because its imaginary part should be zero. Abate and Whitt discussed the error introduced by this discretization and noted that it works particularly well because the integrand oscillates and the approximation errors tend to cancel out.

In practice, we need to truncate the infinite sum $\overline{S}_{\infty}(t|x;\theta,c,h)$ in (13) to $\overline{S}_{R}(t|x;\theta,c,h)\equiv\frac{h}{2\pi}\sum_{r=-R}^{R}\Re\,\overline{q}(t,rh|x;\theta,c)$ for some $R\in\mathbb{N}$ and use extrapolation to approximate the case where $R\rightarrow\infty$ . Because $\overline{S}_{R}(t|x;\theta,c,h)$ is nearly periodic in $R$ , $\lim_{R\rightarrow\infty}\overline{S}_{R}(t|x;\theta,c,h)$ can be efficiently approximated using Euler summation:

[TABLE]

for some $M\in\mathbb{N}$ . Abate and Whitt proposed to estimate the associated error by $\overline{E}_{R,M+1}(t|x;\theta,c,h)-\overline{E}_{R,M}(t|x;\theta,c,h)$ . In our case, this estimate quickly tends to zero as M is increases, which suggests that the approximation is accurate (see also Section 4).

We follow a similar procedure to calculate the density $f(\cdot|x;\theta)$ from its Laplace transform ${\cal F}(\cdot|x;\theta)$ . We again start with Mellin’s inverse formula (10) with contour $\gamma_{\xi}$ , but now with $f(t|x;\theta)$ in its left hand side and ${\cal F}(s|x;\theta)$ in its right hand side. With the finitely discrete specification of $G$ , ${\cal F}(s|x;\theta)$ vanishes more rapidly than $\overline{{\cal F}}(s|x;\theta)$ ( $s{\cal F}(s|x;\theta)\rightarrow 0$ , whereas $s{\overline{\cal F}}(s|x;\theta)\rightarrow 1$ ) as $|s|\rightarrow\infty$ .111111This follows from the fact that the behavior of ${\cal F}(s|x;\theta)$ for large $s$ is dominated by the term $\pi_{1}\exp\left\{-\Lambda(s;\mu,\sigma)\phi(x;\beta)v_{1}\right\}$ corresponding to the lowest support point $v_{1}$ of $G$ . With specifications of $G$ that have support near zero, ${\cal F}(s|x;\theta)$ may vanish more slowly than $\overline{{\cal F}}(s|x;\theta)$ as $|s|\rightarrow\infty$ . For example, if $G$ is a gamma distribution, one can show that $|s{\overline{\cal F}}(s|x;\theta)|\rightarrow\infty$ as $|s|\rightarrow\infty$ . Simulations suggest our procedure is nevertheless accurate in this case. This suggests that we can again replace the contour $\gamma_{\xi}$ in Mellin’s inverse formula with $\tilde{\gamma}_{\xi}$ and that

[TABLE]

where

[TABLE]

As before, we can rewrite this into an integral over the real line,

[TABLE]

where $q(t,u|x;\theta,c)\equiv q^{*}(t,c+\mathrm{i}u|x;\theta)$ , and approximate this integral with an Euler sum $E_{R,M}(t|x;\theta,c,h)$ .

One could control the computation of $f(t|x;\theta)$ and $\overline{F}(t|x;\theta)$ with different tuning parameters $c$ , $h$ , $R$ , and $M$ . However, as our notation $E_{R,M}(t|x;\theta,c,h)$ and $\overline{E}_{R,M}(t|x;\theta,c,h)$ for the corresponding Euler sums suggests, we will not do so in this paper. We take guidance from Rogers in setting the common values of $c$ , $h$ , $R$ , and $M$ . In the next sections, we find that his suggestion to use duration- $t$ specific values $c=11/t$ and $h=\pi/t$ yields good numerical performance in our case. We will adopt these as our default settings, together with $R=9$ and $M=25$ .121212Rogers (2000) claimed that $R=6$ and $M=15$ trade off accuracy and speed well. Because of the advances in computing speed since then, we can opt for more accuracy. See Section 4 for some details.

The log likelihood for an independently censored sample satisfies

[TABLE]

We have implemented an estimator in MATLAB that maximizes this approximate log likelihood using a quasi-Newton algorithm with BFGS updates for the Hessian and multiple random starting values (Nocedal and Wright, 2006).

We supply an analytical gradient of the approximate log likelihood with respect to the parameter vector $\theta$ to ensure quick and stable maximization. This gradient sums contributions of the $N$ observations. Consider the contribution of observation $n$ . Suppose that this observation is complete ( $D_{n}=1$ ; the calculations for a censored observation are similar). The approximate likelihood contribution of this observation, $E_{R,M}(T^{*}_{n}|X_{n};\theta,c,h)$ , is the real part of a weighted sum of $q(T^{*}_{n},rh|X_{n};\theta,c)$ over finitely many values of $r$ , with weights that do not depend on $\theta$ . Each term $q(T^{*}_{n},rh|X_{n};\theta,c)$ in this weighted sum is the product of three factors;

[TABLE]

that are smooth in $\theta$ and $z$ , composed with $z=\Lambda_{\mathrm{BM}}(c+\mathrm{i}rh;\mu,\sigma)$ , which is itself smooth in $\mu$ and $\sigma$ . Its complex-valued derivative with respect to $\theta$ follows from tedious but straightforward application of the product and chain rules. We ignore the imaginary part of the weighted sum of these derivatives over $r$ , because the imaginary part of the likelihood contribution $f(T^{*}_{n}|X_{n};\theta)$ that we approximate with $E_{R,M}(T^{*}_{n}|X_{n};\theta,c,h)$ is zero. So, we set the contribution of observation $n$ to the gradient of the log likelihood equal to the real part of this weighted sum of derivatives, divided by $E_{R,M}(T^{*}_{n}|X_{n};\theta,c,h)$ . The analytical gradient sums these contributions. We construct asymptotic standard errors from the corresponding Hessian, which we calculate using finite differences of the analytical gradient. The replication package (Abbring and Salimans, 2021) provides further details.

The MATLAB code currently normalizes $\psi(\cdot|\mu,\sigma,\alpha)$ by setting $\mu=1$ . Note that this implicitly assumes that $\mu>0$ . It would be straightforward to adapt the code to instead normalize $|\mu|=1$ , which more generally allows for $\mu\neq 0$ , or $\sigma=1$ , which does not restrict $\mu$ at all.

Our estimator maximizes an approximate log likelihood. For some applications, it has been shown that the maximum approximate likelihood estimator is first order equivalent to the exact maximum likelihood estimator if the approximations improve sufficiently quickly with the sample size (e.g. Aït-Sahalia, 2002). We could try to derive a similar equivalence result for our estimator, using Abate and Whitt’s numerical analysis and some further results on the tail behavior of $\overline{q}(t,u|x;\theta,c)$ and ${q}(t,u|x;\theta,c)$ . However, as we will see in Section 4, we can compute our estimator very accurately in reasonable time, so that a formal result establishing how accuracy should increase with sample size would not be of much practical use. Therefore, we take the pragmatic approach that much of the literature has taken and simply apply standard maximum likelihood asymptotics.131313This is how Singleton (2001) handled his maximum likelihood estimator of a discretely sampled affine diffusion, which, like our estimator, required numerical Fourier inversion. He expressed some worries about the computational burden of his Fourier inversion procedure, but only for the multivariate case. We only use univariate Fourier inversion and benefit from 20 years of computational development.

4 Numerical Experiments

We have investigated the accuracy of the proposed likelihood approximation by conducting a range of numerical experiments. We discuss the results of three of these experiments here. All three experiments use the default settings for the parameters that control the approximation, unless explicitly stated otherwise. The first two experiments directly compare the explicitly known duration density and likelihood implied by MHT models without shocks to their approximations. The third experiment focuses on a model with shocks, for which the implied duration density is not known in explicit form.

The first experiment compares direct computations of the log likelihood function of the mixed inverse Gaussian model using the explicit expression for the density in (7) to its numerical approximations as we vary $M$ . The log likelihood is calculated on the data set that we use in Section 5. This ensures that this experiment provides both a real life test case and a check on the results we present in that section. The data contain 566 complete strike durations. Because the approximation errors are close to unbiased, the error in the log likelihood scales with the root of the sample size.

Figure 1 plots the average of the absolute approximation error of the log likelihood, for different values of $M$ , over 100 model parameters randomly generated at the scale of their maximum likelihood estimates. We find that this average absolute error decreases exponentially with $M$ ; this result is robust across the various parameter values over which the plotted results are averaged. Consistently with Rogers (2000), we see that $M=15$ already provides a decent approximation for most practical purposes. However, because the time required for the calculations grows only linearly in $M$ , we can increase $M$ to 25 at a very low computational cost and obtain a nearly thousandfold increase in precision (with most of the gain already obtained with $M=20$ ). Once $M\geq 25$ , other factors, such as rounding errors, become important, and the approximation error levels off. We also find that, with $M=25$ , increasing $R$ or decreasing the step size $h$ adds very little to the precision of the inversion. The numerical approximation of the log likelihood takes 9–11 times as long to calculate as the analytical expression. However, in absolute terms this is still very manageable. For example, it takes about a second to calculate the density for a specification with shocks on a regular laptop computer 100,000 times.141414We used Figure 3’s specification and MATLAB 2020b on a MacBook Pro (2018, 15inch, 2.9GHz 6-Core Intel Core i9, 32 GB 2400 MHz DDR4) with macOS 10.15.7. Consistently with this, the log likelihood can be maximized, starting from multiple random parameter values for each maximization, in under half a minute for the model specifications that we consider in Section 5.

The second experiment takes a closer look at the numerical approximation of the density $f_{\mathrm{BM}}$ of a basic inverse Gaussian model with parameters such that $\mu=\sigma^{2}=\phi(X;\beta)V=1$ . We only present results for $M=25$ , but found very similar results for any $M\geq 20$ . For the purpose of maximum likelihood estimation, we care most about the errors in the approximation of the log density, $\ln f_{\mathrm{BM}}$ . Figure 2 plots the absolute error of this approximation against the log density itself, on a logarithmic scale. The (log-)linear relation displayed by the graph implies that the absolute error in the approximation of $\ln f_{\mathrm{BM}}(t|X;\theta)$ roughly equals $10^{-11}/f_{\mathrm{BM}}(t|X;\theta)$ . Consequently, the approximation error is generally small, but the approximation breaks down when the density gets very small (say, $f_{\mathrm{BM}}(t|X;\theta)<10^{-10}$ , or $\ln f_{\mathrm{BM}}(t|X;\theta)<-23$ ). When estimating the model with maximum likelihood, we can easily avoid this by setting reasonable starting values for the parameters. This ensures that the approximation is sufficiently precise for numerically robust maximum likelihood estimation.

The third experiment considers a model with shocks and a heterogeneous threshold. Figure 3 plots the approximate density of $\ln T$ for this model, again using $M=25$ . In this case, the true density is not explicitly known, so we compare the approximate density with a fine histogram of many simulated values of $\ln T$ . Our approximate density closely tracks the simulated one. This finding is robust across model specifications.

5 Strike Durations

The mere existence of nontrivial delays in labor agreements has puzzled economists; duration patterns in their resolution have been studied to learn more about underlying bargaining games and information structures.

Lancaster (1972) analyzed strike durations using a Gaussian MHT model with regressors, but without unobserved heterogeneity. He interpreted the gap between the Brownian motion and the threshold as the level of disagreement, and concluded that this model fits his data for the United Kingdom well. Others used proportional hazards models to study strike durations. Kennan (1985), in particular, showed that the US strike duration hazard is $U$ -shaped and took this as evidence against Lancaster’s (homogeneous) MHT model. He noted that this aspect of the data can be interpreted in terms of heterogeneity in the conflicts underlying the strikes, but did not subsequently pursue this in his empirical analysis.

Here, we will investigate whether Kennan’s strike data can be matched well by a more general MHT model that explicitly takes into account unobserved heterogeneity in strikes. Such a model comes with Lancaster’s attractive interpretation in terms of a level of disagreement that may both vary over time and initially be heterogeneous between strikes. We will explicitly discuss our estimation results in terms of this interpretation, with an implicit understanding that it is our modest objective to illustrate our methods and the descriptive and potential structural appeal of the MHT model, without providing a fully structural analysis of strike durations.

Kennan’s (1985) data cover all contract strikes in US manufacturing in the period 1968–1976 that involved at least a thousand workers, and that were classified to be primarily about “general wage changes”. They include the durations in days of 566 strikes and, for each strike, a measure of the state of the business cycle in the month it started: the residuals of a regression of log industrial production in US manufacturing on linear and quadratic trend terms and seasonal dummies. We obtained the data in a fixed format text file strkdur.asc from Cameron and Trivedi’s (2005) web page. We divided all strike durations by seven, so that they are measured in weeks.

Table 1 reports maximum likelihood estimates for a range of Section 2.5’s flexible parameterizations. All reported estimates are computed using Section 3.3’s numerical methods, with $M=25$ . To further check these methods and their MATLAB implementation, we have also computed the same estimates for lower values of $M\geq 15$ (not reported), and estimates for the first five specifications using the explicit expressions for the log likelihood that are available in these cases (not reported). These results are virtually identical to those reported in Table 1.

Columns I–V present estimates of models with Brownian motion latent processes and discrete unobserved heterogeneity. Throughout, the drift is normalized to 1 per week ( $\mu=1$ ), so that $\mathbb{E}\left[T|X,V\right]=-{\cal F}^{\prime}(0+|X,V;\theta)=\exp(X^{\prime}\beta)V$ . By its construction as a regression residual, $X$ varies around zero and is close to zero on average in the sample. Consequently, $V$ can be interpreted as the unobserved initial level of disagreement, measured as the mean number of strike weeks it commands.

The log likelihood substantially improves when adding a second, third and fourth support point to the distribution of $V$ , between Columns I and IV, but a fifth support point (Column V) hardly changes the fit and the other parameters’ estimates. The estimates indicate that there is both substantial heterogeneity in the strikes’ initial levels of disagreement and uncertainty in their evolution over time. The numbers in Column IV imply that there are four unobserved types of labor conflict, on average commanding respectively $1.1$ , $3.2$ , $7.2$ , and $18.6$ strike weeks. Each type’s level of disagreement evolves with a standard deviation per week just above the unit drift towards agreement.

It is instructive to note that the variance of the latent process drops substantially, from close to 20 to just over 1, when more heterogeneity is added between Columns I and IV. Clearly, Column I’s specification falsely attributes heterogeneity in the strikes’ initial levels of disagreement to uncertainty in their evolution over time.

The estimates of the coefficient $\beta$ reflect the effect of the business cycle on strike durations. In line with Kennan’s (1985) results, strikes that begin in months with low production last longer. In the MHT model, this is captured by a countercyclical threshold: In times with low production, in expectation, conflicts command more strike days. One interpretation is that strike days are less costly in times with low production. The precision of the estimates of $\beta$ is low. This is consistent with Kennan’s results. He obtained more precise results with a binary cyclical indicator constructed from the indicator used here. For simplicity, we do not follow this lead here.

Column VI reports an estimate of a specification that includes discrete shocks of size $\nu$ at Poisson times. The estimates point to an infrequent shock that sets back just over five weeks of drift towards agreement. The shock only somewhat improves the likelihood; a specification without shock, such as those in Columns IV and V, seems to be sufficient.

Finally, a very similar result is found with a gamma shock at a Poisson time (not reported). With this specification, virtually the same estimate of the arrival rate of the shocks is obtained. Moreover, the estimated gamma shock distribution is close to degenerate at Column VI’s estimate of the shock size ( $\nu$ ). Specifically, the estimates of the shape ( $\tau$ ) and scale ( $\omega$ ) parameters of the gamma distribution are both very large, and their ratio equals Column VI’s estimated shock size. As expected, the same log likelihood is found.

Figure 4 plots the aggregate hazard implied by the MHT model’s estimates in Column IV of Table 1. It also plots the hazard implied by estimates a MPH model with a Weibull baseline and a discrete heterogeneity distribution with four support points. Note that this MPH specification has exactly the same number of parameters as Column IV’s MHT specification. In both cases, we computed the distribution of $T|X$ implied by these estimates, integrated over the empirical distribution of $X$ , and computed and plotted the hazard rate of the resulting distribution. Figure 4 also plots the empirical hazard rate, computed by kernel smoothing the raw data.

Both the MHT and the MPH models fit the empirical hazard well, but the MPH model’s log likelihood, at $-1577.9$ , is $1.6$ points lower. Because the Weibull baseline is monotonic, the Weibull MPH model can only fit the nonmonotonic strike hazard by compensating an increasing baseline hazard with negative duration dependence due to unobserved heterogeneity. Of course, usually MPH models with richer specifications of the baseline hazard are estimated and a sufficiently rich specification can fit the empirical hazard arbitrarily well.

6 Conclusion

The results in this paper enable applied researchers to analyze duration data with mixed hitting-time (MHT) models using standard likelihood-based estimation and inference methods. The MATLAB code for parametric maximum likelihood estimation that accompanies this paper can directly be applied to either complete or independently right-censored duration data, and is easy to adapt to more general censoring schemes.

Our procedure for likelihood computation lends itself well for use in semi-nonparametric maximum likelihood estimation (e.g. Chen, 2007). As in Heckman and Singer (1984)’s analysis of the MPH model, we could handle unobserved heterogeneity nonparametrically using discrete heterogeneity distributions with a varying number of support points. Some care would have to be taken to ensure that the likelihood approximation continues to work well if the unobserved heterogeneity, in the limit, has support near zero (see Footnote 11). Similarly, the Lévy-Itô decomposition of $\{Y\}$ (see Section 2.2) suggests that we construct a sieve for $\psi$ using Section 2.5’s specification that sums a Gaussian component with an independent compound Poisson component, with the shocks distributed discretely with a varying number of support points. This way, each element of the sieve satisfies Assumption 1 and our computational procedure applies.

The procedure can also be used to implement other likelihood-based methods. For example, it can be combined with data augmentation and Markov chain Monte Carlo methods to implement a Bayesian estimator that can flexibly deal with unobserved heterogeneity.

Two types of empirical application of the MHT framework can be distinguished. First, it can be used as a descriptive framework, much like Cox’s (1972) proportional hazards model and Lancaster’s (1979) mixed proportional hazards model. Section 5’s analysis of Kennan’s (1985) strike data shows that estimates of the MHT model have descriptive appeal, with natural interpretations that nicely complement those that could be obtained from a proportional hazards analysis. Indeed, in statistics, there is substantial interest in the descriptive analysis of duration data with first hitting time models (Singpurwalla, 1995; Yashin and Manton, 1997; Aalen and Gjessing, 2001; Lee and Whitmore, 2006).

Second, it can be applied to the structural empirical analysis of heterogeneous agents’ optimal stopping decisions. Abbring (2012) presents a range of examples, based on the type of optimal stopping models that are reviewed and analyzed in Dixit and Pindyck (1994); Stokey (2009); Kyprianou (2006); Boyarchenko and Levendorskiĭ (2007). These include McDonald and Siegel’s (1986) model for the optimal timing of an irreversible investment; a model of unemployment durations based on Dixit’s (1989) model of entry and exit, complemented with heterogeneity in transition costs; and a model of job separations with heterogeneous search. The identification results in Abbring (2012, 2010) show that data on durations and covariates are informative on the economic primitives of such models. The methods developed in this paper can be applied to measure those primitives.

Acknowledgements

We are grateful to Yanqin Fan, the editor (Dennis Kristensen), two referees, and attendees of various conferences and seminars for their comments. We thank Justin Dijk for excellent research assistance. The research of Jaap Abbring is financially supported by the Dutch Research Council (NWO) through Vici grant 453-11-002. Tim Salimans worked on this paper while employed at Erasmus University Rotterdam.

Bibliography33

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Aalen and Gjessing (2001) Aalen, O. O. and H. K. Gjessing (2001). Understanding the shape of the hazard rate: A process point of view. Statistical Science 16 (1), 1–14.
2Abate and Whitt (1992) Abate, J. and W. Whitt (1992). The Fourier-series method for inverting transforms of probability distributions. Queueing Systems 10 , 5–88.
3Abbring (2010) Abbring, J. H. (2010). Identification of dynamic discrete choice models. Annual Review of Economics 2 , 367–394.
4Abbring (2012) Abbring, J. H. (2012, March). Mixed hitting-time models. Econometrica 80 (2), 783–819.
5Abbring and Salimans (2021) Abbring, J. H. and T. Salimans (2021, April). The likelihood of mixed hitting times: Replication package. Zenodo. https://doi.org/10.5281/zenodo.4670373 . · doi ↗
6Aït-Sahalia (2002) Aït-Sahalia (2002). Maximum likelihood estimation of discretely sampled diffusions: A closed-form approximation approach. Econometrica 70 (1), 223–262.
7Andersen et al. (1993) Andersen, P. K., Ø. Borgan, R. D. Gill, and N. Keiding (1993). Statistical Models Based on Counting Processes . New York: Springer-Verlag.
8Bertoin (1996) Bertoin, J. (1996). Lévy Processes . Number 121 in Cambridge Tracts in Mathematics. Cambridge: Cambridge University Press.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Code & Models

Videos

Taxonomy

The Likelihood of Mixed Hitting Times††thanks: Forthcoming in the Journal of Econometrics: doi.org/10.1016/j.jeconom.2019.08.017.

Abstract

1 Introduction

2 Mixed Hitting-Time Model

2.1 Specification

Definition 1**.**

2.2 Characterization

2.3 Nontrivial Gaussian Component

Assumption 1** **(Nontrivial Gaussian Component).

Lemma 1** **(Absolute Continuity).

Proof.

2.4 Nonparametric Identification

Assumption 2** **(Continuity of the Covariate Effects).

Lemma 2** **(Identification of the Conditional Distribution).

Proof.

Assumption 3** **(Nontrival Covariate Effects).

Theorem 1** **(Nonparametric Identification).

Proof.

2.5 Parameterization and Normalization

Latent process

Covariate effects

Unobserved heterogeneity

Corollary 1** **(Parametric Identification).

3 Maximum Likelihood Estimation

3.1 Sampling and Likelihood

3.2 Gaussian Special Case

3.3 General Case

4 Numerical Experiments

5 Strike Durations

6 Conclusion

Acknowledgements

Definition 1.

Assumption 1 (Nontrivial Gaussian Component).

Lemma 1 (Absolute Continuity).

Assumption 2 (Continuity of the Covariate Effects).

Lemma 2 (Identification of the Conditional Distribution).

Assumption 3 (Nontrival Covariate Effects).

Theorem 1 (Nonparametric Identification).

Corollary 1 (Parametric Identification).