General Compound Hawkes Processes in Limit Order Books

Anatoliy Swishchuk; Aiden Huffman

arXiv:1812.02298·q-fin.TR·December 7, 2018

General Compound Hawkes Processes in Limit Order Books

Anatoliy Swishchuk, Aiden Huffman

PDF

TL;DR

This paper introduces and analyzes general compound Hawkes processes within limit order books, establishing their theoretical properties and applying them to understand the relationship between price volatility and order flow.

Contribution

It constructs new compound Hawkes processes, proves LLN and FCLT for these models, and links these results to price volatility analysis in limit order books.

Findings

01

Established LLN and FCLT for compound Hawkes processes

02

Linked price volatility to order flow parameters

03

Provided theoretical foundation for modeling limit order books

Abstract

In this paper, we study various new Hawkes processes. Specifically, we construct general compound Hawkes processes and investigate their properties in limit order books. With regards to these general compound Hawkes processes, we prove a Law of Large Numbers (LLN) and a Functional Central Limit Theorems (FCLT) for several specific variations. We apply several of these FCLTs to limit order books to study the link between price volatility and order flow, where the volatility in mid-price changes is expressed in terms of parameters describing the arrival rates and mid-price process.

Figures29

Click any figure to enlarge with its caption.

Tables9

Table 1. Table 1 . Stock liquidity of AAPL, AMZN, GOOG, MSFT and INTC for June 21st, 2012.

Ticker	Avg # of Orders per Second	Price Changes in 1 Day
AAPL	51	64,350
AMZN	25	27,557
GOOG	21	24,084
MSFT	173	3,217
INTC	176	4,060

Table 2. Table 2 . Each parameter was estimated using a particle swarm optimization method in an attempt to globally optimize the negative log-likelihood function. The values for λ 𝜆 \lambda , α 𝛼 \alpha and β 𝛽 \beta for each data set are as provided.

	$λ$	$α$	$β$
AAPL	1.4683	1045.2676	2556.1844
AMZN	0.6443	653.7524	1556.1702
GOOG	0.4985	865.8553	1980.4409
MSFT	0.0659	479.3482	908.0032
INTC	0.0471	399.6389	760.4991

Table 3. Table 3. Expected number of arrivals on a unit interval using estimated parameters from an MLE method is compared against the empirical arrivals.

	Emp. $E [N ([0, 1])]$	MLE
AAPL	2.4840	2.4841
AMZN	1.1110	1.1110
GOOG	0.8857	0.8857
MSFT	0.1395	0.1396
INTC	0.0991	0.0992

Table 4. Table 4. Provided above are the values for s ∗ superscript 𝑠 s^{*} , σ 𝜎 \sigma as well as the probabilities of an upward/downward movement given an upward/downward movement for each of the 5 stocks in question.

	$p_{d d}$	$p_{u u}$	$σ$	$a^{*}$
AAPL	0.4956	0.4933	0.0049	-1.1463e-5
AMZN	0.4635	0.4576	0.0046	-2.7373e-5
GOOG	0.4769	0.4461	0.0046	-1.4301e-4
MSFT	0.6269	0.5827	0.0062	-2.7956e-4
INTC	0.6106	0.5588	0.0059	-3.1185e-4

Table 5. Table 5 . a ( i ) 𝑎 𝑖 a(i) is the average of the upward or downward mid-price movements. Following our previous convention, the first state will be associated with the mean of all downward mid-price movements and the second state will be associated with the mean of all upward mid-price movements.

	$a (1)$	$a (2)$
AAPL	-0.0172	0.0170
AMZN	-0.0134	0.0133
GOOG	-0.0302	0.0308

Table 6. Table 6. We above the values for s ∗ superscript 𝑠 s^{*} , σ 𝜎 \sigma as well as the probabilities of an upwards/downwards movement given an upwards/downwards movement for the 3 stocks of interest.

	$p_{d d}$	$p_{u u}$	$σ$	$a^{*}$
AAPL	0.4956	0.4933	0.0169	-1.5624e-4
AMZN	0.4635	0.4576	0.0123	-1.0475e-4
GOOG	0.4769	0.4461	0.0282	-5.5095e-4

Table 7. Table 7. Above we have provide the state, associated ergodic probabilities and state values a ( i ) 𝑎 𝑖 a(i) for AMZN, given a 12 state Markov chain which was obtained from choosing a 16 quantile method.

i	$π_{i}^{*}$	$a (X_{i})$
1	0.0275	-0.0524
2	0.0281	-0.0318
3	0.0264	-0.0250
4	0.0382	-0.0200
5	0.0576	-0.0150
6	0.3249	-0.0064
7	0.2321	0.0050
8	0.0923	0.0100
9	0.0578	0.0150
10	0.0353	0.0200
11	0.0412	0.0271
12	0.0387	0.0476

Table 8. Table 8 . We list the mean residuals for several Markov chains with varying numbers of states. These were generated using our modified quantile approach choosing to start with 2, 8, 16 or 32 quantiles. We see that in general the mean residual decreases to some lower limit where we can no longer perform any better. Recall that the only observed mid-price changes for INTC and MSFT were of a half tick size, any increase in the number of quantiles will result in the same performance.

	CHPDO	2	8	16	32
AAPL	0.2679	0.0050	0.0036	0.0036	0.0036
AMZN	0.1122	0.0208	0.0131	0.0124	0.0123
GOOG	0.4036	0.0115	0.0048	0.0045	0.0047
INTC	1.7917e-5	1.7917e-5	1.7917e-5	1.7917e-5	1.7917e-5
MSFT	1.0586e-4	1.0586e-4	1.0586e-4	1.0586e-4	1.0586e-4

Table 9. Table 9 . The coefficients calculated for AAPL, AMZN and GOOG are generated using a Markov chain created by 16 quantiles on the upward and downward movements. While the coefficients for INTC and MSFT are obtained from the CHPDO case.

	Theoretical Coefficient	Regression Coefficient	Percent Error
AAPL	0.02868	0.02828	1.42%
AMZN	0.01450	0.01831	20.8%
GOOG	0.02883	0.03023	4.63%
INTC	0.00186	0.00193	3.4%
MSFT	0.00231	0.00246	6.4%

Equations140

λ (t) = h \to 0 lim \frac{E [ N ( t + h ) - N ( t ) ∣ F ^{N} ( t )]}{h}

λ (t) = h \to 0 lim \frac{E [ N ( t + h ) - N ( t ) ∣ F ^{N} ( t )]}{h}

λ (t) = λ + \int_{0}^{t} μ (t - s) d N (s)

λ (t) = λ + \int_{0}^{t} μ (t - s) d N (s)

P (N (t + h) - N (t) = m ∣ F^{N} (t)) = ⎩ ⎨ ⎧ λ (t) h + o (h) o (h) 1 - λ (t) h + o (h) m = 1 m > 1 m = 0

P (N (t + h) - N (t) = m ∣ F^{N} (t)) = ⎩ ⎨ ⎧ λ (t) h + o (h) o (h) 1 - λ (t) h + o (h) m = 1 m > 1 m = 0

Λ (t) = \int_{0}^{t} λ (s) d s

Λ (t) = \int_{0}^{t} λ (s) d s

N (t) = M (t) + Λ (t) a.s.

N (t) = M (t) + Λ (t) a.s.

μ (t) = α e^{- β t}

μ (t) = α e^{- β t}

λ (t) = λ + \int_{0}^{t} α e^{- β (t - s)} d N (s)

λ (t) = λ + \int_{0}^{t} α e^{- β (t - s)} d N (s)

d λ (t) = β (λ - λ (t)) d t + α d N (t), \vskip 6.0 ptpl u s 2.0 pt min u s 2.0 ptt \geq 0

d λ (t) = β (λ - λ (t)) d t + α d N (t), \vskip 6.0 ptpl u s 2.0 pt min u s 2.0 ptt \geq 0

λ (t) = e^{- β t} (λ_{0} - λ) + λ + \int_{0}^{t} α e^{- β (t - s)} d N (s),

λ (t) = e^{- β t} (λ_{0} - λ) + λ + \int_{0}^{t} α e^{- β (t - s)} d N (s),

λ (t) = λ + \int_{0}^{t} \frac{k}{( c + ( t - s ) ) ^{p}} d N (s)

λ (t) = λ + \int_{0}^{t} \frac{k}{( c + ( t - s ) ) ^{p}} d N (s)

λ^{i} (t) = λ^{i} + \int_{0}^{t} μ^{ij} (t - s) d N^{j} (s)

λ^{i} (t) = λ^{i} + \int_{0}^{t} μ^{ij} (t - s) d N^{j} (s)

λ (t) = λ + M * d N (t)

λ (t) = λ + M * d N (t)

\lambda(t)=h\bigg{(}\lambda+\int_{0}^{t}\mu(t-s)dN(s)\bigg{)}

\lambda(t)=h\bigg{(}\lambda+\int_{0}^{t}\mu(t-s)dN(s)\bigg{)}

S_{t} = S_{0} + i = 1 \sum N_{t} a (X_{k})

S_{t} = S_{0} + i = 1 \sum N_{t} a (X_{k})

\lambda(t)=h\bigg{(}\lambda+\int_{0}^{t}\mu(t-s)dN(s)\bigg{)}

\lambda(t)=h\bigg{(}\lambda+\int_{0}^{t}\mu(t-s)dN(s)\bigg{)}

S_{t} = S_{0} + i = 1 \sum N (t) a (X_{k})

S_{t} = S_{0} + i = 1 \sum N (t) a (X_{k})

S_{t} = S_{0} + i = 1 \sum N (t) a (X_{k})

S_{t} = S_{0} + i = 1 \sum N (t) a (X_{k})

S_{t} = S_{0} + i = 1 \sum N (t) X_{k}

S_{t} = S_{0} + i = 1 \sum N (t) X_{k}

S_{t} = S_{0} + i = 1 \sum N_{t} a (X_{k})

S_{t} = S_{0} + i = 1 \sum N_{t} a (X_{k})

\frac{S _{n t} - N ( n t ) \cdot a ^ ^{*}}{n} n \to \infty \overset{σ}{^}^{*} E [N [0, 1]] W_{t}

\frac{S _{n t} - N ( n t ) \cdot a ^ ^{*}}{n} n \to \infty \overset{σ}{^}^{*} E [N [0, 1]] W_{t}

0 < \overset{μ}{^} := \int_{0}^{\infty} μ (s) d s < 1 and \vskip 6.0 ptpl u s 2.0 pt min u s 2.0 pt \int_{0}^{\infty} s μ (s) d s < \infty

0 < \overset{μ}{^} := \int_{0}^{\infty} μ (s) d s < 1 and \vskip 6.0 ptpl u s 2.0 pt min u s 2.0 pt \int_{0}^{\infty} s μ (s) d s < \infty

(\overset{σ}{^}^{*})^{2} : v (i) b b (i) : g : \overset{a}{^}^{*} : = i \in X \sum π_{i}^{*} v (i) = b (i)^{2} + j \in X \sum (g (j) - g (i))^{2} P (i, j) - 2 b (i) j \in X \sum (g (j) - g (i)) P (i, j) = (b (1), b (2), ..., b (n))^{'} = a (X_{i}) - a^{*} := a (i) - a^{*} = (P + Π^{*} - I)^{- 1} b = i \in X \sum π_{i}^{*} a (X_{i})

(\overset{σ}{^}^{*})^{2} : v (i) b b (i) : g : \overset{a}{^}^{*} : = i \in X \sum π_{i}^{*} v (i) = b (i)^{2} + j \in X \sum (g (j) - g (i))^{2} P (i, j) - 2 b (i) j \in X \sum (g (j) - g (i)) P (i, j) = (b (1), b (2), ..., b (n))^{'} = a (X_{i}) - a^{*} := a (i) - a^{*} = (P + Π^{*} - I)^{- 1} b = i \in X \sum π_{i}^{*} a (X_{i})

S_{n t} = S_{0} + i = 1 \sum N_{n t} a (X_{k})

S_{n t} = S_{0} + i = 1 \sum N_{n t} a (X_{k})

S_{n t} = S_{0} + i = 1 \sum N_{n t} (a (X_{k}) - \overset{a}{^}^{*}) + N (n t) \overset{a}{^}^{*}

S_{n t} = S_{0} + i = 1 \sum N_{n t} (a (X_{k}) - \overset{a}{^}^{*}) + N (n t) \overset{a}{^}^{*}

\frac{S _{n t} - N _{n t} a ^ ^{*}}{n} = \frac{S _{0} + \sum _{i = 1}^{N_{n t}} ( a ( X _{k} ) - a ^ ^{*} )}{n} .

\frac{S _{n t} - N _{n t} a ^ ^{*}}{n} = \frac{S _{0} + \sum _{i = 1}^{N_{n t}} ( a ( X _{k} ) - a ^ ^{*} )}{n} .

\frac{a ( X _{k} ) - a ^ ^{*}}{n}

\frac{a ( X _{k} ) - a ^ ^{*}}{n}

\hat{R}_{n}^{*} := k = 1 \sum n (a (X_{k}) - \overset{a}{^}^{*})

\hat{R}_{n}^{*} := k = 1 \sum n (a (X_{k}) - \overset{a}{^}^{*})

\hat{U}_{n}^{*} (t) := n^{- 1/2} [(1 - (n t - ⌊ n t ⌋))] \hat{R}_{⌊ n t ⌋}^{*} + (n t - ⌊ n t ⌋) \hat{R}_{⌊ n t ⌋ + 1}^{*}]

\hat{U}_{n}^{*} (t) := n^{- 1/2} [(1 - (n t - ⌊ n t ⌋))] \hat{R}_{⌊ n t ⌋}^{*} + (n t - ⌊ n t ⌋) \hat{R}_{⌊ n t ⌋ + 1}^{*}]

\hat{U}_{n}^{*} (t) n \to \infty \overset{σ}{^}^{*} W (t)

\hat{U}_{n}^{*} (t) n \to \infty \overset{σ}{^}^{*} W (t)

\frac{N _{t}}{t} t \to \infty E [N [0, 1]]

\frac{N _{t}}{t} t \to \infty E [N [0, 1]]

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

General Compound Hawkes Processes in Limit Order Books

[email protected]

Abstract.

In this paper, we study various new Hawkes processes. Specifically, we construct general compound Hawkes processes and investigate their properties in limit order books. With regards to these general compound Hawkes processes, we prove a Law of Large Numbers (LLN) and a Functional Central Limit Theorems (FCLT) for several specific variations. We apply several of these FCLTs to limit order books to study the link between price volatility and order flow, where the volatility in mid-price changes is expressed in terms of parameters describing the arrival rates and mid-price process.

Key words and phrases:

Hawkes processes, general compound Hawkes processes, limit order books, functional central limit theorems, LOBSTER data

1991 Mathematics Subject Classification:

Primary: 58F15, 58F17; Secondary: 53C35.

The first and second authors are supported by NSERC

∗ Corresponding author: Aiden Huffman

Anatoliy Swishchuk∗

2500 University Dr. NW

Calgary, AB, Canada

T2N 1N4

Aiden Huffman

2500 University Dr. NW

Calgary, AB, Canada

T2N 1N4

(Communicated by Tony Ware)

1. Introduction

The Hawkes process (HP) is named after its creator Alan Hawkes (1971, 1974), [27], [28]. The HP is a simple point process equipped with a self-exciting property, clustering effect and long run memory. Through its dependence on the history of the process, the HP captures the temporal and cross sectional dependence of the event arrival process as well as the ’self-exciting’ property observed in our empirical data on limit order books. Self-exciting point processes have recently been applied to high frequency data for price changes [54] or order arrival times [55]. HPs have seen their application in many areas, like genetics (2010) [11], occurrence of crime (2010) [50], bank defaults [51] and earthquakes [52].

Point processes gained a significant amount of attention in statistics during the 1950s and 1960s. Cox (1955) [16] introduced the notion of a doubly stochastic process Poisson process (called the Cox process now) and Bartlett (1963) [8] investigated statistical methods for point processes based on their power spectral densities. Lewis (1964) [34] formulated a point process model (for computer power failure patters) which was a step in the direction of the HP. A nice introduction to the theory of point processes can be found in Daley et al. (1988) [17]. The first type of point process in the context of market microstructure is the autoregressive conditional duration (ACD) model introduced by Engel et al. (1998) [19].

A recent application of HP is in financial analysis, in particular limit order books. In this paper, we study various new Hawkes processes, namely general compound Hawkes processes to model the price process in limit order books. We prove a Law of Larges Numbers (LLN) and a Function Central Limit Theorem (FCLT) for specific cases of these processes. Several of these FCLTs are applied to limit order books where we use asymptotic methods to study the link between price volatility and order flow in our models. The volatility of the price changes is expressed in terms of parameters describing the arrival rates and price changes. We also present some numerical examples. The general compound Hawkes process was first introduced in [40] to model the risk process in insurance and studied in detail in [41]. In the paper [43] we obtained functional CLTs and LLNs for general compound Hawkes processes with dependent orders and regime-switching compound Hawkes processes.

Bowsher (2007) [6] was the first one who applied the HP to financial data modelling. Cartea et al. (2011) [9] applied HP to model market order arrivals. Filimonov and Sornette (2012) [25] and Filimonov et al. (2013) [26] applied the HPs to estimate the percentage of price changes caused by endogenous self-generated activity rather than by the exogenous impact of news or novel information. Bauwens and Hautsch (2009) [7] used a five dimensional HP to estimate multivariate volatility between five stocks, based on price intensities. Hewlett (2006) [29] used the instantaneous jump in the intensity caused by the occurrence of an event to qualify the market impact of that event, taking into account the cascading effect of secondary events causing further events. Hewlett (2006) [29] also used the Hawkes model to derive optimal pricing strategies for market makers and optimal trading strategies for investors given that the rational market makers have the historic trading data. Large (2007) [32] applied a Hawkes model for the purpose of investigating market impact, with a specific interest in order book resiliency. Specifically, he considered limit orders, market orders and cancellations on both the buy and sell side, and further categorizes these events based on their level of aggression, resulting in a ten dimensional Hawkes process. Other econometric models based on marked point processes with stochastic intensity include autoregressive conditional intensity (ACI) models with the intensity depending on its history. Hasbrouck (1999) [30] introduced a multivariate point process to model the different events of an order book but did not parametrize the intensity. We note that Brémaud et al. (1996) [4] generalized the HP to its nonlinear form. Also, a functional central limit theorem for nonlinear Hawkes processes was obtained in Zhu (2013) [49]. The ’Hawkes diffusion model’ introduced in Ait-Sahalia et al. (2013) [1] attempted to extend previous models of stock prices to include financial contagion. Chavez-Demoulin et al. (2012) [12] used Hawkes processes to model high-frequency financial data. An application of affine point processes to portfolio credit risk may be found in Errais et al. (2010) [24]. Some applications of Hawkes processes to financial data are also given in Embrechts et al. (2011) [23].

Cohen et al. (2014) [14] derived an explicit filter for Markov modulated Hawkes processes. Vinkovskaya (2014) [47] considered a regime-switching Hawkes process to model its dependency on the bid-ask spread in limit order book. Regime-switching models for pricing of European and American options were considered in Buffington et al. (2000) [2] and Buffington et al. (2002) [3], respectively. Semi-Markov processes were applied to limit order books in [44] to model the mid-price. We also note that level-1 limit order books with time dependent arrival rates $\lambda(t)$ were studied in [13], including the asymptotic distribution of the price process.

The paper by Bacry et al. (2015) [5] proposes an overview of the recent academic literature devoted to the applications of Hawkes processes in finance. It is a nice survey of applications of Hawkes processes in finance. In general, the main models in high-frequency finance can be divided into univariate models, price models, impact models, order-book models and some systemic risk models, models accounting for news, high-dimensional models and clustering with graph models. The book by Cartea et al. (2015) [10] developed models for algorithmic trading such as methods for executing large orders, market making, trading pairs of collections of assets, and executing in the dark pool. This book also contains a link from which several datasets can be downloaded, along with MATLAB code to assist in experimentation with the data.

A detailed description of the mathematical theory of Hawkes processes is given in Liniger (2009) [33]. The paper by Laub et al. (2015) [35] provides background, introduces the field and historical development, and touches upon all major aspects of Hawkes processes. The results of the current paper were first annouced in in [42]

The paper is organized as follows. A definition of a Hawkes process and description of its properties are given in Section 2. General compound Hawkes processes are described in Section 3. Law of Large Numbers (LLN) and Functional Central Limit Theorems (FCLT) for various general compound Hawkes processes, including non-linear, in limit order books are proved in Section 4. Section 5 contains a numerical exploration of the derived diffusion limits to several datasets and finally Section 6 concludes the paper.

2. Definition of Hawkes Processes (HPs)

In this section we give various definitions and some properties of Hawkes processes which can be found in the existing literature (see, e.g., [27], [28], [23] and [48], to name a few). They include in particular one dimensional and non-linear Hawkes processes.

Definition 2.1 (Counting Process).

A counting process is a stochastic process $N(t)$ with $t\geq 0$ , where $N(t)$ takes positive integer values and satisfies $N(0)=0$ . It is almost surely finite and a right-continuous step function with increments of size $+1$ .

Denote by $\mathcal{F}^{N}(t)$ , $t\geq 0$ , the history of the arrival up to time t, that is, $\mathcal{F}^{N}(t)$ , $t\geq 0$ , is a filtration (an increasing sequence of $\sigma$ -algebras).

A counting process $N(t)$ can be interpreted as a cumulative count of the number of arrivals into a system up to the current time $t$ . The counting process can also be characterized by the sequence of random arrival times $(T_{1},\ T_{2},\ \dots)$ at which the counting process $N(t)$ has jumped. The process defined by these arrival times is called a point process (see [17]).

Definition 2.2 (Point Process).

If a sequence of random variables $(T_{1},\ T_{2},\ \dots)$ , taking values in $[0,\infty)$ has $P(0\leq T_{1}\leq T_{2}\leq\dots)=1$ , and the number of points in a bounded region is almost surely finite, then $(T_{1},\ T_{2},\ \dots)$ is called a point process.

Definition 2.3 (Conditional Intensity Function).

Consider a counting process $N(t)$ with associated histories $\mathcal{F}^{N}(t)$ , $t\geq 0$ . If a non-negative function $\lambda(t)$ exists such that

[TABLE]

then it is called the conditional intensity function of $N(t)$ (see [35]). We note that originally this function was called the hazard function (see [16]).

Definition 2.4 (One-dimensional Hawkes Process).

The one-dimensional Hawkes process (see [35], [28]) is a point process $N(t)$ which is characterized by its intensity $\lambda(t)$ with respect to its natural filtration:

[TABLE]

where $\lambda>0$ , and the response function $\mu(t)$ is a positive function that satisfies $\int_{0}^{\infty}\mu(s)ds<1$ .

The constant $\lambda$ is called the background intensity and the function $\mu(t)$ is sometimes called the excitation function. To avoid the trivial case of a homogeneous Poisson process, we assume $\mu(t)\neq 0$ . Thus, the Hawkes process is a non-Markovian extension of the Poisson process.

With respect to the Definitions of $\lambda(t)$ in 2.3 and $N(t)$ in 2.4, it follows that

[TABLE]

The interpretation of Equation (2) is that the events occur according to an intensity with a background intensity $\lambda$ which increases by $\mu(0)$ at each new event, eventually decaying back to the background intensity value according to the evolution of the function $\mu(t)$ . Choosing $\mu(0)>0$ leads to a jolt in the intensity at each new event, and this feature is often called the self-exciting feature. In other words, if an arrival causes the conditional intensity function $\lambda(t)$ in Equations (1)-(2) to increase then the process is called self-exciting.

We would like to mention that the conditional intensity function $\lambda(t)$ in Equations (1)-(2) can be associated with the compensator $\Lambda(t)$ of the counting process $N(t)$ , that is

[TABLE]

We note that $\Lambda(t)$ is the unique non-decreasing, $\mathcal{F}^{N}(t)$ , $t\geq 0$ , predictable function, with $\Lambda(0)=0$ such that

[TABLE]

where $M(t)$ is an $\mathcal{F}^{N}(t)$ , $t\geq 0$ , local martingale (existence of which is guaranteed by the Doob-Meyer decomposition).

A common choice for the function $\mu(t)$ in Equation (2) is the one of exponential decay (see [27])

[TABLE]

where parameters $\alpha$ , $\beta>0$ . In this case, the Hawkes process is called the Hawkes process with exponentially decaying intensity.

In the case of Equation (4), Equation (2) becomes

[TABLE]

We note that in the case of Equation (4), the process $(N(t),\lambda(t))$ is a continuous-time Markov process, which is not the case for a general choice of excitation function in Equation (1).

With some intitial condition $\lambda(0)=\lambda_{0}$ , the conditional intensity $\lambda(t)$ in Equation (5) with exponential decay in Equation (4) satisfies the SDE

[TABLE]

which can be solved using stochastic calculus as

[TABLE]

which is an extension of Equation (5).

Another choice for $\mu(t)$ is a power law function

[TABLE]

with positive parameters $(c,k,p)$ . This power law form for $\lambda(t)$ in Equation (8) was applied in the geological model called Omori’s law, and used to predict the rate of aftershocks caused by an earthquake.

Definition 2.5 (D-dimensional Hawkes Process).

The D-dimensional Hawkes process (see [23]) is a point process $\vec{N}(t)=(N^{i}(t))_{i=1}^{D}$ which is characterized by its intensity vector $\vec{t}=(\lambda^{i}(t))_{i=1}^{D}$ such that:

[TABLE]

where $\lambda^{i}>0$ , and $M(t)=(\mu^{ij}(t))$ is a matrix-valued kernel such that:

(1)

it is component-wise non-negative: $(\mu^{ij}(t))\geq 0$ for each $1\leq i,j\leq D$ 2. (2)

it is component-wise $L^{1}$ -integrable

In matrix-convolution form, Equation (9) can be written as

[TABLE]

where $\vec{\lambda}(t)=(\lambda^{i})_{i=1}^{D}$ .

Definition 2.6 (Non-linear Hawkes Process).

The non-linear Hawkes process (see, e.g., [48]) is defined by the intensity function in the following form:

[TABLE]

where $h(\cdot)$ is a non-linear function with support in $\mathbb{R}^{+}$ . Typical examples for $h(\cdot)$ are $h(x)=\mathbf{1}_{x\in\mathbb{R}^{+}}$ and $h(x)=e^{x}$ .

Remark 1.

Many other generalizations of Hawkes processes have been proposed. They include mixed diffusion-Hawkes models [24], Hawkes models with shot noise exogenous events [18] and Hawkes processes with generation dependent kernels [37], to name a few.

3. Compound Hawkes Processes

In this section we define non-linear compound Hawkes process with $N$ -state dependent orders. We also consider special cases of this general compound Hawkes process.

Definition 3.1 (Non-Linear Compound Hawkes Process with $n$ -state Dependent Orders (NLCHPnSDO) in Limit Order Books).

Consider the mid-price process $S_{t}$

[TABLE]

where $X_{k}$ is a continuous time $n$ -state Markov chain, $a(x)$ is a continuous and bounded function on the state space $X:=\{1,\ 2,\ ...,\ n\}$ , $N(t)$ is the non-linear Hawkes process (see, e.g., [48] defined by the intensity function in the following form (see Equation (11)):

[TABLE]

where $h(\cdot)$ is a non-linear increasing function with support in $\mathbb{R}^{+}$ . We note that in [4] it was shown that if $h(\cdot)$ is $\alpha$ -Lipschitz (see [4]) such that $\alpha||h||_{L^{1}}<1$ then there exists a unique stationary and ergodic Hawkes process satisfying the dynamics of Equation (11). We shall refer to the process in Equation (12) as a Non-linear Compound Hawkes Process with $n$ -State Dependent Orders (NLCHPnSDO).

This non-linear compound Hawkes process will be the foundation for our studies throughout this paper. In the following subsection we will introduce four specific examples, which will be used for our empirical investigations of the mid-price processes.

3.1. Special Cases of Compound Hawkes Processes in Limit Order Books

Definition 3.2 (General Compound Hawkes Process With N-state Dependent Orders (GCHPnSDO)).

Suppose that $X_{k}$ is an ergodic continuous-time Markov chain, independent of $N(t)$ , with state space $X=\{1,\ 2,\ ...,\ n\}$ , $N(t)$ is a one-dimensional Hawkes process defined in Definition 2.4 and $a(x)$ is any bounded and continuous function on $X$ . We define the General Compound Hawkes process with N-state Dependent Orders (GCHPnSDO) by the following process

[TABLE]

Note that this process can be recovered from Equation (12) by letting $h(x)=x$ .

Definition 3.3 (General Compound Hawkes Process with Two-State Dependent Orders (GCHP2SDO)).

Suppose that $X_{k}$ is an ergodic continuous time Markov chain, independent of $N(t)$ , with two states $\{1,\ 2\}$ . Then Equation (13) becomes

[TABLE]

where $a(X_{k})$ takes only the values $a(1)$ and $a(2)$ . Of course we can view this as a special case of the n-state case, where $n=2$ . This model was used in [45] for the mid-price process in limit order books with non-fixed tick $\delta$ and two-valued price changes.

Definition 3.4 (General Compound Hawkes Process with Dependent Orders (GCHPDO)).

Suppose that $X_{k}\in\{-\delta,\ \delta\}$ and that $a(x)=x$ , then $S_{t}$ in Equation (13) becomes

[TABLE]

This type of process can be a model for the mid-price in limit order books, where $\delta$ is a fixed tick size and $N(t)$ is the number of order arrivals up to time $t$ . We shall call this process a General Compound Hawkes Process with Dependent Orders (GCHPDO). This is a generalization of the previous process, obtained by letting $a(1)=-\delta$ and $a(2)=\delta$ .

Having defined several mid-price processes, we now prove diffusion limit theorems and LLNs for each price process in the following Section. These diffusion processes will be used for our exploration of the applicability of this model to real world limit order book data.

4. Diffusion Limits and LLNs

4.1. Diffusion Limit and LLN for NLCHPnSDO

We consider the mid-price process $S_{t}$ defined in Definition 3.1, namely

[TABLE]

where $X_{k}$ is a continuous time n-state Markov chain and a(x) is a continuous bounded function on the state space $X=\{1,\,2,\ ...,\ n\}$ . $N_{t}$ is the number of price changes up to time $t$ , described by the non-linear Hawkes process given in Equation (11).

Theorem 4.1 (Diffusion Limit for NLCHPnSDO).

Let $X_{k}$ be an ergodic Markov chain with $n$ states $\{1,\ 2,\ ...,\ n\}$ and with ergodic probabilities $(\pi^{*}_{1},\ \pi^{*}_{2},\ ...,\ \pi^{*}_{n})$ . Let also $S_{t}$ be as defined in Definition 3.1, then

[TABLE]

where $W_{t}$ is a standard Wiener process and $E[N[0,1]]$ is the mean of the number of arrivals on a unit interval under the stationary and ergodic measure. Furthermore

[TABLE]

P is the transition probability matrix for $X_{k}$ , i.e. $P(i,j)=P(X_{k+1}=j\ |\ X_{k}=i)$ . $\Pi^{*}$ denotes the matrix of stationary distributions of $P$ and $g(j)$ is the $j$ th entry of $g$ .

Proof.

From Equation (13) we have

[TABLE]

and

[TABLE]

therefore

[TABLE]

As long as $\frac{S_{0}}{\sqrt{n}}\xrightarrow{n\to\infty}0$ , we need only find the limit for

[TABLE]

when $n\to+\infty$ . Consider the following sums

[TABLE]

and

[TABLE]

where $\lfloor\cdot\rfloor$ is the floor function. Following the martingale method from [45], we have the following weak convergence in the Skorokhod topology (see [39]):

[TABLE]

We note that the results from [4] imply by the ergodic theorem that

[TABLE]

or

[TABLE]

Using the change of time $t\to N_{nt}/n$ , we find that

[TABLE]

or

[TABLE]

The result now follows from Equations (19)-(21) ∎

Lemma 4.2 (LLN for NLCHPnSDO).

The process $S_{nt}$ in Equation (19) satisfies the following weak convergence in the Skorokhod topology (see [40]):

[TABLE]

where $\hat{a}^{*}$ is defined in Equation (18) respectively.

Proof.

From Equation (12) we have

[TABLE]

The first term goes to zero when $n\to\infty$ . On the right hand side, with respect to the strong LLN for Markov chains (see, e.g. [38])

[TABLE]

Then taking into account Equation (26), we obtain

[TABLE]

from which the desired result follows. ∎

4.2. Diffusion Limit and LLN for GCHPnSDO

We consider here the mid-price process $S_{t}$ which was defined in Definition 3.2, namely

[TABLE]

where $X_{k}$ is a continuous time n-state Markov chain, a(x) is a continuous and bounded function on the state space $X=\{1,\ 2,\ ...,\ n\}$ , and $N(t)$ is the number of price changes up to time t, described by a one-dimensional Hawkes process defined in Definition 2.4. This can be interpreted as the case of non-fixed tick sizes, n-valued price changes and dependent orders.

Theorem 4.3 (Diffusion limit for GCHPnSDO).

Let $X_{k}$ be an ergodic Markov chain with $n$ states $\{1,\ 2,\ ...,\ n\}$ and with ergodic probabilities $(\pi^{*}_{1},\ \pi^{*}_{2},\ ...,\ \pi^{*}_{n})$ . Let also $S_{t}$ be as defined in Definition 3.2, then

[TABLE]

where W(t) is a standard Wiener process,

[TABLE]

P is the transition probability matrix for $X_{k}$ , i.e. $P(i,j)=P(X_{k+1}=j\ |\ X_{k}=i)$ . $\Pi^{*}$ denotes the matrix of stationary distributions of $P$ and $g(j)$ is the $j$ th entry of $g$ .

Proof.

As in the previous theorem we have that

[TABLE]

and

[TABLE]

where $\hat{a}^{*}$ is as defined above.

Therefore, we can conclude that

[TABLE]

as long as $\frac{S_{0}}{\sqrt{n}}\xrightarrow{n\to\infty}0$ , we need only find the limit for

[TABLE]

as $n\to\infty$ . We consider the following sums

[TABLE]

and

[TABLE]

where $\lfloor\cdot\rfloor$ is the floor function.

Then following the martingale method from [45], we have the following weak convergence in the Skorokhod topology (see [39])

[TABLE]

where $\hat{\sigma}^{*}$ is as defined above.

We note again that with respect to the LLN for Hawkes processes (see, e.g., [17]) we have

[TABLE]

or

[TABLE]

where $\hat{\mu}$ is as defined above. Then using the change of time defined before where $t\to N(nt)/n$ we can find from Equations (42)-(44):

[TABLE]

or

[TABLE]

The result in Equation (34) now follows from Equations (37)-(44) ∎

Lemma 4.4 (LLN for GCHPnSDO).

The process $S_{nt}$ defined in Definition 3.2, satisfies the following weak convergence in the Skorokhod topology (see [40])

[TABLE]

where $\hat{\mu}$ and $\hat{a}^{*}$ are defined in Equations (35) and (36) respectively.

Proof.

From Equation (13) we have

[TABLE]

The first term goes to zero when $n\to\infty$ . On the right hand side, with respect to the strong LLN for Markov chains (see, e.g., [38])

[TABLE]

Then taking into account Equation (44) we obtain

[TABLE]

from which the desired result follows. ∎

4.3. Diffusion Limits and LLNs for Special Cases of GCHPnSDO

Theorem 4.3 can be reduced to some of the special cases we outlined previously in Subsection3.1. Specifically in Definitions 3.3 and 3.4, we consider the case of a 2-state Markov chain for which we provide the diffusion limit and LLN result as Corollaries below.

We begin by considering the mid-price process $S_{t}$ (GCHP2SD) which was defined in 3.3, namely

[TABLE]

where $X_{k}$ is a continuous-time 2-state Markov chain, a(x) is a continuous and bounded function on $X=\{1,\ 2\}$ and $N(t)$ is the number of price changes up to moment t, described by a one-dimensional Hawkes process defined in Definition 2.4. This can be interpreted as the case of non-fixed tick sizes, two-valued price changes and dependent orders.

Corollary 1 (Diffusion Limit for GCHP2SDO).

Let $X_{k}$ be an ergodic Markov chain with two states $\{1,\ 2\}$ and with ergodic probabilities $(\pi^{*}_{1},\ \pi^{*}_{2})$ . Further, let $S_{t}$ be defined as in Definition 3.3. Then

[TABLE]

where $W(t)$ is standard Wiener process,

[TABLE]

where $(p,p^{\prime})$ are the transition probabilities of the Markov chain.

Corollary 2 (LLN for GCHP2SDO).

The process $S_{nt}$ defined in Definition 3.3 satisfies the following weak convergence in the Skorokhod topology (see [40])

[TABLE]

where $\hat{\mu}$ and $\hat{a}^{*}$ are defined in Equations (51) and (52) respectively.

Now let us consider the process $S_{t}$ defined in Definition 3.4, specifically

[TABLE]

where $X_{k}\in\{-\delta,\delta\}$ is a continuous-time 2-state Markov chain, $\delta$ is the fixed tick size, and $N(t)$ is the number of price changes up to moment t, described by a one-dimensional Hawkes process defined in Definition 2.4. This case we have a fixed tick size, two-valued price changes and dependent orders.

Corollary 3 (Diffusion Limit for CHPDO).

Let $X_{k}$ be an ergodic Markov chain with two states $\{\delta,\ -\delta\}$ and with ergodic probabilities $(\pi^{*},\ 1-\pi^{*})$ , then taking $S_{t}$ as defined in Definition 3.4 then

[TABLE]

where $W(t)$ is a standard Wiener process,

[TABLE]

and $(p,p^{\prime})$ are the transition probabilities of the Markov chain $X_{k}$ . We note that $\lambda$ and $\mu(t)$ are defined in Equation (2).

We note that the LLN for both GCHP2SDO and CHPDO is identitical to the result given in Lemma 4.4, after some simplification.

5. Empirical Results

In order to test the validity of our models and determine which best fits empirical data, we have considered level 1 LOB data for Apple, Amazon, Google, Microsoft and Intel on June 21st, 2012 [53]. We first verify that the data is reasonable for our model by checking it’s liquidity, this is illustrated in Table 1.

The high number of daily price changes motivates the idea that we can use asymptotic analysis in order to approximate long-run volatility using order flow by finding the diffusion limit of the price process. Because we do not want to include opening and closing auctions we omit the first and last fifteen minutes of our data. We motivate the arrival process by analyzing the inter-arrival times and clustering to a ensure the arrival process is not Poisson and exhibits the characteristics of a Hawkes Process, this is illustrated in Figures 1 and 2.

Furthermore, the relationship between our diffusion coefficient and arrival process is limited to the expected number of arrivals on a unit interval. This implies that results for a simple exponential model can be easily generalized to a non-linear one. This makes it possible to work with a simplified model for our Hawkes process which is still rich enough to capture our observations. Keeping this in mind, we restrict ourselves to an exponential kernel and estimate parameters using a MLE [35]. We provide these estimates in Table 2 and compare the empirical expected number of arrivals and compare with the MLE estimate in Table 3

Obviously the MLE method accurately estimates the expected number of arrivals on the unit interval. This means we can confidently say for our data that our parameters will work reasonably with our models. We provide the estimated parameters in Table 2

5.1. CHPDO

We first consider a compound Hawkes process with dependent orders defined in Definition 3.4, namely

[TABLE]

where $X_{k}\in\{+\delta,-\delta\}$ is a continuous-time two state Markov chain, $\delta$ is of fixed size and $N(t)$ is the number of mid-price movements up to time $t$ described by a Hawkes process.

We have opted to study the mid-price changes of our model. Thus $S_{t}$ can be computed by averaging the best bid and ask price. Noting that the price is recorded in cents the smallest possible jump in the mid-price is a half a cent which we will use as $\delta$ . Furthermore, in order to estimate the transition matrix for the Markov chain $X_{k}$ we count the absolute frequencies of upward and downward price movements and from this calculate the relative frequencies giving an estimate for $p_{uu}$ and $p_{dd}$ which represent the conditional probabilities of an up/down movement given an up/down movement. Later we will consider several different sizes of mid-price movements and will work with the convention that each movement will be assigned a state based off of its ordering in the reals. In this case, $-\delta$ will be state one, and $\delta$ with be state two. This results in the transition matrix $P$ given below.

[TABLE]

After determining our parameters and transition probabilities we calculate $a^{*}$ and $\sigma$ in Table 4 together with $p_{uu}$ and $p_{dd}$ .

Provided these values we can test our claim that our model accurately describes the mid-price process satisfies we will use the diffusion limit proved earier, namely

[TABLE]

If the data satisfies our proposed model then when considering large windows of time (5min, 10min, 20min), then when we would expect to see the empirical and theoretical standard deviations to follow each other closely. To test this we compare the equivalent process, constructed by multiplying the LHS and RHS by $\sqrt{n}$ . Then cutting our data into disjoint windows of size $n$ , specifically $[in,(i+1)n]$ with $t=1$ and by setting the left bound as our starting time we can calculate $S_{nt}-N(nt)a^{*}$ for each individual window and give a generalized formula for this below.

[TABLE]

This gives a collection of values $\{S^{*}_{i}\}$ over which we compute the standard deviation. If our model is accurate would would expect that

[TABLE]

We plot the empirical standard deviation against the theoretical one for various window sizes starting at 10 seconds and increasing in steps of 10 seconds until we reach 20 minutes, this is illustrated in Figure 3.

Several important remarks should be made at this point. It is clear that while the model accurately predicts the overall trend for MSFT and INTC, we severely underestimate the variability in the mid-price process for APPL, AMZN and GOOG. Furthermore, as the window size increases the overall spread in the data increases. We attribute this to the decreasing sample size imposed on us as we increase the window size. For example when we consider a 20 minute window, we can only construct 27 disjoint windows in the 9 hour trading day forcing us to deal with the problem of predicting a ’population’ standard deviation from a increasingly small sample. We remedy this by using a variance stabilizing transformation in later sections. Specifically, a popular method for a Poisson process is to take the square root of our empirical and theoretical standard deviations. This makes it possible to qualitatively view the overall trend in the data, gaining a clearer idea of goodness of fit from there.

5.2. GCHP2SDO

As of now we have considered a fixed delta related to the trading tick size. However, if we consider the mid-price changes for APPL, AMZN and GOOG the assumption of a fixed tick size is violated. In fact, we observe that approximately approximately 61%, 53% and 71% of all mid-price changes are larger than half a tick size, which is opposed to what we observe for MSFT and INTC where all mid-price changes occur at the half tick size, we illustrate this for AAPL, AMZN and GOOG in Figure 4.

It is clear in Figure 4 additional considerations need to be made. A simple way to include the variability in mid-price movements in our model is to introduce $a(X_{i})$ as described in Definition 3.3. It is of course necessary to determine the values of $a(\cdot)$ for each state of our Markov chain. A naive method is to take the mean of the downward and upward mid-price movements and assign them to $a(1)$ and $a(2)$ respectively. We provide these values in Table 5.

In this step we have only endeavoured to better realize the actual price movements in our data. Therefore, when we observe a downward mid-price movement we continue to assign it to state one and similarly for an upward price movement we continue to assign it to state two. It follows that our transition matrix will remain the same. Then using these new state values we recalculate $a^{*}$ and $\sigma$ , providing them in Table 6. The effect of these changes is investigated in Figure 5. Note that in Figure 5 we have used the variance stabilizing transformation discussed earlier in order to better visualize the overall trend in our data.

Notice that there is a significant qualitative improvement in the fits for AAPL and GOOG in Figure 5 but the variability in mid-price movements for AMZN is still clearly underestimated by our model. The unexplained variance may be captured by investigating an n-state Markov chain since the additional transition probabilities could explain the variability missing in the 2-state case.

5.3. GCHPNSDO

We recall the N-state model described in Definition 3.2. The immediate question becomes how best to choose the state values. We modify the quantile based approach from [45]. After calculating the mid-price changes we separate the data into upward and downward price movements. Then we calculate evenly distributed quantiles for both data sets. Depending on the data, several quantiles may be identical, we reject any duplicates. We thus obtain a list of bounds which we complete by adding the minimum observed value if necessary.

To determine the state values $a(X_{i})$ , we take the average of all mid-price changes located between two neighbouring boundary values. Furthermore, we assign a mid-price change to state $i$ if it is greater than or equal to the $(i-1)$ th boundary and strictly less than the $i$ th boundary. An exception is made for the largest upper bound where equality is permitted at both ends.

As we could not capture the full variability of mid-price changes for AMZN in the previous method we investigate for this case. Furthermore, for tractability we only consider 14 boundary values from which we obtain a 12-state Markov chain. Instead of providing the transition matrix, we provide the ergodic probabilities for the transition matrix and the associated states in Table 7.

In order to compare the two state and N-state approaches we first take a qualitative approach and plot the two theoretical and empirical standard deviations against each other in Figure 6. When we compare the mean squared residuals, the 2-state model discussed before has mean squared error $0.0208$ while our 12-state Markov chain brings that down to $0.0125$ . Considering an even larger Markov chain with 24-states we are only able to obtain a meager improvement to $0.0123$ , suggesting that there is some underlying variance in the mid-price process not captured by our model. We investigate these more quantitative measures in the following Subsection. First comparing the models against a numerical best fit and then investigating the mean squared error to gain a better quantitative understanding of the overall improvement obtained from each model as we increase the number of quantiles.

5.4. Quantitative Analysis

While our model does visually appear to fit the expected variability in four of the five cases, it still fails to capture the complete dynamics of mid-price changes seen in our AMZN data. We investigate the mean square error of our models with a varying number of quantiles in Table 8, this gives a good indication as to whether the N-state model is a better fit for our data. If we look closely at AAPL and GOOG we see the N-state case can still improve our results from the two state case. For AAPL we constructed a 17 state Markov chain by taking 16 quantiles on the downward movements and 16 quantiles upward movements. This resulted in a mean squared error of $0.0036$ which is approximately a 28% improvement to the two state case where the mean squared error was $0.0050$ . Even more extreme, using a 25 state Markov chain for GOOG which was constructed similarly, we observed a mean squared error of $0.0046$ which is a 60% improvement from the two state case with a mean squared error of $0.0115$ . We conclude with Table 8 which provides the mean residuals for AAPL, AMZN and GOOG with several of Markov chains constructed from various numbers of quantiles. We also include mean residuals for INTC and MSFT for comparison.

Another quantitative measure of our the fit would be to compare them to the best theoretical one available given the data. Notice that each of our models assumes that the standard deviation evolves according to the square root of the time step times some determinable coefficient. Therefore we can estimate the best possible coefficient by minimizing the mean squared residual, we provide plots of these hypothetical best fits against the empirical data and theoretical fits in Figure 7. We also provide the coefficients for the theoretical fits and hypothetical best fit in Table 9 in order to have a more quantitative comparison.

We notice that in each case the errors are close to, or under five percent with AMZN being the biggest offender. This is consistent with the discussion provided throughout our analysis and highlights the general applicability of our model.

5.5. Remarks

Overall the N state model out performs the others, generating fits for four out of the five datasets that are reasonable and providing reductions in the mean squared error by upwards of 25%. While not able to capture the full dynamics observed in AMZN it appears to be a strong candidate for a simple model of price dynamics observed in our data. Further investigation would be necessary to determine what causes the additional volatility observed in AMZN, and potentially implement a more robust model which captures this.

Bibliography55

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Ait-Sahalia, Y., Cacho-Diaz, J. and Laeven, R. (2010): Modelling of financial contagion using mutually exciting jump processes. Tech. Rep. , 15850, Nat. Bureau of Ec. Res., USA.
2[2] Buffington, J., Elliott, R. J. (2000): Regime Switching and European Options. Lawrence, K.S. (ed.) Stochastic Theory and Control. Proceedings of a Workshop, 73-81. Berlin Heidelberg New York: Springer. (2002).
3[3] Buffington, J., Elliott, R.J. (2002): American Options with Regime Switching. International Journal of Theoretical and Applied Finance 5, 497-514.
4[4] Brémaud, P. and Massoulié, L. (1996): Stability of nonlinear Hawkes processes. The Annals of Probab., 24(3), 1563.
5[5] Bacry, E., Mastromatteo, I. and Muzy, J.-F. (2015): Hawkes processes in finance. ar Xiv:1502.04592 v 2 [q-fin.TR] 17 May 2015.
6[6] Bowsher, C. (2007): Modelling security market events in continuous time: intensity based, multivariate point process models. J. Econometrica , 141 (2): 876-912.
7[7] Bauwens, L. and Hautsch, N. (2009): Modelling Financial High Frequency Data Using Point Processes. Springer.
8[8] Bartlett, M. (1963): The spectral analysis of point processes. J. R. Stat. Soc., ser. B, 25 (2), 264-296.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

General Compound Hawkes Processes in Limit Order Books

Abstract.

Key words and phrases:

1991 Mathematics Subject Classification:

1. Introduction

2. Definition of Hawkes Processes (HPs)

Definition 2.1** (Counting Process).**

Definition 2.2** (Point Process).**

Definition 2.3** (Conditional Intensity Function).**

Definition 2.4** (One-dimensional Hawkes Process).**

Definition 2.5** (D-dimensional Hawkes Process).**

Definition 2.6** (Non-linear Hawkes Process).**

Remark 1**.**

3. Compound Hawkes Processes

Definition 3.1** (Non-Linear Compound Hawkes Process with nnn-state Dependent Orders (NLCHPnSDO) in Limit Order Books).**

3.1. Special Cases of Compound Hawkes Processes in Limit Order Books

Definition 3.2** (General Compound Hawkes Process With N-state Dependent Orders (GCHPnSDO)).**

Definition 3.3** (General Compound Hawkes Process with Two-State Dependent Orders (GCHP2SDO)).**

Definition 3.4** (General Compound Hawkes Process with Dependent Orders (GCHPDO)).**

4. Diffusion Limits and LLNs

4.1. Diffusion Limit and LLN for NLCHPnSDO

Theorem 4.1** (Diffusion Limit for NLCHPnSDO).**

Proof.

Lemma 4.2** (LLN for NLCHPnSDO).**

Proof.

4.2. Diffusion Limit and LLN for GCHPnSDO

Theorem 4.3** (Diffusion limit for GCHPnSDO).**

Proof.

Lemma 4.4** (LLN for GCHPnSDO).**

Proof.

4.3. Diffusion Limits and LLNs for Special Cases of GCHPnSDO

Corollary 1** (Diffusion Limit for GCHP2SDO).**

Corollary 2** (LLN for GCHP2SDO).**

Corollary 3** (Diffusion Limit for CHPDO).**

5. Empirical Results

5.1. CHPDO

5.2. GCHP2SDO

5.3. GCHPNSDO

5.4. Quantitative Analysis

5.5. Remarks

Definition 2.1 (Counting Process).

Definition 2.2 (Point Process).

Definition 2.3 (Conditional Intensity Function).

Definition 2.4 (One-dimensional Hawkes Process).

Definition 2.5 (D-dimensional Hawkes Process).

Definition 2.6 (Non-linear Hawkes Process).

Remark 1.

Definition 3.1 (Non-Linear Compound Hawkes Process with $n$ -state Dependent Orders (NLCHPnSDO) in Limit Order Books).

Definition 3.2 (General Compound Hawkes Process With N-state Dependent Orders (GCHPnSDO)).

Definition 3.3 (General Compound Hawkes Process with Two-State Dependent Orders (GCHP2SDO)).

Definition 3.4 (General Compound Hawkes Process with Dependent Orders (GCHPDO)).

Theorem 4.1 (Diffusion Limit for NLCHPnSDO).

Lemma 4.2 (LLN for NLCHPnSDO).

Theorem 4.3 (Diffusion limit for GCHPnSDO).

Lemma 4.4 (LLN for GCHPnSDO).

Corollary 1 (Diffusion Limit for GCHP2SDO).

Corollary 2 (LLN for GCHP2SDO).

Corollary 3 (Diffusion Limit for CHPDO).