A Composite Likelihood-based Approach for Change-point Detection in   Spatio-temporal Processes

Zifeng Zhao; Ting Fung Ma; Wai Leong Ng; Chun Yip Yau

arXiv:1904.06340·stat.ME·October 9, 2023

A Composite Likelihood-based Approach for Change-point Detection in Spatio-temporal Processes

Zifeng Zhao, Ting Fung Ma, Wai Leong Ng, Chun Yip Yau

PDF

Open Access

TL;DR

This paper introduces a computationally efficient composite likelihood-based method for detecting change-points in non-stationary spatio-temporal processes, achieving exact recovery and consistency without penalties.

Contribution

It develops a unified approach for change-point detection in spatio-temporal data, with theoretical guarantees and a practical algorithm, extending classical time series results to spatial-temporal settings.

Findings

01

Exact change-point recovery in spatio-temporal data

02

Consistency of change-point estimation without penalty

03

Effective application to precipitation data

Abstract

This paper develops a unified and computationally efficient method for change-point estimation along the time dimension in a non-stationary spatio-temporal process. By modeling a non-stationary spatio-temporal process as a piecewise stationary spatio-temporal process, we consider simultaneous estimation of the number and locations of change-points, and model parameters in each segment. A composite likelihood-based criterion is developed for change-point and parameters estimation. Under the framework of increasing domain asymptotics, theoretical results including consistency and distribution of the estimators are derived under mild conditions. In contrast to classical results in fixed dimensional time series that the localization error of change-point estimator is $O_{p} (1)$ , exact recovery of true change-points can be achieved in the spatio-temporal setting. More surprisingly, the…

Tables5

Table 1. Table 1 : Percentage of estimated change-points m ^ ^ 𝑚 \hat{m} among 1000 replications under various spatial size S 𝑆 S , temporal size T 𝑇 T , and signal levels ( δ ϕ , δ ρ ) subscript 𝛿 italic-ϕ subscript 𝛿 𝜌 (\delta_{\phi},\delta_{\rho}) .

$T$	$δ_{ϕ} \times 10$	$δ_{ρ} \times 10$	% of $\hat{m}$
			$S = 6^{2}$			$S = 8^{2}$			$S = 10^{2}$
			0	1	$\geq 2$	0	1	$\geq 2$	0	1	$\geq 2$
100	0	0	100	0	0	100	0	0	100	0	0
	2	0	63	37	0	29	71	1	2	98	0
	3	0	22	79	0	1	99	1	0	100	0
	0	6	81	18	2	39	60	2	7	91	2
	0	10	16	81	3	1	95	4	1	96	3
	2	2	54	47	0	14	86	1	0	100	0
	3	3	11	89	1	1	99	1	0	100	0
200	0	0	100	0	0	100	0	0	100	0	0
	2	0	20	81	0	1	99	0	0	100	0
	3	0	0	100	0	0	100	0	0	100	0
	0	6	38	61	1	13	82	5	0	98	2
	0	10	3	96	2	1	98	2	0	100	0
	2	2	10	91	0	0	100	0	0	100	0
	3	3	4	95	0	0	100	0	0	100	0

Table 2. Table 2 : Percentage of m ^ = 1 ^ 𝑚 1 \hat{m}=1 , and percentage of λ ^ = 0.5 ^ 𝜆 0.5 \hat{\lambda}=0.5 , mean, empirical standard deviation (esd), mean of 90% confidence interval (CI) of λ ^ ^ 𝜆 \hat{\lambda} , and empirical coverage probability (CP) of the 90% CI (given m ^ = m o = 1 ) \hat{m}=m_{o}=1) under various settings.

$δ_{ϕ} \times 10$	$δ_{ρ} \times 10$	$S$	% of $\hat{m}$ = 1	$\hat{λ}$ (given $\hat{m} = 1)$
				% of $\hat{λ}$ = 0.5	mean	esd	90% CI	CP
2	0	$6^{2}$	81	37	0.4923	0.0513	[0.4611, 0.5426]	88.0
		$8^{2}$	99	63	0.4964	0.0310	[0.4538, 0.5279]	87.6
		$10^{2}$	100	79	0.4979	0.0171	[0.4726, 0.5189]	91.6
3	0	$6^{2}$	100	81	0.4944	0.0326	[0.4695, 0.5368]	92.4
		$8^{2}$	100	81	0.5013	0.0137	[0.4722, 0.5145]	89.1
		$10^{2}$	100	100	0.5000	-	-	-
2	2	$6^{2}$	91	82	0.5046	0.0275	[0.4827, 0.5114]	90.6
		$8^{2}$	100	100	0.5000	-	-	-
		$10^{2}$	100	100	0.5000	-	-	-

Table 3. Table 3 : Percentage of m ^ ^ 𝑚 \hat{m} among 1000 replications, and percentage of λ ^ = 0.5 ^ 𝜆 0.5 \hat{\lambda}=0.5 , mean and empirical standard deviation (esd) of λ ^ ^ 𝜆 \hat{\lambda} (given m ^ = m o = 1 ^ 𝑚 subscript 𝑚 𝑜 1 \hat{m}=m_{o}=1 ) under various vanishing change sizes.

S

δ_{ϕ} \times 10

%

\hat{m}

\hat{λ}

(given

\hat{m} = 1)

0

1

\geq 2

% of

\hat{λ} = 0.5

mean

esd

δ_{ϕ} = S^{- 0.4}

6^{2}

2.38

40

60

0

43

0.4895

0.0314

10^{2}

1.58

25

75

0

68

0.4960

0.0168

30^{2}

0.66

0

100

0

100

0.5000

-

δ_{ϕ} = S^{- 0.5}

6^{2}

1.67

74

26

0

27

0.4773

0.0595

10^{2}

1.00

68

32

0

57

0.5025

0.0314

30^{2}

0.33

14

86

0

100

0.5000

-

Table 4. Table 4 : Percentage of estimated change-points m ^ ^ 𝑚 \hat{m} among 1000 replications by the CL-based estimator (without penalty terms) and the CLMDL estimator under various change sizes.

S

δ_{ϕ}

CL

CLMDL

δ_{ϕ}

CL

CLMDL

\times 10

% of

m

% of

m

\times 10

% of

m

% of

m

0

1

\geq 2

0

1

\geq 2

0

1

\geq 2

0

1

\geq 2

30^{2}

0

96

3

1

100

0

96

3

1

100

0

1

0

96

4

0

100

0

0.33 (

S^{- 0.5}

)

0

100

0

14

86

0

2

0

81

19

0

100

0

0.66 (

S^{- 0.4}

)

0

100

0

100

0

60^{2}

0

100

0

100

0

100

0

100

0

1

0

100

0

100

0

0.17 (

S^{- 0.5}

)

0

100

0

100

0

2

0

100

0

100

0

0.38 (

S^{- 0.4}

)

0

100

0

100

0

Table 5. Table 5 : Percentage of estimated change-points m ^ ^ 𝑚 \hat{m} among 1000 replications, and mean, empirical standard deviation (esd) of λ ^ ^ 𝜆 \hat{\lambda} (given m ^ = m o = 1 ^ 𝑚 subscript 𝑚 𝑜 1 \hat{m}=m_{o}=1 ) using misspecified model and true model.

$S$	$δ \times 10$	Misspecified model					True model
		% of $\hat{m}$			$\hat{λ}$ (given $\hat{m} = 1)$		% of $\hat{m}$			$\hat{λ}$ (given $\hat{m} = 1)$
		0	1	$\geq 2$	mean	esd	0	1	$\geq 2$	mean	esd
$6^{2}$	0	97	2	0	-	-	97	2	0	-	-
	5	94	6	0	0.4244	0.1135	93	7	0	0.4237	0.1136
	10	33	67	1	0.5109	0.0407	30	70	1	0.5097	0.0375
	15	1	98	1	0.5039	0.0214	1	98	1	0.5037	0.0199
$8^{2}$	0	99	1	0	-	-	99	1	0	-	-
	5	69	31	0	0.4886	0.0417	66	33	0	0.5104	0.0383
	10	0	99	1	0.5019	0.0111	0	99	1	0.4983	0.0109
	15	0	99	1	0.5008	0.0053	0	99	1	0.5007	0.0052

Equations90

Y = {y_{t, s} : t \in [1, T], s \in S} = {y_{t} : 1 \leq t \leq T, t \in N^{+}},

Y = {y_{t, s} : t \in [1, T], s \in S} = {y_{t} : 1 \leq t \leq T, t \in N^{+}},

\dot{x}_{t - τ_{j - 1}^{o}}^{(j)} = y_{t}, t = τ_{j - 1}^{o} + 1, \dots, τ_{j}^{o} .

\dot{x}_{t - τ_{j - 1}^{o}}^{(j)} = y_{t}, t = τ_{j - 1}^{o} + 1, \dots, τ_{j}^{o} .

Y = (\dot{x}_{1}^{(1)}, \dots, \dot{x}_{T_{1}^{o}}^{(1)}, \dot{x}_{1}^{(2)}, \dots, \dot{x}_{T_{2}^{o}}^{(2)}, \dots, \dot{x}_{1}^{(m_{o} + 1)}, \dots, \dot{x}_{T_{m_{o} + 1}^{o}}^{(m_{o} + 1)}) .

Y = (\dot{x}_{1}^{(1)}, \dots, \dot{x}_{T_{1}^{o}}^{(1)}, \dot{x}_{1}^{(2)}, \dots, \dot{x}_{T_{2}^{o}}^{(2)}, \dots, \dot{x}_{1}^{(m_{o} + 1)}, \dots, \dot{x}_{T_{m_{o} + 1}^{o}}^{(m_{o} + 1)}) .

y_{t, s} = μ_{t, s} + ε_{t, s} .

y_{t, s} = μ_{t, s} + ε_{t, s} .

y_{t, s} = μ + i = 1 \sum q ρ_{i} y_{t - i, s} + ε_{t, s},

y_{t, s} = μ + i = 1 \sum q ρ_{i} y_{t - i, s} + ε_{t, s},

L_{P} (θ; {x_{i}}_{i = 1}^{n}) = i, j \prod L (θ; x_{i}, x_{j})^{w_{i, j}},

L_{P} (θ; {x_{i}}_{i = 1}^{n}) = i, j \prod L (θ; x_{i}, x_{j})^{w_{i, j}},

\displaystyle P_{i,\mathcal{N}}^{(j)}=\bigcup_{t=1}^{T_{j}-i}\big{\{}(t,i,{\textbf{s}}_{1},{\textbf{s}}_{2}):{\textbf{s}}_{1}\in\mathcal{S},{\textbf{s}}_{2}\in\{{\textbf{s}}_{1}\}\cup\mathcal{N}({\textbf{s}}_{1})\big{\}},

\displaystyle P_{i,\mathcal{N}}^{(j)}=\bigcup_{t=1}^{T_{j}-i}\big{\{}(t,i,{\textbf{s}}_{1},{\textbf{s}}_{2}):{\textbf{s}}_{1}\in\mathcal{S},{\textbf{s}}_{2}\in\{{\textbf{s}}_{1}\}\cup\mathcal{N}({\textbf{s}}_{1})\big{\}},

D_{k, N}^{(j)} = i = 0 ⋃ k P_{i, N}^{(j)},

D_{k, N}^{(j)} = i = 0 ⋃ k P_{i, N}^{(j)},

PL (ψ_{j}; X_{j}) = (t, i, s_{1}, s_{2}) \in D_{k, N}^{(j)} \sum lo g f (x_{t, s_{1}}^{(j)}, x_{t + i, s_{2}}^{(j)}; ψ_{j}) = (t, i, s_{1}, s_{2}) \in D_{k, N}^{(j)} \sum l_{p ai r} (ψ_{j}; x_{t, s_{1}}^{(j)}, x_{t + i, s_{2}}^{(j)}) .

PL (ψ_{j}; X_{j}) = (t, i, s_{1}, s_{2}) \in D_{k, N}^{(j)} \sum lo g f (x_{t, s_{1}}^{(j)}, x_{t + i, s_{2}}^{(j)}; ψ_{j}) = (t, i, s_{1}, s_{2}) \in D_{k, N}^{(j)} \sum l_{p ai r} (ψ_{j}; x_{t, s_{1}}^{(j)}, x_{t + i, s_{2}}^{(j)}) .

L_{S T}^{(j)} (ψ_{j}; X_{j}) = PL (ψ_{j}; X_{j}) + (i, s) \in E_{k, N} \sum lo g f (x_{i, s}^{(j)}; ψ_{j}) + (i, s) \in E_{k, N} \sum lo g f (x_{T_{j} - i + 1, s}^{(j)}; ψ_{j})

L_{S T}^{(j)} (ψ_{j}; X_{j}) = PL (ψ_{j}; X_{j}) + (i, s) \in E_{k, N} \sum lo g f (x_{i, s}^{(j)}; ψ_{j}) + (i, s) \in E_{k, N} \sum lo g f (x_{T_{j} - i + 1, s}^{(j)}; ψ_{j})

:=

CL (Y) = CL (M) + CL (Y ∣ M),

CL (Y) = CL (M) + CL (Y ∣ M),

CL (M) = CL (m) + CL (λ_{1}) + \dots + CL (λ_{m}) + CL (ξ_{1}) + \dots + CL (ξ_{m + 1}) + CL (\hat{θ}_{1}) + \dots + CL (\hat{θ}_{m + 1}) .

CL (M) = CL (m) + CL (λ_{1}) + \dots + CL (λ_{m}) + CL (ξ_{1}) + \dots + CL (ξ_{m + 1}) + CL (\hat{θ}_{1}) + \dots + CL (\hat{θ}_{m + 1}) .

CL (M) = lo g_{2} (m + 1) + j = 1 \sum m + 1 lo g_{2} T_{j} + j = 1 \sum m + 1 i = 1 \sum c_{j} lo g_{2} ξ_{i, j} + j = 1 \sum m + 1 \frac{d _{j}}{2} (lo g_{2} T_{j} + lo g_{2} S),

CL (M) = lo g_{2} (m + 1) + j = 1 \sum m + 1 lo g_{2} T_{j} + j = 1 \sum m + 1 i = 1 \sum c_{j} lo g_{2} ξ_{i, j} + j = 1 \sum m + 1 \frac{d _{j}}{2} (lo g_{2} T_{j} + lo g_{2} S),

C_{k, N} = \frac{2 Card ( D _{k, N}^{(j)} ) + 2 Card ( E _{k, N} )}{S T _{j}} = \frac{\sum _{s \in S} ( 2 k + ( 2 k + 2 ) ∣ N ( s ) ∣ )}{S},

C_{k, N} = \frac{2 Card ( D _{k, N}^{(j)} ) + 2 Card ( E _{k, N} )}{S T _{j}} = \frac{\sum _{s \in S} ( 2 k + ( 2 k + 2 ) ∣ N ( s ) ∣ )}{S},

CLMDL (m, Λ, Ψ)

CLMDL (m, Λ, Ψ)

A_{ϵ_{λ}}^{m} = {Λ \in (0, 1)^{m} : 0 = λ_{0} < λ_{1} < \dots < λ_{m} < λ_{m + 1} = 1, λ_{j} - λ_{j - 1} \geq ϵ_{λ}, j = 1, \dots, m + 1} .

A_{ϵ_{λ}}^{m} = {Λ \in (0, 1)^{m} : 0 = λ_{0} < λ_{1} < \dots < λ_{m} < λ_{m + 1} = 1, λ_{j} - λ_{j - 1} \geq ϵ_{λ}, j = 1, \dots, m + 1} .

(\overset{m}{^}, \hat{Λ}_{S T}, \hat{Ψ}_{S T}) = m \leq M_{λ}, Λ \in A_{ϵ_{λ}}^{m}, Ψ \in M^{m + 1} arg min CLMDL (m, Λ, Ψ),

(\overset{m}{^}, \hat{Λ}_{S T}, \hat{Ψ}_{S T}) = m \leq M_{λ}, Λ \in A_{ϵ_{λ}}^{m}, Ψ \in M^{m + 1} arg min CLMDL (m, Λ, Ψ),

\hat{θ}_{j} = θ \in Θ (\hat{ξ}_{j}) arg max L_{S T}^{(j)} {(\hat{ξ}_{j}, θ); \hat{X}_{j}}

\hat{θ}_{j} = θ \in Θ (\hat{ξ}_{j}) arg max L_{S T}^{(j)} {(\hat{ξ}_{j}, θ); \hat{X}_{j}}

S, T sup (t, i, s_{1}, s_{2}) \in D_{k, N}^{(j)} sup E [θ \in Θ (ξ) sup ∣ l_{p ai r}^{[a]} {(ξ, θ); \overset{x}{˙}_{t, s_{1}}^{(j)}, \overset{x}{˙}_{t + i, s_{2}}^{(j)}} ∣^{r + ϵ}] < \infty,

S, T sup (t, i, s_{1}, s_{2}) \in D_{k, N}^{(j)} sup E [θ \in Θ (ξ) sup ∣ l_{p ai r}^{[a]} {(ξ, θ); \overset{x}{˙}_{t, s_{1}}^{(j)}, \overset{x}{˙}_{t + i, s_{2}}^{(j)}} ∣^{r + ϵ}] < \infty,

S, T sup (t, s) \in E_{k, N} sup E [θ \in Θ (ξ) sup ∣ l_{ma r g}^{[a]} {(ξ, θ); \overset{x}{˙}_{t, s}^{(j)}} ∣^{r + ϵ}] < \infty,

ψ_{j}^{o} = (ξ_{j}^{o}, θ_{j}^{o}) = ξ \in M, θ \in Θ (ξ) arg max \accentset L_{S T}^{(j)} {(ξ, θ)} .

ψ_{j}^{o} = (ξ_{j}^{o}, θ_{j}^{o}) = ξ \in M, θ \in Θ (ξ) arg max \accentset L_{S T}^{(j)} {(ξ, θ)} .

\displaystyle\alpha_{\mathbf{X}_{j}^{o}}(U,V)=\sup\big{\{}|P(A\cap B)-P(A)P(B)|:A\in\sigma_{\mathbf{X}_{j}^{o}}(U),B\in\sigma_{\mathbf{X}_{j}^{o}}(V)\big{\}}\,.

\displaystyle\alpha_{\mathbf{X}_{j}^{o}}(U,V)=\sup\big{\{}|P(A\cap B)-P(A)P(B)|:A\in\sigma_{\mathbf{X}_{j}^{o}}(U),B\in\sigma_{\mathbf{X}_{j}^{o}}(V)\big{\}}\,.

\displaystyle\alpha_{\mathbf{X}_{j}^{o}}(d;u,v)=\sup\big{\{}\alpha_{\mathbf{X}_{j}^{o}}(U,V):|U|\leq u,~{}|V|\leq v,~{}\rho(U,V)\geq d\big{\}}\,.

\displaystyle\alpha_{\mathbf{X}_{j}^{o}}(d;u,v)=\sup\big{\{}\alpha_{\mathbf{X}_{j}^{o}}(U,V):|U|\leq u,~{}|V|\leq v,~{}\rho(U,V)\geq d\big{\}}\,.

d = 1 \sum \infty (d + 1)^{3 (c - u + 1) - 1} [α_{X_{j}^{o}} (d; M u, M v)]^{ϵ / (c + ϵ)} < \infty .

d = 1 \sum \infty (d + 1)^{3 (c - u + 1) - 1} [α_{X_{j}^{o}} (d; M u, M v)]^{ϵ / (c + ϵ)} < \infty .

\displaystyle{\color[rgb]{0,0,0}\alpha_{\mathbf{X}_{j}^{o}}^{*}(d)=\sup\Big{\{}\alpha_{\mathbf{X}_{j}^{o}}\big{(}[t_{1},t_{2}]\times\mathcal{S}^{\prime},~{}[t_{2}+d,t_{3}]\times\mathcal{S}^{\prime}\big{)}:1\leq t_{1}\leq t_{2},~{}t_{2}+d\leq t_{3}<\infty,~{}\mathcal{S}^{\prime}\subset\mathcal{S}\Big{\}}},

\displaystyle{\color[rgb]{0,0,0}\alpha_{\mathbf{X}_{j}^{o}}^{*}(d)=\sup\Big{\{}\alpha_{\mathbf{X}_{j}^{o}}\big{(}[t_{1},t_{2}]\times\mathcal{S}^{\prime},~{}[t_{2}+d,t_{3}]\times\mathcal{S}^{\prime}\big{)}:1\leq t_{1}\leq t_{2},~{}t_{2}+d\leq t_{3}<\infty,~{}\mathcal{S}^{\prime}\subset\mathcal{S}\Big{\}}},

(i) S, T sup (t, i, s_{1}, s_{2}) \in D_{k, N}^{(j)} sup E [θ \in Θ (ξ) sup ∣ l_{p ai r}^{[1]} {(ξ, θ); \overset{x}{˙}_{t, s_{1}}^{(j)}, \overset{x}{˙}_{t + i, s_{2}}^{(j)}} ∣^{r + δ}] < \infty, and (ii) α_{X_{j}^{o}}^{*} (d) \leq C d^{- τ} .

(i) S, T sup (t, i, s_{1}, s_{2}) \in D_{k, N}^{(j)} sup E [θ \in Θ (ξ) sup ∣ l_{p ai r}^{[1]} {(ξ, θ); \overset{x}{˙}_{t, s_{1}}^{(j)}, \overset{x}{˙}_{t + i, s_{2}}^{(j)}} ∣^{r + δ}] < \infty, and (ii) α_{X_{j}^{o}}^{*} (d) \leq C d^{- τ} .

(i) \frac{1}{S T} \mbox V a r (L_{S T}^{^{'} (j)} (ψ_{j}^{o}; X_{j}^{o})) ⟶ Σ_{1}^{(j)}, (ii) - \frac{1}{S T} E (L_{S T}^{^{''} (j)} (ψ_{j}^{o}; X_{j}^{o})) ⟶ Σ_{2}^{(j)},

(i) \frac{1}{S T} \mbox V a r (L_{S T}^{^{'} (j)} (ψ_{j}^{o}; X_{j}^{o})) ⟶ Σ_{1}^{(j)}, (ii) - \frac{1}{S T} E (L_{S T}^{^{''} (j)} (ψ_{j}^{o}; X_{j}^{o})) ⟶ Σ_{2}^{(j)},

(\tilde{m}, \tilde{Λ}_{S T}, \tilde{Ψ}_{S T}) = m \leq M_{λ}, Λ \in A_{ϵ_{λ}}^{m}, Ψ \in M^{m + 1} arg min - j = 1 \sum m + 1 L_{S T}^{(j)} (ψ_{j}; X_{j}),

(\tilde{m}, \tilde{Λ}_{S T}, \tilde{Ψ}_{S T}) = m \leq M_{λ}, Λ \in A_{ϵ_{λ}}^{m}, Ψ \in M^{m + 1} arg min - j = 1 \sum m + 1 L_{S T}^{(j)} (ψ_{j}; X_{j}),

\tilde{m} = m_{o}, [T \tilde{Λ}_{S T}] = [T Λ^{o}],

\tilde{m} = m_{o}, [T \tilde{Λ}_{S T}] = [T Λ^{o}],

\overset{m}{^} = m_{o}, [T \hat{Λ}_{S T}] = [T Λ^{o}], and \hat{ξ}_{j} = ξ_{j}^{o}, \hat{θ}_{j} ⟶ θ_{j}^{o} for all j = 1, \dots, m_{o} + 1,

\overset{m}{^} = m_{o}, [T \hat{Λ}_{S T}] = [T Λ^{o}], and \hat{ξ}_{j} = ξ_{j}^{o}, \hat{θ}_{j} ⟶ θ_{j}^{o} for all j = 1, \dots, m_{o} + 1,

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Genetic and phenotypic traits in livestock · Economics of Agriculture and Food Markets

Full text

A Composite Likelihood-based Approach for Change-point Detection in Spatio-temporal Processes

Zifeng Zhao1, Ting Fung Ma2, Wai Leong Ng3, Chun Yip Yau4

University of Notre Dame1, University of South Carolina2,

Hang Seng University of Hong Kong3 and Chinese University of Hong Kong4

Abstract

This paper develops a unified and computationally efficient method for change-point estimation along the time dimension in a non-stationary spatio-temporal process. By modeling a non-stationary spatio-temporal process as a piecewise stationary spatio-temporal process, we consider simultaneous estimation of the number and locations of change-points, and model parameters in each segment. A composite likelihood-based criterion is developed for change-point and parameter estimation. Under the framework of increasing domain asymptotics, theoretical results including consistency and distribution of the estimators are derived under mild conditions. In contrast to classical results in fixed dimensional time series that the localization error of change-point estimator is $O_{p}(1)$ , exact recovery of true change-points is possible in the spatio-temporal setting. More surprisingly, the consistency of change-point estimation can be achieved without any penalty term in the criterion function. In addition, we further establish consistency of the change-point estimator under the infill asymptotics framework where the time domain is increasing while the spatial sampling domain is fixed. A computationally efficient pruned dynamic programming algorithm is developed for the challenging criterion optimization problem. Extensive simulation studies and an application to the U.S. precipitation data are provided to demonstrate the effectiveness and practicality of the proposed method.

Keywords: Dynamic programming; increasing domain asymptotics; infill asymptotics; multiple change-points; pairwise likelihood; asymptotic distribution.

1 Introduction

With the advances in data technology, large datasets observed sequentially over long periods are becoming increasingly available. For analyzing such data, stationary models are often inadequate. Instead of developing complicated models for describing the non-stationary behavior, it is often more intuitive and effective to incorporate a change-point model which segments the data into stationary pieces. This makes change-point analysis increasingly popular in recent decades across many applications, such as climate science (Killick et al.,, 2010; Lu et al.,, 2010; Kelly and Ó Gráda,, 2014), finance (Ang and Timmermann,, 2012; Fryzlewicz,, 2014), genetics (Shen and Zhang,, 2012; Fearnhead and Rigaill,, 2018), and signal processing (Wang et al.,, 2004; Harle et al.,, 2016).

Change-point estimation has been extensively studied for time series data, see e.g. Davis et al., (2006); Aue et al., (2009); Shao and Zhang, (2010); Matteson and James, (2014); Preuss et al., (2015); Jiang et al., (2021). Recently, change-point analysis has also gained popularity in high-dimensional statistics (Cho and Fryzlewicz,, 2015; Wang and Samworth,, 2018; Wang et al.,, 2021; Chen et al.,, 2022) and functional data analysis (Berkes et al.,, 2009; Aston and Kirch,, 2012; Aue et al.,, 2018). However, change-point estimation for spatio-temporal processes remains largely unexplored with only a handful of works in the literature. Moreover, existing methods are subject to various limitations. For example, Bayesian methods such as Majumdar et al., (2005) and Altieri et al., (2015) usually require very specific model structures without theoretical guarantees and are computationally intensive. Gromenko et al., (2017) focuses on at most one change-point in the mean function of a spatio-temporal process. Furthermore, all existing literature require the separability of space-time covariance. The major difficulty of change-point analysis for spatio-temporal processes stems from the theoretical and computational challenges of spatio-temporal modeling under the increase in dimensions of both space and time with the presence of unknown change-points.

In this paper, we propose a likelihood-based procedure for multiple change-point estimation along the time dimension in a spatio-temporal process with spatial locations at a possibly irregular grid. We take this approach since the parametric model, such as regression based mean functions and Matérn class based space-time covariance functions, is one of the main workhorses in the spatio-temporal literature. Our procedure adopts pairwise likelihood to alleviate computational difficulty of full likelihood for spatio-temporal data while maintaining statistical efficiency. The proposed method is a general approach that can handle a wide range of spatio-temporal models including both separable and non-separable space-time covariance. Furthermore, it works under model misspecification and can detect changes beyond first and second moments. In spatial statistics, there are mainly two asymptotic framework, namely the increasing domain asymptotics and the spatial infill asymptotics. The choice of asymptotic frameworks plays an important role in establishing theoretical properties of the statistical methodology. Both frameworks are common in practice, see Stein, (1999), Zhang and Zimmerman, (2005) and Bevilacqua et al., (2020) for more discussions. Thus, for completeness, we establish the theoretical results of the proposed change-point estimation procedure under both the increasing domain and infill asymptotics in the spatial dimension.

The contribution of this paper is two-fold. First, in terms of statistical theory, we show that in contrast to the commonly used likelihood-based change-point estimation approach for multivariate time series (e.g. Davis et al.,, 2006; Ma and Yau,, 2016), the pairwise likelihood for spatio-temporal data induces an edge effect around each change-point. This edge effect is non-ignorable under the spatio-temporal setting as the spatial dimension grows and can cause inconsistency of change-point estimation. To tackle this problem, we carefully design a compensating mechanism which modifies the pairwise likelihood by introducing a marginal likelihood term to correct the edge effect.

Interestingly, unlike traditional criterion functions such as BIC and minimum description length (MDL) which involve a penalty term, the consistency of the change-point estimation can be achieved solely by the modified pairwise likelihood. Moreover, in contrast to classical results in fixed dimensional time series that the asymptotic error of change-point estimation is $O_{p}(1)$ , we show that exact recovery of true change-points is possible in the spatio-temporal setting under the increasing domain asymptotics. To further achieve consistent model selection for each segment and enhance finite sample performance, an MDL-based criterion function is developed. We prove that, even under possible model misspecification, the number and locations of change-points can be consistently estimated under mild conditions. The asymptotic distributions of the estimated change-points and the spatio-temporal model parameters in each stationary segment are also derived. In addition, we further establish the consistency of the number and locations of the change-point estimator under infill asymptotics where the time domain is increasing while the spatial sampling domain is fixed. To the best of our knowledge, this paper is the first to study change-point estimation in spatio-temporal data under the spatial infill setting.

Second, in terms of statistical computing, we develop a computationally efficient algorithm for the optimization of the criterion function. Computational feasibility is a major challenge in change-point estimation since it involves optimization over a large number of change-point configurations. Popular optimization methods, such as binary segmentation (Vostrikova,, 1981) and genetic algorithm (Davis et al.,, 2006), are fast but only provide approximate solutions. On the other hand, dynamic programming (Jackson et al.,, 2005) provides exact solutions but incurs quadratic computational cost. In this paper, we adapt the pruned exact linear time (PELT) algorithm in Killick et al., (2012) (originally designed for univariate time series) to the spatio-temporal setting. The new algorithm is computationally efficient and proves to provide an asymptotically exact solution.

We remark that change-point estimation based on pairwise likelihood with an MDL penalty has previously been studied in Ma and Yau, (2016) under the multivariate time series setting. However, as evident from our discussion above, there are notable differences between the two works. First, our work focuses on the more complex spatio-temporal setting with a growing spatial dimension. Importantly, we show the pairwise likelihood procedure in Ma and Yau, (2016) leads to inconsistent change-point estimation due to an edge effect under our setting (see details later), and we instead design a new criterion function with both pairwise and marginal likelihood to achieve consistency. Second, we operate under both the increasing domain and spatial infill asymptotics, and discover new phenomena for change-point estimation such as exact recovery and consistency without penalty, which requires substantially different technical arguments than the ones in Ma and Yau, (2016).

The rest of the paper is organized as follows. Section 2 provides the background and derivation of the composite likelihood based criterion for change-point estimation. The main results under increasing domain asymptotics including estimation consistency and asymptotic distribution of the estimators are presented in Section 3. Numerical experiments and an application to the U.S. precipitation data are given in Section 4. Section 5 concludes. All technical proofs, detailed results under the infill asymptotics, and additional numerical studies can be found in the supplement.

2 Background

2.1 Settings and notations

On a set of spatial locations $\mathcal{S}$ with cardinality $S=\text{Card}(\mathcal{S})$ , consider a spatio-temporal process

[TABLE]

where $\mathbf{y}_{t}=\{y_{t,{\textbf{s}}}:{\textbf{s}}\in\mathcal{S}\}$ denotes the observations of all spatial locations at time $t$ , and given any two positive integers $t_{1}\leq t_{2}$ , we denote $[t_{1},t_{2}]$ as the set $\{t_{1},t_{1}+1,\cdots,t_{2}\}.$ There are in total $S\cdot T$ observations. We focus on $\mathcal{S}\subset\mathbb{R}^{2}$ , while our result can be easily generalized to $\mathbb{R}^{d}$ with $d\geq 3$ .

Data generating process: We assume that $\mathbf{Y}$ can be partitioned into $m_{o}+1$ stationary segments along the time dimension111Note that potentially there can also be structural breaks across the space $\mathcal{S}$ , however, since there is no natural order for $\mathcal{S}$ , the search space is much larger, so here we do not consider this case.. In other words, there are $m_{o}$ unknown change-points $0<\tau_{1}^{o}<\cdots<\tau_{m_{o}}^{o}<T$ in the spatio-temporal process. Our asymptotic results hold given that as $S,T$ increase, the normalized change-point $\tau_{j}^{o}/T$ converges to a limit $\lambda_{j}^{o}$ for all $j$ , where $0<\lambda_{1}^{o}<\cdots<\lambda_{m_{o}}^{o}<1$ . However, for clarity of presentation, in the rest of the paper, we simply set $\tau_{j}^{o}=[T\lambda_{j}^{o}]$ for $j=1,\cdots,m_{o}$ , where for a number $z\in\mathbb{R}_{\geq 0}$ , we denote $[z]$ as the closest integer to $z$ . For notational convenience, we further define $\tau_{0}^{o}=0$ and $\tau_{m_{o}+1}^{o}=T$ , and $\lambda_{0}^{o}=0$ and $\lambda_{m_{o}+1}^{o}=1.$

Let $T_{j}^{o}=\tau_{j}^{o}-\tau_{j-1}^{o}$ be the length of the $j$ th stationary segment for $j=1,\ldots,m_{o}+1$ . For convenience, we can re-index the $j$ th stationary segment of $\mathbf{Y}$ as a stationary process $\mathbf{X}_{j}^{o}=\{\dot{\mathbf{x}}_{t}^{(j)}:t\in[1,T_{j}^{o}]\}$ with $\dot{\mathbf{x}}_{t}^{(j)}=\{\dot{x}_{t,{\textbf{s}}}^{(j)}:{\textbf{s}}\in\mathcal{S}\}$ , such that

[TABLE]

The observed spatio-temporal process $\mathbf{Y}$ can then be written as

[TABLE]

As in Davis et al., (2008) and Aue et al., (2009), we first assume for simplicity that the data across different segments $\mathbf{X}_{j}^{o}$ ( $j=1,\ldots,m_{o}+1$ ) are independent. This assumption is commonly found in spatio-temporal change-point literature (e.g. Altieri et al.,, 2015; Gromenko et al.,, 2017). We refer to Remark 3 and LABEL:sec:relaxIS of the supplement for discussions on its relaxations.

Model and parameterization: We adopt a parametric approach and model each stationary segment by a member of a pre-specified finite class of models, $\mathcal{M}$ . Each element in $\mathcal{M}$ is a model indexed by an integer-valued vector $\xi$ that represents the model order. In other words, $\xi$ determines the form of the parametric model. Given $\xi$ , the model can be fully specified by a $d(\xi)$ -dimensional parameter $\theta(\xi)$ in a compact parameter space $\Theta(\xi)\subset\mathbb{R}^{d(\xi)}$ . We refer to $\xi$ as the model order and $\theta(\xi)$ as the model parameter.

For the $j$ th stationary segment $\mathbf{X}_{j}^{o}$ , we assume there exists a pseudo-true parametric model in $\mathcal{M}$ , indexed by a model order vector $\xi_{j}^{o}$ of dimension $c_{j}^{o}$ , that provides the best fit for the data (see 3 later for the technical definition). Given $\xi_{j}^{o}$ , the exact model for $\mathbf{X}_{j}^{o}$ is fully specified by a model parameter $\theta_{j}^{o}=\theta_{j}(\xi_{j}^{o})$ of dimension $d_{j}^{o}=d_{j}(\xi_{j}^{o})$ . Importantly, note that we do not require $\mathcal{M}$ to cover the true data generating process of $\mathbf{X}_{j}^{o}$ . This modeling framework is flexible. In particular, on each stationary segment, it allows a general spatio-temporal model such that

[TABLE]

Here, $\mu_{t,{\textbf{s}}}$ is the mean process which can take various regression forms such as $z_{t,{\textbf{s}}}^{\top}\beta$ with $z_{t,{\textbf{s}}}$ being the covariate associated with $(t,{\textbf{s}})$ , and $\varepsilon_{t,{\textbf{s}}}$ is the error process whose space-time covariance can take various parametric forms of separable or non-separable spatio-temporal dependence.

Notation: Denote $\psi_{j}^{o}=(\xi_{j}^{o},\theta_{j}^{o})$ as the true model parameter set for the $j$ th stationary segment $\mathbf{X}_{j}^{o}=\{\dot{\mathbf{x}}_{1}^{(j)},\ldots,\dot{\mathbf{x}}_{T_{j}^{o}}^{(j)}\}$ and denote $\Psi^{o}=\{\psi_{1}^{o},\ldots,\psi_{m_{o}+1}^{o}\}$ . Furthermore, denote $\Lambda^{o}=\{\lambda_{1}^{o},\ldots,\lambda_{m_{o}}^{o}\}$ as the set of true (normalized) change-points. To avoid confusion, in the following, we use $\Lambda=(\lambda_{1},\cdots,\lambda_{m})$ to denote a generic set of $m$ change-points. Furthermore, we denote $\mathbf{X}_{1},\mathbf{X}_{2},\cdots,\mathbf{X}_{m+1}$ as a generic partition of $\mathbf{Y}$ imposed by $\Lambda$ , where $\mathbf{X}_{j}=\{\mathbf{x}_{1}^{(j)},\ldots,\mathbf{x}_{T_{j}}^{(j)}\}$ is of length $T_{j}=\tau_{j}-\tau_{j-1}$ with $\tau_{j}=[T\lambda_{j}].$ Note that $\mathbf{X}_{1},\mathbf{X}_{2},\cdots,\mathbf{X}_{m+1}$ depend on $\Lambda$ implicitly, which is suppressed for notational simplicity. In particular, we have that $\mathbf{X}_{j}=\mathbf{X}_{j}^{o}$ for all $j=1,2,\cdots,m_{o}+1$ if $\Lambda=\Lambda^{o}$ . We use $\psi_{j}=(\xi_{j},\theta_{j})$ to denote a generic model parameter set for the $j$ th segment $\mathbf{X}_{j}$ , where $\xi_{j}$ is of dimension $c_{j}$ and $\theta_{j}$ is of dimension $d_{j}.$ Denote $\Psi=\{\psi_{1},\cdots,\psi_{m+1}\}$ .

An illustrative example: To build more intuition, we conclude this section with a concrete example where the model class $\mathcal{M}$ consists of Gaussian space-time AR (STAR) models. On a stationary segment, an STAR model of order $q$ takes the form

[TABLE]

where $\varepsilon_{t}=\{\varepsilon_{t,{\textbf{s}}},{\textbf{s}}\in\mathcal{S}\}$ is a Gaussian process that can take $K$ possible parametric forms of spatial dependence, such as exponential or Matérn covariance. Thus, the model order can be specified by $\xi=(i_{1},i_{2})$ , where $i_{1}\in\{1,\cdots,p\}$ indicates the temporal autoregressive order and $i_{2}\in\{1,\cdots,K\}$ indicates the parametric form of the spatial dependence. Given $\xi$ , the model parameter $\theta(\xi)$ consists of the temporal AR coefficients $(\mu,\rho_{1},\cdots,\rho_{q})$ and the spatial covariance parameters of $\varepsilon_{t}$ .

2.2 Composite likelihood and pairwise likelihood

Although (full) likelihood based methods generally achieve high statistical efficiency, when the likelihood function involves high-dimensional inverse covariance matrices or integrals, computations become infeasible. To overcome this limitation, Lindsay, (1988) considers the composite likelihood, which is a weighted product of likelihoods for some subsets of the data. By specifying the subsets, different classes of composite likelihood are obtained. One popular class is the pairwise likelihood (PL), which is the product of the bivariate densities of all possible pairs of observations,

[TABLE]

where $w_{i,j}$ are the weights. Composite likelihood often enjoys computational efficiency while statistical efficiency is retained; see Lindsay, (1988); Varin et al., (2011). Owing to its flexibility and attractive asymptotic properties, composite likelihood has been widely used in genetics (Larribe and Fearnhead,, 2011), longitudinal data (Bartolucci and Lupparelli,, 2016), time series (Davis and Yau,, 2011), and spatio-temporal statistics (Bevilacqua et al.,, 2012; Huser and Davison,, 2014).

Given a generic segment $\mathbf{X}_{j}=\{\mathbf{x}_{1}^{(j)},\ldots,\mathbf{x}_{T_{j}}^{(j)}\}$ and a model parameter $\psi_{j}$ , the classical pairwise likelihood is defined as the product of the bivariate densities $f(\cdot,\cdot;\psi_{j})$ of each distinct pair $(x_{t,{\textbf{s}}}^{(j)},x_{t^{\prime},{\textbf{s}}^{\prime}}^{(j)})$ in $\mathbf{X}_{j}$ . However, in many spatio-temporal processes, the dependence between observations diminishes quickly as the time lag or spatial distance increases. Therefore, it suffices to consider pairs in $\mathbf{X}_{j}$ that are up to a small time lag $k$ and a small spatial distance $d$ apart.

In particular, define $\mathcal{N}\equiv\mathcal{N}({\textbf{s}})=\{{\textbf{s}}^{\prime}|{\textbf{s}}^{\prime}\in\mathcal{S},{\textbf{s}}^{\prime}\neq{\textbf{s}},\text{dist}({\textbf{s}},{\textbf{s}}^{\prime})\leq d\}$ as a distance-based neighborhood of location s, where the distance can be Euclidean. Note that $\mathcal{N}$ depends on $d$ implicitly. Given $(k,\mathcal{N})$ , for a time lag $0<i\leq k$ , define the index set

[TABLE]

which collects all pairs of observations $(x_{t,{\textbf{s}}_{1}}^{(j)},x_{t+i,{\textbf{s}}_{2}}^{(j)})$ in $\mathbf{X}_{j}$ that are exactly $i$ time units apart and at most $d$ spatial distance away. For the time lag $i=0$ , we further define $P_{0,\mathcal{N}}^{(j)}=\bigcup_{t=1}^{T_{j}}\big{\{}(t,0,{\textbf{s}}_{1},{\textbf{s}}_{2}):{\textbf{s}}_{1}\in\mathcal{S},{\textbf{s}}_{2}\in\mathcal{N}({\textbf{s}}_{1})\big{\}}$ . We then define

[TABLE]

which is the collection of pairs of distinct observations in $\mathbf{X}_{j}$ that are at most $k$ time units and $d$ spatial distance apart. The pairwise log-likelihood of $\mathbf{X}_{j}$ is then defined as

[TABLE]

For the choices of $k$ and $\mathcal{N}$ (i.e. $d$ ), intuitively, a large neighborhood can be used if there exists strong spatial correlation across $\mathcal{S}$ , and a small neighborhood should be favored if the spatial correlation is weak; see Varin and Vidoni, (2005) and Bai et al., (2012). On the other hand, similar to Ma and Yau, (2016), if the main focus is estimating change-points rather than model parameters, it usually suffices to use the smallest $k$ and $d$ that ensure identifiability of the models in the candidate model set $\mathcal{M}$ . See more discussions below Assumption 3 in Section 3.

2.3 Edge effect and a remedial composite likelihood

By the definition of $\mathcal{D}_{k,\mathcal{N}}^{(j)}$ in (4), each data point in a generic segment $\mathbf{X}_{j}=\{\mathbf{x}_{1}^{(j)},\ldots,\mathbf{x}_{T_{j}}^{(j)}\}$ may not appear in $\mathrm{PL}(\psi_{j};\mathbf{X}_{j})$ for the same number of times. For example, $\mathbf{x}_{1}^{(j)}$ can only be paired with $\{\mathbf{x}_{t}^{(j)}:t=1,\ldots,k+1\}$ , and thus appears approximately half frequently compared to observations $\mathbf{x}_{\tilde{t}}^{(j)}$ , $k+1\leq\tilde{t}\leq T_{j}-k$ , which can be paired with $\{\mathbf{x}_{t}^{(j)}:\tilde{t}-k\leq t\leq\tilde{t}+k\}$ . This can be viewed as that different weights are implicitly assigned to the observations in $\mathbf{X}_{j}$ , and observations on the edge of a segment receive less weights. We refer to this phenomenon of the pairwise likelihood (PL) defined in equation (4) as the edge effect.

To determine whether a change-point $\tau$ exists on a segment $\mathbf{X}_{j}$ , we need to compare two log-likelihood quantities: the pairwise log-likelihood formed by $\{\mathbf{x}_{t}^{(j)}:1\leq t\leq T_{j}\}$ , and the sum of pairwise log-likelihoods formed by $\{\mathbf{x}_{t}^{(j)}:1\leq t\leq\tau\}$ and $\{\mathbf{x}_{t}^{(j)}:\tau+1\leq t<T_{j}\}$ . For the latter quantity, all observations within $k$ time units from $\tau$ will suffer from the edge effect and receive less weights, which thus causes that the latter quantity has $O(S)$ fewer terms than the former one.

The edge effect is negligible if the spatial dimension $S$ is fixed since it is of order $O_{p}(1)$ , as is in the multivariate time series setting. However, when $S\longrightarrow\infty$ , the edge effect is non-ignorable and can cause inconsistency of the PL based method. In Section LABEL:sec:edge_effect of the supplement, through a simple and intuitive example, we show that due to the fact $S\longrightarrow\infty$ in the spatio-temporal setting, the edge effect can cause false positives in the PL based change-point estimation asymptotically with probability 1, which is further confirmed by an accompanying simulation study.

To correct the edge effect of PL and achieve consistency of change-point estimation, we design a compensating mechanism for the missing pairwise log-likelihoods on the edge of each segment, so that each data point appears the same number of times in the likelihood function. In particular, the mechanism introduces additional marginal log-likelihoods for data points observed on the edge $t=(1,\ldots,k)\cup(T_{j}-k+1,\ldots,T_{j})$ . Based on the compensating mechanism, we propose a newly designed composite likelihood that takes the form

[TABLE]

where $l_{marg}(\psi;x)=\log f(x;\psi)$ , $l_{pair}(\psi;x_{1},x_{2})=\log f(x_{1},x_{2},\psi)$ , and $\mathcal{E}_{k,\mathcal{N}}=\bigcup_{i=1}^{k}\{(i,{\textbf{s}}):{\textbf{s}}\in\mathcal{S},\text{~{}repeat }(i,{\textbf{s}})\text{ by }(k-i+1)(1+|\mathcal{N}({\textbf{s}})|)\text{ times}\}$ denotes the collection of marginal log-likelihoods used for correcting the edge effect.

2.4 Derivation of the criterion

With the modified composite likelihood, in this section we derive a criterion function for estimating the change-points and model parameters in each segment. The criterion is based on the minimum description length (MDL) principle, which aims to select the best-fitting model that requires the minimum amount of code length to store the data (Rissanen,, 2012), and has been shown to have promising performance for change-point estimation, see Davis et al., (2006, 2008); Lu et al., (2010).

One classical way to construct the MDL is the two-stage approach (Hansen and Yu,, 2001; Lee,, 2001), which splits the code length, $\mathrm{CL}(\mathbf{Y})$ , into two components:

[TABLE]

where $\mathrm{CL}(\widehat{\mathcal{M}})$ is the code length for the fitted model $\widehat{\mathcal{M}}$ and $\mathrm{CL}(\mathbf{Y}\mid\widehat{\mathcal{M}})$ is the information in $\mathbf{Y}$ unexplained by $\widehat{\mathcal{M}}$ . Recall that the model parameter set of the $j$ th segment is specified by $\psi_{j}=(\xi_{j},\theta_{j})$ , where $\xi_{j}$ is the model order and $\theta_{j}$ is the model parameter. Given $\xi_{j}$ , the composite likelihood estimator of $\theta_{j}$ is obtained by $\hat{\theta}_{j}=\operatorname*{arg\,max}_{\theta\in\Theta(\xi_{j})}L_{ST}^{(j)}\{(\xi_{j},\theta);\mathbf{X}_{j}\}$ , where $L_{ST}^{(j)}$ is defined in (2.3). Since the fitted model $\widehat{\mathcal{M}}$ can be completely described by $m$ , $\lambda_{j}$ ’s and $\psi_{j}$ ’s, we have

[TABLE]

Note that $(\lambda_{1},\ldots,\lambda_{m})$ contains information equivalent to the integer-valued vector $(T_{1},\ldots,T_{m+1})$ , and the code length for an integer $I$ is approximately $\log_{2}I$ . From Hansen and Yu, (2001) and Lee, (2001), the code length for an estimate of a real-valued parameter depends on the precision by the optimal quantization of the parameter space, which is related to the standard error of the estimate. In particular, under the increasing domain asymptotics and some regularity conditions, a parameter estimator computed from $N$ observations is $1/\sqrt{N}$ -consistent, and hence the code length for an estimate is $(\log_{2}N)/2$ . Thus, we have

[TABLE]

where $\xi_{j}=(\xi_{1,j},\ldots,\xi_{c_{j},j})$ .

We remark that since the segment length $T_{j}$ is upper bounded by $T$ and there are finite number of models in $\mathcal{M}$ , the uniform code $({m+1})\log_{2}T$ and $(m+1)\log_{2}|\mathcal{M}|$ can instead be used for the second and third terms in (6) and will lead to the same asymptotic results. However, in finite sample, we find that $\log_{2}T_{j}$ achieves higher detection power due to its smaller magnitude, and $\sum_{i=1}^{c_{j}}\log_{2}\xi_{i,j}$ encourages parsimony as lower-index models in $\mathcal{M}$ are less complex.

As demonstrated by Rissanen, (2012), $\mathrm{CL}(\mathbf{Y}\mid\widehat{\mathcal{M}})=-\log_{2}L$ , where $L$ is the maximized full likelihood. By regarding the composite likelihood as a proxy to the full likelihood and using logarithm base $e$ rather than base $2$ , the sum of the negative of (2.3) can be used to define $\mathrm{CL}(\mathbf{Y}\mid\widehat{\mathcal{M}})$ . However, by construction, each data point may appear several times in the composite likelihood function. In particular, for any $\mathbf{X}_{j}$ with length $T_{j}\geq 2k$ , the average number of times that an observation is used in the composite log-likelihood $L_{ST}^{(j)}(\psi_{j},\mathbf{X}_{j})$ equals to

[TABLE]

where $\mbox{Card}(\cdot)$ denotes the cardinality of a set. Indeed, if the data are identically and independently distributed, the composite log-likelihood is a product of marginal densities and $L_{ST}^{(j)}(\psi_{j},\mathbf{X}_{j})$ is essentially $C_{k,\mathcal{N}}$ times the full log-likelihood. Thus, we compensate the code length by multiplying $\mathrm{CL}(\widehat{\mathcal{M}})$ by a factor $C_{k,\mathcal{N}}$ and define the composite likelihood-MDL (CLMDL) criterion as

[TABLE]

We remark that unlike the classical MDL criterion, the proposed CLMDL criterion in (8) may no longer be interpreted as a code-length function since the adjusted composite likelihood is used. Nevertheless, as will be seen, this construction still allows one to balance between the lack of fit and model complexity, and offers consistent estimation of change-points and model parameters.

CLMDL-based estimation: We estimate the unknown true parameters $(m_{o},\Lambda^{o},\Psi^{o})$ by minimizing the CLMDL criterion $\eqref{MDLform}$ . To ensure identifiability of the change-points, we impose a tuning parameter $\epsilon_{\lambda}\in(0,1/2)$ such that $\min_{1\leq j\leq m+1}|\lambda_{j}-\lambda_{j-1}|\geq\epsilon_{\lambda}$ . In other words, we consider

[TABLE]

Thus, the number of change-points is upper bounded by $M_{\lambda}:=[1/\epsilon_{\lambda}-1]$ . For theoretical validity, we require $\epsilon_{\lambda}\leq\epsilon_{\lambda}^{o}$ , where $\epsilon_{\lambda}^{o}=\min_{1\leq j\leq m_{o}+1}(\lambda_{j}^{o}-\lambda_{j-1}^{o})$ is the minimum spacing between true change-points. This is a common assumption in the change-point estimation literature for parametric models, see e.g. Andrews, (1993), Davis et al., (2006), Ling, (2014), Ma and Yau, (2016), and Romano et al., (2022). A sensitivity analysis is conducted in LABEL:sec:add_num (LABEL:add_simu_epsilon) of the supplement, which shows CLMDL is robust to the choice of $\epsilon_{\lambda}$ . Denote $\mathcal{M}^{m+1}$ as the Cartesian product of $\mathcal{M}$ .

The estimated number and locations of change-points and parameters in each segment are thus

[TABLE]

where $\hat{\Lambda}_{ST}=(\hat{\lambda}_{1},\ldots,\hat{\lambda}_{\hat{m}})$ and $\hat{\Psi}_{ST}=(\hat{\psi}_{1},\ldots,\hat{\psi}_{\hat{m}+1})$ with $\hat{\psi}_{j}=(\hat{\xi}_{j},\hat{\theta}_{j})$ . Note that

[TABLE]

is the composite likelihood estimator of the model parameters of the $j$ th estimated segment of the spatio-temporal process, i.e. $\hat{\mathbf{X}}_{j}=\left\{\mathbf{y}_{t}:[T\hat{\lambda}_{j-1}]+1\leq t\leq[T\hat{\lambda}_{j}]\right\}$ .

3 Main Results under Increasing Domain Asymptotics

In this section, we first impose some mild regularity conditions on the composite log-likelihood and the strong-mixing coefficients of the spatio-temporal process, and then present the main results.

For the asymptotic theory, we assume that the piecewise stationary spatio-temporal process is generated by a random field in a (possibly unevenly spaced) lattice $\mathcal{Z}=\mathbb{N}^{+}\times\mathcal{S}\subset\mathbb{N}^{+}\times\mathbb{R}^{2}$ . The data $\mathbf{Y}$ is observed on $\mathcal{Z}_{n}=\mathcal{T}_{n}\times\mathcal{S}_{n}$ with $\mathcal{T}_{n}=[1,T_{n}]$ and $|\mathcal{S}_{n}|=S_{n}$ . In other words, we have $\mathbf{Y}=\{y_{t,{\textbf{s}}}:t\in[1,T_{n}],~{}{\textbf{s}}\in\mathcal{S}_{n}\}$ . The asymptotic theory is based on $n\longrightarrow\infty$ , where the number of observations $|\mathcal{Z}_{n}|=S_{n}T_{n}>|\mathcal{Z}_{n^{\prime}}|=S_{n^{\prime}}T_{n^{\prime}}$ whenever $n>n^{\prime}$ . For notational simplicity, in the following we use $(S,T)$ instead of $(S_{n},T_{n})$ when there is no possibility of confusion.

We define a metric $\rho$ on $\mathcal{Z}$ by $\rho({\textbf{d}}_{1},{\textbf{d}}_{2})=\max(|t_{2}-t_{1}|,|s_{2}^{1}-s_{1}^{1}|,|s_{2}^{2}-s_{1}^{2}|)$ , where ${\textbf{d}}_{i}=(t_{i},{\textbf{s}}_{i})$ , ${\textbf{s}}_{i}=(s_{i}^{1},s_{i}^{2})$ , $i=1,2$ , denote any two points in $\mathcal{Z}$ . The distance between any two subsets $U,V\in\mathcal{Z}$ is further defined as $\rho(U,V)=\inf\{\rho({\textbf{d}}_{1},{\textbf{d}}_{2}):{\textbf{d}}_{1}\in U\text{ and }{\textbf{d}}_{2}\in V\}\,.$ In this section, the theoretical results are established under the increasing domain asymptotics framework as in Jenish and Prucha, (2009) and Bai et al., (2012), which is made explicit by Assumption 1.

Assumption 1.

The lattice $\mathcal{Z}\subset\mathbb{N}^{+}\times\mathbb{R}^{2}$ is countably infinite. All elements in $\mathcal{Z}$ are located at distances of at least $\rho_{0}>0$ from each other, i.e., for all ${\textbf{d}}_{1},{\textbf{d}}_{2}\in\mathcal{Z},$ we have $\rho({\textbf{d}}_{1},{\textbf{d}}_{2})\geq\rho_{0}$ .

Assumption 1 allows for unevenly spaced locations and general forms of sample regions, which is often encountered in real data. By Assumption 1, we can assume the maximum cardinality of the neighborhood set $\mathcal{N}({\textbf{s}})$ is bounded by a constant $B_{\mathcal{N}}$ . Throughout this section, we assume the time lag used in CLMDL is $k$ and the maximum cardinality of $\mathcal{N}({\textbf{s}})$ in CLMDL is $B_{\mathcal{N}}$ .

Assumption 2 ( $r$ ).

*There exists an $\epsilon>0$ such that for any fixed model order $\xi$ , we have

(i) for $a=0$ and each stationary segment $\mathbf{X}_{j}^{o}$ , $j=1,\ldots,m_{o}+1$ ,*

[TABLE]

*(ii) for $a=1,2$ , the above moment conditions hold with $r=2$ ,

where $l_{marg}$ and $l_{pair}$ are defined in (2.3) and $[a]$ stands for the $a$ th order derivative w.r.t. $\theta$ .*

Assumption 2( $r$ ) requires that the composite log-likelihood (2.3) is twice continuously differentiable and has a finite $(r+\epsilon)$ th moment, and its first and second order derivatives have a finite $(2+\epsilon)$ th moment. Note that if the data $\mathbf{Y}$ is observed on a regular lattice, by the piecewise stationarity assumption, the supremum w.r.t. $(S,T)$ can be dropped in Assumption 2( $r$ ).

Given $\psi$ , define the expected log-likelihood of each stationary segment as $\accentset{\rule{2.79996pt}{1.0pt}}{L}_{ST}^{(j)}(\psi)=\mathbb{E}\{L_{ST}^{(j)}(\psi;\mathbf{X}_{j}^{o})\}$ . The derivatives $\accentset{\rule{2.79996pt}{1.0pt}}{L}_{ST}^{{}^{\prime}(j)}(\psi)=\mathbb{E}\{L_{ST}^{{}^{\prime}(j)}(\psi;\mathbf{X}_{j}^{o})\}$ and $\accentset{\rule{2.79996pt}{1.0pt}}{L}_{ST}^{{}^{\prime\prime}(j)}(\psi)=\mathbb{E}\{L_{ST}^{{}^{\prime\prime}(j)}(\psi;\mathbf{X}_{j}^{o})\}$ are defined similarly.

Assumption 3.

For each stationary segment $\mathbf{X}_{j}^{o}$ of the random field, where $j=1,\ldots,m_{o}+1$ ,

(i) there exists a model $\xi_{j}^{o}\in\mathcal{M}$ with a parameter $\theta_{j}^{o}\in\mathbb{R}^{d_{j}^{o}}$ satisfying

[TABLE]

The model order $\xi_{j}^{o}$ is uniquely identifiable in the sense that if there exists another model $(\xi_{j}^{*},\theta_{j}^{*})\neq(\xi_{j}^{o},\theta_{j}^{o})$ with $\theta_{j}^{*}\in\mathbb{R}^{d_{j}^{*}}$ and $\accentset{\rule{2.79996pt}{1.0pt}}{L}^{(j)}_{ST}\{(\xi_{j}^{*},\theta_{j}^{*})\}=\accentset{\rule{2.79996pt}{1.0pt}}{L}^{(j)}_{ST}\{(\xi_{j}^{o},\theta_{j}^{o})\}$ , then $d_{j}^{*}>d_{j}^{o}$ . Moreover, for any $\delta>0$ , we have $\sup\limits_{S,T}\frac{1}{ST}\left(\sup_{\|\theta-\theta^{o}_{j}\|_{2}>\delta}\accentset{\rule{2.79996pt}{1.0pt}}{L}^{(j)}_{ST}\{(\xi_{j}^{o},\theta)\}-\accentset{\rule{2.79996pt}{1.0pt}}{L}^{(j)}_{ST}\{(\xi_{j}^{o},\theta_{j}^{o})\}\right)<0$ . The same holds for $(\xi_{j}^{*},\theta_{j}^{*}).$

(ii) $\sup\limits_{S,T}\frac{1}{ST}\left(\accentset{\rule{2.79996pt}{1.0pt}}{L}^{(j)}_{ST}(\psi_{j-1}^{o})-\accentset{\rule{2.79996pt}{1.0pt}}{L}^{(j)}_{ST}(\psi_{j}^{o})\right)<0$ and $\sup\limits_{S,T}\frac{1}{ST}\left(\accentset{\rule{2.79996pt}{1.0pt}}{L}^{(j-1)}_{ST}(\psi_{j}^{o})-\accentset{\rule{2.79996pt}{1.0pt}}{L}^{(j-1)}_{ST}(\psi_{j-1}^{o})\right)<0$ , where $\psi_{j-1}^{o}$ and $\psi_{j}^{o}$ are defined in (i).

Similar as above, if the data $\bf Y$ is observed on a regular lattice, then by the piecewise stationarity assumption, the supremum w.r.t. $(S,T)$ can be dropped. Note that Assumption 3 does not require that the stationary process is from the model class $\mathcal{M}$ . Instead, Assumption 3(i) only assumes the existence of a pseudo-true model $\psi_{j}^{o}$ in $\mathcal{M}$ , which is of the simplest form. That is, the model cannot be expressed by another model $\psi_{j}^{*}=(\xi_{j}^{*},\theta_{j}^{*})$ in $\mathcal{M}$ , where $\theta_{j}^{*}$ is of a smaller dimension. The last statement in Assumption 3(i) asserts that for any model order $\xi^{*}_{j}$ , the point $\theta^{*}_{j}$ is the unique parameter value that maximizes the expected composite likelihood.

Assumption 3(ii) rules out the degenerate case that the $(j-1)$ th and $j$ th stationary segments are indistinguishable by the composite likelihood. Assumption 3(ii) may fail if two adjacent segments follow different models but have the same expected composite likelihood. In LABEL:subsec:pathological of the supplement, we give two pathological examples where such scenarios may happen. In principle, this situation can always be avoided by using a larger time lag $k$ and spatial neighborhood $\mathcal{N}$ when defining (2.3). As noted by Ma and Yau, (2016), this situation seldom occurs in practice and $k=1$ or $2$ is usually sufficient to distinguish between stationary segments.

For more intuition regarding 3(ii), consider the case where two adjacent segments share the same model order $\xi_{j-1}^{o}=\xi_{j}^{o}$ , and thus $\theta_{j-1}^{o}$ and $\theta_{j}^{o}$ can be directly compared. In view of Assumption 3(i) and boundedness of the first order derivatives in 2(ii), for Assumption 3(ii) to hold, it is equivalent to require $\|\theta_{j-1}^{o}-\theta_{j}^{o}\|_{2}>\epsilon$ for some $\epsilon>0$ . In other words, the change size of the model parameter is non-vanishing. In Section 3.1.1, we further show that 3(ii) can indeed be relaxed to allow vanishing change sizes under additional conditions.

The next two assumptions regulate the dependence structure of the spatio-temporal process by imposing mild $\alpha$ -mixing conditions on the underlying random field. For the $j$ th stationary segment $\mathbf{X}_{j}^{o}$ of the random field, denote $\sigma_{\mathbf{X}_{j}^{o}}(U)=\sigma(\dot{x}_{t,{\textbf{s}}}^{(j)}:(t,{\textbf{s}})\in U)$ as the sigma field generated by the random variables in the index set $U\subset\mathbb{N}^{+}\times\mathcal{S}.$ Define

[TABLE]

The $\alpha$ -mixing coefficient for the $j$ th stationary segment $\mathbf{X}_{j}^{o}$ is defined as

[TABLE]

Also, denote $M=(1+k)(1+B_{\mathcal{N}})$ , where $2M$ is an upper bound of the number of composite likelihood components $l_{pair}$ that any point $(t,{\textbf{s}})$ can have in $\mathcal{D}_{k,\mathcal{N}}^{(j)}$ .

Assumption 4 ( $r$ ).

For each stationary segment $\mathbf{X}_{j}^{o}$ , where $j=1,\ldots,m_{o}+1$ , there exist some $\epsilon>0$ and $c\in 2\mathbb{N}^{+}$ where $c>r$ , such that for all $u,v\in\mathbb{N}^{+}$ , $u+v\leq c$ , $u,v\geq 2$ , we have

[TABLE]

Assumption 4 requires a polynomial decay rate for the $\alpha$ -mixing coefficient of the random field, which is mild and is used for invoking the moment inequality in Doukhan, (1994) to control the asymptotic size of the deviation of the composite likelihood from its expectation. Note that the mixing rate in Assumption 4 depends on $M=(1+k)(1+B_{\mathcal{N}})$ . This is intuitive since a longer time lag $k$ and a larger neighborhood size $B_{\mathcal{N}}$ induce a slightly higher dependence among the composite likelihood and thus require stronger conditions on the mixing coefficients. Define also

[TABLE]

which characterizes the dependence of the spatio-temporal process along the time dimension, and can be regarded as an analogue to the classical $\alpha$ -mixing coefficient in the time series setting.

Assumption 5.

For each stationary segment $\mathbf{X}_{j}^{o}$ of the random field, where $j=1,\ldots,m_{o}+1$ , there exist $r>2$ , $\delta>0$ , $\tau>r(r+\delta)/(2\delta)$ and $C>0$ , such that for any fixed model order $\xi$ ,

[TABLE]

Assumption 5(i) is a moment condition on the first order derivative of the composite likelihood. Assumption 5(ii) is a mild mixing condition along the time dimension and is used for invoking an maximal moment inequality in Yang, (2007) that controls the asymptotic size of the first order derivative of the composite likelihood at the true parameter value. Note that for a fixed moment condition $c=r+\delta$ , the slowest polynomial decay rate for the mixing condition is $\tau>\frac{c}{c-2}$ , which is achieved when $r\to 2$ . Thus, a higher order moment condition on the first order derivative requires a weaker mixing condition. Next, we impose Assumptions 6 and 7, which are standard in establishing the asymptotic distribution of the parameter estimator $\hat{\theta}_{j}$ in a stationary segment; see Assumption 3 in Jenish and Prucha, (2009) and Assumptions 6–8 in Bai et al., (2012).

Assumption 6.

For each stationary segment $\mathbf{X}_{j}^{o}$ of the random field, where $j=1,\ldots,m_{o}+1$ , there exists some $\delta>0$ such that (i) $\sum_{d=1}^{\infty}\alpha_{\mathbf{X}_{j}^{o}}(d;M,M)d^{3(2+\delta)/\delta-1}<\infty$ , (ii) $\sum_{d=1}^{\infty}d^{2}\alpha_{\mathbf{X}_{j}^{o}}(d;Mu,Mv)<\infty$ for $u+v\leq 4$ , and (iii) $\alpha_{\mathbf{X}_{j}^{o}}(d;M,\infty)=O(d^{-3-\epsilon})$ for some $\epsilon>0$ .

Assumption 7.

For each stationary segment $\mathbf{X}_{j}^{o}$ of the random field, we have

[TABLE]

where $j=1,\ldots,m_{o}+1$ , and $\Sigma_{1}^{(j)}$ and $\Sigma_{2}^{(j)}$ are positive definite matrices.

Remark 1 (Mixing conditions).

Define $\alpha_{\mathbf{X}_{j}^{o}}(d):=\sup_{u\geq 1}\sup_{v\geq 1}\alpha_{\mathbf{X}_{j}^{o}}(d;u,v)$ , which is a mixing coefficient commonly used in the random field literature, see Berkes and Morrow, (1981) and Doukhan, (1994) (Section 1.3). Clearly, we have $\alpha_{\mathbf{X}_{j}^{o}}^{*}(d)\leq\alpha_{\mathbf{X}_{j}^{o}}(d).$ Thus, all mixing conditions in Assumptions 4, 5(ii) and 6 hold if $\alpha_{\mathbf{X}_{j}^{o}}(d)$ decays at a sufficiently fast polynomial rate. This is a mild condition and is satisfied by many widely used spatio-temporal parametric models, such as a Gaussian random field with a separable space-time Matérn covariance function or with other popular non-separable space-time covariance functions proposed in Stein, (2005) and Fuentes et al., (2008). To conserve space, we refer to LABEL:sec:alphMixing of the supplement for more details.

3.1 Consistency of CLMDL under increasing domain asymptotics

We now present the theoretical results. We begin with a somewhat surprising finding on the consistency of change-point estimation using the composite likelihood function alone without any penalty for model complexity. It requires the existence of the $(2+\epsilon)$ th moments of the composite likelihood in (2.3) and its derivatives, and a mild divergence rate requirement between $T$ and $S$ .

Analogous to the CLMDL based estimator in (10), define the CL based estimator

[TABLE]

where $\tilde{\Lambda}_{ST}=(\tilde{\lambda}_{1},\ldots,\tilde{\lambda}_{\tilde{m}})$ , $\tilde{\Psi}_{ST}=(\tilde{\psi}_{1},\ldots,\tilde{\psi}_{\tilde{m}+1})$ with $\tilde{\psi}_{j}=(\tilde{\xi}_{j},\tilde{\theta}_{j})$ , and the estimated model parameter is $\tilde{\theta}_{j}=\operatorname*{arg\,max}_{\theta\in\Theta(\tilde{\xi}_{j})}L_{ST}^{(j)}\{(\tilde{\xi}_{j},\theta);{\tilde{\mathbf{X}}_{j}}\}$ with $\tilde{\mathbf{X}}_{j}=\Big{\{}\mathbf{y}_{t}:[T\tilde{\lambda}_{j-1}]+1\leq t\leq[T\tilde{\lambda}_{j}]\Big{\}}$ .

Proposition 1.

Let $\mathbf{Y}$ be a piecewise stationary spatio-temporal process specified by $(m_{o},\Lambda^{o},\Psi^{o})$ , and Assumptions 1, 2( $r$ ), 3 and 4( $r$ ) hold with some $r>2$ .

(i). If $S,T\longrightarrow\infty$ , with probability going to 1, we have $\tilde{m}\geq m_{o}$ and for each $j=1,\ldots,m_{o}$ , there exists a $\tilde{\lambda}_{i_{j}}\in\tilde{\Lambda}_{ST},1\leq i_{j}\leq\tilde{m}$ , such that $[T\lambda_{j}^{o}]=[T\tilde{\lambda}_{i_{j}}]$ .

(ii). If, in addition, $T\cdot S^{-r/2}\longrightarrow 0$ and Assumption 5 holds, with probability going to 1, we have

[TABLE]

and $\tilde{\theta}_{j}$ has a dimension greater than or equal to $d_{j}^{o}=d_{j}(\xi^{o}_{j})$ for $j=1,\ldots,m_{o}+1$ .

The consistency of the CL-based change-point estimator in Proposition 1(ii) can be viewed as the opposite phenomenon to the inconsistency caused by the edge effect discussed in Section 2.3. It is due to the compensating mechanism of the proposed composite likelihood (which is designed to correct the edge effect) and the fact that the spatial dimension $S$ diverges (i.e. $S\longrightarrow\infty$ ).

The high-level intuition is as follows. Consider the $j$ th stationary segment $\mathbf{X}_{j}^{o}$ . By assigning one false change-point to $\mathbf{X}_{j}^{o}$ , the difference introduced on the CL criterion function in (12) is the difference between the dropped pairwise likelihood and the corresponding compensating marginal likelihood evaluated at $\theta_{j}^{o}$ (and a $O_{p}(1)$ term due to the estimation error of model parameters). Compared to the dropped pairwise log-likelihood, the compensating marginal log-likelihood incorrectly imposes temporal independence on the stationary segment. Thus, by the definition of $\theta_{j}^{o}$ in Assumption 3(i), the expected value of this difference is positive and is of order $O(S)$ . Based on a moment inequality, we can further show the actual value of this difference is concentrated around its expectation, and therefore is positive and of order $O_{p}(S)$ . Thus, a model with an over-estimated change-point will have a significantly larger criterion function value than the true model, and will not be selected. In contrast, in the classical time series setting, $S$ is fixed and the consistency of $m_{o}$ needs to be achieved by an extra penalty term in the criterion function to suppress over-estimation.

Proposition 1 only guarantees consistency in change-point estimation but not model selection. Indeed, introducing the MDL penalty term ensures consistency of model selection in each segment and improves the finite sample performance in terms of reducing false positives when there is no change-point. Theorem 1 states the consistency of our procedure based on the CLMDL criterion.

Theorem 1.

Let $\mathbf{Y}$ be a piecewise stationary spatio-temporal process specified by $(m_{o},\Lambda^{o},\Psi^{o})$ , and Assumptions 1, 2( $r$ ), 3 and 4( $r$ ) hold with some $r>2$ . Consider the CLMDL based estimator $(\hat{m},\hat{\Lambda}_{ST},\hat{\Psi}_{ST})$ defined in (10).

(i). If $S,T\longrightarrow\infty$ , with probability going to 1, we have $\hat{m}\geq m_{o}$ and for each $j=1,\ldots,m_{o}$ , there exists a $\hat{\lambda}_{i_{j}}\in\hat{\Lambda}_{ST},1\leq i_{j}\leq\hat{m}$ , such that $[T\lambda_{j}^{o}]=[T\hat{\lambda}_{i_{j}}]$ .

(ii). If, in addition, $T\cdot S^{-r/2}\longrightarrow 0$ and Assumption 5 holds, we further have

[TABLE]

with probability going to 1.

In Proposition 1 and Theorem 1, we have $\mathbb{P}([T\hat{\Lambda}_{ST}]=[T\Lambda^{o}])\longrightarrow 1$ , which indicates that asymptotically we can recover the exact location of the change-points without error. This result is different from the one under the multivariate time series setting, where an $O_{p}(1)$ error is observed asymptotically. This difference is due to the expansion of the spatial domain $\mathcal{S}$ under the spatio-temporal setting, which intuitively provides more information for the detection of change-points.

Remark 2 (Infill asymptotics).

An alternative asymptotic framework commonly used in the spatial statistics literature is the infill asymptotics, where the time domain is increasing while the spatial sampling domain is fixed. In LABEL:sec:main_infill of the supplement, we modify the CLMDL criterion to tailor to the infill setting and further establish its estimation consistency in LABEL:asym_infill. Intuitively, under the infill setting, since we are sampling from a bounded spatial domain, the increasing sample size $S$ along the space dimension may or may not accumulate more information for change-point estimation. In particular, we show that exact recovery of the change-point is possible if the change happens in a microergodic parameter. Roughly speaking, a parameter is microergodic if a higher spatial sampling resolution improves its estimation accuracy (see Stein, (1999) and Zhang, (2004)). For changes in a non-microergodic parameter, the localization error achieves the classical $O_{p}(1)$ rate. To conserve space, we refer to LABEL:sec:main_infill of the supplement for more details.

To guarantee the consistency of the estimated number of change-points, Proposition 1 and Theorem 1 require an additional polynomial rate condition on the divergence rate between $S$ and $T$ , i.e. $T\cdot S^{-r/2}\longrightarrow 0$ . This technical condition is needed for controlling the asymptotic size of terms due to the compensating mechanism via a union bound. Note that a higher moment condition $r$ implies a less restrictive divergence rate requirement. When $\mathbf{Y}$ is Gaussian, all moments of the composite likelihood exist, and the rate requirement becomes minimal. Indeed, if the spatio-temporal process $\mathbf{Y}$ is observed on a regular lattice, then under an additional assumption on the mixing coefficients, a finer result in terms of the divergence rate between $S$ and $T$ can be obtained.

Theorem 2.

Let $\mathbf{Y}$ be a piecewise stationary spatio-temporal process specified by $(m_{o},\Lambda^{o},\Psi^{o})$ , and Assumptions 1, 2( $r$ ), 3, 4( $r$ ) and 5 hold with $r>2$ . Moreover, assume that

[TABLE]

for some $C>0$ , $\tau>2(1+\eta)(q+\delta)/(q-2+\delta)$ with some $q>2$ , $\delta>0$ and $0<\eta<1/2$ . Then, for the estimator $(\hat{m},\hat{\Lambda}_{ST},\hat{\Psi}_{ST})$ defined in (10), with probability going to 1, we have

[TABLE]

provided that $S,T\longrightarrow\infty$ with $\log T/S\longrightarrow 0$ .

Remark 3.

The consistency results given in Proposition 1, Theorems 1 and 2 are based on the independence assumption across stationary segments, which is commonly used in the change-point estimation literature. In LABEL:sec:relaxIS of the supplement (see LABEL:consistency_relaxIS), we further relax this assumption and show that, similar to the classical time series setting, CLMDL can still achieve consistency with a $O_{p}(1)$ localization error rate when there is weak dependence across segments.

3.1.1 Vanishing change sizes

The exact recovery phenomenon in Theorems 1 and 2 suggest CLMDL may further work with vanishing change sizes under the increasing domain asymptotics, which is confirmed in this subsection.

To facilitate a transparent and intuitive definition of change sizes, in this subsection, we assume the model order of the pseudo-true parametric model in $\mathcal{M}$ are the same across the $m_{o}+1$ stationary segment, i.e. $\xi_{1}^{o}=\cdots=\xi_{m_{o}+1}^{o}=\xi^{o}$ . Therefore, the pseudo-true model parameters $\theta_{j}^{o}$ of all $j=1,\cdots,m_{o}+1$ stationary segments share the same meaning and same dimension $d^{o}=d(\xi^{o})$ , and can be directly compared. Specifically, the parameter change at the $j$ th true change-point can be defined as $\Delta_{j}=\theta_{j+1}^{o}-\theta_{j}^{o}$ and the change size can be measured by its $l_{2}$ -norm $\|\Delta_{j}\|_{2}.$ Otherwise, if $\xi_{j}^{o}\neq\xi_{j+1}^{o}$ , the difference between $\theta_{j}^{o}$ and $\theta_{j+1}^{o}$ is not well defined and we need to quantify the change size via the expected composite log-likelihood, which is less intuitive and interpretable222With tedious notations, we can extend our result to the case where the pseudo-true model orders $\{\xi_{1}^{o},\cdots,\xi_{m_{o}+1}^{o}\}$ are different but consist of nested models, such as space-time AR( $q$ ) models with $q=1,\cdots,p$ . In such case, $\theta_{j}^{o}$ can be of different dimensions but we can define the change size using the pseudo-true parameter $\theta_{j}^{*}$ at the highest model order, i.e. $\max\{\xi_{1}^{o},\cdots,\xi_{m_{o}+1}^{o}\}$ , which shares the same meaning and dimension thanks to the nested model nature..

To model the vanishing change size, we assume that $\Delta_{j}=\kappa\delta_{j}$ , where $\delta_{j}\in\mathbb{R}^{d^{o}}$ is a $d^{o}$ -dimensional vector with $\|\delta_{j}\|_{2}>0$ for $j=1,\cdots,m_{o}$ . The decay rate of the change size is controlled by $\kappa\in\mathbb{R}^{+}$ , which vanishes as $S,T$ increase, i.e. $\kappa=\kappa_{ST}\longrightarrow 0.$ Under this setting, it is easy to see that Assumption 3(ii) no longer holds, as the difference between $\theta_{j}^{o}$ and $\theta_{j+1}^{o}$ (and thus $\psi_{j}^{o}$ and $\psi_{j+1}^{o}$ ) converges to 0 as $S,T$ increase. Thus, a more delicate technical argument based on Taylor expansion is needed to quantify the difference between the composite likelihood functions in CLMDL, which requires the following stronger moment conditions on its derivatives.

Assumption 8 ( $r$ ).

2*(ii) holds for $a=1,2$ with some $r>2.$ *

Theorem 3 states that the CLMDL based estimator $(\hat{m},\hat{\Lambda},\hat{\Psi})$ defined in (10) can still achieve consistent estimation of change-points given that the change size $\kappa$ does not vanish too fast.

Theorem 3.

Let $\mathbf{Y}$ be a piecewise stationary spatio-temporal process specified by $(m_{o},\Lambda^{o},\Psi^{o})$ , and Assumptions 1, 2( $r$ ), 3(i), 4( $r$ ), 5 and 8(r) hold with some $r>2$ , and furthermore $T\cdot S^{-r/2}\longrightarrow 0$ . Suppose the change size satisfies that $T\cdot(S\kappa^{2})\longrightarrow\infty$ .

(i). If $\liminf\limits_{S,T\to\infty}S\kappa^{2}>0$ , with probability going to 1, we have

[TABLE]

(ii). If $S\kappa^{2}\longrightarrow 0$ , with probability going to 1, we have

[TABLE]

and $\hat{\theta}_{j}$ has a dimension greater than or equal to $d^{o}=d(\xi^{o})$ for $j=1,\cdots,m_{o}+1$ . If in addition, $S=O(T)$ , we further have $\hat{\xi}_{j}=\xi^{o}$ and $\hat{\theta}_{j}\longrightarrow\theta_{j}^{o}$ for $j=1,\cdots,m_{o}+1$ with probability going to 1.

Theorem 3 suggests that in some sense $S\kappa^{2}$ can be viewed as the effective (squared) change size of the problem. In particular, Theorem 3(i) states that as long as $S\kappa^{2}$ does not vanish as $S,T$ increase, CLMDL can still provide consistent estimation for $m_{o}$ and achieve exact recovery of all change-points $[T\Lambda^{o}]$ , which resembles Theorem 1(i). Interestingly, Theorem 3(i) indicates that exact recovery can be achieved not only for a diverging $S\kappa^{2}\longrightarrow\infty$ but also for a constant/converging $S\kappa^{2}\longrightarrow c>0$ . This in fact can be attributed to the compensating mechanism of the proposed composite likelihood, which is designed to correct the edge effect, and shares a similar intuition as that of Proposition 1. We refer to Remark LABEL:remark:hdmean of the supplement for more details.

Theorem 3(ii) states that under a vanishing $S\kappa^{2}$ , as long as $T\cdot(S\kappa^{2})\longrightarrow\infty$ , CLMDL again consistently estimates $m_{o}$ , though no longer achieves exact recovery of $[T\Lambda^{o}].$ However, CLMDL still gives consistent estimation of the normalized change-points, as by (17), we have $\big{|}\hat{\lambda}_{j}-\lambda_{j}^{o}\big{|}=O_{p}(\sqrt{1/(ST\kappa^{2})})=o_{p}(1)$ . Due to the bias caused by the localization error of $[T\hat{\lambda}_{j}]$ , CLMDL may over-estimate the model order $\xi^{o}$ on each segment unless $S=O(T).$

Theorem 3(ii) also indicates that the existence of a higher moment $r$ leads to a faster localization error rate. In particular, if the moment conditions hold for all $r>2$ , (17) implies that $\big{|}[T\hat{\lambda}_{j}]-[T\lambda_{j}^{o}]\big{|}=O_{p}((S\kappa^{2})^{-1-\delta})$ for any $\delta>0.$ The presence of $r$ is due to the use of a moment inequality in Doukhan, (1994), which works for a general random field with mild $\alpha$ -mixing conditions. Indeed, we show in LABEL:remark:sharper_rate of the supplement that under some additional martingale difference or linear process assumptions on the score function, by invoking the generalized Hájek-Rényi inequality in Bai, (1994), the localization error rate (17) can be sharpened to $O_{p}((S\kappa^{2})^{-1})$ as long as $r>2.$ However, it seems unnatural and difficult to cast a general spatio-temporal process into the framework of linear processes, and we thus opt to use a general (albeit less sharp) moment inequality for CLMDL.

To our best knowledge, the consistency and exact recovery property of CLMDL for multiple change-points estimation in a parametric spatio-temporal model are the first time seen in the literature. This is substantial as parametric modeling is one of the main workhorses for spatio-temporal analysis. For simplicity and clarity of presentation, we assume all changes $\Delta_{j},~{}j=1,\cdots,m_{o}$ vanish. Indeed, by combining the arguments in Theorems 1 and 3, we can show CLMDL works when both vanishing and non-vanishing changes exist. We omit the details to conserve space.

We remark that the exact recovery phenomenon has been previously observed in the change-point literature. For example, Bai, (2010) and its extension Bhattacharjee et al., (2019) study single change-point estimation for non-parametric high-dimensional mean change and show that exact recovery can be achieved when the change size is large in certain sense. However, the technical arguments used in the results for CLMDL are substantially different. In particular, unlike the mean change problem which retains linearity, due to the parametric nature of our problem, the objective function of CLMDL is more complex and does not have a closed-form solution as a result of its non-linearity. On the other hand, there is also some intuitive connection between the localization error rate of CLMDL in Theorem 3 and the rate derived in the high-dimensional mean change literature. We refer to LABEL:remark:hdmean of the supplement for a more detailed discussion.

3.2 Asymptotic distribution under increasing domain asymptotics

We now investigate the asymptotic distribution of the change-point estimator. Theorems 1(ii), 2 and 3(i) indicate that under the increasing domain asymptotics, the integer-valued change-points can be recovered exactly. Thus, there is no non-degenerate asymptotic distribution for the estimated change-points $\hat{\Lambda}_{ST}$ . Nevertheless, the following Theorem 4 provides some insights on the finite-sample behavior of $\hat{\Lambda}_{ST}$ . For simplicity, we only present the result for $k=1$ , the result for $k>1$ is similar but notationally more complicated. Define $E_{1}=\{({\textbf{s}}_{1},{\textbf{s}}_{2}):{\textbf{s}}_{1}\in\mathcal{S},{\textbf{s}}_{2}\in{\textbf{s}}_{1}\cup\mathcal{N}({\textbf{s}}_{1})\}$ and $E_{2}=\{{\textbf{s}}\in\mathcal{S}:\text{repeat }{\textbf{s}}\text{ by }1+|\mathcal{N}({\textbf{s}})|\text{ times}\}$ .

For $q>0$ , define

[TABLE]

where $D_{1}(q)=D_{10}(q)\cup D_{11}(q)$ with $D_{10}(q)=\bigcup_{t=1}^{q}\{(t,0,{\textbf{s}}_{1},{\textbf{s}}_{2}):{\textbf{s}}_{1}\in\mathcal{S},{\textbf{s}}_{2}\in\mathcal{N}({\textbf{s}}_{1})\}$ and $D_{11}(q)=\bigcup_{t=1}^{q-1}\{(t,1,{\textbf{s}}_{1},{\textbf{s}}_{2}):{\textbf{s}}_{1}\in\mathcal{S},{\textbf{s}}_{2}\in{\textbf{s}}_{1}\cup\mathcal{N}({\textbf{s}}_{1})\}$ .

For $q<0$ , define

[TABLE]

where $D_{2}(q)=D_{20}(q)\cup D_{21}(q)$ with $D_{20}(q)=\bigcup_{t=T_{j}^{o}+q+1}^{T_{j}^{o}}\{(t,0,{\textbf{s}}_{1},{\textbf{s}}_{2}):{\textbf{s}}_{1}\in\mathcal{S},{\textbf{s}}_{2}\in\mathcal{N}({\textbf{s}}_{1})\}$ and $D_{21}(q)=\bigcup_{t=T_{j}^{o}+q+1}^{T_{j}-1}\{(t,1,{\textbf{s}}_{1},{\textbf{s}}_{2}):{\textbf{s}}_{1}\in\mathcal{S},{\textbf{s}}_{2}\in{\textbf{s}}_{1}\cup\mathcal{N}({\textbf{s}}_{1})\}$ . Note that $A_{i}$ ’s and $B_{i}$ ’s quantify the effects of the estimation error $[T\hat{\lambda}_{j}]-[T\lambda_{j}^{o}]=q$ on the CLMDL. For $j=1,\ldots,m_{o}$ , we define a double-sided random walk for the $j$ th change-point,

[TABLE]

Theorem 4 gives an approximation of the finite-sample behavior of $\hat{\Lambda}_{ST}$ and the asymptotic distribution of the estimated parameters $\hat{\theta}_{j}$ .

Theorem 4.

Suppose that the conditions in Theorems 1(ii) or 2 or 3(i) are satisfied, and Assumptions 6 and 7 hold. We have that

[TABLE]

as $S,T\longrightarrow\infty$ . If, additionally, $S=o(T)$ , we have for $j=1,\ldots,m_{o}+1$ ,

[TABLE]

Moreover, $\{\hat{\lambda}_{1},\ldots,\hat{\lambda}_{m_{o}}$ , $\hat{\theta}_{1},\ldots,\hat{\theta}_{m_{o}+1}\}$ are asymptotically independent.

From the proof of Theorems 1 and 3, it can be shown that

[TABLE]

which again indicates that the true change-points can be recovered without errors. Although $[T\hat{\lambda}_{j}]-[T\lambda_{j}^{o}]$ eventually converges to a degenerate distribution, as is shown by the numerical experiments in Section 4, $\arg\max_{q\in\mathbb{Z}}W_{ST}^{(j)}(q;\psi_{j}^{o},\psi_{j+1}^{o})$ can still give a reasonably accurate approximation to the finite-sample behavior of $\hat{\Lambda}_{ST}$ . The finite-sample approximation of $[T\hat{\lambda}_{j}]-[T\lambda_{j}^{o}]$ in Theorem 4 requires $S=o(T)$ . For the case where $S$ is greater than $o(T)$ , the approximation in (20) becomes inaccurate. The intuitive reason is that the distribution of $[T\hat{\lambda}_{j}]-[T\lambda_{j}^{o}]$ converges too fast towards its degenerate limit when more information from the spatial dimension is available.

Since a closed-form expression for the distribution function of $W_{ST}^{(j)}(\cdot,\psi^{o}_{j},\psi^{o}_{j+1})$ is unavailable, we need to simulate replicates of $W_{ST}^{(j)}(\cdot,\hat{\psi}_{j},\hat{\psi}_{j+1})$ to conduct inference. However, the double-sided random walk $W_{ST}^{(j)}(\cdot,\psi^{o}_{j},\psi^{o}_{j+1})$ depends not only on the pseudo-true parameters $\psi^{o}_{j}$ and $\psi^{o}_{j+1}$ , but also on the true distributions of the $j$ th and $(j+1)$ th segments. Hence, this simulation procedure is valid only if the true models of the $j$ th and $(j+1)$ th segments are known. In other words, if the true model is not included in $\mathcal{M}$ , $\hat{\Lambda}_{ST}$ is consistent, but inference cannot be made via Theorem 4.

4 Numerical Experiments

We begin this section by discussing the optimization algorithm for the minimization of CLMDL in (10) and then present the extensive simulation studies and a real data application.

Due to the additive form of CLMDL in (8), its minimization can be performed via dynamic programming (Jackson et al.,, 2005), which incurs a quadratic computational complexity $O(ST^{2})$ . To further lower the computational cost, we adapt the pruned exact linear time (PELT) algorithm proposed by Killick et al., (2012) (originally designed for univariate time series) to the spatio-temporal setting. A key component of PELT is to find a suitable threshold $K$ , which is used to prune unnecessary candidates in recursive computation of the dynamic programming, and thus lowers the computational complexity to between $O(ST)$ and $O(ST^{2})$ . Under the spatio-temporal setting, the threshold $K$ is more challenging to derive due to the non-ignorable edge effect when spatial dimension $S\longrightarrow\infty$ . Nevertheless, in LABEL:PELTK of the supplement, we provide a suitable choice of $K$ and establish the asymptotic validity of applying PELT for the minimization of CLMDL. We refer to LABEL:sec:PELT of the supplement for more details.

4.1 Simulation studies

Throughout the numerical experiments, we mainly consider the following four-parameter autoregressive spatial model,

[TABLE]

where $\mathbf{y}_{t}=\{y_{t,{\textbf{s}}}:{\textbf{s}}\in\mathcal{S}\}$ is defined on a regular two-dimensional grid $\mathcal{S}$ (see definition later) and $\boldsymbol{\varepsilon}_{t}=\{\varepsilon_{t,{\textbf{s}}}:{\textbf{s}}\in\mathcal{S}\}$ is a Gaussian process with exponential covariance function ${\rm Cov}(\varepsilon_{t,\textbf{s}},\varepsilon_{t,\textbf{s}^{\prime}})=\sigma^{2}\exp\{-\left\lVert\textbf{s}-\textbf{s}^{\prime}\right\lVert_{2}/\rho\}$ and ${\rm Cov}(\varepsilon_{t,\textbf{s}},\varepsilon_{t^{\prime},\textbf{s}^{\prime}})=0$ , when $t\neq t^{\prime}$ . The model is specified by $\theta=(\mu,\phi,\rho,\sigma^{2})^{\top}$ , where $\phi\in(-1,1)$ , $\rho>0$ and $\sigma^{2}>0$ . Spatial and temporal dependence are determined by $\rho$ and $\phi$ respectively. Meanwhile, $\mu$ and $\sigma^{2}$ control the overall mean and variance.

We investigate the performance of CLMDL under the increasing domain setting and define the spatial domain as $\mathcal{S}=\{(s_{1},s_{2}):s_{1},s_{2}\in\{1,2,3\ldots,s\}\}$ where the spatial sample size $S$ grows by increasing $s$ . For all simulation studies, we set $k=1$ and $d=2$ in defining the composite likelihood, set $\epsilon_{\lambda}=0.1$ in the optimization, and set the number of replications to be 1000.

Competing methods: To our knowledge, there is no natural competing method for CLMDL in the literature. Nevertheless, for illustration purposes, we compare CLMDL with Davis et al., (2006), which is an important work for multiple change-point estimation in parametric models of univariate time series, and with SBS in Cho and Fryzlewicz, (2015) and DCBS in Cho, (2016), which are important works that allow multiple change-point estimation in both the (non-parametric) mean and second-order structure of a high-dimensional time series. To conserve space, we refer to Section LABEL:sec:add_num of the supplement (LABEL:add_simu_hd and LABEL:add_simu_univariate) for the detailed comparison.

Additional simulation studies: In the supplement, we have further conducted numerical experiments examining the performance of CLMDL for multiple change-point estimation and model selection within each stationary segment, for change-point estimation under partial changes, and under the spatial infill setting, and for its robustness w.r.t. the tuning parameters $(k,d)$ and $\epsilon_{\lambda}$ . We refer to Section LABEL:sec:add_num of the supplement (Simulation LABEL:add_simu_multiCP-LABEL:add_simu_epsilon) for more details.

Simulation 1.

In this simulation, we examine the estimation accuracy of CLMDL under various sample sizes and signal levels. The underlying data generating process (DGP) in each stationary segment follows (22) with $\mu=0$ , hence the process is specified by $\theta=(\phi,\rho,\sigma^{2})^{\top}$ .

Let $\theta_{1}=(-0.5,0.6,1)^{\top}$ and $\theta_{2}=(-0.5+\delta_{\phi},0.6+\delta_{\rho},1)^{\top}$ be the underlying parameter vectors for the segments before and after the change-point, respectively. When there is no change-point (i.e. $\delta_{\phi}=\delta_{\rho}=0$ ), the entire process is simulated from $\theta_{1}$ , otherwise there is a change-point at $\lambda_{1}^{o}=0.5$ . We consider four scenarios corresponding to no change-point, change in temporal dependence ( $\delta_{\phi}$ ), change in spatial dependence ( $\delta_{\rho}$ ) and change in both spatial and temporal dependence. Note that the signal levels $\delta_{\phi},\delta_{\rho}$ are pre-fixed and do not vary with the sample sizes $S,T$ (i.e. non-vanishing).

Table 1 reports the estimated number of change-points under various settings. Under the no-change scenario, there is no false positive even when the sample size is small. Some over-estimation is observed for small sample when there is change in spatial dependence, which is probably due to the larger variation in estimating $\rho$ . The detection power improves when either $S$ , $T$ or signal level $(\delta_{\phi},\delta_{\rho})$ increases. For example, for $(\delta_{\phi}=0.2$ , $S=6^{2}$ , $T=100)$ , the detection power increases from 37% to 81% or 98% respectively, when $S$ increases to $10^{2}$ or $T$ increases to 200. To be expected, the proposed procedure is most powerful when there is change in both $\delta_{\phi}$ and $\delta_{\rho}$ .

Simulation 2.

In this simulation, we compare the empirical distribution of the change-point estimator and its asymptotic distribution as stated in Theorem 4, and further illustrate the exact recovery property of CLMDL in Theorem 1 under non-vanishing change sizes. The underlying DGP follows (22) with $\mu=0$ and $\theta_{1}=(-0.5,0.6,1)^{\top}$ and $\theta_{2}=(-0.5+\delta_{\phi},0.6+\delta_{\rho},1)^{\top}$ . We vary $S=6^{2},8^{2},10^{2}$ and fix $T_{1}^{o}=T_{2}^{o}=100$ . When the number of change-points is correctly estimated, 100 replicates of $W_{ST}(\cdot)$ are simulated using $\hat{\psi}_{1}$ and $\hat{\psi}_{2}$ to compute $\arg\max_{q\in\mathbb{Z}}W_{ST}(q;\hat{\psi}_{1},\hat{\psi}_{2})$ .

Table 2 summarizes the detailed simulation result. Clearly, the percentage that the estimated change-points equal the true ones, i.e. $\{\hat{\lambda}=\lambda^{o}\}$ , increases to 100% when the sample size increases, which demonstrates the exact recovery of the true change-points. Table 2 further reports the performance of the 90% confidence interval (CI) obtained from the quantiles of $\arg\max_{q\in\mathbb{Z}}W_{ST}(q;\hat{\psi}_{1},\hat{\psi}_{2})$ . Both the width of CI and empirical standard deviation (esd) decrease as sample size increases, and the empirical coverage probabilities are close to the nominal level. For more intuition, Figure 1 (top panel) gives the QQ plots of the empirical quantile of $\hat{\lambda}$ against its theoretical quantile based on $\arg\max_{q\in\mathbb{Z}}W_{ST}(q;\hat{\psi}_{1},\hat{\psi}_{2})$ in Theorem 4. Note that the QQ plot closely aligns with the 45 degree line. Figure 1 (bottom panel) depicts the histograms of $\hat{\lambda}$ . The distribution of $\hat{\lambda}$ is non-standard and becomes degenerate as the sample size increases, which aligns with the asymptotic results.

Simulation 3.

In this simulation, we further examine the performance of CLMDL under vanishing change sizes as studied in Theorem 3. The underlying DGP follows (22) with $\mu=0$ and $\theta_{1}=(-0.5,0.6,1)^{\top}$ and $\theta_{2}=(-0.5+\delta_{\phi},0.6,1)^{\top}$ . We fix $T_{1}^{o}=T_{2}^{o}=50$ and vary $S=6^{2},10^{2},30^{2}$ . We set the change size as $\delta_{\phi}=S^{-0.4}$ or $S^{-0.5}$ , which is a function of the sample size $S$ and vanishes as $S$ increases. To conserve space, we refer to LABEL:add_simu_vanishing in LABEL:sec:add_num of the supplement for a more thorough numerical study regarding CLMDL under vanishing change sizes.

Table 3 summarizes the simulation results. As can be seen, for both $\delta_{\phi}=S^{-0.4}$ and $S^{-0.5}$ , the detection power and estimation accuracy of CLMDL improve as $S$ increases (and the change size vanishes). Intuitively, CLMDL performs better under a slower decaying rate (i.e. $S^{-0.4}$ ) of change sizes. Note that exact recovery of $\lambda_{1}^{o}=0.5$ is possible even for $\delta_{\phi}=S^{-0.5}$ (though it requires a large sample size), which provides further numerical evidence for the theoretical results in Theorem 3.

Simulation 4.

In this simulation, we examine the consistency of the CL based estimator defined in (12), which has no penalty terms. The underlying DGP follows (22) with $\mu=0$ , and we set $\theta_{1}=(-0.5,0.6,1)^{\top}$ and $\theta_{2}=(-0.5+\delta_{\phi},0.6,1)^{\top}$ as the true parameters for the segments before and after the change-point. We consider change sizes of both fixed $\delta_{\phi}=0.1,0.2$ and vanishing $\delta_{\phi}=S^{-0.4},S^{-0.5}$ . We fix $T=100$ and vary $S=30^{2},60^{2}$ . When there is no change-point (i.e. $\delta_{\phi}=0$ ), the data is simulated from $\theta_{1}$ , otherwise there is a change-point at $\lambda_{1}^{o}=0.5$ .

Table 4 reports the estimated number of change-points by CLMDL and CL. Due to the lack of penalty, CL does experience false positives for $S=30^{2}$ . However, under both fixed and vanishing change sizes, the false detection disappears as $S$ increases to $60^{2}$ , which indicates the consistency of CL (without penalty) and aligns with the theoretical results in Proposition 1 and LABEL:lem_rate_m_unknown. On the other hand, note that CL does require a very large sample size to achieve such consistency, which is often not reasonable in real data and thus makes CL less practical. In contrast, thanks to the MDL penalty, CLMDL is robust to false positives and achieves superior performance over CL under constant change sizes. Compared with CL, it has slightly less power under the vanishing change sizes for $S=30^{2}$ when $\delta_{\phi}$ is extremely small (i.e. $\delta_{\phi}=S^{-0.5}$ ). However, such power loss disappears as $S$ increases to $60^{2}$ . Thus, we prefer CLMDL as the MDL penalty can guard against false positives in finite sample without significant negative impact on its detection power.

Simulation 5.

In this simulation, we examine the robustness of CLMDL against model misspecification. In particular, we consider the non-separable space-time covariance function in Cressie and Huang, (1999),

[TABLE]

where $h$ and $u$ are the space and time distance, $\nu>0$ is the smoothness parameter, $K_{\nu}$ is the modified Bessel function, $a\geq 0$ is the time scaling parameter, $b\geq 0$ is the space scaling parameter, $c>0$ is the space-time interaction parameter, and $\sigma^{2}=C(0,0|\theta)>0$ is the variance. We collect the parameter $\theta=(a,b,c,\nu,\sigma^{2})$ . Note that $C(h,u|\theta)$ in (25) generalizes various popular covariance functions. For example, if $u=0$ , $C(h,0|\theta)$ becomes the Matérn spatial covariance function. For $\nu=0.5$ , it further reduces to the exponential covariance function. Moreover, a separable space-time covariance is obtained when $c=1$ . See Cressie and Huang, (1999) for details.

Let $\theta_{1}=(1,1,3,0.2,1)^{\top}$ and $\theta_{2}=(1+\delta,1+\delta,3,0.2,1)^{\top}$ be the parameters for the segments before and after the change-point, respectively. When there is no change-point (i.e. $\delta=0$ ), the entire process is simulated from $\theta_{1}$ , otherwise there is a change-point at $\lambda_{1}^{o}=0.5$ . We set $T=100$ and vary $S=6^{2},8^{2}$ . We conduct the change-point estimation using both the separable model (22) and the true model (25). Table 5 summarizes the numerical result. It shows that CLMDL works well under model misspecification, with a low false positive rate when there is no change-point and high detection power when change-point exists. Moreover, compared to the true model, the loss in detection power or estimation accuracy due to model misspecification is small.

4.2 Application to the U.S. precipitation data

Change-point detection in the amount of precipitation has been recognized as an important problem in climate and environmental science, see Gallagher et al., (2012) for a review on some common approaches. However, existing literature, e.g. Gromenko et al., (2017), seems to mainly consider the at-most one change-point scenario and requires space-time separability of the covariance function.

We consider the data from the Global historical climatological network database (GHCN), which is a main database for global climate monitoring. In particular, some key climate variables such as the amount of precipitation are collected from stations located all over the world. The documentation and datasets are available from Menne et al., (2012) and GHCN official website333ftp://ftp.ncdc.noaa.gov/pub/data/ghcn. Similar to Gromenko et al., (2017), selected precipitation data from the Midwest region of the U.S., including Illinois, Indiana, Iowa, Kansas, Michigan, Minnesota, Missouri, Nebraska, North Dakota, Ohio, South Dakota and Wisconsin, are considered.

In the GHCN database, daily precipitation from each station is recorded. However, there are missing data in many stations. We focus on the land surface stations that provide at least 5 daily records of in each month over the entire period, and compute the monthly average precipitation data for the analysis. In summary, we have the monthly average precipitation $\{y^{*}_{t,{\textbf{s}}}\}$ in tenths of a millimeter (mm) from January 1941 to December 2010 for 76 stations (i.e. $S=76$ and $T=720$ ). We refer to Figure LABEL:US.map of the supplement for the location of the 76 stations.

To alleviate the heavy tail behavior of the data, we log-transform the monthly average precipitation record. In addition, to remove seasonality in the mean and variance of the data, we follow the stationarizing transform in Bloomfield et al., (1994) and Lund et al., (1995) and set

[TABLE]

where $\nu(t)\in\{1,\cdots,12\}$ denotes the month that time $t$ is in, and $\hat{\mu}_{\nu,{\textbf{s}}}$ and $\hat{\sigma}_{\nu,{\textbf{s}}}$ are the sample mean and standard deviation (over time) of the $\log(y^{*}_{t,{\textbf{s}}}+1)$ value for month $\nu$ at station ${\textbf{s}}.$ To conserve space, we refer to LABEL:subsec:addrealdata of the supplement for a sample plot of the data.

For the mean function $\mathbb{E}(y_{t,{\textbf{s}}})$ , a linear regression with an intercept and three station-level covariates is adopted, which includes the latitude, longitude and elevation of the corresponding station. See Erhardt et al., (2015) and references therein for similar treatment of spatial regression. For robustness, we employ the non-separable Gaussian space-time covariance function (25) of Cressie and Huang, (1999) with $\nu=0.5$ in the formulation of the composite likelihood. For change-point estimation, we employ the proposed CLMDL in the main text, which is derived under the increasing domain setting. We also implement the modified CLMDL derived in the supplement, which is tailored for the infill setting. We select all pairs of time lag within 3 months $(k=3)$ and spatial distance within 500 kilometers in the composite likelihood. For the modified CLMDL tailored for the infill setting, we further set $B_{\mathcal{N}}=4$ . The geodesic distance (Karney,, 2013) is used as the spatial distance, which is the shortest path between two points on the WGS84 ellipsoid. We note that the estimation result remains similar for different choices of time lag and spatial distance for the neighborhood.

The proposed CLMDL detects two change-points at June 1953 and March 1968. Interestingly, the same change-points are detected by the modified CLMDL tailored for the infill setting, suggesting the robustness of our finding. The first change-point is within the great drought and prolonged heatwave which had great impact on the Midwestern U.S. (Mishra and Singh,, 2010; Westcott,, 2011). The second change-point is close to the proposed change in climate (1970) in North America from Bartomeus et al., (2011). Moreover, the second change-point matches the one detected by Gromenko et al., (2017), which analyzes a similar dataset but under annual resolution and allows at most one change-point. Based on Theorem 4, the 90% CIs for the two change-points are (Nov. 1950, Sep. 1955) and (Dec. 1966, Nov. 1971).

In LABEL:subsec:addrealdata of the supplement, we report the estimated model parameters for each stationary segment and provide visualization based on the estimation result. In addition, a robustness check is conducted for the change-point estimation result, which suggests that the changes are mainly due to the mean function of the spatio-temporal precipitation data.

5 Conclusion

In this paper, by combining composite likelihood and the MDL principle, we propose CLMDL, a unified and computationally efficient method for multiple change-point estimation in a piecewise stationary spatio-temporal process. CLMDL allows for non-separable space-time covariance specification and can detect changes in both mean and covariance functions. Moreover, it works under both the increasing domain and infill asymptotics. We show that exact recovery of true change-points can be achieved in the spatio-temporal setting under mild conditions. Furthermore, the effectiveness and practicality of CLMDL are demonstrated via extensive numerical studies.

For future research, one interesting direction is to further consider the setting of change-point estimation in a locally stationary environment, where on each segment, the model parameter is allowed to vary smoothly instead of being constant, see Wu and Zhao, (2007) and Chen et al., (2022) for works in non-parametric mean change under such setting. In LABEL:subsec:localvarying of the supplement, we give a road map of how to achieve so under the CLMDL framework, where preliminary results show promise. Another interesting direction is to allow the number of change-points to diverge in the theoretical result, which is feasible under stronger assumptions on the Hessian matrix of the log-likelihood function. We give a detailed discussion in LABEL:subsec:div_cp of the supplement. Lastly, most MDL based change-point estimation procedures in the literature (including this work) use the two-part code of standard MDL, as it suffices the purpose of change-point estimation. It will be interesting to design an algorithm based on the one-part code of refined MDL (Grünwald,, 2007) and compare the two strategies in terms of theoretical, computational and numerical performance.

Bibliography71

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Altieri et al., (2015) Altieri, L., Scott, E. M., Cocchi, D., and Illian, J. B. (2015). A changepoint analysis of spatio-temporal point processes. Spatial Statistics , 14:197–207.
2Andrews, (1993) Andrews, D. W. K. (1993). Tests for parameter instability and structural change with unknown change point. Econometrica , 61(4):821–856.
3Ang and Timmermann, (2012) Ang, A. and Timmermann, A. (2012). Regime changes and financial markets. Annu. Rev. Financ. Econ. , 4(1):313–337.
4Aston and Kirch, (2012) Aston, J. A. D. and Kirch, C. (2012). Evaluating stationarity via change-point alternatives with applications to fmri data. The Annals of Applied Statistics , 6(4):1906–1948.
5Aue et al., (2009) Aue, A., Hormann, S., Horvath, L., and Reimherr, M. (2009). Break detection in the covariance structure of multivariate time series models. The Annals of Statistics , 37(6B):4046–4087.
6Aue et al., (2018) Aue, A., Rice, G., and Sönmez, O. (2018). Detecting and dating structural breaks in functional data without dimension reduction. Journal of the Royal Statistical Society - Series B , 80(3):509–529.
7Bai, (1994) Bai, J. (1994). Least squares estimation of a shift in linear processes. Journal of Time Series Analysis , 15(5):453–472.
8Bai, (2010) Bai, J. (2010). Common breaks in means and variances for panel data. Journal of Econometrics , 157(1):78–92.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

A Composite Likelihood-based Approach for Change-point Detection in Spatio-temporal Processes

Abstract

1 Introduction

2 Background

2.1 Settings and notations

2.2 Composite likelihood and pairwise likelihood

2.3 Edge effect and a remedial composite likelihood

2.4 Derivation of the criterion

3 Main Results under Increasing Domain Asymptotics

Assumption 1**.**

Assumption 2** (rrr).**

Assumption 3**.**

Assumption 4** (rrr).**

Assumption 5**.**

Assumption 6**.**

Assumption 7**.**

Remark 1** (Mixing conditions).**

3.1 Consistency of CLMDL under increasing domain asymptotics

Proposition 1**.**

Theorem 1**.**

Remark 2** (Infill asymptotics).**

Theorem 2**.**

Remark 3**.**

3.1.1 Vanishing change sizes

Assumption 8** (rrr).**

Theorem 3**.**

3.2 Asymptotic distribution under increasing domain asymptotics

Theorem 4**.**

4 Numerical Experiments

4.1 Simulation studies

Simulation 1**.**

Simulation 2**.**

Simulation 3**.**

Simulation 4**.**

Simulation 5**.**

4.2 Application to the U.S. precipitation data

5 Conclusion

Assumption 1.

Assumption 2 ( $r$ ).

Assumption 3.

Assumption 4 ( $r$ ).

Assumption 5.

Assumption 6.

Assumption 7.

Remark 1 (Mixing conditions).

Proposition 1.

Theorem 1.

Remark 2 (Infill asymptotics).

Theorem 2.

Remark 3.

Assumption 8 ( $r$ ).

Theorem 3.

Theorem 4.

Simulation 1.

Simulation 2.

Simulation 3.

Simulation 4.

Simulation 5.