Inference for Change Points in High Dimensional Data via   Self-Normalization

Runmin Wang; Changbo Zhu; Stanislav Volgushev; Xiaofeng Shao

arXiv:1905.08446·math.ST·August 10, 2021

Inference for Change Points in High Dimensional Data via Self-Normalization

Runmin Wang, Changbo Zhu, Stanislav Volgushev, Xiaofeng Shao

PDF

Open Access

TL;DR

This paper introduces a new self-normalized testing method for detecting change points in high-dimensional data, applicable to both independent and time series data, with theoretical guarantees and practical estimation procedures.

Contribution

It develops a novel self-normalized test for high-dimensional change points that requires no tuning parameters and extends to dependent data with a trimming approach.

Findings

01

The proposed tests are theoretically justified under null and alternative hypotheses.

02

Numerical simulations show the methods outperform existing approaches.

03

The approach can accurately estimate multiple change points using wild binary segmentation.

Abstract

This article considers change point testing and estimation for a sequence of high-dimensional data. In the case of testing for a mean shift for high-dimensional independent data, we propose a new test which is based on $U$ -statistic in Chen and Qin (2010) and utilizes the self-normalization principle [Shao (2010), Shao and Zhang (2010)]. Our test targets dense alternatives in the high-dimensional setting and involves no tuning parameters. To extend to change point testing for high-dimensional time series, we introduce a trimming parameter and formulate a self-normalized test statistic with trimming to accommodate the weak temporal dependence. On the theory front, we derive the limiting distributions of self-normalized test statistics under both the null and alternatives for both independent and dependent high-dimensional data. At the core of our asymptotic theory, we obtain weak…

Figures2

Click any figure to enlarge with its caption.

Tables11

Table 1. Table 1: Simulated quantiles of the limit T 𝑇 T

$γ$	80%	90%	95%	99%	99.5%
$Q_{T} (γ)$	603.72	881.78	1177.45	2026.28	2443.27

Table 2. Table 2: Simulated quantiles of the limit T ⋄ superscript 𝑇 ⋄ T^{\diamond}

$γ$	80%	90%	95%	99%	99.5%
$Q_{T^{⋄}} (γ)$	7226.18	8762.45	10410.19	14603.51	16608.86

Table 3. Table 3: Simulated 100 ( 1 − α ) % 100 percent 1 𝛼 100(1-\alpha)\% quantiles of T ( η ) 𝑇 𝜂 T(\eta)

$η$	$α = 0.2$	$α = 0.1$	$α = 0.05$	$α = 0.01$	$α = 0.005$
$0.01$	$795.017$	$1198.187$	$1639.631$	$2758.508$	$3561.943$
$0.02$	$1069.060$	$1635.213$	$2203.461$	$3788.201$	$4601.053$
$0.03$	$1465.636$	$2268.597$	$3137.729$	$5571.669$	$6599.765$
$0.04$	$2058.443$	$3142.107$	$4441.035$	$8294.564$	$10027.183$
$0.05$	$2969.278$	$4541.604$	$6396.979$	$12103.096$	$15131.248$
$0.06$	$4471.923$	$6915.262$	$9934.987$	$18447.155$	$22879.406$
$0.07$	$6640.513$	$10263.074$	$14819.331$	$26974.749$	$32808.194$
$0.08$	$11555.69$	$18099.74$	$25834.53$	$45742.71$	$55384.87$
$0.09$	$20332.71$	$32633.35$	$46290.22$	$84578.12$	$106325.07$
$0.10$	$37737.27$	$59394.68$	$84389.98$	$152412.61$	$194372.67$

Table 4. Table 4: Empirical Rejection Rates (in percentage) for One Change Point in Mean (Gaussian Error)

			ID					AR(1)
	$n$	$p$	$T_{n}$	$K S_{n, I n f}$	$K S_{n, 1}$	$K S_{n, 2}$	$E H$	$T_{n}$	$K S_{n, I n f}$	$K S_{n, 1}$	$K S_{n, 2}$	$E H$
$ℋ_{0}$	$100$	$100$	5.6	2.2	2.3	2.6	1.7	6.3	3.3	3.6	3.7	10.8
		$200$	4.9	3.4	3.3	3.3	1.4	4.7	3.1	2.9	2.9	11.7
		$500$	5.3	2.1	2.2	2.0	1.1	6.1	3.3	3.4	3.2	10.8
	$200$	$100$	5.8	4.0	4.0	4.3	1.2	5.9	4.2	4.2	4.2	9.4
		$200$	5.1	4.3	4.4	4.6	0.4	4.6	3.1	3.2	3.4	11.3
		$500$	6.0	3.7	3.6	3.6	0.8	5.8	3.7	3.9	3.8	10.8
	$500$	$100$	6.3	4.9	5.0	5.1	0.5	5.8	6.7	7.0	7.0	10.4
		$200$	6.2	5.3	5.6	5.6	0.7	5.6	5.0	4.9	5.0	9.8
		$500$	6.0	4.5	4.3	4.3	0.4	6.2	4.7	4.5	4.5	9.2
$ℋ_{1, 1}$	$100$	$100$	34.5	30.0	30.0	30.5	11.4	27.0	27.2	27.8	28.5	31.1
		$200$	51.9	49.8	49.5	49.4	24.4	37.4	37.4	38.1	38.0	44.5
		$500$	82.5	85.5	85.6	84.7	64.5	64.8	65.7	66.3	65.3	72.1
	$200$	$100$	77.5	81.5	82.0	82.1	42.6	61.4	62.1	62.1	62.1	59.0
		$200$	94.7	96.3	96.3	96.5	81.2	79.3	83.1	83.8	83.6	79.6
		$500$	100.0	100.0	100.0	100.0	99.9	98.5	99.3	99.3	99.3	99.0
	$500$	$100$	99.8	100.0	100.0	100.0	99.5	97.9	98.3	98.3	98.3	98.2
		$200$	100.0	100.0	100.0	100.0	100.0	99.9	99.9	99.9	99.9	99.9
		$500$	100.0	100.0	100.0	100.0	100.0	100.0	100.0	100.0	100.0	100.0
$ℋ_{1, 2}$	$100$	$100$	99.3	100.0	100.0	100.0	97.8	100.0	100.0	100.0	100.0	93.9
		$200$	100.0	100.0	100.0	100.0	100.0	100.0	100.0	100.0	100.0	99.9
		$500$	100.0	100.0	100.0	100.0	100.0	100.0	100.0	100.0	100.0	100.0
	$200$	$100$	100.0	100.0	100.0	100.0	100.0	100.0	100.0	100.0	100.0	100.0
		$200$	100.0	100.0	100.0	100.0	100.0	100.0	100.0	100.0	100.0	100.0
		$500$	100.0	100.0	100.0	100.0	100.0	100.0	100.0	100.0	100.0	100.0
	$500$	$100$	100.0	100.0	100.0	100.0	100.0	100.0	100.0	100.0	100.0	100.0
		$200$	100.0	100.0	100.0	100.0	100.0	100.0	100.0	100.0	100.0	100.0
		$500$	100.0	100.0	100.0	100.0	100.0	100.0	100.0	100.0	100.0	100.0
			BD					CS
$ℋ_{0}$	$100$	$100$	5.8	3.5	3.5	3.5	10.2	11.4	11.4	12.6	12.4	89.2
		$200$	4.6	3.1	3.2	3.2	10.8	9.0	10.2	12.4	12.4	94.8
		$500$	5.5	3.4	3.4	3.3	10.3	9.5	11.7	14.1	14.0	97.9
	$200$	$100$	6.1	3.7	3.7	3.8	7.9	10.3	13.6	15.9	15.7	92.6
		$200$	4.3	3.3	3.4	3.2	9.3	11.9	13.9	15.4	14.9	96.3
		$500$	5.5	4.1	4.2	4.2	9.6	10.3	13.1	14.9	14.2	98.5
	$500$	$100$	6.3	6.7	6.9	7.0	9.5	12.3	16.5	16.8	16.8	97.2
		$200$	6.3	5.5	5.6	5.7	7.9	10.4	14.5	15.1	14.6	98.5
		$500$	5.7	4.7	4.7	4.7	7.2	14.1	18.1	18.3	18.2	99.4
$ℋ_{1, 1}$	$100$	$100$	26.5	24.8	25.2	25.2	29.4	18.0	19.3	20.7	20.7	91.1
		$200$	35.3	36.6	36.7	36.7	42.3	18.1	18.0	20.2	20.2	95.7
		$500$	67.1	66.4	66.3	65.8	70.6	18.5	17.5	19.5	20.5	98.0
	$200$	$100$	60.7	63.2	62.9	63.9	56.1	26.9	28.4	30.5	30.5	94.6
		$200$	82.9	87.1	87.2	87.2	84.5	27.3	29.3	30.9	30.6	97.6
		$500$	99.1	99.6	99.6	99.6	99.5	26.4	27.8	28.6	29.0	99.1
	$500$	$100$	98.7	99.4	99.4	99.4	98.0	48.1	51.6	51.0	52.0	98.3
		$200$	100.0	100.0	100.0	100.0	100.0	47.0	49.7	50.2	50.2	99.4
		$500$	100.0	100.0	100.0	100.0	100.0	46.1	48.4	49.2	49.9	99.7
$ℋ_{1, 2}$	$100$	$100$	95.1	95.8	95.7	95.8	95.3	38.6	38.3	39.6	40.1	94.6
		$200$	99.7	99.9	99.9	99.9	99.9	41.3	38.4	40.2	40.2	98.1
		$500$	100.0	100.0	100.0	100.0	100.0	38.4	38.4	40.4	40.8	99.3
	$200$	$100$	100.0	100.0	100.0	100.0	100.0	61.8	62.6	63.2	63.3	98.2
		$200$	100.0	100.0	100.0	100.0	100.0	60.0	62.9	63.7	63.5	99.5
		$500$	100.0	100.0	100.0	100.0	100.0	62.5	63.9	63.8	63.9	99.9
	$500$	$100$	100.0	100.0	100.0	100.0	100.0	90.2	92.1	91.8	91.8	100.0
		$200$	100.0	100.0	100.0	100.0	100.0	91.4	93.0	93.7	93.8	100.0
		$500$	100.0	100.0	100.0	100.0	100.0	89.9	91.2	91.4	91.5	99.9

Table 5. Table 5: Empirical Rejection Rates (in percentage) for One Change Point in Mean (Non-Gaussian Error)

			ID					AR(1)
	$n$	$p$	$T_{n}$	$K S_{n, I n f}$	$K S_{n, 1}$	$K S_{n, 2}$	$E H$	$T_{n}$	$K S_{n, I n f}$	$K S_{n, 1}$	$K S_{n, 2}$	$E H$
$ℋ_{0}$	$100$	$100$	5.0	3.7	3.3	3.1	84.3	7.0	3.9	3.2	3.0	76.9
		$200$	5.4	3.6	2.3	2.9	97.1	5.7	3.4	3.3	2.9	92.9
		$500$	5.2	2.7	2.4	1.7	100.0	5.2	2.2	2.3	1.9	100.0
	$200$	$100$	5.5	4.8	4.8	4.3	84.6	5.4	4.5	4.5	4.6	78.1
		$200$	5.1	4.0	4.3	4.2	97.3	6.1	4.6	3.9	4.3	93.5
		$500$	6.2	4.2	3.9	3.7	100.0	6.4	4.3	3.8	3.8	99.7
	$500$	$100$	4.1	4.8	4.7	5.0	88.1	4.9	5.5	5.4	5.9	81.9
		$200$	5.2	3.6	3.4	3.4	97.9	6.4	5.4	5.2	5.4	95.6
		$500$	5.4	3.4	3.6	3.3	100.0	5.6	3.7	4.1	4.2	99.9
$ℋ_{1, 1}$	$100$	$100$	35.0	29.8	33.2	32.8	85.6	28.8	25.6	26.9	27.5	81.1
		$200$	54.2	52.1	52.5	51.6	96.9	40.1	36.5	38.2	38.1	94.5
		$500$	87.4	87.1	87.2	84.9	100.0	66.4	67.9	67.5	65.9	100.0
	$200$	$100$	75.9	79.2	80.4	80.0	100.0	58.8	60.2	61.4	61.9	88.5
		$200$	94.2	97.4	97.3	97.0	99.5	80.2	83.9	83.7	83.8	98.5
		$500$	100.0	100.0	100.0	99.9	100.0	98.3	99.1	98.8	98.6	100.0
	$500$	$100$	99.9	100.0	100.0	99.9	100.0	97.4	98.9	98.7	98.7	99.5
		$200$	100.0	100.0	100.0	100.0	100.0	100.0	100.0	100.0	100.0	100.0
		$500$	100.0	100.0	100.0	99.9	100.0	100.0	100.0	100.0	99.9	100.0
$ℋ_{1, 2}$	$100$	$100$	99.2	99.6	99.4	99.2	99.3	93.4	93.9	93.9	93.9	97.5
		$200$	99.8	100.0	99.9	99.6	100.0	99.5	99.8	99.5	99.2	99.9
		$500$	100.0	100.0	100.0	100.0	100.0	100.0	100.0	100.0	100.0	100.0
	$200$	$100$	100.0	100.0	100.0	99.9	100.0	99.9	100.0	99.9	99.9	100.0
		$200$	100.0	100.0	100.0	100.0	100.0	100.0	100.0	100.0	100.0	100.0
		$500$	100.0	100.0	100.0	100.0	100.0	100.0	100.0	100.0	100.0	100.0
	$500$	$100$	100.0	100.0	100.0	100.0	100.0	100.0	100.0	100.0	100.0	100.0
		$200$	100.0	100.0	100.0	100.0	100.0	100.0	100.0	100.0	100.0	100.0
		$500$	100.0	100.0	100.0	100.0	100.0	100.0	100.0	100.0	100.0	100.0
			BD					CS
$ℋ_{0}$	$100$	$100$	6.7	3.7	2.9	3.0	76.4	11.8	10.6	14.4	14.1	90.6
		$200$	5.2	3.9	3.4	3.0	93.0	11.2	11.5	14.0	13.6	97.2
		$500$	4.7	2.4	2.3	1.8	100.0	11.9	12.6	15.3	15.3	99.5
	$200$	$100$	5.9	4.4	4.2	4.7	77.0	12.3	15.1	16.1	16.1	94.4
		$200$	6.0	4.1	4.1	4.2	93.4	12.3	15.5	16.2	16.6	98.2
		$500$	5.5	4.3	4.0	3.9	99.9	12.2	13.3	15.3	15.1	100.0
	$500$	$100$	4.6	5.4	5.1	5.5	81.7	11.5	15.9	16.1	15.7	96.5
		$200$	5.9	5.3	5.0	5.1	95.5	12.0	15.1	15.9	16.0	99.2
		$500$	6.0	3.4	3.7	3.8	99.9	12.9	16.5	16.8	17.1	100.0
$ℋ_{1, 1}$	$100$	$100$	29.3	24.9	26.3	26.3	80.5	19.6	19.9	21.4	22.1	92.1
		$200$	40.6	35.9	38.5	38.1	94.6	17.9	17.5	18.4	18.8	97.5
		$500$	66.9	67.8	67.4	66.7	100.0	20.1	20.1	22.2	22.4	99.8
	$200$	$100$	60.5	61.3	61.8	62.1	87.4	25.0	27.1	27.3	27.3	96.3
		$200$	81.3	84.9	84.5	84.2	98.7	28.8	30.4	31.4	31.4	98.8
		$500$	98.4	99.2	99.3	98.9	100.0	26.8	28.0	29.7	29.1	99.9
	$500$	$100$	97.8	99.4	99.1	99.1	99.7	46.9	49.5	49.7	50.0	98.6
		$200$	99.9	100.0	100.0	100.0	100.0	44.9	48.6	49.1	49.3	99.4
		$500$	100.0	100.0	100.0	99.9	100.0	44.4	47.4	48.6	48.9	99.9
$ℋ_{1, 2}$	$100$	$100$	94.3	95.2	94.8	94.7	97.5	39.0	37.5	40.0	40.5	95.9
		$200$	99.6	99.9	99.6	99.4	100.0	36.7	36.1	37.4	37.2	98.5
		$500$	100.0	100.0	100.0	100.0	100.0	41.1	38.5	39.1	38.9	99.9
	$200$	$100$	99.9	100.0	99.9	99.9	100.0	58.3	60.2	61.8	61.5	98.5
		$200$	100.0	100.0	100.0	100.0	100.0	62.4	63.4	64.4	63.7	99.7
		$500$	100.0	100.0	100.0	100.0	100.0	60.5	61.4	62.3	62.1	100.0
	$500$	$100$	100.0	100.0	100.0	100.0	100.0	91.2	92.3	92.2	92.4	100.0
		$200$	100.0	100.0	100.0	100.0	100.0	90.5	92.3	92.8	92.9	99.9
		$500$	100.0	100.0	100.0	100.0	100.0	88.8	91.4	91.5	91.6	100.0

Table 6. Table 6: Empirical Rejection Rates (in percentage) for Multiple Change Points (Gaussian Error)

			$n = 100$			$n = 200$			$n = 500$
			$p = 100$	$p = 200$	$p = 500$	$p = 100$	$p = 200$	$p = 500$	$p = 100$	$p = 200$	$p = 500$
ID	$ℋ_{0}$	$T_{n}$	4.1	4.1	4.3	6.1	5.2	4.7	5.9	5.7	6.3
	$ℋ_{0}$	$T_{n}^{⋄}$	14.1	12.5	13.6	7.6	7.6	5.9	6.4	7.0	4.8
	$ℋ_{1, 1}$	$T_{n}$	99.6	100.0	100.0	100.0	100.0	100.0	100.0	100.0	100.0
	$ℋ_{1, 1}$	$T_{n}^{⋄}$	51.6	82.4	99.6	97.2	99.9	100.0	100.0	100.0	100.0
	$ℋ_{1, 2}$	$T_{n}$	0.3	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0
	$ℋ_{1, 2}$	$T_{n}^{⋄}$	83.0	97.0	100.0	100.0	100.0	100.0	100.0	100.0	100.0
	$ℋ_{1, 3}$	$T_{n}$	0.3	0.3	0.1	0.2	0.1	0.0	0.0	0.0	0.0
	$ℋ_{1, 3}$	$T_{n}^{⋄}$	72.3	94.6	100.0	99.8	100.0	100.0	100.0	100.0	100.0
AR	$ℋ_{0}$	$T_{n}$	4.7	4.7	4.5	5.7	5.6	5.0	6.0	5.6	5.9
	$ℋ_{0}$	$T_{n}^{⋄}$	17.3	15.6	15.0	7.9	7.9	6.8	7.5	7.9	6.3
	$ℋ_{1, 1}$	$T_{n}$	92.8	99.4	100.0	99.9	100.0	100.0	100.0	100.0	100.0
	$ℋ_{1, 1}$	$T_{n}^{⋄}$	38.7	62.5	94.2	84.0	98.3	100.0	100.0	100.0	100.0
	$ℋ_{1, 2}$	$T_{n}$	1.5	0.5	0.0	0.2	0.0	0.0	0.0	0.0	0.0
	$ℋ_{1, 2}$	$T_{n}^{⋄}$	65.6	86.2	99.8	97.1	100.0	100.0	100.0	100.0	100.0
	$ℋ_{1, 3}$	$T_{n}$	2.3	0.6	0.6	0.8	0.5	0.0	0.1	0.0	0.0
	$ℋ_{1, 3}$	$T_{n}^{⋄}$	58.4	81.1	99.5	96.1	99.8	100.0	100.0	100.0	100.0
BD	$ℋ_{0}$	$T_{n}$	4.8	4.9	4.4	5.8	5.6	5.5	6.1	6.0	5.4
	$ℋ_{0}$	$T_{n}^{⋄}$	16.2	14.3	13.0	8.3	7.6	7.2	7.1	6.5	6.3
	$ℋ_{1, 1}$	$T_{n}$	93.7	99.7	100.0	100.0	100.0	100.0	100.0	100.0	100.0
	$ℋ_{1, 1}$	$T_{n}^{⋄}$	39.4	62.6	94.9	86.7	99.0	100.0	100.0	100.0	100.0
	$ℋ_{1, 2}$	$T_{n}$	0.9	0.1	0.0	0.0	0.0	0.0	0.0	0.0	0.0
	$ℋ_{1, 2}$	$T_{n}^{⋄}$	65.2	84.0	99.7	97.1	99.9	100.0	100.0	100.0	100.0
	$ℋ_{1, 3}$	$T_{n}$	1.1	2.0	0.4	0.8	0.2	0.0	0.0	0.0	0.0
	$ℋ_{1, 3}$	$T_{n}^{⋄}$	60.3	81.3	98.7	96.4	99.9	100.0	100.0	100.0	100.0
CS	$ℋ_{0}$	$T_{n}$	11.6	11.2	11.4	10.6	10.8	11.0	11.2	11.4	11.3
	$ℋ_{0}$	$T_{n}^{⋄}$	44.7	44.5	44.7	39.3	38.7	39.4	32.6	33.5	33.6
	$ℋ_{1, 1}$	$T_{n}$	41.0	39.4	37.3	57.8	60.2	60.4	89.2	90.0	91.3
	$ℋ_{1, 1}$	$T_{n}^{⋄}$	52.5	51.4	54.4	59.3	57.2	57.5	79.9	81.9	81.9
	$ℋ_{1, 2}$	$T_{n}$	11.6	10.1	9.5	9.0	11.2	9.3	4.8	7.6	7.3
	$ℋ_{1, 2}$	$T_{n}^{⋄}$	57.9	58.5	58.9	64.6	66.6	65.6	88.2	88.3	88.2
	$ℋ_{1, 3}$	$T_{n}$	10.9	11.6	12.4	13.1	12.1	15.0	11.3	14.5	12.8
	$ℋ_{1, 3}$	$T_{n}^{⋄}$	53.9	58.3	55.9	65.1	63.9	64.2	87.7	87.7	88.5

Table 7. Table 7: Empirical Rejection Rates (in percentage) for Single Change Point Testing: Sizes for Example 6.1 with μ = 0 𝜇 0 \mu=0 and p = 2 n 𝑝 2 𝑛 p=2n .

	$ρ$	$n$	$T_{n}$		DCBS	HH
	$ρ$	$n$	$η = 0.02$	$η = 0.05$	DCBS	$h = 3$	$h = 6$
(a)	$0.2$	$200$	$0.042$	$0.051$	$0$	$0.326$	$0.264$
	$0.2$	$400$	$0.048$	$0.047$	$0$	$0.377$	$0.219$
	$0.2$	$800$	$0.062$	$0.062$	$0$	$0.488$	$0.236$
	$0.5$	$200$	$0.067$	$0.063$	$0$	$0.994$	$0.363$
	$0.5$	$400$	$0.051$	$0.052$	$0$	$1$	$0.482$
	$0.5$	$800$	$0.062$	$0.066$	$0$	$1$	$0.643$
	$0.7$	$200$	$0.011$	$0.094$	$0$	$1$	$0.980$
	$0.7$	$400$	$0.066$	$0.058$	$0.0005$	$1$	$1.000$
	$0.7$	$800$	$0.061$	$0.069$	$0$	$1$	$1$
	- $0.5$	$200$	$0.020$	$0.018$	$0$	$0.562$	$0.964$
	- $0.5$	$400$	$0.034$	$0.030$	$0$	$0.382$	$0.906$
	- $0.5$	$800$	$0.057$	$0.053$	$0$	$0.336$	$0.947$
(b)	$0.2$	$200$	$0.052$	$0.058$	$0$	$0.204$	$0.084$
	$0.2$	$400$	$0.053$	$0.049$	$0$	$0.204$	$0.084$
	$0.2$	$800$	$0.044$	$0.050$	$0$	$0.312$	$0.083$
	$0.5$	$200$	$0.076$	$0.074$	$0$	$0.998$	$0.210$
	$0.5$	$400$	$0.052$	$0.057$	$0$	$1$	$0.311$
	$0.5$	$800$	$0.044$	$0.051$	$0$	$1$	$0.538$
	$0.7$	$200$	$0.002$	$0.103$	$0$	$1$	$0.978$
	$0.7$	$400$	$0.060$	$0.064$	$0.0005$	$1$	$1.000$
	$0.7$	$800$	$0.046$	$0.054$	$0$	$1$	$1$
	- $0.5$	$200$	$0.012$	$0.022$	$0$	$0.370$	$0.944$
	- $0.5$	$400$	$0.039$	$0.032$	$0$	$0.210$	$0.878$
	- $0.5$	$800$	$0.040$	$0.040$	$0$	$0.150$	$0.932$

Table 8. Table 8: Empirical Rejection Rates (in percentage) for Single Change Point Testing: Powers for Example 6.1 (i) with p = 2 n 𝑝 2 𝑛 p=2n .

			Case (i)					Case (ii)
	$ρ$	$n$	$T_{n}$		DCBS	HH		$T_{n}$		DCBS	HH
	$ρ$	$n$	$η = 0.02$	$η = 0.05$	DCBS	$h = 3$	$h = 6$	$η = 0.02$	$η = 0.05$	DCBS	$h = 3$	$h = 6$
(a)	$0.2$	$200$	$0.728$	$0.700$	$0.058$	$0.972$	$0.956$	$0.908$	$0.878$	$0.102$	$0.998$	$0.996$
	$0.2$	$400$	$1$	$1$	$0.906$	$1$	$1$	$1$	$1$	$0.999$	$1$	$1$
	$0.2$	$800$	$1$	$1$	$1$	$1$	$1$	$1$	$1$	$1$	$1$	$1$
	$0.5$	$200$	$0.317$	$0.252$	$0$	$1$	$0.742$	$0.392$	$0.316$	$0$	$1$	$0.835$
	$0.5$	$400$	$0.799$	$0.774$	$0$	$1$	$0.996$	$0.967$	$0.948$	$0$	$1$	$1$
	$0.5$	$800$	$1$	$1$	$0.046$	$1$	$1$	$1$	$1$	$0.337$	$1$	$1$
	$0.7$	$200$	$0.020$	$0.160$	$0.0005$	$1$	$0.989$	$0.023$	$0.176$	$0$	$1$	$0.994$
	$0.7$	$400$	$0.302$	$0.237$	$0.004$	$1$	$1$	$0.390$	$0.337$	$0.005$	$1$	$1$
	$0.7$	$800$	$0.840$	$0.813$	$0$	$1$	$1$	$0.970$	$0.958$	$0$	$1$	$1$
	- $0.5$	$200$	$0.990$	$1.000$	$0$	$1$	$1$	$1$	$1$	$0$	$1$	$1$
	- $0.5$	$400$	$1$	$1$	$0.188$	$1$	$1$	$1$	$1$	$0.9995$	$1$	$1$
	- $0.5$	$800$	$1$	$1$	$1$	$1$	$1$	$1$	$1$	$1$	$1$	$1$
(b)	$0.2$	$200$	$0.906$	$0.888$	$0.010$	$0.991$	$0.983$	$0.983$	$0.976$	$0.064$	$1$	$0.999$
	$0.2$	$400$	$1$	$1$	$0.964$	$1$	$1$	$1$	$1$	$1$	$1$	$1$
	$0.2$	$800$	$1$	$1$	$1$	$1$	$1$	$1$	$1$	$1$	$1$	$1$
	$0.5$	$200$	$0.409$	$0.314$	$0$	$1$	$0.687$	$0.532$	$0.456$	$0$	$1$	$0.808$
	$0.5$	$400$	$0.954$	$0.927$	$0$	$1$	$1$	$0.996$	$0.994$	$0.0065$	$1$	$1$
	$0.5$	$800$	$1$	$1$	$0.041$	$1$	$1$	$1$	$1$	$0.5675$	$1$	$1$
	$0.7$	$200$	$0.006$	$0.178$	$0$	$1$	$0.992$	$0.006$	$0.220$	$0$	$1$	$0.996$
	$0.7$	$400$	$0.349$	$0.328$	$0.002$	$1$	$1$	$0.489$	$0.484$	$0.005$	$1$	$1$
	$0.7$	$800$	$0.960$	$0.946$	$0$	$1$	$1$	$0.995$	$0.994$	$0$	$1$	$1$
	- $0.5$	$200$	$0.999$	$1$	$0$	$1$	$1.000$	$1$	$1$	$0$	$1$	$1$
	- $0.5$	$400$	$1$	$1$	$0.524$	$1$	$1$	$1$	$1$	$0.9955$	$1$	$1$
	- $0.5$	$800$	$1$	$1$	$1$	$1$	$1$	$1$	$1$	$1$	$1$	$1$

Table 9. Table 9: Empirical Rejection Rates for Tests of Covariance Matrix Change

		Diagonal		AR
		$ℋ_{0}$	$ℋ_{1}$	$ℋ_{0}$	$ℋ_{1}$
$N = 100, p = 10$	$S N$	5.0	91.4	6.5	90.0
$N = 100, p = 10$	$A B$	66.6	100	73.8	100
$N = 200, p = 10$	$S N$	4.3	100	4	100
$N = 200, p = 10$	$A B$	46	100	83	100

Table 10. Table 10: Estimation Result for Multiple Change Points in Mean of High-dimensional Independent Data

		$\hat{N} - N$							MSE	ARI
		-3	-2	-1	0	1	2	3
Sparse( $\sqrt{2.5 / 5}$ )	WBS-SN	2	12	38	48	0	0	0	1.04	0.75
	BS-SN	100	0	0	0	0	0	0	9	0
	INSPECT	0	16	1	76	7	0	0	0.72	0.85
Sparse ( $\sqrt{4 / 5}$ )	WBS-SN	0	0	1	96	3	0	0	0.04	0.95
	BS-SN	100	0	0	0	0	0	0	9	0
	INSPECT	0	0	0	83	17	0	0	0.17	0.96
Dense( $\sqrt{2.5 / p}$ )	WBS-SN	2	13	36	49	0	0	0	1.06	0.70
	BS-SN	100	0	0	0	0	0	0	9	0
	INSPECT	0	30	2	45	19	4	0	1.57	0.69
Dense( $\sqrt{4 / p}$ )	WBS-SN	0	0	1	92	7	0	0	0.08	0.95
	BS-SN	100	0	0	0	0	0	0	9	0
	INSPECT	0	6	0	72	17	5	0	0.61	0.92

Table 11. Table 11: Estimation Result for Multiple Change Points in Mean of High-dimensional Time Series: Example 6.2 with n = 500 𝑛 500 n=500 and p = 250 , 500 𝑝 250 500 p=250,500 based on 200 simulations. WBS-SN 1 superscript WBS-SN 1 \text{WBS-SN}^{1} and WBS-SN 2 superscript WBS-SN 2 \text{WBS-SN}^{2} correspond to trimming η = 0.01 𝜂 0.01 \eta=0.01 and η = 0.02 𝜂 0.02 \eta=0.02 respectively. The minimal length L 0 = 6 τ + 7 + ⌊ θ n ⌋ subscript 𝐿 0 6 𝜏 7 𝜃 𝑛 L_{0}=6\tau+7+\lfloor\theta n\rfloor , where τ = ⌊ n η ⌋ 𝜏 𝑛 𝜂 \tau=\lfloor n\eta\rfloor and θ = 0.1 , 0.15 , 0.2 𝜃 0.1 0.15 0.2 \theta=0.1,0.15,0.2 . The rows labeled with DCBS-Li correspond to methods of double CUSUM binary segmentation algorithm (left) in Cho (2016) and segmentation algorithm based on a bias-corrected statistic (right) in Li et al. (2019).

				${WBS-SN}^{1}$					${WBS-SN}^{2}$
Case	$p$	$ρ$	$θ$	$#$ of change points ( $%$ )				ARI	$#$ of change points ( $%$ )				ARI
Case	$p$	$ρ$	$θ$	$\leq 2$	3	4	$\geq 5$	ARI	$\leq 2$	3	4	$\geq 5$	ARI
(i)	$250$	$0.3$	$0.1$	$0$	$193$	$6$	$1$	$0.932$	$0$	$200$	$0$	$0$	$0.919$
			$0.15$	$0$	$200$	$0$	$0$	$0.933$	$0$	$199$	$1$	$0$	$0.914$
			$0.2$	$0$	$199$	$1$	$0$	$0.934$	$0$	$199$	$1$	$0$	$0.916$
			DCBS-Li	$0$	$200$	$0$	$0$	$0.999$	$0$	$168$	$9$	$23$	$0.923$
		$0.6$	$0.1$	$14$	$164$	$21$	$1$	$0.876$	$14$	$178$	$8$	$0$	$0.865$
			$0.15$	$12$	$175$	$13$	$0$	$0.887$	$4$	$195$	$1$	$0$	$0.878$
			$0.2$	$13$	$176$	$10$	$1$	$0.882$	$5$	$194$	$1$	$0$	$0.887$
			DCBS-Li	$182$	$18$	$0$	$0$	$0.633$	$0$	$3$	$0$	$197$	$0.477$
	$500$	$0.3$	$0.1$	$0$	$194$	$6$	$0$	$0.937$	$0$	$198$	$2$	$0$	$0.917$
			$0.15$	$0$	$198$	$2$	$0$	$0.937$	$0$	$200$	$0$	$0$	$0.913$
			$0.2$	$0$	$198$	$2$	$0$	$0.936$	$0$	$200$	$0$	$0$	$0.913$
			DCBS-Li	$0$	$200$	$0$	$0$	$1.000$	$0$	$176$	$2$	$22$	$0.927$
		$0.6$	$0.1$	$3$	$170$	$25$	$2$	$0.907$	$1$	$192$	$7$	$0$	$0.894$
			$0.15$	$1$	$186$	$13$	$0$	$0.915$	$0$	$197$	$3$	$0$	$0.891$
			$0.2$	$3$	$187$	$10$	$0$	$0.914$	$0$	$199$	$1$	$0$	$0.893$
			DCBS-Li	$178$	$22$	$0$	$0$	$0.655$	$0$	$5$	$2$	$193$	$0.438$
(ii)	$250$	$0.3$	$0.1$	$0$	$194$	$6$	$0$	$0.936$	$0$	$197$	$3$	$0$	$0.915$
			$0.15$	$0$	$196$	$4$	$0$	$0.934$	$0$	$198$	$2$	$0$	$0.917$
			$0.2$	$0$	$198$	$2$	$0$	$0.934$	$0$	$199$	$1$	$0$	$0.915$
			DCBS-Li	$0$	$200$	$0$	$0$	$0.998$	$0$	$172$	$10$	$18$	$0.938$
		$0.6$	$0.1$	$27$	$154$	$18$	$1$	$0.869$	$23$	$168$	$8$	$1$	$0.854$
			$0.15$	$26$	$165$	$9$	$0$	$0.869$	$10$	$188$	$2$	$0$	$0.879$
			$0.2$	$26$	$172$	$2$	$0$	$0.874$	$14$	$185$	$1$	$0$	$0.880$
			DCBS-Li	$147$	$53$	$0$	$0$	$0.742$	$0$	$3$	$1$	$196$	$0.435$
	$500$	$0.3$	$0.1$	$0$	$195$	$5$	$0$	$0.934$	$0$	$198$	$2$	$0$	$0.921$
			$0.15$	$0$	$197$	$3$	$0$	$0.934$	$0$	$199$	$1$	$0$	$0.917$
			$0.2$	$0$	$197$	$3$	$0$	$0.933$	$0$	$200$	$0$	$0$	$0.917$
			DCBS-Li	$0$	$200$	$0$	$0$	$1.000$	$0$	$175$	$0$	$25$	$0.917$
		$0.6$	$0.1$	$7$	$171$	$21$	$1$	$0.907$	$0$	$190$	$10$	$0$	$0.891$
			$0.15$	$8$	$182$	$10$	$0$	$0.908$	$0$	$199$	$1$	$0$	$0.893$
			$0.2$	$7$	$188$	$5$	$0$	$0.913$	$0$	$198$	$2$	$0$	$0.895$
			DCBS-Li	$187$	$13$	$0$	$0$	$0.554$	$0$	$1$	$3$	$196$	$0.414$
(iii)	$250$	$0.3$	$0.1$	$0$	$183$	$16$	$1$	$0.925$	$0$	$196$	$4$	$0$	$0.908$
			$0.15$	$0$	$193$	$7$	$0$	$0.928$	$0$	$200$	$0$	$0$	$0.909$
			$0.2$	$0$	$199$	$1$	$0$	$0.930$	$0$	$199$	$1$	$0$	$0.905$
			DCBS-Li	$0$	$198$	$2$	$0$	$0.998$	$0$	$162$	$21$	$17$	$0.983$
		$0.6$	$0.1$	$29$	$140$	$28$	$3$	$0.848$	$41$	$155$	$4$	$0$	$0.812$
			$0.15$	$26$	$158$	$15$	$1$	$0.864$	$24$	$172$	$4$	$0$	$0.846$
			$0.2$	$26$	$161$	$12$	$1$	$0.868$	$22$	$175$	$3$	$0$	$0.848$
			DCBS-Li	$171$	$28$	$0$	$1$	$0.656$	$0$	$14$	$9$	$177$	$0.675$
	$500$	$0.3$	$0.1$	$0$	$195$	$5$	$0$	$0.937$	$0$	$196$	$4$	$0$	$0.913$
			$0.15$	$0$	$195$	$5$	$0$	$0.935$	$0$	$199$	$1$	$0$	$0.914$
			$0.2$	$0$	$198$	$2$	$0$	$0.937$	$0$	$200$	$0$	$0$	$0.913$
			DCBS-Li	$0$	$200$	$0$	$0$	$1.000$	$0$	$175$	$19$	$6$	$0.977$
		$0.6$	$0.1$	$9$	$153$	$35$	$3$	$0.895$	$3$	$192$	$5$	$0$	$0.883$
			$0.15$	$4$	$174$	$22$	$0$	$0.903$	$0$	$199$	$1$	$0$	$0.890$
			$0.2$	$4$	$184$	$12$	$0$	$0.909$	$0$	$200$	$0$	$0$	$0.890$
			DCBS-Li	$190$	$10$	$0$	$0$	$0.632$	$1$	$9$	$0$	$190$	$0.515$

Equations799

H_{0} : μ_{1} = μ_{2} = \dots = μ_{n} v . s H_{1} : μ_{1} = \dots = μ_{k_{1}} \neq = μ_{k_{1} + 1} = \dots = μ_{k_{s}} \neq = μ_{k_{s} + 1} \dots = μ_{n},

H_{0} : μ_{1} = μ_{2} = \dots = μ_{n} v . s H_{1} : μ_{1} = \dots = μ_{k_{1}} \neq = μ_{k_{1} + 1} = \dots = μ_{k_{s}} \neq = μ_{k_{s} + 1} \dots = μ_{n},

H_{1}^{'} : μ_{1} = μ_{2} = \dots = μ_{k} \neq = μ_{k + 1} = \dots = μ_{n}, \mbox f or so m e 1 \leq k \leq n - 1.

H_{1}^{'} : μ_{1} = μ_{2} = \dots = μ_{k} \neq = μ_{k + 1} = \dots = μ_{n}, \mbox f or so m e 1 \leq k \leq n - 1.

E [h ((X, X^{'}), (Y, Y^{'}))] = ∥ E (X) - E (Y) ∥^{2},

E [h ((X, X^{'}), (Y, Y^{'}))] = ∥ E (X) - E (Y) ∥^{2},

G_{n} (k) = \frac{1}{k ( k - 1 )} \frac{1}{( n - k ) ( n - k - 1 )} j_{3} \neq = j_{1} 1 \leq j_{1}, j_{3} \leq k \sum j_{2} \neq = j_{4} k + 1 \leq j_{2}, j_{4} \leq n \sum (Y_{j_{1}} - Y_{j_{2}})^{T} (Y_{j_{3}} - Y_{j_{4}}) .

G_{n} (k) = \frac{1}{k ( k - 1 )} \frac{1}{( n - k ) ( n - k - 1 )} j_{3} \neq = j_{1} 1 \leq j_{1}, j_{3} \leq k \sum j_{2} \neq = j_{4} k + 1 \leq j_{2}, j_{4} \leq n \sum (Y_{j_{1}} - Y_{j_{2}})^{T} (Y_{j_{3}} - Y_{j_{4}}) .

1 \leq k \leq n sup w_{n} (k) ∣ G_{n} (k) ∣

1 \leq k \leq n sup w_{n} (k) ∣ G_{n} (k) ∣

1 \leq k \leq n sup ∥Σ ∥_{F}^{- 1} (\frac{2}{k ( k - 1 )} + \frac{2}{( n - k ) ( n - k - 1 )} + \frac{4}{k ( n - k )})^{- 1/2} ∣ G_{n} (k) ∣ ⟶ D W,

1 \leq k \leq n sup ∥Σ ∥_{F}^{- 1} (\frac{2}{k ( k - 1 )} + \frac{2}{( n - k ) ( n - k - 1 )} + \frac{4}{k ( n - k )})^{- 1/2} ∣ G_{n} (k) ∣ ⟶ D W,

D (k; ℓ, m) := j_{3} \neq = j_{1} ℓ \leq j_{1}, j_{3} \leq k \sum j_{2} \neq = j_{4} k + 1 \leq j_{2}, j_{4} \leq m \sum (Y_{j_{1}} - Y_{j_{2}})^{T} (Y_{j_{3}} - Y_{j_{4}})

D (k; ℓ, m) := j_{3} \neq = j_{1} ℓ \leq j_{1}, j_{3} \leq k \sum j_{2} \neq = j_{4} k + 1 \leq j_{2}, j_{4} \leq m \sum (Y_{j_{1}} - Y_{j_{2}})^{T} (Y_{j_{3}} - Y_{j_{4}})

W_{n} (k; ℓ, m) := \frac{1}{n} t = ℓ + 1 \sum k - 2 D (t; ℓ, k)^{2} + \frac{1}{n} t = k + 2 \sum m - 2 D (t; k + 1, m)^{2},

W_{n} (k; ℓ, m) := \frac{1}{n} t = ℓ + 1 \sum k - 2 D (t; ℓ, k)^{2} + \frac{1}{n} t = k + 2 \sum m - 2 D (t; k + 1, m)^{2},

T_{n} := k = 4, ..., n - 4 sup \frac{{ D ( k ; 1 , n ) } ^{2}}{W _{n} ( k ; 1 , n )} .

T_{n} := k = 4, ..., n - 4 sup \frac{{ D ( k ; 1 , n ) } ^{2}}{W _{n} ( k ; 1 , n )} .

D (k; ℓ, m) =

D (k; ℓ, m) =

- 2 (k - ℓ + 1) (m - k) (S_{n} (ℓ, m) - S_{n} (ℓ, k) - S_{n} (k + 1, m)),

Ω (ϵ)

Ω (ϵ)

Ω_{n} (ϵ)

G_{ϵ}

G_{ϵ, n, f}

G_{ϵ, n, b}

G_{ϵ, n, b}

T_{n}^{*} := (l_{1}, l_{2}) \in Ω_{n} (ϵ) max \frac{D ( l _{1} ; 1 , l _{2} ) ^{2}}{W _{n} ( l _{1} ; 1 , l _{2} )} + (m_{1}, m_{2}) \in Ω_{n} (ϵ) max \frac{D ( m _{2} ; m _{1} , n ) ^{2}}{W _{n} ( m _{2} ; m _{1} , n )} .

T_{n}^{*} := (l_{1}, l_{2}) \in Ω_{n} (ϵ) max \frac{D ( l _{1} ; 1 , l _{2} ) ^{2}}{W _{n} ( l _{1} ; 1 , l _{2} )} + (m_{1}, m_{2}) \in Ω_{n} (ϵ) max \frac{D ( m _{2} ; m _{1} , n ) ^{2}}{W _{n} ( m _{2} ; m _{1} , n )} .

T_{n}^{⋄} := (l_{1}, l_{2}) \in G_{ϵ, n, f} max \frac{D ( l _{1} ; 1 , l _{2} ) ^{2}}{W _{n} ( l _{1} ; 1 , l _{2} )} + (m_{1}, m_{2}) \in G_{ϵ, n, b} max \frac{D ( m _{2} ; m _{1} , n ) ^{2}}{W _{n} ( m _{2} ; m _{1} , n )} .

T_{n}^{⋄} := (l_{1}, l_{2}) \in G_{ϵ, n, f} max \frac{D ( l _{1} ; 1 , l _{2} ) ^{2}}{W _{n} ( l _{1} ; 1 , l _{2} )} + (m_{1}, m_{2}) \in G_{ϵ, n, b} max \frac{D ( m _{2} ; m _{1} , n ) ^{2}}{W _{n} ( m _{2} ; m _{1} , n )} .

l_{1}, ..., l_{h} = 1 \sum p c u m^{2} (X_{0, l_{1}, n}, ..., X_{0, l_{h}, n}) \leq C ∥ Σ_{n} ∥_{F}^{h},

l_{1}, ..., l_{h} = 1 \sum p c u m^{2} (X_{0, l_{1}, n}, ..., X_{0, l_{h}, n}) \leq C ∥ Σ_{n} ∥_{F}^{h},

∣ c u m (X_{0, m_{1}, n}, ..., X_{0, m_{h}, n}) ∣ \leq C_{h} (1 \lor 1 \leq i, j \leq h max ∣ m_{i} - m_{j} ∣)^{- r} .

∣ c u m (X_{0, m_{1}, n}, ..., X_{0, m_{h}, n}) ∣ \leq C_{h} (1 \lor 1 \leq i, j \leq h max ∣ m_{i} - m_{j} ∣)^{- r} .

T_{n} ⟶ D T = r \in [0, 1] sup \frac{G ( r ; 0 , 1 ) ^{2}}{\int _{0}^{r} G ( u ; 0 , r ) ^{2} d u + \int _{r}^{1} G ( u ; r , 1 ) ^{2} d u},

T_{n} ⟶ D T = r \in [0, 1] sup \frac{G ( r ; 0 , 1 ) ^{2}}{\int _{0}^{r} G ( u ; 0 , r ) ^{2} d u + \int _{r}^{1} G ( u ; r , 1 ) ^{2} d u},

G (r; a, b)

G (r; a, b)

C o v (Q (a_{1}, b_{1}), Q (a_{2}, b_{2})) = (b_{1} \land b_{2} - a_{1} \lor a_{2})^{2} 1 {b_{1} \land b_{2} > a_{1} \lor a_{2}} .

C o v (Q (a_{1}, b_{1}), Q (a_{2}, b_{2})) = (b_{1} \land b_{2} - a_{1} \lor a_{2})^{2} 1 {b_{1} \land b_{2} > a_{1} \lor a_{2}} .

T_{n} ⟶ D r \in [0, 1] sup \frac{{ 2 G ( r ; 0 , 1 ) + c Δ ( r , 0 , 1 ) } ^{2}}{\int _{0}^{r} { 2 G ( u ; 0 , r ) + c Δ ( u , 0 , r ) } ^{2} d u + \int _{r}^{1} { 2 G ( u ; r , 1 ) + c Δ ( u , r , 1 ) } ^{2} d u},

T_{n} ⟶ D r \in [0, 1] sup \frac{{ 2 G ( r ; 0 , 1 ) + c Δ ( r , 0 , 1 ) } ^{2}}{\int _{0}^{r} { 2 G ( u ; 0 , r ) + c Δ ( u , 0 , r ) } ^{2} d u + \int _{r}^{1} { 2 G ( u ; r , 1 ) + c Δ ( u , r , 1 ) } ^{2} d u},

Δ (r, a, b) := ⎩ ⎨ ⎧ (b^{*} - a)^{2} (b - r)^{2} (r - a)^{2} (b - b^{*})^{2} 0 a < b^{*} \leq r < b, a < r < b^{*} < b, b^{*} \leq a \mbox or b^{*} \geq b .

Δ (r, a, b) := ⎩ ⎨ ⎧ (b^{*} - a)^{2} (b - r)^{2} (r - a)^{2} (b - b^{*})^{2} 0 a < b^{*} \leq r < b, a < r < b^{*} < b, b^{*} \leq a \mbox or b^{*} \geq b .

T_{1} (r_{1}, r_{2})

T_{1} (r_{1}, r_{2})

T_{2} (s_{1}, s_{2})

T_{n}^{*}

T_{n}^{*}

T_{n}^{⋄}

μ_{t} = μ_{k}^{*} ⌊ n b_{k}^{*} ⌋ + 1 \leq t \leq ⌊ n b_{k + 1}^{*} ⌋, k = 0, ..., M

μ_{t} = μ_{k}^{*} ⌊ n b_{k}^{*} ⌋ + 1 \leq t \leq ⌊ n b_{k + 1}^{*} ⌋, k = 0, ..., M

l_{1}, ..., l_{4} = 1 \sum p c u m^{2} (X_{0, l_{1}}, ..., X_{0, l_{4}}) = o (∥ Σ_{n} ∥_{F}^{4}) .

l_{1}, ..., l_{4} = 1 \sum p c u m^{2} (X_{0, l_{1}}, ..., X_{0, l_{4}}) = o (∥ Σ_{n} ∥_{F}^{4}) .

∣ c u m (X_{0, m_{1}, n}, ..., X_{0, m_{h}, n}) ∣ \leq C_{h} (1 \lor 1 \leq i, j \leq h max ∣ m_{i} - m_{j} ∣)^{- r} .

∣ c u m (X_{0, m_{1}, n}, ..., X_{0, m_{h}, n}) ∣ \leq C_{h} (1 \lor 1 \leq i, j \leq h max ∣ m_{i} - m_{j} ∣)^{- r} .

D (k; l, m ∣ τ) =

D (k; l, m ∣ τ) =

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Bayesian Methods and Mixture Models

Full text

Inference for Change-Points in High-dimensional Data via Self-normalization

Runmin Wanglabel=e0][email protected] [

Changbo Zhulabel=e1][email protected] [

Stanislav Volgushevlabel=e2][email protected] [

Xiaofeng Shao label=e3][email protected] [ Southern Methodist University\thanksmarkm0 and University of California at Davis\thanksmarkm4 and University of Toronto\thanksmarkm2 and University of Illinois at Urbana-Champaign\thanksmarkm1

Abstract

This article considers change point testing and estimation for a sequence of high-dimensional data. In the case of testing for a mean shift for high-dimensional independent data, we propose a new test which is based on $U$ -statistic in Chen and Qin, (2010) and utilizes the self-normalization principle [Shao, (2010), Shao and Zhang, (2010)]. Our test targets dense alternatives in the high-dimensional setting and involves no tuning parameters. To extend to change point testing for high-dimensional time series, we introduce a trimming parameter and formulate a self-normalized test statistic with trimming to accommodate the weak temporal dependence. On the theory front, we derive the limiting distributions of self-normalized test statistics under both the null and alternatives for both independent and dependent high-dimensional data. At the core of our asymptotic theory, we obtain weak convergence of a sequential U-statistic based process for high-dimensional independent data, and weak convergence of sequential trimmed U-statistic based processes for high-dimensional linear processes, both of which are of independent interests. Additionally, we illustrate how our tests can be used in combination with wild binary segmentation to estimate the number and location of multiple change points. Numerical simulations demonstrate the competitiveness of our proposed testing and estimation procedures in comparison with several existing methods in the literature.

62H15,

60K35,

62G10,

62G20,

CUSUM,

Segmentation,

Self-Normalization,

Structural Break,

Time Series,

U-Statistic,

keywords:

[class=MSC]

keywords:

\startlocaldefs\endlocaldefs

, ,

and

t1Runmin Wang is Assistant Professor at Southern Methodist University, Department of Statistical Science (email: [email protected]); Changbo Zhu is Postdoctoral Scholar, Department of Statistics, University of California at Davis; Stanislav Volgushev is Assistant Professor at Department of Statistical Sciences, University of Toronto (email: [email protected]); Xiaofeng Shao is Professor, Department of Statistics, University of Illinois at Urbana-Champaign (e-mail: [email protected]). Wang and Zhu are joint first authors and made equal contributions to the paper.

1 Introduction

Suppose that we have a sequence of $\mathbb{R}^{p}$ -valued observations $\{Y_{t}\}_{t=1}^{n}$ which share the same distribution, except for possible change points in the mean vector $\mu_{t}=E(Y_{t})$ . We are interested in testing

[TABLE]

for some unknown $s$ and $k_{j}$ , $j=1,...,s$ . Change point testing is a classical problem in statistics and econometrics and it has been extensively studied when the dimension $p$ is low and fixed. For univariate and low/fixed dimensional multivariate data, we refer the readers to Aue et al., (2009), Shao and Zhang, (2010), Matteson and James, (2014), Kirch et al., (2015), Zhang and Lavitas, (2018) (among many others) for some recent work and Perron, (2006) and Aue and Horváth, (2013) for excellent reviews and the huge literature cited therein. A related problem is to estimate the number $s$ and the locations ( $k_{j}$ , $j=1,...,s$ ) of change points, which is also addressed in this paper.

Owing to the advances in science and technology, high-dimensional data is now produced in many areas, such as neuroscience, genomics and finance, among others. Structural change detection and estimation for high-dimensional data are of prime importance to understand the heterogeneity in the data as well as facilitate statistical modeling and inference. Among recent work that tackles change point testing and estimation for the mean of high-dimensional data and large panel data (allowing growing dimension), we mention Horváth and Hušková, (2012), Chan et al., (2013), Jirak, (2012, 2015), Cho, (2016), Yu and Chen, (2017), Wang and Samworth, (2018), Dette and Gösmann, (2018), Enikeeva and Harchaoui, (2019). In the high-dimensional environment, we often classify the alternatives into two types: sparse and dense alternatives. In the change-point context, a sparse change means that only a few components of the vector change their mean, i.e. the $L^{0}$ -norm of the mean change vector is much smaller than $p$ ; whereas dense change corresponds to the case that a change occurs for a substantial portion of the components. Several of the above-mentioned tests, including Chan et al., (2013), Jirak, (2012, 2015), Yu and Chen, (2017), Dette and Gösmann, (2018) and Wang and Samworth, (2018), specifically target sparse alternatives. For example, the test proposed by Wang and Samworth, (2018) is based on projection under a sparsity assumption; the test by Jirak, (2015) is based on taking maximum of componentwise CUSUM statistics. On the other hand, the test by Horváth and Hušková, (2012) aggregates the componentwise CUSUM statistic by using the sum, and is thus expected to have power against dense alternatives. However their asymptotic theory is mostly based on independent panel/component assumption and imposes the restrictive growth rate assumption $p/n^{2}=o(1)$ ; the test developed by Enikeeva and Harchaoui, (2019) is adaptive in the sense that it can capture both sparse and dense alternatives. However, the latter paper imposed Gaussian and independent components assumptions and the validity of their method seems questionable when these strong assumptions are violated (see Section 6 for numerical evidence). The test by Cho, (2016) is based on the double CUSUM statistic which utilizes the cross-sectional change-point structure by examining the cumulative sums of ordered CUSUMs at each point. A standard binary segmentation procedure was used to estimate the multiple change points and its consistency was shown for high-dimensional time series. Note that several tuning parameters need to be chosen for the double CUSUM based procedure and the computation cost is high due to the use of bootstrap; see Section 6 for some comparisons.

In this paper, we propose a new class of test statistics that target dense alternatives in the high-dimensional setting with either one single change point or multiple change points, which has received relatively less attention in the literature. The focus on the dense alternative can be well motivated by real data and is often the type of alternative we are interested in. For example, copy number variations in cancer cells are commonly manifested as change-points occurring at the same positions across many related data sequences corresponding to cancer samples and biologically-related individuals; see Fan and Mackey (2017). As a second example, the financial crisis is expected to have an impact on a large number of sectors and their stock returns, so a dense change is expected if we study the stock returns time series for many sectors. Our approach is nonparametric, requires quite mild structural assumptions on the data generating process, and does not impose any sparsity assumptions. Due to the use of self-normalization the limiting distributions of the proposed tests are pivotal. We note that, while self-normalized change point tests with pivotal limit were also obtained in Shao and Zhang, (2010) and Zhang and Lavitas, (2018), the test statistics in the latter papers can not be used when $p\geq n$ . Even when $p<n$ but $p$ is moderately large relative to $n$ , those tests typically do not work well as shown in some unreported simulations.

To fix ideas, we begin by considering the setting of one single change point alternative for high-dimensional independent data. To construct a procedure that works under mild assumptions on $p$ , we build upon the insights from Chen and Qin, (2010) who demonstrated that U-statistics provide a very effective means of comparing two high-dimensional mean vectors. Deriving the limiting distribution of our tests requires control over a collection of high-dimensional sequential U-statistics computed from a growing number of different sub-samples. This is achieved by establishing the weak convergence of a two-parameter stochastic process in the form of sequential U-statistic under sensible and mild assumptions. Given this crucial theoretical ingredient, we are able to derive the limiting null distribution of our test for a single change point. Practically, critical values of the proposed test can be obtained by simulation as the limiting null distribution is pivotal, and the procedure is rather straightforward to implement as no tuning parameter is involved. We further derive the power under local alternatives.

Next, we present extensions of this approach to testing against an unknown number of change-points in the spirit of Zhang and Lavitas, (2018) (who only considered fixed $p$ ) and consider the problem of testing for a change point in the covariance matrix. As in the single change point setting we obtain tests with pivotal limits. All tests are examined in the simulation studies and exhibit quite accurate size and decent power properties relative to some existing ones.

To extend our U-statistic based approach to high-dimensional time series, we introduce a trimmed version of the original U-statistic. As suggested by preliminary simulations and theoretical calculations this is crucial in the high-dimensional regime in order to alleviate the impact of temporal dependence on the bias of U-statistic. This trimmed statistic provides a basic ingredient for self-normalized test under simple and multiple change-point alternatives. We derive the limiting distributions under both the null and alternatives for high-dimensional linear processes and under fixed- $b$ asymptotics [Kiefer and Vogelsang, (2005)], i.e., we assume that the trimming parameter $\tau$ satisfies $\tau/n=\eta\in(0,1)$ , and show how the resulting limiting null distribution depends on $\eta$ . This provides a better approximation to the finite sample distribution than the conventional small- $b$ counterpart. Finally, we combine the idea of wild binary segmentation [Fryzlewicz, (2014)] with the SN-based test statistic to estimate the number and location of change points, and demonstrate its effectiveness as compared to several competitors in the literature.

The rest of the paper is structured as follows. Section 2 introduces our SN-based test statistics for both one single change point and multiple change points alternatives. A rigorous theoretical justification for their limiting properties under the null and alternatives is provided in Section 3, which also contains a theoretical extension to test for covariance matrix change. Section 4 presents an extension of the U-statistic based approach to the high-dimensional time series setting to test for a single mean shift. In Section 5, we present an algorithm based on wild binary segmentation and our SN-based test to estimate the number and locations of change points. Section 6 contains all simulation results. Section 7 concludes. The technical proofs and some additional simulation results are relegated to supplementary material.

A word about notation. For any real-valued vector $\delta=(\delta_{1},\delta_{2},...,\delta_{p})^{T}\in\mathbb{R}^{p}$ , its $L^{1}$ -norm and $L^{2}$ -norm are denoted as $\|\delta\|_{1}:=\sum_{i=1}^{p}|\delta_{i}|$ and $\|\delta\|_{2}:=(\sum_{i=1}^{p}\delta_{i}^{2})^{1/2}$ . For any matrix $A=(a_{i,j})_{i=1,...,n;j=1,...,m}\in\mathbb{R}^{n\times m}$ , its $L_{1}$ norm is denoted as $\|A\|_{1}:=\max_{j}\sum_{i=1}^{n}|a_{i,j}|$ , $L_{\infty}$ norm denoted as $\|A\|_{\infty}:=\max_{i}\sum_{j=1}^{m}|a_{i,j}|$ , the spectral norm by $\|A\|_{2}:=\sigma_{max}(A)$ , with $\sigma_{max}$ denoting the largest singular value and Frobenius norm as $\|A\|_{F}:=\{\sum_{i=1}^{n}\sum_{j=1}^{m}a_{i,j}^{2}\}^{1/2}$ . We denote the trace of a symmetric matrix $A$ as $tr(A)$ . The joint cumulant of $n$ random variables $Z_{1},...,Z_{n}$ is denoted is as $cum(Z_{1},Z_{2},...,Z_{n})$ . The notation ${{\bf{1}}}_{E}$ equals to $1$ if condition $E$ is satisfied and zero otherwise. We use “ $\overset{\mathcal{D}}{\rightarrow}$ ” to denote the convergence in distribution for random vectors, and “ $\rightsquigarrow$ ” to denote the weak convergence for stochastic processes.

2 Test statistics for high-dimensional independent data

2.1 Single change-point

To introduce our test statistic, we shall first focus on the single change point alternative, i.e.,

[TABLE]

An extension to general case (i.e., $\mathcal{H}_{1}$ ) will be made later. Assume that we observe a sample $Y_{1},...,Y_{n}$ . We shall describe the underlying rationale in forming our test in two steps. We begin by recalling the U-statistic approach pioneered by Chen and Qin, (2010) for comparing high-dimensional means from two samples. For $x_{1},...,x_{4}\in R^{p}$ define $h((x_{1},x_{2}),(x_{3},x_{4}))=(x_{1}-x_{3})^{T}(x_{2}-x_{4})$ . Then

[TABLE]

where $(X^{\prime},Y^{\prime})$ is an i.i.d. copy of $(X,Y)$ . In other words the parameter $\|E(X)-E(Y)\|^{2}$ can be estimated by a two-sample U-statistic with kernel $h$ . This insight provides the basic building block for the following approach.

Step 1: Form U-statistic based process. For any given candidate change point location $k$ compute the two-sample U-Statistic

[TABLE]

It is not hard to see that under $\mathcal{H}_{0}$ , $\mathbb{E}[G_{n}(k)]=0~{}\forall k$ while $\sup_{k}\mathbb{E}[G_{n}(k)]>0$ under $\mathcal{H}_{1}^{\prime}$ . This suggests that a consistent test for $\mathcal{H}_{1}^{\prime}$ can be constructed by considering the statistic

[TABLE]

with $w_{n}(k)$ denoting suitable weights. The first challenge in applying this test in practice lies in deriving the limiting distribution of $\sup_{1\leq k\leq n}w_{n}(k)|G_{n}(k)|$ under the null. The results in Chen and Qin, (2010) suggest that each individual $G_{n}(k)$ is asymptotically normal, but that is insufficient to find the asymptotic distribution of $\sup_{1\leq k\leq n}w_{n}(k)|G_{n}(k)|$ . The process convergence theory that we develop in this paper enables us to overcome this challenge, and given our results it is possible to show that

[TABLE]

where $W$ denotes a pivotal random variable and $\Sigma:=Cov(Y_{1})$ . However, this does not directly lead to an applicable test since the scaling $\|\Sigma\|_{F}^{-1}$ is unknown. Ratio-consistent estimation of $\|\Sigma\|_{F}^{2}$ is a difficult problem when $p$ is large, and this is particularly true in the change point testing context. The estimator used in Chen and Qin, (2010) is consistent under the null, but no longer consistent under the alternative due to a change point in mean. It is possible to formulate Kolmogorov-Smirnov type test with consistent estimation of $\|\Sigma\|_{F}$ (see Section 6.1 for the details and simulation comparisons), but we will next propose to use an approach that completely avoids consistent estimation.

Step 2: Self-normalization. The essence of SN is to avoid using a consistent estimator of the unknown parameter in the scale, which is $\|\Sigma\|_{F}^{2}$ in the present setting. As we mentioned before, consistent estimation of $\|\Sigma\|_{F}$ is difficult in the change point setting (especially with multiple unknown change points). The approach in Shao and Zhang, (2010) is not applicable in the present setting, however the basic strategy to use estimators from sub-samples still works after a suitable adaptation. Define

[TABLE]

for $1\leq\ell\leq k<m\leq n$ and $D(k;\ell,m)=0$ otherwise. Note that $D(k;1,n)$ is simply a scaled version of $G_{n}(k)$ defined previously while $D(k;\ell,m)$ can hence be interpreted as a scaled version of the U-Statistic $G_{n}$ computed on the sub-sample $Y_{\ell},Y_{\ell+1},...,Y_{m}$ . Letting

[TABLE]

the self-normalized test statistic for the presence of a single change point takes the form

[TABLE]

Heuristically, the fact that $D$ computed on various sub-samples appears both in the numerator and denominator, means that the unknown factor $\|\Sigma\|_{F}^{2}$ in their variance cancels out and the limit becomes pivotal; see Theorem 3.4 for a formal statement. The key to deriving the asymptotic distribution of $T_{n}$ defined above is to establish the joint behavior of the collection of statistics $D(k;\ell,m)$ indexed by $k,\ell,m$ . Due to the U-Statistic nature of our problem this result does not follow from statements about $G_{n}(k)$ and involves additional technical difficulties.

Note that our test statistic can be computed at the cost of $O(n^{2}p)$ . To this end, observe that

[TABLE]

where $S_{n}(k,m)=\sum_{i=k}^{m}\sum_{j=k}^{i}Y_{i+1}^{T}Y_{j}$ . Many quantities in $\{S_{n}(k,m)\}_{k<m}$ are repeatedly used in the calculation of our test statistic $T_{n}$ . The trick is to calculate $S_{n}(k,m)$ for all $1\leq k<m\leq n$ first, which can be done with the cost $O(n^{2}p)$ . Once $S_{n}(k,m)$ is available for all $k<m$ , $D(k;\ell,m)$ can be computed at the cost of $O(1)$ for fixed $k,l,m$ , and $T_{n}$ at the cost of $O(n^{2})$ . Hence the total computation cost is of order $O(n^{2}p)$ .

2.2 Extension to multiple change-points

In practice, the number of change points under the alternative is often unknown, which is the ‘unsupervised’ case considered in Zhang and Lavitas, (2018). It is expected that the SN-based test developed in the previous section may lose power when the number of change points is more than one; see Section 6.1 for simulation evidence. Thus it is desirable to develop a test that is adaptive, i.e., has reasonable power without the need to specify the number of change points under the alternative. Here, we propose to combine the scanning idea in Zhang and Lavitas, (2018) and the SN-based test proposed above to form our unsupervised test statistic. To this end, we consider the following additional notation. Following Zhang and Lavitas, (2018) define the sets

[TABLE]

and

[TABLE]

The first test statistic now takes the form

[TABLE]

One potential issue with this definition is that it involves the computation of $D_{n}(l_{1};1,l_{2})^{2}$ for $O(n^{2})$ combinations of $l_{1},l_{2}$ which can be expensive, especially when $n$ and $p$ are both large. To relax the computational burden, Zhang and Lavitas, (2018) also consider a discretised version. In our setting it takes the form

[TABLE]

It is worth noting that $\epsilon$ is a trimming parameter that needs to be specified by the user. We set $\epsilon=0.1$ following the practice of Zhang and Lavitas, (2018), who also provided some discussion on the role of $\epsilon$ in the testing.

3 Theoretical properties

Asymptotic properties of the proposed tests will be derived in a triangular array setting where $p=p_{n}$ , the dimension of $X_{0}$ , diverges to infinity. We will need the following regularity assumptions.

Assumption 3.1.

The observations are $Y_{t,n}=\mu_{t,n}+X_{t,n},t=1,...,n$ . $X_{1,n},...,X_{n,n}$ are i.i.d. copies of the $\mathbb{R}^{p_{n}}$ -valued random vector $X_{0,n}$ with $\mathbb{E}[X_{0,n}]=0$ and $\mathbb{E}[X_{0,n}X_{0,n}^{T}]=\Sigma_{n}$ . Moreover

A.1

$tr(\Sigma_{n}^{4})=o(\|\Sigma_{n}\|_{F}^{4})$ , 2. A.2

There exists a constant $C$ independent of $n$ such that

[TABLE]

for $h=2,3,4,5,6$ .

We remark that the dimension $p=p_{n}$ of the vector $X_{0}$ , the vectors $\mu_{i}$ , and the covariance matrix $\Sigma_{n}$ change with $n$ . To keep the notation simple this dependence will be dropped in all of the following results whenever there is no risk of confusion.

Remark 3.2 (Discussion of Assumptions).

Simple computation shows that Assumption A.1 is equivalent to $\|\Sigma_{n}\|_{2}=o(\|\Sigma_{n}\|_{F})$ , see section S8.5 in the supplement for details. Hence Assumption A.1 can only hold if $p=p_{n}\to\infty$ as $n\to\infty$ . All other conditions can be satisfied under uniform bounds on moments and ‘short-range’ dependence type conditions on the entries of the vector $(X_{0,1,n},...,X_{0,p_{n},n})$ . For illustration purposes, consider the following conditions.

(i)

There exists $c_{0}>0$ independent of $n$ such that $\inf_{i=1,...,p_{n}}Var(X_{0,i})\geq c_{0}$ . 2. (ii)

For $h=2,...,6$ there exist constants $C_{h}$ depending on $h$ only and a constant $r>2$ independent of $n,h,m_{1},...,m_{h}$ such that

[TABLE]

Note that this assumption is trivially satisfied if the entries of $(X_{0,1,n},...,X_{0,p_{n},n})$ are m-dependent over $i$ , i.e., if two groups $\{X_{0,i,n}:i\in J_{1}\},\{X_{0,i,n}:i\in J_{2}\}$ are independent whenever $\inf_{i\in J_{1},j\in J_{2}}|i-j|>m$ and if moments of order $h$ are uniformly bounded. It can also be verified under other conditions such as mixing plus moment assumptions [Zhurbenko and Zuev, (1975)] or physical dependence measures, see for instance Proposition 2 of Wu and Shao, (2004) and Theorem 4.1 of Shao and Wu, (2007) for the latter.

Now it is easy to prove (see section S8.5 in the supplement for details) that if $p_{n}\to\infty$ , (i) holds and (ii) holds for some $r>3/2$ then Assumption 3.1 holds. **

Remark 3.3 (Comparison with Chen and Qin, (2010)).

Although Chen and Qin, (2010) studied a two-sample mean testing problem which is different from the change point setting we consider here, the weak cross-sectional dependence condition was also required in their theory to obtain a Gaussian limit. To quantify the dependence among different components of the vector $X_{1}$ , Chen and Qin, (2010) proposed a factor model. More precisely they assume that $X_{i}=\Gamma Z_{i}$ where $Z_{i}$ are m-dimensional random vectors with the additional property $\mathbb{E}[Z_{t,l_{1}}^{\alpha_{1}}\cdots Z_{t,l_{q}}^{\alpha_{q}}]=\mathbb{E}[Z_{t,l_{1}}^{\alpha_{1}}]\cdots\mathbb{E}[Z_{t,l_{q}}^{\alpha_{q}}]$ for all $l_{1}\neq...\neq l_{q}$ and integers $\alpha_{k}\leq 4$ with $\sum_{k}\alpha_{k}\leq 8$ . In contrast, we assume A.2 without imposing a factor model structure. As we shall prove in section S8.6, the factor model structure of Chen and Qin, (2010) together with finite moments of order $6$ implies our condition A.2. Moreover, a close look at the proofs reveals that for proving finite-dimensional convergence we only require A.2 with $h\leq 4$ , which follows from the assumptions of Chen and Qin, (2010). Hence, we prove a result which corresponds to that of Chen and Qin, (2010) under strictly weaker assumptions on the dependence structure and provide process convergence results under only slightly stronger moment conditions and still weaker structural assumptions. **

3.1 Properties of the test for a single change-point

We begin by deriving the limiting distribution of the test statistic $T_{n}$ defined in (2.3).

Theorem 3.4.

Let Assumption 3.1 hold. If $\mu_{t}\equiv\mu$ for a vector $\mu\in\mathbb{R}^{p}$ (i.e. under $\mathcal{H}_{0}$ ) then

[TABLE]

where

[TABLE]

and $Q$ is a centered Gaussian process on $[0,1]^{2}$ with covariance structure given by

[TABLE]

The limiting distribution $T$ is pivotal, and an asymptotic level $\alpha$ test for $\mathcal{H}_{0}:\mu_{t}\equiv\mu$ is thus given by the decision: reject $\mathcal{H}_{0}$ if $T_{n}>Q_{T}(1-\alpha)$ where $Q_{T}(1-\alpha)$ denotes the $1-\alpha$ quantile of the distribution of $T$ . Simulated quantiles from this distribution (based on 10000 Monte Carlo replications) are provided in Table 1.

Note that the above limiting null distribution requires that $p\wedge n\rightarrow\infty$ , (this must hold for Assumption A.1 to be satisfied), and does not hold when $p$ is fixed and $n\rightarrow\infty$ . Our SN-based test statistic $T_{n}$ builds on the two sample test statistic proposed by Chen and Qin, (2010), whose limit under the fixed $p$ paradigm is expected to be non-Gaussian, as their test statistic is a degenerate $U$ -statistic under the null. Here the assumption $p\rightarrow\infty$ is essential to our Gaussian process limit for the two-parameter process $\Big{\{}\frac{\sqrt{2}}{n\|\Sigma\|_{F}}\widetilde{S}_{n}(\lfloor an\rfloor+1,\lfloor bn\rfloor-1)\Big{\}}_{(a,b)\in[0,1]^{2}},$ which is the key to derive the limiting null distribution of $T_{n}$ ; see Section S8 in the supplement.

Next we consider the behavior of the test under alternatives. The following result shows that the test is consistent against local alternatives of a certain order if there is exactly one change-point.

Theorem 3.5.

Let Assumption 3.1 hold. Assume that there exists $b^{*}\in(0,1)$ such that $\mu_{t}=\mu,t=1,...,\lfloor b^{*}n\rfloor$ and $\mu_{t}=\mu+\delta_{n},t=\lfloor b^{*}n\rfloor+1,...,n$ . Then

If $\sqrt{n}\|\delta_{n}\|_{2}/\|\Sigma\|_{F}^{1/2}\to\infty$ then $T_{n}\to\infty$ in probability. 2. 2.

If $\sqrt{n}\|\delta_{n}\|_{2}/\|\Sigma\|_{F}^{1/2}\to 0$ then $T_{n}\stackrel{{\scriptstyle\mathcal{D}}}{{\longrightarrow}}T$ . 3. 3.

If $\sqrt{n}\|\delta_{n}\|_{2}/\|\Sigma\|_{F}^{1/2}\to c\in(0,\infty)$ then

[TABLE]

where

[TABLE]

3.2 Properties of the tests for multiple change-points

To describe the properties of the test statistics $T_{n}^{*},T_{n}^{\diamond}$ under the null, define for $0\leq r_{1}<r_{2}\leq 1$ and $0\leq s_{1}<s_{2}\leq 1$ ,

[TABLE]

Theorem 3.6.

Let Assumption 3.1 hold and assume $\epsilon<1/4$ . If $\mu_{t}\equiv\mu$ for a vector $\mu\in\mathbb{R}^{p}$ (i.e. under $H_{0}$ ) then

[TABLE]

The distributions of $T^{*},T^{\diamond}$ are again pivotal but depend on $\epsilon$ (which is known since it is chosen by the user). For $\epsilon=0.1$ used in the paper, the critical values of $T^{\diamond}$ are tabulated in Table 2 below.

To describe the properties of the tests based on $T_{n}^{*},T_{n}^{\diamond}$ under the alternative (where we could have several change-points), assume that for some $\epsilon<b_{1}^{*}<b_{2}^{*}<...<b_{M}^{*}<1-\epsilon$ we have

[TABLE]

where we defined $b_{0}^{*}=0,b_{M+1}^{*}=1$ and $\mu_{0}\neq\mu_{1}\neq...\neq\mu_{M}$ denote vectors in $\mathbb{R}^{p}$ .

Theorem 3.7.

Let Assumption 3.1 hold and assume $\epsilon<1/4$ . Additionally, assume that in the setting given above we have $\inf_{k}|b_{k}^{*}-b_{k+1}^{*}|\geq\epsilon,\sup_{k}\sqrt{n}\|\mu_{k}^{*}-\mu_{k+1}^{*}\|_{2}/\|\Sigma\|_{F}^{1/2}\to\infty$ . Then $T_{n}^{*}\to\infty$ in probability and $T_{n}^{\diamond}\to\infty$ in probability.

3.3 Application to testing for changes in the covariance structure

In this subsection, we shall focus on testing for a change in the covariance matrix, which is an important problem in the analysis of multivariate data, and has applications in many areas, such as economics and finance. Aue et al., (2009) proposed a CUSUM-based test in the low dimensional time series setting and documented the early literature, which is mostly focused on the low dimension high sample size setting. In the high dimensional environment, the only work we are aware of is Avanesov and Buzun, (2018), which will be introduced and compared in our simulation studies; see Section S10 of the supplement. Following the latter paper, we assume $\mu_{t,n}=0,t=1,...,n$ . Define $Z_{0}=vech(X_{0}X_{0}^{T})$ as the half-vectorization of $X_{0}X_{0}^{T}$ , i.e. the vectorization of the lower triangular part (including the diagonal) of $X_{0}X_{0}^{T}$ . If $\mathbb{E}[X_{0}]=0$ then $\mathbb{E}(Z_{0})=vech(\Sigma_{X})$ . Tests for changes in $\Sigma_{X}$ can thus be constructed by applying the test statistics from the previous sections to the transformed observations $Z_{t}:=vech(X_{t}X_{t}^{T}),t=1,...,n$ . In what follows we provide a result that allows to verify Assumption 3.1 for $Z_{0}$ from properties of $X_{0}$ .

Proposition 3.8.

The vector $Z_{0}:=vech(X_{0}X_{0}^{T})$ satisfies Assumption 3.1 provided that the following conditions hold for $X_{0}$ with $\mathbb{E}[X_{0}]=0$ and $\Sigma_{n}:=\mathbb{E}[X_{0}X_{0}^{T}].$

B.1

$\|\Sigma_{n}\|_{1}=o(\|\Sigma_{n}\|_{F})$ . 2. B.2

$\max_{l_{1},l_{2}=1,...,p}\sum_{l_{3},l_{4}=1}^{p}|cum(X_{0,l_{1}},X_{0,l_{2}},X_{0,l_{3}},X_{0,l_{4}})|=o(\|\Sigma_{n}\|_{F}^{2})$ . 3. B.3

There exists a constant $C$ such that $\sum_{l_{1},...,l_{h}=1}^{p}cum^{2}(X_{0,l_{1}},...,X_{0,l_{h}})\leq C\|\Sigma_{n}\|_{F}^{h}$ for $h=2,...,12$ . Moreover

[TABLE]

Remark 3.9 (Discussion of Assumptions).

Similar to Remark 3.2, Assumptions B.1 - B.3 can be verified by considering the following conditions: (1) $p_{n}\to\infty$ ; (2) there exists $c_{0}>0$ independent of $n$ such that $\inf_{i=1,...,p_{n}}Var(X_{0,i})\geq c_{0}$ ; (3) there exist $c_{1}>0$ such that $Var(X_{0,i}X_{0,j})\geq c_{1}>0$ , $\forall 1\leq i\leq j\leq p$ ; (4) for $h=2,...,12$ there exist constants $C_{h}$ depending on $h$ only and a constant $r>2$ independent of $n,h,m_{1},...,m_{h}$ such that

[TABLE]

This can be easily satisfied if the entries of $(X_{0,1,n},...,X_{0,p_{n},n})$ are m-dependent and moments of order $12$ are uniformly bounded or under suitable conditions on short-range dependence; see Remark 3.2 for additional details. A proof of this statement is given in Section S8.5.**

Remark 3.10.

As pointed out by a referee, we vectorize the covariance matrix and apply the mean change point test, which may not be efficient, since we ignore certain structures of covariance matrices such as symmetricity and positive definiteness. In the two sample testing context, Li and Chen, (2012) proposed a novel test for the equality of two high-dimensional covariance matrices by using U-statistic for the scalar parameter $tr\{(\Sigma_{1}-\Sigma_{2})^{2}\}$ , where $\Sigma_{j}$ denotes the covariance matrix for the $j$ th population, $j=1,2$ . The test by Li and Chen, (2012) can be naturally viewed as an extension of Chen and Qin, (2010) from the mean testing to covariance matrix testing. Given this connection, it is indeed possible to build on Li and Chen, (2012) to propose a SN-based test for a change-point in covariance matrix, following the developments presented in Section 2.1. However, the associated theory seems fairly complex and we shall leave it for future investigation. **

4 Test statistics for high-dimensional time series

In this section, we assume that $\{Y_{t}\}_{t=1}^{n}$ is a realization of $\mathbb{R}^{p}$ -valued time series with weak temporal dependence. To extend the U-statistic based approach from high-dimensional independent data to weakly dependent high-dimensional time series, we formulate a trimmed version of the $U$ -statistic that excludes pairs of points that are close on time scale. Trimming is crucial in the high-dimensional context to remove the bias caused by weak temporal dependence and is common for the use of U-statistic in the time series setting. It is also routinely applied in fixed dimensions; see Lee, (1990). To confirm the need for trimming, we implemented the untrimmed test statistic $T_{n}$ for the VAR $(1)$ model in Example 6.1 for both $n=p=100$ and $n=p=200$ with $\rho=-0.5,0.5,0.7$ , and the empirical sizes are uniformly zero for all cases (results based on 2000 replications). This is due to the fact that the temporal dependence incurs a non-negligible bias for the denominator $D(k;1,n)$ (and more generally $D(k;l,m)$ ) as under the null and for stationary time series, $E\{D(k;l,m)\}$ is a linear combination of the auto-covariance based terms $E\{(Y_{0}-\mu)^{T}(Y_{h}-\mu)\}$ , $h=1,2,...$ , which vanish under the i.i.d. assumption. As an alternative approach, Li et al., (2019) proposed to estimate the bias explicitly, and we shall compare the two approaches in terms of estimation accuracy in Section S11.2 of the supplement.

Motivated by the discussion above, we modify the statistic $D$ in equation (2.1) by removing all terms of the form $Y_{i}^{T}Y_{j}$ for which $|i-j|\leq\tau$ . This considerably reduces the bias which is introduced by weak temporal dependence of the $Y_{i}$ . The resulting trimmed statistic is of the form

[TABLE]

where $\tau$ is a given positive integer such that $l+\tau+1\leq k\leq m-2\tau-2$ . It is clear that when $\tau=0$ , $D(k;l,m|0)=D(k;l,m)$ , where $D(k;l,m)$ is defined in Equation (2.1). Furthermore, we let

[TABLE]

where $l+\tau+1\leq k-2\tau-2$ and $k+\tau+2\leq m-2\tau-2$ . The self-normalized statistic is then defined as

[TABLE]

In the theoretical developments that follow, we assume $\tau=\lfloor\eta n\rfloor,\eta\in(0,1)$ and fix $\eta$ in our asymptotic framework, in other words we consider fixed- $\eta$ asymptotics [this type of approach is termed fixed-b asymptotics in Kiefer and Vogelsang, (2005). This is motivated by preliminary simulations, where we found that the limiting null distribution derived under the small- $\eta$ asymptotics (i.e., $\eta\rightarrow 0$ as $n\rightarrow\infty$ ) provides a poor approximation to the finite sample distribution under the null especially when $\eta$ is not very small, which is required when the temporal dependence is moderate or strong. Explicitly taking into account the effect of trimming through fixed- $\eta$ asymptotics results in a much more accurate size as seen in our simulations. Note that fixed- $b$ asymptotics and self-normalization are quite related in many ways and for some problems, self-normalization is a special case of fixed- $b$ asymptotics; see Shao, (2010) and Shao, (2015) for more discussions about the connection and difference.

Compared to the analysis in Section 3, the present setting involves two major challenges. First, adopting the fixed- $\eta$ framework results in a more complex statistic and the simple representation of the process $D$ without trimming (see equation (S8.2)) does not hold anymore. A somewhat more involved representation needs to be derived instead; see the first two pages in Section S9.2 and in particular equation (S9.2) therein). Second, each of the four U-processes in the new decomposition is now based on dependent rather than independent data and involves additional weighting. This considerably complicates their asymptotic analysis.

To overcome the technical difficulties described above, we will limit our attention to linear processes. In particular, we assume $Y_{t}=\mu_{t}+X_{t}$ , $t=1,...,n$ , where $X_{t}=\sum_{j=0}^{\infty}c_{j}\epsilon_{t-j}$ and $\{\epsilon_{t}\}$ are i.i.d $p$ -dimensional innovations with mean [math] and $c_{j}$ are $p\times p$ coefficient matrices. Let

[TABLE]

be the corresponding long run variance matrix. The linear processes framework is quite general and it includes the well-known ARMA models. From a technical point of view, we are able to take advantage of the Beveridge-Nelson (BN) decomposition [Phillips and Solo, (1992)], which can be shown to work in the high-dimensional setting.

The following assumptions are imposed to study the asymptotic distribution of $T_{n}$ .

Assumption 4.1.

Suppose the following assumptions hold.

C.1

$\sup_{l=1,...,p}\|\epsilon_{0,l}\|_{8}<\infty$ .** 2. C.2

For any $m\geq 0$ .

[TABLE]

where $C>0$ and $0<\rho<1$ are some constants. 3. C.3

$tr(\Gamma^{4})=o(\|\Gamma\|_{F}^{4})$ . 4. C.4

$p^{6}\rho^{\lfloor\eta n\rfloor}/\|\Gamma\|_{F}^{6}=O(1)$ . 5. C.5

For any $h=2,3,4,5,6$ ,* $\sum_{k_{1},...,k_{h}=1}^{p}|cum(\epsilon_{0,k_{1}},\cdots,\epsilon_{0,k_{h}})|\leq C^{\prime}\|\Gamma\|_{F}^{h},$ ** where $C^{\prime}$ is some constant independent of $n,p$ .*

Remark 4.2.

Assumptions C.1 and C.2 imply the Uniform Geometric Moment Contraction (UGMC( $8$ )) property in Wang and Shao, (2020). The UGMC condition is a generalization of Geometric Moment Contraction in Hsing and Wu, (2004) and Wu and Shao, (2004) to the high-dimensional setting and its equivalent form has been used in Zhang and Cheng, (2018). Assumption C.3 is commonly assumed for covariance matrix [e.g., Chen and Qin, (2010)] and it can be satisfied under some weak cross-sectional and temporal dependence conditions. Assumption C.4 implies that the bias caused by temporal dependence is asymptotically negligible. Assumption C.5 holds under mild conditions, see Section 3 in Wang and Shao, (2020) for some verified examples.

Remark 4.3.

Recently, Wang and Shao, (2020) proposed a new way of doing self-normalization for inference of high-dimensional time series. They dealt with one sample testing problem, and also used the trimming technique in their U-statistic. Their asymptotic theory was developed for a broad class of nonlinear causal processes using martingale approximation. To develop our asymptotic theory for nonlinear processes would be desirable but seems very challenging as we are dealing with a two-sample testing problem with unknown break date, and the process convergence theory we develop seems considerably more involved.

Now we are ready to state the asymptotic null distribution of $T_{n}$ .

Theorem 4.4.

Suppose Assumption 4.1 is true. Then,

[TABLE]

where

[TABLE]

For $u,v=1,2,3,4,$

[TABLE]

and $V_{1},V_{2},V_{3},V_{4}$ are Gaussian processes with covariance structures

[TABLE]

where $C_{u,v}(a,b)$ is defined as

[TABLE]

with $w_{i,j}^{u}={\bf{1}}_{\{u=1\}}+\frac{j}{n}{\bf{1}}_{\{u=2\}}+\frac{i+\lfloor\eta n\rfloor+1}{n}{\bf{1}}_{\{u=3\}}+\frac{i+\lfloor\eta n\rfloor+1}{n}\frac{j}{n}{\bf{1}}_{\{u=4\}}$ .

The limiting distribution $T(\eta)$ derived above is considerably more complicated than in the independent case but still pivotal for given $\eta$ . This is because the cross-covariance of the centered processes $V_{1},...,V_{4}$ depends only on $\eta$ and not on any unknown quantities. In other words, our test involves only one trimming parameter, whose impact is captured to the first order by the limiting null distribution. Simulated quantiles of $T(\eta)$ are tabulated in Table 3.

Remark 4.5.

The main reason for the rather involved structure of $T(\eta)$ above is the effect of the trimming parameter $\eta$ . Indeed, if $\eta=0$ ,

[TABLE]

which is identical to $G(r;a,b)$ in Theorem 3.4.

Next we present the asymptotic distribution under some local alternatives.

Theorem 4.6.

Suppose Assumption 4.1 holds. Assume that there exits $\phi\in(3\eta,1-3\eta)$ such that $\mu_{t}=\mu^{*}$ for $t=1,2,...,\lfloor\phi n\rfloor$ and $\mu_{t}=\mu^{*}+\delta_{n}$ for $t=\lfloor\phi n\rfloor+1,...,n$ . Then,

1,

If $n^{1/2}\|\delta_{n}\|_{2}/\|\Gamma\|_{F}^{1/2}\rightarrow\infty$ , then $T_{n}\rightarrow\infty$ in probability.

2,

If $n^{1/2}\|\delta_{n}\|_{2}/\|\Gamma\|_{F}^{1/2}\rightarrow 0$ , then $T_{n}\rightarrow T$ .

3,

If $n^{1/2}\|\delta_{n}\|_{2}/\|\Gamma\|_{F}^{1/2}\rightarrow c\in(0,\infty)$ , then

[TABLE]

*where $\widetilde{G}(r;a,b|\eta,\phi):=\sqrt{2}G(r;a,b|\eta)+c\Delta(r;a,b|\eta,\phi),$ and $\Delta(r;a,b|\eta,\phi)$ is defined similarly to $G(r;a,b|\eta)$ but with $\triangledown_{u}(\cdot,\cdot|\eta,\phi)$ , $\square_{u}(\cdot,\cdot;\cdot,\cdot)$ replacing all instances of $V_{u}(\cdot,\cdot|\eta),U_{u}(\cdot,\cdot;\cdot,\cdot)$ where we defined *

[TABLE]

and

[TABLE]

Remark 4.7.

If $\eta=0$ , we have

[TABLE]

It can be easily seen that $\triangledown_{1}(a,b|0,\phi)=(b-(\phi\vee a))^{2}{\bf{1}}_{\{\phi<b\}}$ . Then, some algebra show that

[TABLE]

Thus, we have that $\Delta(r;a,b|0,\phi)$ is equal to $\Delta(r,a,b)$ with $b^{*}=\phi$ defined in Theorem 3.5.

Remark 4.8.

It is quite straightforward to mimic the test we develop for the unsupervised case in the setting of high-dimensional independent data, and develop an SN-based test for multiple change points alternative in the high-dimensional time series setting. Details are omitted for the sake of brevity.

5 Wild binary segmentation and multiple change-point estimation

In practice, an important problem is to estimate the number and location of change points. A classical testing-based method is binary segmentation: run a test over the full sample, and if the test rejects the null, then split the sample into two segments (with the location of first change point estimated by the $k$ where the maximum is achieved in the test statistic), and then continue to test for change points for each segment. The algorithm stops when there is no rejection for each segment. A problem with binary segmentation is that it does not work well when there are multiple change points with changes exhibiting a non-monotonic pattern; see our simulation results. To overcome this drawback, Fryzlewicz, (2014) proposed a new approach called Wild Binary Segmentation (WBS, hereafter). The main idea of WBS is to calculate the CUSUM statistic for many random sub-intervals to allow at least one of them to be localized around a change point (with high probability), so this change point can be identified. It overcomes the weakness of binary segmentation, where the CUSUM statistic computed on the full sample is unsuitable for certain configurations of multiple change-points. It seems natural to combine the WBS with our SN-based test statistic and see whether we can estimate the number and location of change points accurately.

We begin by introducing some additional notation. For arbitrary integers $4\leq s+3\leq e-4\leq n-4$ define

[TABLE]

where $D(b;\ell,m)$ was defined in (2.1) and

[TABLE]

Note that $Q(s,e)$ is simply the statistic $T_{n}$ from (2.3) computed pretending that the available sample consists of $X_{s},...,X_{e}$ .

Now WBS-SN is applied as follows. Denote by $F_{n}^{M}$ a set of $M$ pairs of integers $(s_{m},e_{m})$ which satisfy $1\leq s_{m}<e_{m}\leq n$ and $e_{m}-s_{m}\geq L_{0}$ with numbers $s_{m},e_{m}$ drawn uniformly from the set $\{1,...,n\}$ (independently with replacement) and $L_{0}$ denoting a minimal interval length. Given this sample, apply Algorithm 1 with initialization WBS-SN $(1,n,\xi_{n},{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{L_{0},F_{n}^{M}}})$ . Here, the threshold parameter $\xi_{n}$ is determined by simulations as follows: generate $R$ samples of i.i.d multivariate normal random variables with constant mean zero and identity covariance matrix, with the same $n$ and $p$ as $Y_{1},...,Y_{n}$ . For the $i^{th}$ sample, calculate

[TABLE]

Given the $R$ values $\hat{\xi}^{i}_{n},i=1,...,R$ above pick $\xi_{n}$ as the $95\%$ quantile of the values $\hat{\xi}^{1}_{n},...,\hat{\xi}^{R}_{n}$ . Since the SN test statistic is asymptotically pivotal, this threshold is expected to well approximate the 95% quantile of the finite sample distribution of the maximum SN test statistic on the $M$ random intervals under the null. The detailed algorithm is presented below.

The same approach can be applied to multiple change point detection for high-dimensional time series, with an incorporation of a trimming parameter in our SN-based test statistic. To obtain the threshold $\xi_{n}$ , we can apply the same $M$ random intervals and the trimmed SN-based test statistic with the same trimming parameter $\tau$ to i.i.d standard normal distributed data with the same $(n,p)$ , as done for the independent data case. Similarly, we also adopt a bound $L_{0}$ for the minimal length of random intervals which now depends on $\eta,n$ . Some investigations of the sensitivity with respect to the choice of $L_{0}$ and some practical recommendations are provided in the simulation section in the supplement.

6 Numerical Results

In this section, we examine the finite sample performance of our proposed tests and estimation methods via simulations. In Section 6.1, we present the size and power for our SN-based test in comparison with Kolmogorov-Smirnov type test for a single change point in high-dimensional independent data and also examine the behavior of the test developed for the unsupervised case. In Section 6.2, we show the size and power for the test for a single change point in the mean of high-dimensional time series. Section S11.1 and Section S11.2 in the supplement contain the WBS-based estimation result in comparison with some existing methods for independent and dependent data, respectively.

6.1 Testing for high-dimensional independent data

In this subsection we investigate the finite sample behavior of our test statistic for a mean shift. We shall first focus on the supervised case, i.e., under the alternative that there is one change point in the mean. Consider the data generating process

[TABLE]

where ${\delta}$ is a p-dimensional vector representing the mean shift, and $\{\epsilon_{t}\}$ are i.i.d samples from multivariate normal distribution, with common mean ${0}$ and covariance matrix $\Sigma$ . Under the null hypothesis where there is no change point, it is equivalent to the case that ${\delta}={0}$ , whereas under the alternative (there is one change point), we let ${\delta}=\kappa{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{(1,1,...,1)}}^{T}$ with $\kappa\in\{0.1,0.2\}$ . For $\Sigma$ , we consider four scenarios:

a)

Independent. $\Sigma={\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{I_{p}}}$ (i.e., identity matrix). 2. b)

AR(1)-type correlation. The $(i,j)$ element in $\Sigma$ is $\sigma_{ij}=0.5^{|i-j|}$ . 3. c)

Banded. Specifically, the main diagonal elements are all 1. The first off-diagonal elements are all 0.5 and the second off-diagonal elements are all 0.25. All other elements are zero. 4. d)

Compound Symmetric. The main diagonal elements are all 1 and all remaining elements are 0.5.

We also tried non-Gaussian errors, where $\epsilon_{t}=\Sigma^{1/2}\widetilde{\epsilon}_{t}$ , where $\widetilde{\epsilon}_{t}$ have i.i.d components with scaled $t(3)$ distribution that have mean zero and variance one. We let $p\in\{100,200,500\}$ and $n\in\{100,200,500\}$ .

We shall formulate an extension of the classical Kolmogorov-Smirnov (KS) test statistic in the current context and compare with SN-based test via simulations. Let $\widehat{k}={\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{\mbox{argmax}_{k=2,...,n-2}}}D(k;1,n)^{2}$ which is an estimate of change point location without self-normalization. We can then define an estimator of $\|\Sigma\|_{F}^{2}$ using the Jackknife-based approach as presented on page 814 of Chen and Qin, (2010) in two ways. On one hand, we can obtain a pre-break estimate and a post-break estimate of $\|\Sigma\|_{F}^{2}$ and then take the average of them, i.e.,

[TABLE]

where $\bar{X}_{(j_{1},j_{2});a:b}$ denotes the average of the sample $X_{a},...,X_{b}$ without $X_{j_{1}}$ and $X_{j_{2}}$ . On the other hand, we can form a demeaned sample by substracting $\bar{X}_{1:\widehat{k}}$ from $(X_{1},...,X_{\widehat{k}})$ and $\bar{X}_{(\widehat{k}+1):n}$ from $(X_{\widehat{k}+1},...,X_{n})$ , and then apply the jackknifed based estimator to the full demeaned sample; we denote the resulting estimator by $\widehat{\|\Sigma\|_{F,2}^{2}}$ . Then we can define the following two statistics

[TABLE]

To facilitate the comparison, we also introduce an infeasible version,

[TABLE]

The limiting null distributions of the above three statistics are expected to be $\sup_{r\in[0,1]}|G(r;0,1)|^{2}$ , the critical values of which can be obtained by simulations. It is worth noting that the limiting null for the infeasible test statistic can be easily derived from our Theorem 3.4.

Below we compare four tests, $T_{n}$ , $KS_{n,1}$ , $KS_{n,2}$ , $KS_{n,Inf}$ and $EH$ based on 5000 Monte Carlo replications with the nominal level $0.05$ . Here $EH$ refers to the adaptive change point test developed by Enikeeva and Harchaoui, (2019), which requires Gaussian and independent components assumptions. Table 4 below shows the rejection rate in percentage under $\mathcal{H}_{0}:\kappa=0$ , $\mathcal{H}_{1,1}:\kappa=0.1$ and $\mathcal{H}_{1,2}:\kappa=0.2$ for Gaussian errors and Table 5 is for the non-Gaussian case.

Please insert Table 4 here!

Please insert Table 5 here!

The above simulation results demonstrate that when the error is Gaussian, (1) SN-based test has accurate size for independent, AR(1) and Banded correlation models, whereas the test appears quite distorted in the compound symmetric case. This finding is not surprising as the compound symmetric case violates the theoretical assumptions imposed (see Assumption 3.1), whereas independent, AR(1) and Banded cases satisfy those assumptions. In a sense, this shows that our (weak componentwise dependence) assumptions are to a certain extent necessary. The KS tests (both infeasible and feasible ones) show similar size behavior except that they are noticeably undersized for $n=100$ case, and their size distortion in the compound symmetric case is even greater than our test. The test by Enikeeva and Harchaoui, (2019) exhibits size distortion for all cases (undersized for independent case, and oversized for AR(1) and bounded correlation models) and its size for compound symmetric case is way too high. When the error is nonGaussian, our SN-based test and all KS tests appear to have similar rejection rates as the Gaussian case, indicating the robustness of our SN-based test with respect to heavy tailed errors. By contrast, the size for EH in the non-Gaussian case is very high, implying the sensitivity/non-robustness of their test with respect to non-Gaussianity.

A comparison of the powers for SN-based and KS tests shows that our test is very comparable to all three KS tests, which perform similarly. Overall the finite sample size and power performance of four tests (SN and three KS tests) are very much comparable with no single test dominating others. Note that the feasible KS tests assume there is one change point, and it may perform very poorly when there are more than one change-point (results not shown). Methodologically, it seems desirable to develop a test that does not involve explicit estimation of change points, which is itself a difficult problem, especially when there are multiple change points. The power of EH is hard to interpret given its distorted size, and we shall not look into the size-adjusted power as we would not recommend EH test for nonGaussian and cross-sectionally dependent high-dimensional independent data.

We further examine the finite sample performance of the test we develop for the unsupervised case (i.e., there could be multiple change points under the alternative), in comparison with the SN-based test aimed for one change point only. Three different data generating processes are considered below:

( $\mathcal{H}_{1,1}$ ) (one change-point alternative): $\mu_{t}={\delta}{\bf{1}}_{t/n>1/2}$ ;

( $\mathcal{H}_{1,2}$ ) (two change-point alternative): $\mu_{t}={\delta}{\bf{1}}_{t/n>1/3}-{\delta}{\bf{1}}_{t/n>2/3}$ ;

( $\mathcal{H}_{1,3}$ ) (three change-point alternative): $\mu_{t}={\delta}{\bf{1}}_{t/n>1/4}-{\delta}{\bf{1}}_{3/4\geq t/n>1/2}$ ;

Under the null hypothesis, ${\delta}=0$ , whereas under the alternative we let ${\delta}={\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{(0.2,0.2,...,0.2)}}^{T}$ . Following the practice in Zhang and Lavitas, (2018), we set $\epsilon=0.1$ . The empirical rejection rates (in percentage) are summarized in Table 6 below for several combinations of $(n,p)$ , where we denote the statistic developed for the supervised case as $T_{n}$ and for the unsupervised case as $T_{n}^{\diamond}$ .

Please insert Table 6 here!

From Table 6, we can observe that $T_{n}$ have empirical rejection rates close to $5\%$ under the null for all cases except for compound symmetric case, and $T_{n}^{\diamond}$ exhibits quite a bit distortion when $n=100,200$ and its size appears accurate for $n=500$ for the independent, AR(1) and banded cases. When the error has compound symmetric covariance, the size distortion for $T_{n}^{\diamond}$ is considerably higher than that for $T_{n}$ , showing the difficulty brought by the strong componentwise dependence. Under the alternative, we can see that the supervised test statistic has much higher power in the single change point case, but the power lost drastically when there are two or three change points, suggesting the inability of the supervised test that targets one change point to accommodate more than one. By contrast, the unsupervised test still preserves reasonable amount of power, which is consistent with our theory. The results for the non-Gaussian case are qualitatively similar so are not included here to conserve space.

6.2 Testing for high-dimensional time series

We consider the following single change point model.

Example 6.1.

Consider the following VAR(1) model,

[TABLE]

where $\{\epsilon_{t}\}$ are the temporally independent errors and we consider $\rho\in\{0.2,0.5,0.7,-0.5\}$ . Under the null hypothesis, ${\delta}=0$ . Under the alternative hypothesis, we examine the following two types of mean shift, i.e.,

(i)

Homogeneous alternative: ${\delta}^{T}=0.1{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{(1,1,...,1)}}.$

(ii)

Inhomogeneous alternative:

[TABLE]

where ${\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{(|\delta_{1}|,...,.|\delta_{p}|)}}\overset{i.i.d}{\sim}Uniform(0,1)$ and the signs of ${\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{(\delta_{1},...,\delta_{p})}}$ are randomly sampled with equal probability.

Also, for the innovations $\{\epsilon_{t}\}$ , we consider the following two scenarios

(a)

Gaussian errors with AR(1) type convariance structure: $\epsilon_{t}\overset{i.i.d}{\sim}N(0,\Sigma_{\epsilon})$ , where $\Sigma_{\epsilon}=(0.5^{|i-j|})_{i,j=1}^{p}$ .

(b)

Non-Gaussian errors: $\{\epsilon_{t}\}_{t=1}^{n}$ are i.i.d and each entry of $\epsilon_{t}=(\epsilon_{t,1},{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{...}},\epsilon_{t,p})^{T}$ is generated independently from $Uniform[-\sqrt{3},\sqrt{3}]$ .

To illustrate the finite sample performance of our statistic $T_{n}$ , we compare with the methods described in Horváth and Hušková, (2012) (denoted as HH) and the double CUSUM binary segmentation algorithm (denoted as DCBS) [Cho, (2016)]. For HH, it works for independent panel time series, targets dense alternative and involves a bandwidth parameter $h$ , which is used in the kernel estimator of long run variance. For DCBS, it contains several tuning parameters and requires the use of bootstrap. We shall implement DCBS using the R package “hdbinseg” and the default tuning parameter values. Note that the method proposed by Jirak, (2015) targets sparse alternative in the mean of high-dimensional time series, so is not included in our comparison.

Please insert Table 7 here!

As can be seen from Table 7, the size of our statistic $T_{n}$ can depend on the amount of trimming $\eta$ , magnitude and sign of temporal dependence $\rho$ , sample size $n$ and the dimension $p=2n$ . When the temporal dependence is weak, i.e., $\rho=0.2$ , the size is fairly accurate for both trimming levels ( $\eta=0.02$ and $0.05$ ) and all sample sizes ( $n=200,400,800$ ). As the temporal dependence gets stronger, especially for $\rho=0.7$ , we see some size distortion at small sample size $n=200$ for both trimming levels, but the size distortion is much reduced with larger sample size $n=400,800$ . The above comment applies to both Gaussian model (a) and non-Gaussian setting (b). By contrast, the rejection rates of DCBS are almost always equal to zero. This may be due to the default tuning parameters used in “hdbinseg”, which aim to make Type I error zero in large samples to be consistent with the consistency results stated in Cho, (2016). The HH method is apparently oversized in all settings, which is presumably due to the cross-sectional dependence. Therefore the size results demonstrate the decent approximation our limiting null distribution (under fixed- $\eta$ asymptotics) is able to provide and shows its practical usefulness in accommodating weak dependence across panel and over time.

Please insert Table 8 here!

The power results are collected in Table 8. Our test exhibits quite reasonable power, which could depend on the choice of trimming parameter, whereas HH method’s raw power is high due to the (sometimes severe) oversize under the null and DCBS exhibits lower power, which is presumably due to the large critical values used to control Type I error (to make it zero in large sample).

7 Summary and Conclusion

In this paper, we propose a non-parametric methodology to testing and estimation of change-points in the mean of a sequence of high-dimensional data. Our methodological developments start with the relatively simple testing problem: testing for one change-point in the mean of high dimensional independent data, by marrying the self-normalization idea in Shao and Zhang, (2010) and U-statistic based approach of Chen and Qin, (2010) for high dimensional two-sample testing. Our test differs from most existing ones in the literature by targeting the dense alternative, allowing weak cross-sectional dependence, and imposing no particular rate constraints on the dimension $p$ as a function of sample size $n$ . It is worth noting that our test does not involve a tuning parameter and is based on critical values tabulated in the paper, which could be appealing for practitioners.

On the testing front, several extensions were pursued in the paper, including (1) change point testing in the presence of multiple change points in mean; (2) testing for a change-point in covariance matrix assuming zero mean; (3) change point testing for the mean of high-dimensional time series. In particular, the extension to high-dimensional time series is highly nontrivial and theoretically challenging. To attenuate the bias caused by weak temporal dependence, we introduce a trimmed U-statistic and adopt the fixed- $b$ asymptotic framework [Kiefer and Vogelsang, (2005)] to derive the limiting null distribution of the self-normalized test statistic, which appears to approximate the finite sample distribution well for a broad range of time series dependence, as demonstrated in the simulations. On the estimation front, we propose to combine the idea of wild binary segmentation [Fryzlewicz, (2014)] with our SN-based test to estimate the number and location of change points. Simulations show that our method can be more effective when the mean shift is dense as compared to the INSPECT algorithm [Wang and Samworth, (2018)] for high-dimensional independent data, and is at least comparable to the procedures used in Cho, (2016) and Li et al., (2019) for high-dimensional time series. On the theory front, we show the weak convergence of the sequential U-statistic based processes for both independent and dependent high-dimensional data, which can be of independent interest.

There are a number of topics that are worth investigating. Firstly, it would be interesting to extend the asymptotic theory for high-dimensional time series to a more general setting, such as nonlinear causal process [Wu, (2005)]; see Wang and Shao, (2020) for a recent extension of SN to high-dimensional time series under the framework of nonlinear causal process. Secondly, while we consider a shift in mean in this paper, it is also of great value to study change point detection for other high-dimensional parameters, such as the vector of marginal quantiles; see Shao and Zhang, (2010) for a more general framework but in a low-dimensional time series setting. Thirdly, selecting the trimming parameter $\tau$ for real applications can be nontrivial and it would be interesting to develop a data-driven procedure to be adaptive to the magnitude of temporal dependence. Lastly, there is no theory available for the WBS-SN method used here. It would be interesting to provide some theoretical justifications, as done in Fryzlewicz, (2014) in a much simpler setting, and this seems very challenging. Further research along some of these directions is well underway.

Acknowledgements: We would like to thank the three reviewers for their constructive comments, which led to substantial improvements. We are grateful to Dr. Farida Enikeeva for sending us the code used in Enikeeva and Harchaoui (2019). Shao’s research is partially supported by NSF-DMS 1807023 and NSF-DMS-2014018. Vogulshev’s research is partially supported by a discovery grant from NSERC of Canada.

{supplement}\stitle

Supplement to “Inference for Change Points in High-Dimensional Data via Self-normalization” \sdescriptionThe supplementary material contains all the proofs for theoretical results stated in the paper. Additional simulation results are also included.

S8 Proofs for high-dimensional independent data

Throughout this section, $X_{t}$ , $t=1,{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{...}},n$ are i.i.d. We begin by proving an intermediate technical result which provides the crucial ingredient for all subsequent developments. Define

[TABLE]

for any $1\leq k<m\leq n$ and let $\widetilde{S}_{n}(k,m)=0$ for $k\geq m$ or $k<1$ or $m>n$ .

Theorem S8.1.

Under Assumption 3.1 we have as $n\to\infty$

[TABLE]

*where $Q$ is a centered Gaussian process with covariance structure given by (3.2). Moreover, the sample paths of $\frac{\sqrt{2}}{n\|\Sigma\|_{F}}\widetilde{S}_{n}(\lfloor an\rfloor+1,\lfloor bn\rfloor-1)$ are asymptotically uniformly equicontinuous in probability. *

The proof of this Theorem is long and technical. We postpone it to Section S8.8.

Next we present some basic results that will be used throughout the following proofs. Define

[TABLE]

for $1\leq\ell\leq k<m\leq n$ and $D^{X}(k;\ell,m)=0$ otherwise. Observe that under the null of constant mean function we have $D^{X}\equiv D$ and we always have the representation

[TABLE]

Theorem S8.1 and uniform asymptotic equi-continuity of the sample paths of $S$ in probability together with some simple calculations yields

[TABLE]

in $\ell^{\infty}([0,1]^{3})$ where for $0\leq a<b<r\leq 1$

[TABLE]

and $G(r;a,b)=0$ otherwise. Note that this is the process $G$ appearing in (3.1). Since the sample paths of $Q$ are uniformly continuous with respect to the Euclidean metric on $[0,1]^{2}$ , a simple computation shows that the sample paths of $G$ are uniformly continuous with respect to the Euclidean metric on $[0,1]^{3}$ .

S8.1 Proof of Theorem 3.4

For $n\geq 1$ consider the maps

[TABLE]

defined for functions $f:[0,1]^{3}\to\mathbb{R}$ such that the denominator is non-zero. With this definition we have $T_{n}=\Phi_{n}(H_{n}^{X})$ for the process $H_{n}^{X}$ defined in (S8.3). Let $D_{\Phi}$ denote the set of all continuous functions $f$ in $\ell^{\infty}([0,1]^{3})$ with the property $\inf_{r\in[0,1]}\int_{0}^{r}f(u,0,r)^{2}du+\int_{r}^{1}f(u,r,1)^{2}du>0$ and consider the map $\Phi:D_{\Phi}\to\mathbb{R}$ given by

[TABLE]

Straightforward arguments show that for any sequence of functions $f_{n}$ with $\|f_{n}-f\|_{\infty}=o(1)$ for some function $f\in D_{\Phi}$ we have $\Phi_{n}(f_{n})\to\Phi(f)$ . Observe that

[TABLE]

This follows from continuity of the sample paths of $G$ , the fact that $P(G(u,r,1)^{2}>0)=1$ for any $0<r<u<1$ and $P(G(u,0,r)^{2}>0)=1$ for any $0<u<r<1$ . Hence $2G\in D_{\Phi}$ with probability one. Combined with the fact that $H_{n}^{X}\rightsquigarrow 2G$ and the extended continuous mapping theorem (see Theorem 1.11.1 in Van Der Vaart and Wellner, (1996)) this implies $T_{n}=\Phi_{n}(H_{n}^{X})\rightsquigarrow\Phi(G)=T$ . This completes the proof. $\Box$

S8.2 Proof of Theorem 3.5

A key step in the proof of this Theorem is an expansion for $D$ from (2.1) in terms of $D^{X}$ from (S8.1) in the setting where $\mathbb{E}[Y_{i}]=\mu$ for $i=1,...,\lfloor nb^{*}\rfloor$ and $\mathbb{E}[Y_{i}]=\mu+\delta_{n}$ for $i=\lfloor nb^{*}\rfloor+1,...,n$ . To shorten notation let $k^{*}:=\lfloor nb^{*}\rfloor$ . We will only provide a detailed derivation in the case $\ell<k<k^{*}<m$ , all other cases can be handled similarly. Observe that

[TABLE]

Now some straightforward algebraic manipulations show that

[TABLE]

Let $s_{n}(k):=\sum_{j=1}^{k}X_{j}^{T}\delta_{n}$ . Then

[TABLE]

Observing that $s_{n}$ is a sum of centered i.i.d. random variables, Kolmogorov’s inequality implies

[TABLE]

where we used the fact $\|\Sigma_{n}\|_{2}=o(\|\Sigma_{n}\|_{F})$ , see Remark 3.2. This implies that, uniformly in $k,\ell,m$ we have for $\ell<k<k^{*}<m$

[TABLE]

Similar arguments show that for $\ell<k^{*}<k<m$

[TABLE]

while for $k^{*}\leq\ell$ or $k^{*}\geq m$ we have

[TABLE]

Now assuming that $n\|\delta_{n}\|_{2}^{2}/\|\Sigma\|_{F}\to c^{2}\in[0,\infty)$ , and hence $n^{7/2}\|\delta_{n}\|\|\Sigma\|_{F}^{1/2}=O(n^{3}\|\Sigma\|_{F})$ , it follows that

[TABLE]

where

[TABLE]

The remaining proof in the case $n\|\delta_{n}\|_{2}^{2}/\|\Sigma\|_{F}\to c^{2}\in[0,\infty)$ follows by exactly the same arguments as given in the proof of Theorem 3.4 after replacing (S8.3) by (S8.4) and the limit $2G$ by $2G+\sqrt{2}c\Delta$ .

Next consider the case $n\|\delta_{n}\|_{2}^{2}/\|\Sigma\|_{F}\to\infty$ . Observe that

[TABLE]

Since by assumption $\eta_{i}$ are constant for $i=1,...,\lfloor b^{*}n\rfloor$ and $i=\lfloor b^{*}n\rfloor+1,...,n$ , respectively, we have

[TABLE]

for $H_{n}^{X}$ defined in (S8.3). Uniform asymptotic equicontinuity of the sample paths of $H_{n}^{X}$ together with similar arguments as given in the proof of Theorem 3.4 implies that

[TABLE]

where the limit is non-zero and finite almost surely. Next we will analyze the numerator. From the expansions given above we obtain

[TABLE]

This implies that $\frac{D_{n}(k^{*};1,n)}{n^{3}\|\Sigma\|_{F}}\to\infty$ in probability. Combined with (S8.6) and the fact that the limit in (S8.6) is finite almost surely, the convergence $T_{n}\to\infty$ in probability follows. This completes the proof of Theorem 3.5. $\Box$

S8.3 Proof of Theorem 3.6

The proof is similar to the proof of Theorem 3.4, and the proofs of the weak convergence of $T_{n}^{*},T_{n}^{\diamond}$ are also similar to each other. For the sake of brevity we provide a brief outline for $T_{n}^{*}$ and omit all other details. Define the maps

[TABLE]

for all $f$ for which the expression is well-defined and $\Phi^{*}:D_{\Phi^{*}}\to\mathbb{R}$

[TABLE]

where $D_{\Phi^{*}}$ denotes the set of all continuous functions such that all denominators in the fraction above are non-zero. Similarly to the proof of Theorem 3.4 we have $P(2G\in D_{\Phi^{*}})=1,P(2G\in D_{\Phi^{\diamond}})=1$ , and straightforward calculations show that all other conditions of the extended continuous mapping theorem are also satisfied. $\Box$

S8.4 Proof of Theorem 3.7

We begin by proving the statement about $T^{*}_{n}$ . Define $\delta_{k}:=\mu_{k+1}^{*}-\mu_{k}^{*}$ . Let $k_{0}=k_{0,n}$ be a sequence such that

[TABLE]

and

[TABLE]

Further let $r_{k}^{*}:=\lfloor nb_{k}^{*}\rfloor$ , $k=0,...,M$ . By assumption $(r_{k_{0}}^{*},r_{k_{0}+1}^{*})\in\Omega_{n}(\epsilon)$ (for $n$ sufficiently large, where ’sufficiently large’ depends on $\epsilon,b_{1}^{*},...,b_{M}^{*}$ only). Thus for sufficiently large $n$

[TABLE]

Recall the definition of $D^{X}$ in (S8.1) and observe that for all $k,\ell,m$

[TABLE]

Similar arguments as in the proof of Theorem 3.5 show that

[TABLE]

since $\delta_{k}^{T}\Sigma\delta_{k}\leq\|\Sigma\|_{2}\|\delta_{k}\|^{2}=o(\|\Sigma\|_{F}\|\delta_{k}\|^{2})$ . Moreover, straightforward calculations show that

[TABLE]

where we used the fact that by definition of $k_{0}$ one has $\max_{k<k_{0}}\|\delta_{k}\|/\|\delta_{k_{0}}\|=o(1)$ for the last representation. Combining the findings above with process convergence of $D_{n}^{X}(r,a,b)/(n^{3}\|\Sigma\|_{F})$ indexed in $a,b,r\in[0,1]$ it follows that

[TABLE]

which converges to $+\infty$ in probability under the assumptions made since by construction $\max_{k<k_{0}}\|\delta_{k}\|=o(\|\delta_{k_{0}}\|)$ . Moreover,

[TABLE]

where we used that by construction $\max_{k<k_{0}}\|\delta_{k}\|^{2}=O(n^{-1}\|\Sigma\|_{F}^{-1})$ . Combining the above expansions for $D_{n}(r_{k_{0}}^{*};1,r_{k_{0}+1}^{*}),W_{n}(r_{k_{0}}^{*};1,r_{k_{0}+1}^{*})$ it follows that

[TABLE]

This proves the claim for $T^{*}_{n}$ . To prove the claim for $T^{\diamond}_{n}$ , consider $k_{0}$ as above and define $r^{*}:=(\lceil 2b_{k_{0}}^{*}/\epsilon\rceil+1)\epsilon/2$ . Note that by construction $r^{*}\epsilon/2\in\mathcal{G}_{\epsilon}$ and

[TABLE]

Hence for $n$ sufficiently large $(\lfloor r^{*}n\rfloor,\lfloor nk^{*}\epsilon/2\rfloor)\in\mathcal{G}_{\epsilon,n,f}$ and thus (for sufficiently large $n$ )

[TABLE]

From here on the arguments are very similar to the ones for $T^{*}_{n}$ and details are omitted for the sake of brevity. $\Box$

S8.5 Proofs for Remark 3.2 and Remark 3.9

For the equivalence between A.1 and $\|\Sigma_{n}\|_{2}=o(\|\Sigma_{n}\|_{F})$ denote by $\lambda_{1}\geq\lambda_{2}\geq\cdots\geq\lambda_{p}\geq 0$ the ordered eigenvalues of $\Sigma_{n}$ . Then

[TABLE]

so $\|\Sigma_{n}\|_{2}=o(\|\Sigma_{n}\|_{F})$ implies A.1. We also have

[TABLE]

so A.1 implies $\|\Sigma_{n}\|_{2}=o(\|\Sigma_{n}\|_{F})$ . For the remaining part of Remark 3.2, observe that

[TABLE]

by (i). We also have by symmetry of $\Sigma_{n}$ and by (ii) since $\Sigma_{n}(i,j)=cum(X_{0,i,n},X_{0,j,n})$

[TABLE]

where the last bound follows since $r>1$ . Since $p_{n}\to\infty$ , this combined with (S8.7) shows A.1 by using the inequality

[TABLE]

For A.2 note that for $2\leq h\leq 6$ we have by (ii)

[TABLE]

where

[TABLE]

Now the sum is of order $O(p_{n}^{h-1-2r})$ if $h-2-2r>-1$ and of order $O(1)$ if $h-2-2r<-1$ . Now a simple computation shows that (A.2) is satisfied if $h-2r<h/2$ for $h=2,...,6$ , which is equivalent to $r>6/4=3/2$ .

For Remark 3.9, all arguments are similar to the proof of Remark 3.2 but the verification for assumption B.2. Consider

[TABLE]

where the last line uses the fact that $r>2$ ,

[TABLE]

and $|S_{m,4}(l_{1},l_{2})|=O(m\vee 1)$ whenever $m>|l_{1}-l_{2}|$ . This completes the proof. $\Box$

S8.6 Proof of Remark 3.3

We begin by introducing the following proposition.

Proposition S8.2.

Assume the model $X_{t}=\Gamma Z_{t}$ , where $\Gamma$ is $p$ -by- $m$ real matrix such that $\Sigma=\Gamma\Gamma^{T}$ , and $Z_{t}^{\prime}s$ are i.i.d random $m$ -dimensional vectors with $\mathbb{E}[Z_{t}]=0$ and $Var(Z_{t})=I_{m}$ Furthermore for any $t>0$ ,

[TABLE]

for any positive integer $q$ such that $\sum_{l=1}^{q}\alpha_{q}\leq Q$ , where $Q$ is a fixed positive constant, and $l_{1}\neq\cdots\neq l_{q}$ . Then for any $j_{1},{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{...}},j_{k}=1,{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{...}},p$

[TABLE]

for any $1\leq k\leq Q$ , where $cum_{k}(Z_{t,l})$ denotes the joint cumulants of $k$ identical random variables $Z_{t,l}$ .

Proof.

By definition of joint cumulants we know

[TABLE]

Hence it suffices to show that $cum(Z_{t,l_{1}},{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{...}},Z_{t,l_{k}})=0$ if not all indices $l_{1},{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{...}},l_{k}$ are identical. By standard properties of cumulants this would be true if $Z_{t,l_{1}},{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{...}},Z_{t,l_{k}}$ were independent; indeed, if there existed $l_{i}\neq l_{j}$ this would imply that $Z_{t,l_{1}},{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{...}},Z_{t,l_{k}}$ would consist of at least two independent groups. Next, define $\tilde{Z}_{t,l},l=1,...,m$ such that each $\tilde{Z}_{t,l}$ has the same distribution as $Z_{t,l}$ but $\tilde{Z}_{t,l_{1}},{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{...}},\tilde{Z}_{t,l_{k}}$ are independent. By (S8.8) we have

[TABLE]

and thus $cum(Z_{t,l_{1}},{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{...}},Z_{t,l_{k}})=cum(\tilde{Z}_{t,l_{1}},{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{...}},\tilde{Z}_{t,l_{k}})$ by expressing cumulants through moments. Since $cum(\tilde{Z}_{t,l_{1}},{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{...}},\tilde{Z}_{t,l_{k}})=0$ if $l_{1},...,l_{k}$ are not identical, this completes the proof.

∎

Note that Chen and Qin, (2010) assume $Q=8$ . Assuming that $\sup_{j=1,...,m}\mathbb{E}[|Z_{1,j}|^{6}]=O(1)$ , we have for any $2\leq h\leq 6$ ,

[TABLE]

for some positive constant $C$ . By simple manipulation we get for any $h\geq 2$

[TABLE]

This implies that

[TABLE]

Hence we have proved that (S8.8) with $1\leq\alpha_{k}\leq 6$ and $\sum_{k=1}^{q}\alpha_{k}\leq 6$ implies condition A.2.

S8.7 Proof of Proposition 3.8

It suffices to verify that $Z$ satisfies A.1 and A.2. We begin by deriving a useful preliminary result which will be used in both proofs. Observe that

[TABLE]

for sufficiently large $n$ by condition B.3, where the second inequality follows since

[TABLE]

Hence we have proved

[TABLE]

Verification of A.1 It is easy to see that $\Sigma_{Z}=\mathbb{E}[Z_{0}Z_{0}^{T}]-vech(\Sigma_{n})vech(\Sigma_{n})^{T}$ . Specifically any element in $\Sigma_{Z}$ is in the form of $(\mathbb{E}[X_{1,l_{1}}X_{1,l_{2}}X_{1,l_{3}}X_{1,l_{4}}]-\mathbb{E}[X_{1,l_{1}}X_{1,l_{2}}]\mathbb{E}[X_{1,l_{3}}X_{1,l_{4}}])$ and the diagonal elements are of the form $Var(X_{1,l_{1}}X_{1,l_{2}})$ . Recall that $\|\Sigma_{Z}\|_{2}\leq\sqrt{\|\Sigma_{Z}\|_{1}\|\Sigma_{Z}\|_{\infty}}$ , where

[TABLE]

and

[TABLE]

Since $\Sigma_{Z}$ is symmetric, we have $\|\Sigma_{Z}\|_{1}=\|\Sigma_{Z}\|_{\infty}$ and thus $\|\Sigma_{Z}\|_{2}\leq\|\Sigma_{Z}\|_{1}$ . Observe that

[TABLE]

Thus by condition B.1 and B.2, we have $\|\Sigma_{Z}\|_{2}\leq\|\Sigma_{Z}\|_{1}=o(\|\Sigma_{n}\|_{F}^{2})$ . Together with (S8.10) this yields $\|\Sigma_{Z}\|_{2}/\|\Sigma_{Z}\|_{F}\leq\|\Sigma_{Z}\|_{1}/\|\Sigma_{n}\|_{F}^{2}\rightarrow 0$ .

Verification of A.2 By Theorem 2 in Rosenblatt, (2012), we know that

[TABLE]

where the summation is over all indecomposable partitions $\nu_{1}\cup\cdots\cup\nu_{L}=\nu$ of the two way table,

[TABLE]

Note that for $h$ , there are finite number of indecomposable partitions in the $h\times 2$ table. Denote the total number of such partitions as $M$ . We have

[TABLE]

by condition B.3 where the third line in the above derivation follows by the Cauchy-Schwartz inequality. The desired result follows by (S8.10). This completes the proof of Theorem 3.8. $\Box$

S8.8 Proof of Theorem S8.1

The proof relies on the following technical result which will be proved in Section S8.8.3

Lemma S8.3.

Under assumption A.2 there exists a constant $C_{6}<\infty$ such that for all $j_{1}\leq i_{1},{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{...}},j_{6}\leq i_{6}$ ,

[TABLE]

In what follows define

[TABLE]

To prove process convergence in Theorem S8.8, we need to establish two results: convergence of the finite-dimensional distributions, i.e.

[TABLE]

for any fixed points $(a_{1},b_{1}),...,(a_{S},b_{S})$ , and tightness of the sequence $\frac{\sqrt{2}}{n\|\Sigma\|_{F}}S_{n}$ . The latter will be established by showing asymptotic equicontinuity in probability, i.e. we will prove that for any $x>0$

[TABLE]

S8.8.1 Proof of (S8.11)

To simplify notation, we only consider the case $S=2$ , the general case follows by similar arguments. It sufices to show that $\forall\alpha_{1},\alpha_{2}\in\mathbb{R}$ , $a_{1}\leq b_{1},a_{2}\leq b_{2},a_{1},a_{2},b_{1},b_{2}\in[0,1]$

[TABLE]

By symmetry it suffices to consider the following three cases: $a_{1}\leq a_{2}\leq b_{2}\leq b_{1}$ , $a_{1}\leq a_{2}\leq b_{1}\leq b_{2}$ and $a_{1}\leq b_{1}\leq a_{2}\leq b_{2}$ . We will discuss the case $a_{1}\leq a_{2}\leq b_{1}\leq b_{2}$ first. Consider the decomposition

[TABLE]

where

[TABLE]

and

[TABLE]

Define $\mathcal{F}_{i}=\sigma(X_{i},X_{i-1},{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{...}})$ . A simple calculation shows that for any fixed $n$ the triangular array $(\sum_{i=1}^{n-1}\xi_{n,\lfloor na_{1}\rfloor+i})_{1\leq i\leq\lfloor nb_{2}\rfloor-\lfloor na_{1}\rfloor-1}$ is a mean zero martingale difference sequence with respect to $\mathcal{F}_{i}$ . To show weak convergence in (S8.13) will apply the martingale CLT (Theorem 35.12 in Billingsley, (2008)). To this end we need to verify the following two conditions:

(1)

$\forall\epsilon>0,\sum_{i=1}^{\lfloor nb_{2}\rfloor-\lfloor na_{1}\rfloor-1}\mathbb{E}[\widetilde{\xi}_{n,\lfloor na_{1}\rfloor+i}^{2}{\bf{1}}{|\widetilde{\xi}_{n,\lfloor na_{1}\rfloor+i}|>\epsilon}|\mathcal{F}_{\lfloor na_{1}\rfloor+i-1}]\overset{p}{\rightarrow}0$ , 2. (2)

$V_{n}=\sum_{i=1}^{\lfloor nb_{2}\rfloor-\lfloor na_{1}\rfloor-1}\mathbb{E}[\widetilde{\xi}_{n,\lfloor na_{1}\rfloor+i}^{2}|\mathcal{F}_{\lfloor na_{1}\rfloor+i-1}]\overset{p}{\rightarrow}\alpha_{1}^{2}(b_{1}-a_{1})^{2}+\alpha_{2}^{2}(b_{2}-a_{2})^{2}+2\alpha_{1}\alpha_{2}(b_{1}-a_{2})^{2}$ .

We will prove (1) and (2) above in several steps. First, to prove (1), we shall establish that

[TABLE]

For a proof of (2), consider the decomposition

[TABLE]

We will show that

[TABLE]

and

[TABLE]

For other cases of $a_{1},a_{2},b_{1},b_{2}$ , arguments are similar. For example, we assume $a_{1}\leq a_{2}\leq b_{2}\leq b_{1}$ , then

[TABLE]

where

[TABLE]

Then similar arguments can be applied. This is the same for $a_{1}\leq b_{1}\leq a_{2}\leq b_{2}$ .

Proof of (S8.14) Observe that

[TABLE]

Since the $X_{i}$ are i.i.d it follows that

[TABLE]

Next observe that

[TABLE]

where $\Sigma_{l_{1},l_{2}}$ is the $(l_{1},l_{2})$ component of $\Sigma$ and $cum(X_{i+1,l_{1}},X_{i+1,l_{2}},X_{i+1,l_{3}},X_{i+1,l_{4}})$ is the fourth order joint cumulant of $X_{i+1,l_{1}},X_{i+1,l_{2}},X_{i+1,l_{3}},X_{i+1,l_{4}}$ . Thus by Cauchy-Schwartz inequality and Assumption A.2 we have for $C$ from A.2

[TABLE]

where the last line follows from Assumption A.2 with $C$ from that assumption. Similarly we have

[TABLE]

Combining the above results we have

[TABLE]

The bound

[TABLE]

follows by similar arguments and this completes the proof of (S8.14).

Proof of (S8.15) and (S8.16) Since both statements follow by the same arguments we will only give details for the proof of (S8.15). Observe that

[TABLE]

For $V_{n,1}^{(1)}$ we have

[TABLE]

Note that

[TABLE]

where the last inequality is a direct consequence of Cauchy-Schwartz inequality and Assumption A.2. Combining with previous results we have

[TABLE]

This implies $V_{n,1}^{(1)}\overset{p}{\rightarrow}(b_{1}-a_{1})^{2}$ . Moreover, for $j_{1}\neq j_{2},j_{3}\neq j_{4}$ we have $\mathbb{E}\left[X_{j_{2}}^{T}\Sigma X_{j_{1}}X_{j_{4}}^{T}\Sigma X_{j_{3}}\right]=0$ if $j_{1}\notin\{j_{3},j_{4}\}$ or $j_{2}\notin\{j_{3},j_{4}\}$ and $\mathbb{E}\left[X_{j_{2}}^{T}\Sigma X_{j_{1}}X_{j_{4}}^{T}\Sigma X_{j_{3}}\right]=tr(\Sigma^{4})$ otherwise. Hence

[TABLE]

Combining results (S8.15) follows.

Proof of (S8.17) Observe the decomposition

[TABLE]

Note that for $j_{1}\neq j_{2},j_{3}\neq j_{4}$ we have $\mathbb{E}\left[X_{j_{2}}^{T}\Sigma X_{j_{1}}X_{j_{4}}^{T}\Sigma X_{j_{3}}\right]=0$ if $j_{1}\notin\{j_{3},j_{4}\}$ or $j_{2}\notin\{j_{3},j_{4}\}$ and $\mathbb{E}\left[X_{j_{2}}^{T}\Sigma X_{j_{1}}X_{j_{4}}^{T}\Sigma X_{j_{3}}\right]=tr(\Sigma^{4})$ otherwise. Hence we obtain for the first term

[TABLE]

Thus the first term in the decomposition of $V_{3,n}$ is $o_{p}(1)$ . The second term is of the same structure as $V_{1,n}$ , and it follows that

[TABLE]

This yields (S8.17). Thus (S8.14)-(S8.17) are established and this completes the proof of (S8.11). $\Box$

S8.8.2 Proof of (S8.12)

The proof will rely on the following bound for the increments of $S_{n}$ : there exists a constant $\tilde{C}<\infty$ such that for all $n\geq 2$ and all $a,b,c,d\in[0,1]$ we have

[TABLE]

This bound will be established at the end of the proof. For the remainder of the proof, define

[TABLE]

Note that $B_{n}(u)$ has a piece-wise constant structure, more precisely we have for any $u\in[0,1]^{2}$ , $B_{n}(u)=B_{n}(\lfloor nu\rfloor/n)$ (here, $\lfloor nu\rfloor$ is understood component-wise). Define the index set $T_{n}:=\{(i/n,j/n):i,j=0,...,n\}$ . Then

[TABLE]

Consider the metric (on the set $T_{n}$ ) $d(u,v)=\|u-v\|^{1/2}$ . From (S8.18) and the definition of $B_{n}$ we obtain the existence of a constant $C<\infty$ such that for all $n\geq 2$ and all $u,v\in T_{n}$

[TABLE]

which implies

[TABLE]

Note that the packing number of $T_{n}$ with respect to the metric $d$ satisfies

[TABLE]

for some constant $C_{D}<\infty$ . Now apply Lemma A.1 from Kley et al., (2016) with $\Psi(x)=x^{6}$ , $T=T_{n}$ , $d(u,v)=\|u-v\|^{1/2}$ , $\bar{\eta}=n^{-1/2}/2$ to find that for any $\eta\geq\bar{\eta}$ there exists a random variable $R_{n}(\eta,\delta)$ such that

[TABLE]

and

[TABLE]

for some constant $K$ independent of $\delta,\eta,n$ . Next, observe that

[TABLE]

and since $\inf_{u,v\in T_{n},u\neq v}\|u-v\|\geq n^{-1}$ it follows that $u,v\in T_{n}:~{}d(u,v)\leq\bar{\eta}$ implies $u=v$ (recall that $T_{n}$ is discrete) and thus the supremum vanishes and we obtain

[TABLE]

Now a simple computation shows that

[TABLE]

Apply the Markov inequality to find that for any $x>0$

[TABLE]

Since $\eta$ was arbitrary, it follows that

[TABLE]

Combined with (S8.19) this implies (S8.12). Hence it remains to establish (S8.18).

Proof of (S8.18) We shall assume $a<c<d<b$ , proofs in all other cases are similar. By definition of $S_{n}$ ,

[TABLE]

Note that $A=S_{n}(a,c)$ and $E=S_{n}(b,d)$ and that $B$ , $C$ and $D$ share the same structure. Applying Hölder’s inequality yields

[TABLE]

and thus it suffices to show that

[TABLE]

Apply Lemma S8.3 to obtain

[TABLE]

By definition, $S_{n}(a,c)=0$ if $\lfloor nc\rfloor-\lfloor na\rfloor<2$ . If $\lfloor nc\rfloor-\lfloor na\rfloor\geq 2$ , which implies $c-a>1/n$ ,

[TABLE]

Thus

[TABLE]

Exactly the same argument can be used to bound $\mathbb{E}[S_{n}(a,c)^{6}]$ . Next observe that we have for $\lfloor nd\rfloor-\lfloor nc\rfloor>2$ ,

[TABLE]

Thus by summarizing the above steps, we have

[TABLE]

where the last inequality in the previous line follows from

[TABLE]

which implies

[TABLE]

$\Box$

S8.8.3 Proof of Lemma S8.3

By the generalized Hölder’s inequality, we have

[TABLE]

Let $\pi$ be any disjoint partition over the set $\{l_{1},l_{2},l_{3},l_{4},l_{5},l_{6}\}$ such that for any $B\in\pi$ , $|B|\neq 1$ . Thus,

[TABLE]

where $C_{6}>0$ is a generic constant that varies from line by line. This completes the proof.

S9 Proofs of results for high-dimensional time series

Throughout this section, we assume that the process $X_{t}$ admits a linear process represenation.

S9.1 Properties of Linear Process

Firstly, applying Beveridge Nelson (BN) decomposition in Phillips and Solo, (1992), we have

[TABLE]

where $D_{i}=(\sum_{u=0}^{\infty}c_{u})\epsilon_{i}$ , $\widetilde{D}_{i}=\sum_{j=0}^{\infty}(\sum_{u=j+1}^{\infty}c_{u})\epsilon_{i-j}$ and $\varepsilon_{i}=\widetilde{D}_{i}-\widetilde{D}_{i-1}$ . We then state three useful auxiliary lemmas.

Lemma S9.1.

Suppose Assumption 4.1 (C.1, C.2, C.5) is true. Then, for any $h=2,3,4,5,6$ and $j=0,1,{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{...}},h$ , we have

[TABLE]

Lemma S9.2.

Suppose Assumption 4.1 (C.1, C.2) is true. Then, for some constant $C$ and $0<\rho<1$ , we have for $k\leq 7$

[TABLE]

where $i_{max}=\max\{i_{0},{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{...}},i_{k}\}$ , $i_{min}=\min\{i_{0},{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{...}},i_{k}\}$ and for each $i\in\{i_{0},{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{...}},i_{k}\}$ , $Z_{i}$ can be any element from the set $\{X_{i,j},D_{i,j},\widetilde{D}_{i,j},\varepsilon_{i,j}\}_{i=i_{0},{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{...}},i_{k},j=1,{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{...}},p}$ .

Lemma S9.3.

Under Assumption 4.1, for any $i\neq j$ , we have

[TABLE]

S9.2 Proof of Theorem 4.4

Recall that

[TABLE]

For $u=1,2,3,4$ , define

[TABLE]

where

[TABLE]

Expanding the inner product $(X_{j_{1}}-X_{j_{2}})^{T}(X_{j_{3}}-X_{j_{4}})$ leads to

[TABLE]

where vectors in the last term admits

[TABLE]

and

[TABLE]

Thus, we can decompose the last term as

[TABLE]

where for $w_{1}<w_{2}$ , $h_{1}<h_{2}$ , $w_{2}\leq h_{1}-\tau$ ,

[TABLE]

otherwise $Q_{X_{n}}^{u}(w_{1},w_{2};h_{1},h_{2})=0$ . See Figure 1 for an illustration of $Q_{X_{n}}^{u}$ and $\widetilde{S}_{X_{n}}^{u}$ . The above decomposition suggests us to write $\widetilde{H}_{X_{n}}(k;l,m|\tau)$ as a continuous functional of $\widetilde{S}_{X_{n}}^{u}(k,m|\tau)$ .

Thus, for fixed $\eta\in(0,1)$ and $u=1,2,3,4$ , the key step is to study the following two parameter processes

[TABLE]

where

[TABLE]

Recall that Beveridge Nelson (BN) decomposition in Phillips and Solo, (1992) implies $X_{i}=D_{i}-\varepsilon_{i},$ where $D_{i}=(\sum_{u=0}^{\infty}c_{u})\epsilon_{i}$ , $\widetilde{D}_{i}=\sum_{j=0}^{\infty}(\sum_{u=j+1}^{\infty}c_{u})\epsilon_{i-j}$ and $\varepsilon_{i}=\widetilde{D}_{i}-\widetilde{D}_{i-1}$ . By applying the BN decomposition, we would have for any $u=1,2,3,4$

[TABLE]

where $S_{D_{n}}^{u}(a,b|\eta)$ is defined similarly as in equation (S9.3) and it holds in $l^{\infty}([0,1]^{2})$ that $R_{u}\rightsquigarrow 0$ . The proof is postponed to Section S9.2.2. Consequently, it holds in $l^{\infty}([0,1]^{3})$ that

[TABLE]

The convergence of marginals $\left(H_{D_{n}}(r_{1};a_{1},b_{1}|\eta),{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{...}},H_{D_{n}}(r_{K};a_{K},b_{K}|\eta)\right)$ is shown in Section S9.2.1 and the tightness of $H_{D_{n}}(r;a,b|\eta)$ follows from the tightness of each $\frac{\sqrt{2}}{n\|\Gamma\|_{F}}S_{D_{n}}^{u}(a,b|\eta)$ . When $\eta=0$ , the tightness of $\frac{\sqrt{2}}{n\|\Gamma\|_{F}}S_{D_{n}}^{u}(a,b|0)$ is given by Equation (S8.12). When $\eta>0$ , consider $0<a<c<d<b<1-\eta$ such that $a-c<\eta$ and $b-d<\eta$ , we get

[TABLE]

Since $\{D_{i}\}_{i=1}^{n}$ are independent, by applying Lemma S9.3, we can obtain

[TABLE]

Similarly, we can show that

[TABLE]

Then, the asymptotic tightness of $\frac{\sqrt{2}}{n\|\Gamma\|_{F}}S_{D_{n}}^{u}(a,b|\eta)$ follows similarly from the proof of Equation (S8.12). So, we have $H_{D_{n}}(r;a,b|\eta)\rightsquigarrow G(r;a,b|\eta)\text{ in }l^{\infty}([0,1]^{3})$ and Theorem 4.4 can be proved similarly as Theorem 3.4.

S9.2.1 Convergence of Marginals

It suffices to show that for any fixed intervals $(a_{u,k},b_{u,k})\in(0,1)^{2}$ and constants $\alpha_{u,k}\in\mathbb{R}$ , where $k=1,2,{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{...}},K$ , $u=1,2,3,4$ , it holds that

[TABLE]

Some algebra show that

[TABLE]

where $a_{\min}=\min_{u,k}a_{u,k}$ , $b_{\max}=\max_{u,k}b_{u,k}$ ,

[TABLE]

and

[TABLE]

Similarly, $\sum_{i=\lfloor a_{\min}n\rfloor}^{j}\widetilde{\xi}_{i}$ is a martingale with respect to $\mathcal{F}_{j-1}=\sigma(X_{j+\lfloor\eta n\rfloor},X_{j+\lfloor\eta n\rfloor-1},{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{...}})$ . Then, the conditional variance is calculated as

[TABLE]

It can be shown that under Assumption 4.1, for $a^{\prime}\leq a\leq b-\eta\leq 1-\eta$

[TABLE]

where, $C_{u,v}(a,b)=0$ if $a>b-\eta$ ; otherwise, it is given as

[TABLE]

The proof is postponed to Section S9.2.3. Thus, we have

[TABLE]

Next, we check the conditional Lindeberg condition. To this end, it suffices to show that for any fixed interval $(a,b)$ and $u\in\{1,2,3,4\}$

[TABLE]

Due to the independence of $\{D_{i}\}_{i=1}^{n}$ , we have

[TABLE]

only if $j_{1},j_{2},j_{3},j_{4}$ are pair-wise equal. In addition, from Lemma S9.3

[TABLE]

Thus, we have $\sum_{i=\lfloor an\rfloor}^{\lfloor bn\rfloor-\lfloor\eta n\rfloor-1}E[(\xi_{a,i}^{u})^{4}]\lesssim O(1/n).$

S9.2.2 Proof of Equation (S9.4)

First, write $S_{X_{n}}^{u}(a,b|\eta)$ as

[TABLE]

where

[TABLE]

We then show that each of the terms $R_{u,1},R_{u,2},R_{u,3}$ converges weakly to 0 in $l^{\infty}([0,1]^{2})$ . The proof techniques for $R_{u,1}$ and $R_{u,2}$ are very similar, here we only give details to show that $R_{u,2}\rightsquigarrow 0$ . Some algebra show that

[TABLE]

The above decomposition for $R_{u,2}$ is complex, fortunately it can be simplified with the following lemma.

Lemma S9.4.

Under Assumption 4.1, it holds in $l^{\infty}([0,1]^{2})$ that

[TABLE]

where $\{v_{i}\}$ is a sequence of constants such that $\sup_{i}|v_{i}|\leq 1$ .

Thus, we can throw away the following terms

[TABLE]

Next, we focus on the following term

[TABLE]

Then, applying the triangle inequality

[TABLE]

Due to the following lemma, terms (S9.7) and (S9.8) are both of order $o_{p}(1)$ in metric space $l^{\infty}([0,1]^{2})$ .

Lemma S9.5.

Under Assumption 4.1, it holds in $l^{\infty}([0,1]^{2})$ that

[TABLE]

where $\{v_{j}\}$ is a sequence of constants such that $\sup_{j}|v_{j}|\leq 1$ .

Next, denote the $L^{p}$ -norm of a random variable $X$ as $\|X\|_{p}:=\left(E[|X|^{p}]\right)^{1/p}$ . For any two parameter process $W(a,b)$ , if $\|W(a,b)\|_{6}\lesssim 1/\sqrt{n}$ , then its marginals $\left(W(a_{1},b_{1}),{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{...}},W(a_{k},b_{k})\right)$ converges to [math] and from the proof of Equation (S8.12), it is asymptotically tight. Thus, we have $W(a,b)\rightsquigarrow 0$ . With this logic and the following lemma, term (S9.9) is also asymptotically negligible.

Lemma S9.6.

Under Assumption 4.1, it holds in $l^{\infty}([0,1]^{2})$ that

[TABLE]

where $\{v_{i}\}$ is a sequence of constants such that $\sup_{i}|v_{i}|\leq 1$ .

Finally, using the same logic, it can be seen from the lemma below that all the other terms in $R_{u,2}$ converge weakly to [math].

Lemma S9.7.

Under Assumption 4.1, it holds in $l^{\infty}([0,1]^{2})$ that

[TABLE]

where $\{v_{i}\}$ is a sequence of constants such that $\sup_{i}|v_{i}|\leq 1$ .

This concludes the proof that $R_{u,2}\rightsquigarrow 0$ . For $R_{u,3}$ , notice that

[TABLE]

Comparing the above two terms with $R_{u,2}$ , we have $\widetilde{D}_{i+\lfloor\eta n\rfloor}$ instead of $D_{i+\lfloor\eta n\rfloor}$ . Since both Lemma S9.1 and S9.2 hold for any combination of $\widetilde{D}_{i}$ and $D_{j}$ , we can show similarly as in the proof for $R_{u,2}$ that these two terms are of order $o_{p}(1)$ .

S9.2.3 Proof of Equation (S9.5)

Proof.

Notice that

[TABLE]

where

[TABLE]

We then show that $\widetilde{R}$ is negligible.

[TABLE]

Next, we focus on the first term

[TABLE]

whose mean can be calculated as

[TABLE]

Next, we show that the variance of the first term is asymptotically 0. Observe that

[TABLE]

Thus, if $j_{1}=j_{2},j_{1}^{\prime}=j_{2}^{\prime},j_{1}\neq j_{1}^{\prime}$ ,

[TABLE]

where the above inequality holds true since there are at most $O(n^{4})$ non-zero terms. The case that $j_{1}=j_{2}^{\prime},j_{1}^{\prime}=j_{2},j_{1}\neq j_{1}^{\prime}$ can be shown similarly. When $j_{1}=j_{1}^{\prime}=j_{2}=j_{2}^{\prime}$ , it has been shown in the proof of (S8.15) and (S8.16) that $E[D_{1}^{T}\Gamma D_{1}D_{1}^{T}\Gamma D_{1}]\lesssim\|\Gamma\|_{F}^{4}$ , thus

[TABLE]

∎

S9.3 Proof of Theorem 4.6

We first state a lemma.

Lemma S9.8.

Under Assumption 4.1, for any deterministic sequence of vectors $\delta_{n}\in\mathbb{R}^{p}$ ,

[TABLE]

and

[TABLE]

Given the bounds above we have

[TABLE]

Recall that in the proof of Theorem 4.4, we decompose $\widetilde{H}_{Y_{n}}(\lfloor\phi n\rfloor;1,n|\lfloor\eta n\rfloor)$ as a continuous functional of $\frac{\sqrt{2}}{n\|\Gamma\|_{F}}S_{Y_{n}}^{u}(a,b|\eta)$ , then by replacing each $\frac{\sqrt{2}}{n\|\Gamma\|_{F}}S_{Y_{n}}^{u}(a,b|\eta)$ with the above decomposition, Theorem 4.6 follows straight-forwardly under the case that $n\|\delta_{n}\|_{2}^{2}/\|\Gamma\|_{F}\rightarrow c^{2}\in(0,\infty)$ or $n\|\delta_{n}\|_{2}^{2}/\|\Gamma\|_{F}\rightarrow 0$ . For the case that $n\|\delta_{n}\|_{2}^{2}/\|\Gamma\|_{F}\rightarrow\infty$ , it follows from Lemma S9.8 that

[TABLE]

which implies that $\widetilde{H}_{Y_{n}}(\lfloor\phi n\rfloor;1,n|\tau)$ goes to infinity in probability. Then, the result follows similarly as in the proof of Theorem 3.5. $\Box$

S9.4 Proof of Auxiliary Lemmas

S9.4.1 Proof of Lemma S9.1

Proof.

Firstly, for the case $j=0$ , let $i_{min}=\min\{i_{1},i_{2},{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{...}},i_{h}\}$ , $\widetilde{c}_{i}=\sum_{u=i+1}^{\infty}c_{u}$ , $\widetilde{c}_{i,(l,\cdot)}$ be the $l$ -th row of $\widetilde{c}_{i}$ and $\widetilde{c}_{i,(l,k)}$ be the $(l,k)$ -th entry of $\widetilde{c}_{i}$ , the absolute value of cumulant can be bounded as

[TABLE]

which proves the case $j=0$ . For other cases, the results can be shown similarly. To bound the square cumulant, notice that under assumption 4.1 (C.1), it can be easily shown that there exists a constant $C$ such that

[TABLE]

Next, for any $(l_{1},{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{...}},l_{h})$ , we can bound $|cum(\widetilde{D}_{i_{1},l_{1}},{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{...}},\widetilde{D}_{i_{h},l_{h}})|$ as follows

[TABLE]

The proof is similar when $j\neq 0$ , thus there exists a constant $C^{\prime}$ such that for any $(l_{1},{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{...}},l_{h})$ , we have $|cum(D_{i_{1},l_{1}},{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{...}},D_{i_{j},l_{j}},\widetilde{D}_{i_{j+1},l_{j+1}},{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{...}},\widetilde{D}_{i_{h},l_{h}})|\leq C^{\prime}$ . As a consequence, it holds that

[TABLE]

∎

S9.4.2 Proof of Lemma S9.2

Proof.

We show that $\{X_{i}\}$ , $\{D_{i}\}$ , $\{\widetilde{D}_{i}\}$ , $\{\varepsilon_{i}\}$ are all UGMC(8), then the result follows from Remark 9.6 Wang and Shao, (2020). Firstly, we have

[TABLE]

Next, we bound the term $E[(c_{j_{1},(l,\cdot)}^{T}(\epsilon_{i-j_{1}}-\epsilon_{i-j_{1}}^{\prime}))^{8}]$ as an example.

[TABLE]

Thus, we have $E[(X_{i,l}-X_{i,l}^{\prime})^{8}]\leq(\sum_{j=i}^{\infty}\|c_{j}\|_{\infty})^{8}.$ It can be shown similarly that $\sup_{l}E[|X_{0,l}|^{8}]\leq C^{8}$ for some constant $C$ , which concludes that $\{X_{i}\}$ is UGMC(8). From Lemma 9.4 [Wang and Shao, (2020)], $\{D_{i}\}$ , $\{\widetilde{D}_{i}\}$ , $\{\varepsilon_{i}\}$ are also UGMC(8). ∎

S9.4.3 Proof of Lemma S9.3

Proof.

Let $\pi$ be any disjoint partition over the set $\{l_{1},l_{2},l_{3},l_{4},l_{5},l_{6}\}$ such that for any $B\in\pi$ , $|B|\neq 1$ . Under Assumption 4.1,

[TABLE]

∎

S9.4.4 Proof of Lemma S9.4

Proof.

The triangle inequality implies that

[TABLE]

It is sufficient to show that

[TABLE]

To prove this, the idea is to use Proposition 1 in Wu, (2007), for any $n=2^{d}$

[TABLE]

Applying Lemma S9.2 and S9.1,

[TABLE]

To continue the calculation,

[TABLE]

which concludes the case when $n=2^{d}$ . For arbitrary integer $n$ , the statement follows from the fact that there exists $d$ such that $2^{d-1}\leq n<2^{d}$ and

[TABLE]

∎

S9.5 Proof of Lemma S9.5

Using Proposition 1 in Wu, (2007), for any $n=2^{d}$

[TABLE]

For the summands inside the bracket,

[TABLE]

Express the expectation using cumulants

[TABLE]

By using Lemma S9.2 and S9.1, we have

[TABLE]

Since $p^{2}\rho^{\lfloor\eta n\rfloor}=O(\|\Gamma\|_{F}^{2})$ , some straightforward calculation shows that

[TABLE]

Plugging the above bound into Equation (S9.10) results

[TABLE]

which concludes Equation (S9.5) when $n=2^{d}$ . For arbitrary $n$ , there exists $d$ such that $2^{d-1}\leq n<2^{d}$ and

[TABLE]

S9.5.1 Proof of Lemma S9.6

For term (S9.9), we prove that

[TABLE]

Firstly, notice that

[TABLE]

Then, let $\pi$ be any partition of the index set

[TABLE]

such that $|B|>1$ for any $B\in\pi$ . By the moment-cumulant formula,

[TABLE]

where $Z_{i,l}=D_{i,l}\text{ if }(i,l)\in\{(j_{k},l_{k})\}_{k=1}^{6};$ otherwise $Z_{i,l}=\widetilde{D}_{i,l}$ . Set $\mathbb{I}_{1}=\{(i_{k}+\lfloor\eta n\rfloor+1,l_{k})\}_{k=1}^{6}$ , $\mathbb{I}_{2}=\{(j_{k},l_{k})\}_{k=1}^{6}$ and

[TABLE]

Here, we note that $\pi=\pi_{1}\cup\pi_{2}\cup\pi_{3}$ . For notational convenience, write

[TABLE]

and if $B\in\pi_{2}$ , we can represent the set $B$ as

[TABLE]

where $0<\tilde{k}<|B|$ . Then, we can decompose the product of cumulants as

[TABLE]

Apply Lemma S9.2 and S9.1,

[TABLE]

where $i_{max}^{B}=\max\{\tilde{i}_{1},{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{...}},\tilde{i}_{|B|}\}$ and $i_{min}^{B}=\max\{\tilde{i}_{1},{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{...}},\tilde{i}_{|B|}\}$ . If $B\in\pi_{2}$ ,

[TABLE]

where $i_{max}^{B}=\max\{\tilde{i}_{1},{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{...}},\tilde{i}_{\tilde{k}}\}$ and $j_{min}^{B}=\min\{\tilde{j}_{1},{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{...}},\tilde{j}_{|B|-\tilde{k}}\}$ . Thus,

[TABLE]

To continue the proof, notice that there exists a constant $N_{\rho}$ such that $m^{6}\rho^{m/2}<1$ if $m>N_{\rho}$ . Thus, for any $B\in\pi_{1}$ , we have

[TABLE]

Also, it can be easily seen that

[TABLE]

In addition, for any $B\in\pi_{2}$

[TABLE]

It can shown similarly that $\sum_{\tilde{j}_{1},{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{...}},\tilde{j}_{|B|-\tilde{k}}=1}^{\lfloor an\rfloor-1}\rho^{\lfloor an\rfloor-j_{min}^{B}}=O(1)$ . Thus,

[TABLE]

In conclusion, we have

[TABLE]

S9.5.2 Proof of Lemma S9.7

Proof.

For notational convenience, denote

[TABLE]

Let $\pi$ be any disjoint partition over the set $\mathbb{I}$ such that $|B|>1$ for any $B\in\pi$ , where $\mathbb{I}$ is defined as

[TABLE]

Any such $\pi$ is a disjoint union of 3 sets as $\pi=\pi_{1}\cup\pi_{2}\cup\pi_{3}$ , where $\pi_{1}:=\{A|A\in\pi,A\subseteq\mathbb{I}_{1}\},\pi_{2}:=\{A|A\in\pi,A\nsubseteq\mathbb{I}_{1},A\nsubseteq\mathbb{I}_{2}\},\pi_{3}:=\{A|A\in\pi,A\subseteq\mathbb{I}_{2}\}$ and $\mathbb{I}_{1},\mathbb{I}_{2}$ are defined as

[TABLE]

For notational convenience, denote

[TABLE]

Then, we have

[TABLE]

Similarly, we write

[TABLE]

and if $B\in\pi_{2}$ , we can represent the set $B$ as

[TABLE]

where $0<\tilde{k}<|B|$ . Then, we can decompose the product of cumulants as

[TABLE]

The following bounds on the cumulants are from Lemma S9.2 and S9.1

[TABLE]

where $i_{max}^{B}=\max\{\tilde{i}_{1},{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{...}},\tilde{i}_{|B|}\}$ and $i_{min}^{B}=\max\{\tilde{i}_{1},{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{...}},\tilde{i}_{|B|}\}$ . If $B\in\pi_{2}$ ,

[TABLE]

where $i_{min}^{B}=\min\{\tilde{i}_{1},{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{...}},\tilde{i}_{\tilde{k}}\}$ . Thus, we have

[TABLE]

We know from Equation (S9.11) that for $B\in\pi_{1}$ ,

[TABLE]

For any $B\in\pi_{2}$ ,

[TABLE]

As a consequence, for any $\pi=\pi_{1}\cup\pi_{2}\cup\pi_{3}$ , we have

[TABLE]

and

[TABLE]

In conclusion, we have

[TABLE]

∎

S9.5.3 Proof of Lemma S9.8

Observe that under C.3 we have $\delta_{n}^{T}\Gamma\delta_{n}=o(\|\delta_{n}\|^{2}\|\Gamma\|_{F})$ , see the proof of Remark 3.2 for a detailed derivation of this type of bound.

We only give details for the second inequality as others can be shown similarly. Note that for any $\delta_{n}\in\mathbb{R}^{p}$

[TABLE]

Then, apply the BN decomposition to obtain

[TABLE]

For the first term on the right hand side, Kolmogorov’s inequality implies that

[TABLE]

Next, we bound $\sup_{1\leq k\leq n}|\sum_{i=1}^{k}(i+\lfloor\eta n\rfloor+1)/n\varepsilon_{i+\lfloor\eta n\rfloor+1}^{T}\delta_{n}|$ . Some algebra show that

[TABLE]

According to Proposition 1 in Wu, (2007), for any $n=2^{d}$ ,

[TABLE]

For each term in the square bracket

[TABLE]

So, we have

[TABLE]

For general $n$ , there exists $d$ such that $2^{d-1}\leq n<2^{d}$ and

[TABLE]

The above inequality implies

[TABLE]

Next, the maximal inequality implies that

[TABLE]

Then, by Cauchy’s inequality

[TABLE]

Applying Lemma S9.1,

[TABLE]

which entails that $E[(\widetilde{D}_{k+\lfloor\eta n\rfloor+1}^{T}\delta_{n})^{4}]\lesssim\|\delta_{n}\|_{2}^{4}\|\Gamma\|_{F}^{2}$ and

[TABLE]

$\Box$

S10 Testing for Covariance Matrix Change

In this section, we examine the finite sample performance of our test applied to test for a change in the covariance matrix, in comparison with a recent method developed by Avanesov and Buzun, (2018). In the latter paper, they proposed a high dimensional covariance change point detection scheme that involves the choices of several tuning parameters. For the purpose of completeness, we present their method below in detail.

They first consider a set of window sizes $\mathcal{N}\in\mathbb{N}$ . For each window size $n\in\mathcal{N}$ , define a set of central points $\mathbb{T}_{n}:=\{n+1,{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{...}},N-n+1\}$ , where $N$ is the sample size. For $n\in\mathcal{N}$ define a set of indices belong to the window on the left side from the central point $t\in\mathbb{T}_{n}$ as $\mathcal{I}_{n}^{l}(t):=\{t-n,{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{...}},t-1\}$ and indices in the right side $\mathcal{I}_{n}^{r}(t):=\{t+1,{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{...}},t+n\}$ . Denote the sum of number of central points for all window sizes $n\in\mathcal{N}$ as $T:=\sum_{n\in\mathcal{N}}|\mathbb{T}_{n}|$ .

For each window size $n$ , each center point $t$ and either left side or right side $\mathfrak{G}\in\{l,r\}$ , they define a de-sparsified estimator of precision matrix as

[TABLE]

where

[TABLE]

and $\hat{\Theta}_{n}^{\mathfrak{G}}$ is the precision matrix estimated by Graphical Lasso. Define a $p\times p$ matrix with elements

[TABLE]

where $\Theta^{*}:=\mathbb{E}[X_{i}X_{i}^{T}]^{-1}$ for all data before the change point location, $\Theta_{u}^{*}$ is the $u-th$ row and denote the variance as $\sigma_{u,v}:=Var(Z_{1,uv})$ and the diagonal matrix ${S}=diag(\sigma_{1,1},\sigma_{1,2},{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{...}},\sigma_{p,p-1},\sigma_{p,p})$ . Finally their test statistic is

[TABLE]

where $\bar{M}$ means the vector composed of stacked columns of the matrix $M$ . Their test rejects the null hypothesis when the above statistic is greater than some critical value, which is determined via a bootstrap procedure. More details can be found in Avanesov and Buzun, (2018).

Here we let $X_{t}^{\prime}s$ be $p$ -dimensional multivariate normal random vectors with mean ${0}$ and variance $\boldsymbol{\Sigma}_{t}$ . We fix $p=10$ and the sample size as $n=100$ or $200$ . Under the null, we set the common covariance matrix as (1) $0.8I_{p}$ or (2) $AR(0.4)$ . Under the alternative, we let $\boldsymbol{\Sigma}_{1}=\cdots=\boldsymbol{\Sigma}_{n/2}\neq\boldsymbol{\Sigma}_{n/2+1}=\cdots=\boldsymbol{\Sigma}_{n}$ , where $\Sigma_{n/2}$ is (1) $0.8I_{p}$ (2) $AR(0.8)$ and $\Sigma_{n/2+1}$ is (1) $0.4I_{p}$ (2) $AR(0.4)$ .

The results are summarized in Table 9, where our method is denoted as “SN” and the other method as “AB”. There is no tuning parameter in our method, however a few tuning parameters need to be specified for the method “AB”. In particular, the window size was chosen as 30, and the stable set which was used to estimate the precision matrix was chosen as $\{1,2,{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{...}},40\}$ . As we can see from Table 9, there is a huge size distortion with the “AB” test, which could be due to the way the tuning parameter is selected. By contrast, our SN method has fairly accurate size. In terms of the power, SN method is powerful under the alternative. “AB” test has a perfect rejection rate under the alternative, but this shall not be taken too seriously given the huge over-rejection under the null. Overall the SN method seems quite favorable given its accurate size and reasonable power as well as the tuning-free implementation. It is worth mentioning that there is really no guidance or data-driven formula provided as to the choice of tuning parameters in Avanesov and Buzun, (2018). We tried several choices but all of them delivered large size distortion, which indicates the choice of tuning parameters is indeed a difficult issue for their test.

S11 Simulation results for change-point estimation

In this section, we present the WBS-based estimation results and compare with a few other alternative methods via simulations, for independent data in Section S11.1 and time series in Section S11.2.

S11.1 Change-point estimation: independent data

As described in Section 5, we can combine the WBS idea with the self-normalized statistics to estimate the number and locations for change points in the mean of high-dimensional independent data. In this subsection, we compare our WBS method (denoted as WBS-SN, with $L_{0}=10$ ) with binary segmentation(BS-SN) and INSPECT, the latter of which was developed by Wang and Samworth, (2018) targeting sparse and strong changes.

Following Wang and Samworth, (2018), we consider a three change-points model and the change points are located at $[n/4]$ , $2[n/4]$ , and $3[n/4]$ . The mean vectors for those four different zones are $\mu_{1},{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{...}},\mu_{4}$ . Thus we draw $\lfloor n/4\rfloor$ i.i.d sample from $N(\mu_{i},\sigma^{2}I_{p})$ for each zone. We define $\theta_{1},\theta_{2},\theta_{3}$ as three signals at change points, i.e. $\theta_{i}=\mu_{i+1}-\mu_{i}$ and $\nu_{i}=\|\theta_{i}\|_{2}$ as the signal strength for $i=1,2,3$ . Denote $s=\|\theta_{i}\|_{0}$ for all $i$ as the sparsity level. Specifically we let $n=120,p=50$ and set $\sigma=1$ . The total number of random segments used in WBS-SN is fixed as $M=1000$ . As we described before, we choose the threshold for WBS based on the reference sample. For INSPECT, we use all default parameters in the ”InspectChangepoint” package in R. We consider two cases for the alternative, one is sparse where $s=5$ and the other one is dense where $s=p=50$ . We denote the true number of change points as $N=3$ , and the estimated number is $\hat{N}$ . The true location of change points are 30, 60 and 90.

For sparse case, we set $\theta_{1}=2(k,k,k,k,k,0,{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{...}},0)^{T}$ , $\theta_{2}=-2(k,k,k,k,k,0,{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{...}},0)^{T}$ , and $\theta_{3}=2(k,k,k,k,k,0,{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{...}},0)^{T}$ where $k\in\{\sqrt{2.5/5},\sqrt{4/5}\}$ . For the dense alternative, we set $\theta_{1}=2k\times\boldsymbol{1}_{p}$ , $\theta_{2}=-2k\times\boldsymbol{1}_{p}$ and $\theta_{3}=2k\times\boldsymbol{1}_{p}$ , and let $k\in\{\sqrt{2.5/p},\sqrt{4/p}\}$ . To measure the estimation accuracy for the number of change points, we simply use Mean Squared Error between the estimated number and the truth; for the location estimate, since the change point location estimation can be viewed as a special case for classification, we utilize a metric called “Averaged Rand Index”, denoted as ARI to quantify the accuracy. See Rand, (1971), Hubert and Arabie, (1985) and Wang and Samworth, (2018). The ARI is a positive value between 0 and 1. When estimation is perfect, the ARI is 1. If there is no change points estimated, the corresponding ARI is 0. The higher the ARI, the more accurate the estimation. Here we get ARI for each replicate and finally take the average to get the averaged ARI.

As suggested by a referee, we further implemented another method based on consistent estimation of $\|\Sigma\|_{F}$ by dividing the sample into three equal parts and using the median of the Jackknife-based estimator for each part, mimicking an idea first proposed in Liu et al., (2021). Then we couple the studentized test statistic with this consistent estimator and WBS, but did not find substantial gain in the (unreported) simulation studies. Note that an extension of this idea to the time series setting seems nontrivial, as it will involve bandwidths when forming a consistent estimator of $\|\Gamma\|_{F}$ and theoretical justification in the testing context is expected to be challenging.

Please insert Table 10 here!

As seen from Table 10, binary segmentation does not work at all in all cases due to the non-monotonic change in the mean, whereas both WBS-SN and INSPECT provide more sensible estimates. To estimate the number of change points, WBS-SN outperforms INSPECT in the two dense cases and the Sparse $(\sqrt{4/5})$ case, whereas the performance of INSPECT in the Sparse $(\sqrt{2.5/5})$ case is superior; for the change point location estimation, WBS-SN is inferior to INSPECT in the Sparse $(\sqrt{2.5/5})$ case, which is probably not superising. For the other three cases, their performance is comparable. These findings are in general consistent with our intuition that WBS-SN targets dense alternative and INSPECT targets sparse alterative. They suggest WBS-SN can be a useful complement to INSPECT as in practice we may not know a priori whether the change is sparse or dense.

S11.2 Change-point estimation: time series

For multiple change point estimation in the mean of high-dimensional time series, we compare our WBS-SN [see Algorithm 2] with the double CUSUM binary segmentation algorithm (denoted as DCBS) [Cho, (2016)] and the segmentation algorithm based on a bias-corrected statistic in Li et al., (2019) (denoted as Li). The latter two methods have been implemented in the R packages “hdbinseg” and “HdcpDetect”, respectively.

Example S11.1.

Consider the model $Y_{t}=\mu_{t}+X_{t}$ , where $X_{t}=(X_{t,1},X_{t,2},{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{...}},X_{t,p})^{T}$ is generated from the following three models.

(i)

Gaussian errors with AR(1) type convariance structure: set $\Sigma_{\epsilon}=(0.5^{|i-j|})_{i,j=1}^{p}$ . For $t=1,2,{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{...}},n$ , let $\epsilon_{t}\overset{i.i.d}{\sim}N(0,\Sigma_{\epsilon})$ and $X_{t}=\rho X_{t-1}+\epsilon_{t}.$

(ii)

Non-Gaussian errors: for $t=1,2,{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{...}},n$ , $j=1,2,{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{...}},p$ $\epsilon_{t,j}\overset{i.i.d}{\sim}Uniform(-2,2)$ , and $X_{t}=\rho X_{t-1}+\epsilon_{t}.$

(iii)

Motivated by the simulation models used in Cho, (2016)**, we let $\varrho_{k}=0.6(k+1)^{-1}$ and define $\epsilon_{t,j}=\sum_{k=0}^{99}\varrho_{k}v_{t,j-k},\text{ where }v_{t,j}\overset{i.i.d}{\sim}N(0,1)$ , and $X_{t,j}=\rho X_{t-1,j}+\epsilon_{t,j}+0.2\epsilon_{t-1,j}$ for $t=1,2,{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{...}},n$ , $j=1,2,{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{...}},p$ .

Further, we let

[TABLE]

where for $r=1,2,3$ , $\boldsymbol{\bf\delta}_{r}=(\delta_{r,1},\delta_{r,1},{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{...}},\delta_{r,p})^{T}\in\mathbb{R}^{p}$ . Denote $\Pi_{r}=\{j||\delta_{r,j}|>0,j=1,2,{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}\pgfsys@color@gray@stroke{0}\pgfsys@color@gray@fill{0}{...}},p\}$ and $|\delta_{r,j}|\sim^{i.i.d}Uniform(0.75\theta_{r},1.25\theta_{r})$ for $j\in\Pi_{r}$ . The signs of $\{\delta_{r,j}\}$ are randomly sampled with equal probability. Here, we set $(k_{1},|\Pi_{1}|,\theta_{1})=(\lfloor 0.3n\rfloor,\lfloor 0.75p\rfloor,0.4)$ , $(k_{2},|\Pi_{2}|,\theta_{2})=(\lfloor 0.6n\rfloor,\lfloor 0.25p\rfloor,0.696)$ , and $(k_{3},|\Pi_{3}|,\theta_{3})=(\lfloor 0.8n\rfloor,\lfloor 0.1p\rfloor,1.12)$ .

Please insert Table 11 here!

Table 11 reports the estimation results in terms of the frequency for the estimated number of change points among 200 simulations and ARI for WBS-SN (based on two trimming levels $\eta=0.01$ and $0.02$ and three choices of $L_{0}=6\lfloor n\eta\rfloor+7+\lfloor\theta n\rfloor$ with $\theta=0.1,0.15,0.2$ ), DCBS and Li. We fix the sample size $n=500$ and let $p=250$ and $500$ . It appears that when the temporal dependence is weak (i.e., $\rho=0.3$ ), DCBS performs the best in terms of estimation accuracy for the number and location of change points for all settings, WBS-SN with either trimming level and any choice of $L_{0}$ is comparable to Li in terms of ARI but outperforms Li in terms of estimation of the number of change points. Li’s method tends to overestimate the number of change points in all settings. When the temporal dependence is moderately strong (i.e., $\rho=0.6$ ), WBS-SN outperforms both DCBS and Li according to both criteria. In this case, DCBS tends to underestimate the number of change-points, resulting in a small ARI. Overall our WBS-SN method seems fairly competitive and performs quite stably for two levels of temporal dependence. The trimming level $\eta$ could have an impact on the estimation accuracy, and the magnitude of impact could depend on the data generating process (in particular, the magnitude of temporal dependence), the dimensionality and sample size etc. The choice of $L_{0}$ seems to have little impact on the performance in most cases, suggesting that its optimal choice may not be necessary as long as it is in certain range.

Bibliography41

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Aue et al., (2009) Aue, A., Hörmann, S., Horváth, L., and Reimherr, M. (2009). Break detection in the covariance structure of multivariate time series models. The Annals of Statistics , 37(6B):4046–4087.
2Aue and Horváth, (2013) Aue, A. and Horváth, L. (2013). Structural breaks in time series. Journal of Time Series Analysis , 34(1):1–16.
3Avanesov and Buzun, (2018) Avanesov, V. and Buzun, N. (2018). Change-point detection in high-dimensional covariance structure. Electronic Journal of Statistics , 12(2):3254–3294.
4Billingsley, (2008) Billingsley, P. (2008). Probability and Measure . John Wiley & \& Sons.
5Chan et al., (2013) Chan, J., Horváth, L., and Hušková, M. (2013). Darling–erdős limit results for change-point detection in panel data. Journal of Statistical Planning and Inference , 143(5):955–970.
6Chen and Qin, (2010) Chen, S. X. and Qin, Y.-L. (2010). A two-sample test for high-dimensional data with applications to gene-set testing. The Annals of Statistics , 38(2):808–835.
7Cho, (2016) Cho, H. (2016). Change-point detection in panel data via double cusum statistic. Electronic Journal of Statistics , 10(2):2000–2038.
8Dette and Gösmann, (2018) Dette, H. and Gösmann, J. (2018). Relevant change points in high dimensional time series. Electronic Journal of Statistics , 12(2):2578–2636.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Inference for Change-Points in High-dimensional Data via Self-normalization

Abstract

keywords:

keywords:

1 Introduction

2 Test statistics for high-dimensional independent data

2.1 Single change-point

2.2 Extension to multiple change-points

3 Theoretical properties

Assumption 3.1**.**

Remark 3.2** (Discussion of Assumptions).**

Remark 3.3** (Comparison with Chen and Qin, (2010)).**

3.1 Properties of the test for a single change-point

Theorem 3.4**.**

Theorem 3.5**.**

3.2 Properties of the tests for multiple change-points

Theorem 3.6**.**

Theorem 3.7**.**

3.3 Application to testing for changes in the covariance structure

Proposition 3.8**.**

Remark 3.9** (Discussion of Assumptions).**

Remark 3.10**.**

4 Test statistics for high-dimensional time series

Assumption 4.1**.**

Remark 4.2**.**

Remark 4.3**.**

Theorem 4.4**.**

Remark 4.5**.**

Theorem 4.6**.**

Remark 4.7**.**

Remark 4.8**.**

5 Wild binary segmentation and multiple change-point estimation

6 Numerical Results

6.1 Testing for high-dimensional independent data

6.2 Testing for high-dimensional time series

Example 6.1**.**

7 Summary and Conclusion

S8 Proofs for high-dimensional independent data

Theorem S8.1**.**

S8.1 Proof of Theorem 3.4

S8.2 Proof of Theorem 3.5

S8.3 Proof of Theorem 3.6

S8.4 Proof of Theorem 3.7

S8.5 Proofs for Remark 3.2 and Remark 3.9

S8.6 Proof of Remark 3.3

Proposition S8.2**.**

Proof.

S8.7 Proof of Proposition 3.8

S8.8 Proof of Theorem S8.1

Lemma S8.3**.**

S8.8.1 Proof of (S8.11)

S8.8.2 Proof of (S8.12)

S8.8.3 Proof of Lemma S8.3

S9 Proofs of results for high-dimensional time series

S9.1 Properties of Linear Process

Lemma S9.1**.**

Lemma S9.2**.**

Lemma S9.3**.**

S9.2 Proof of Theorem 4.4

S9.2.1 Convergence of Marginals

S9.2.2 Proof of Equation (S9.4)

Lemma S9.4**.**

Lemma S9.5**.**

Lemma S9.6**.**

Lemma S9.7**.**

S9.2.3 Proof of Equation (S9.5)

Proof.

S9.3 Proof of Theorem 4.6

Lemma S9.8**.**

S9.4 Proof of Auxiliary Lemmas

S9.4.1 Proof of Lemma S9.1

Proof.

S9.4.2 Proof of Lemma S9.2

Assumption 3.1.

Remark 3.2 (Discussion of Assumptions).

Remark 3.3 (Comparison with Chen and Qin, (2010)).

Theorem 3.4.

Theorem 3.5.

Theorem 3.6.

Theorem 3.7.

Proposition 3.8.

Remark 3.9 (Discussion of Assumptions).

Remark 3.10.

Assumption 4.1.

Remark 4.2.

Remark 4.3.

Theorem 4.4.

Remark 4.5.

Theorem 4.6.

Remark 4.7.

Remark 4.8.

Example 6.1.

Theorem S8.1.

Proposition S8.2.

Lemma S8.3.

Lemma S9.1.

Lemma S9.2.

Lemma S9.3.

Lemma S9.4.

Lemma S9.5.

Lemma S9.6.

Lemma S9.7.

Lemma S9.8.

Example S11.1.