Change-point Detection by the Quantile LASSO Method

Gabriela Ciuperca; Mat\'u\v{s} Maciak

arXiv:1901.04691·math.ST·January 16, 2019

Change-point Detection by the Quantile LASSO Method

Gabriela Ciuperca, Mat\'u\v{s} Maciak

PDF

Open Access

TL;DR

This paper introduces a quantile LASSO method for change-point detection in piece-wise constant models that is robust to heavy-tailed errors and can estimate multiple quantiles simultaneously.

Contribution

The paper presents a novel quantile LASSO approach for change-point detection that does not require traditional distributional assumptions and provides consistent change-point estimates.

Findings

01

The method effectively handles heavy-tailed error distributions.

02

It provides consistent estimates when the number of change-points is correctly identified.

03

Numerical simulations demonstrate robustness and empirical performance.

Abstract

A simultaneous change-point detection and estimation in a piece-wise constant model is a common task in modern statistics. If, in addition, the whole estimation can be performed automatically, in just one single step without going through any hypothesis tests for non-identifiable models, or unwieldy classical a-posterior methods, it becomes an interesting, but also challenging idea. In this paper we introduce the estimation method based on the quantile LASSO approach. Unlike standard LASSO approaches, our method does not rely on typical assumptions usually required for the model errors, such as sub-Gaussian or Normal distribution. The proposed quantile LASSO method can effectively handle heavy-tailed random error distributions, and, in general, it offers a more complex view of the data as one can obtain any conditional quantile of the target distribution, not just the conditional mean.…

Figures18

Click any figure to enlarge with its caption.

Tables3

Table 1. Table 1: Simulation results for the quantile LASSO performance for the model in ( 12 ) for various quantiles levels and sample sizes based on 1000 Monte Carlo repetitions. Two models are always considered: the first one uses the prior knowledge that there are two change-points in the true model (the corresponding regularization parameter is denoted as λ ( 2 ) subscript 𝜆 2 \lambda_{(2)} ) and the second model is based on the asymptotically appropriate value λ A S = C ( n − 1 log ⁡ n ) 1 / 2 subscript 𝜆 𝐴 𝑆 𝐶 superscript superscript 𝑛 1 𝑛 1 2 \lambda_{AS}=C(n^{-1}\log n)^{1/2} . The estimation bias and the Mean Squared Error (MSE) quantity are provided with the corresponding standard errors. For the model with the regularization parameter λ A S subscript 𝜆 𝐴 𝑆 \lambda_{AS} we also provide an information about the estimated number of change-points (”[M | | | M | | | M]” stands for the minimum, median, and maximum number of change-points estimated over 1000 Monte Carlo simulations). The model with λ ( 2 ) subscript 𝜆 2 \lambda_{(2)} always contains two change-points and thus, three segments.

$𝒏$	$𝝉$	$𝝀_{𝑨 𝑺}$	$𝝀_{(𝟐)}$		Model with $λ_{(𝟐)}$				Model with $λ_{A S}$				$\| \hat{𝓐_{𝒏}} \|$
$𝒏$	$𝝉$	Value	Mean	Std.Err.	Est. Bias		MSE		Est. Bias		MSE		[M $\|$ M $\|$ M]
20	0.05	3.87	0.30	(0.09)	-0.32	(0.41)	0.66	(0.35)	0.49	(0.46)	1.06	(0.55)	[0 $\|$ 0 $\|$ 0]
	0.10	3.87	0.56	(0.18)	-0.02	(0.40)	0.55	(0.28)	0.39	(0.39)	0.92	(0.38)	[0 $\|$ 0 $\|$ 0]
	0.25	3.87	1.08	(0.29)	0.03	(0.33)	0.45	(0.24)	0.20	(0.33)	0.76	(0.21)	[0 $\|$ 0 $\|$ 0]
	0.50	3.87	1.47	(0.46)	-0.01	(0.31)	0.52	(0.22)	-0.04	(0.30)	0.70	(0.13)	[0 $\|$ 0 $\|$ 0]
	0.75	3.87	1.04	(0.18)	-0.03	(0.32)	0.56	(0.27)	-0.21	(0.32)	0.76	(0.23)	[0 $\|$ 0 $\|$ 0]
	0.90	3.87	0.52	(0.17)	0.07	(0.41)	0.80	(0.35)	-0.33	(0.40)	0.88	(0.40)	[0 $\|$ 0 $\|$ 0]
	0.95	3.87	0.28	(0.09)	0.34	(0.44)	1.04	(0.45)	-0.40	(0.49)	1.01	(0.60)	[0 $\|$ 0 $\|$ 0]
100	0.05	2.15	1.67	(0.57)	0.19	(0.25)	0.45	(0.22)	0.31	(0.26)	0.61	(0.27)	[0 $\|$ 1 $\|$ 6]
	0.10	2.15	2.63	(0.85)	0.17	(0.20)	0.40	(0.17)	0.09	(0.19)	0.31	(0.14)	[1 $\|$ 3 $\|$ 10]
	0.25	2.15	4.73	(1.31)	0.10	(0.15)	0.35	(0.13)	0.00	(0.14)	0.14	(0.07)	[1 $\|$ 6 $\|$ 15]
	0.50	2.15	6.22	(1.51)	-0.01	(0.14)	0.41	(0.15)	-0.01	(0.13)	0.12	(0.06)	[2 $\|$ 9 $\|$ 21]
	0.75	2.15	4.40	(0.91)	-0.12	(0.16)	0.44	(0.18)	0.00	(0.14)	0.16	(0.09)	[1 $\|$ 6 $\|$ 15]
	0.90	2.15	2.29	(0.45)	-0.19	(0.19)	0.56	(0.19)	-0.17	(0.19)	0.53	(0.18)	[0 $\|$ 3 $\|$ 9]
	0.95	2.15	1.36	(0.27)	-0.20	(0.22)	0.64	(0.20)	-0.37	(0.22)	0.80	(0.18)	[0 $\|$ 0 $\|$ 3]
500	0.05	1.11	7.73	(2.51)	0.24	(0.13)	0.41	(0.15)	-0.06	(0.09)	0.10	(0.03)	[7 $\|$ 15 $\|$ 25]
	0.10	1.11	12.82	(4.44)	0.19	(0.10)	0.36	(0.13)	-0.06	(0.07)	0.09	(0.03)	[15 $\|$ 24 $\|$ 41]
	0.25	1.11	22.60	(6.33)	0.10	(0.07)	0.33	(0.11)	-0.04	(0.06)	0.10	(0.02)	[37 $\|$ 53 $\|$ 69]
	0.50	1.11	30.23	(6.61)	-0.01	(0.06)	0.38	(0.13)	0.00	(0.05)	0.09	(0.02)	[55 $\|$ 79 $\|$ 103]
	0.75	1.11	20.26	(3.35)	-0.13	(0.08)	0.41	(0.15)	0.04	(0.06)	0.10	(0.02)	[38 $\|$ 53 $\|$ 71]
	0.90	1.11	9.95	(1.07)	-0.23	(0.10)	0.52	(0.17)	0.06	(0.08)	0.09	(0.03)	[14 $\|$ 24 $\|$ 40]
	0.95	1.11	5.52	(0.79)	-0.27	(0.13)	0.58	(0.18)	0.06	(0.09)	0.10	(0.04)	[6 $\|$ 15 $\|$ 27]

Table 2. Table 2: Comparison of the quantile LASSO performance (QLasso) with the standard LASSO approach (SLasso) and the SMUCE method. The results are given for the model in ( 12 ) for three different (symmetric) error distributions with various signal-to-noise ratio ( N ≡ N ( 0 , 1 ) 𝑁 𝑁 0 1 N\equiv N(0,1) , t 3 ≡ subscript 𝑡 3 absent t_{3}\equiv Student’s distribution with three degrees of freedom, and finally, C ≡ C a u c h y ( 0 , 1 ) 𝐶 𝐶 𝑎 𝑢 𝑐 ℎ 𝑦 0 1 C\equiv Cauchy(0,1) ) and various sample sizes. Three models are considered: the model with the true number of change-points with the corresponding regularization parameter λ ( 2 ) subscript 𝜆 2 \lambda_{(2)} , the model with the asymptotically appropriate value λ A S = C ⋅ ( n − 1 log ⁡ n ) 1 / 2 subscript 𝜆 𝐴 𝑆 ⋅ 𝐶 superscript superscript 𝑛 1 𝑛 1 2 \lambda_{AS}=C\cdot(n^{-1}\log n)^{1/2} , and the model with λ M S subscript 𝜆 𝑀 𝑆 \lambda_{MS} given by minimizing the mean squared error ∑ i = 1 n ( u ^ i ∗ − u i ∗ ) 2 superscript subscript 𝑖 1 𝑛 superscript superscript subscript ^ 𝑢 𝑖 superscript subscript 𝑢 𝑖 2 \sum_{i=1}^{n}(\widehat{u}_{i}^{*}-u_{i}^{*})^{2} . The reported values are given with the corresponding standard errors over 1000 Monte Carlo simulations.

$𝓓$		$𝒏$	Model with $λ_{(𝟐)}$				Model with $λ_{A S}$				Model w. $λ_{M S} /$ SMUCE
$𝓓$		$𝒏$	Est. Bias		MSE		Est. Bias		MSE		Est. Bias		MSE
$𝑵$	SLasso	20	0.00	(0.23)	0.43	(0.16)	0.00	(0.23)	0.65	(0.08)	0.00	(0.23)	0.27	(0.15)
		100	0.00	(0.10)	0.35	(0.12)	0.00	(0.10)	0.09	(0.04)	0.00	(0.10)	0.13	(0.08)
		500	0.00	(0.04)	0.33	(0.11)	0.00	(0.04)	0.06	(0.02)	0.00	(0.04)	0.02	(0.01)
		20	-0.01	(0.31)	0.53	(0.21)	-0.04	(0.29)	0.70	(0.13)	0.00	(0.24)	0.31	(0.17)
		100	-0.01	(0.14)	0.43	(0.14)	0.00	(0.13)	0.12	(0.06)	0.00	(0.13)	0.20	(0.11)
	QLasso	500	-0.01	(0.06)	0.40	(0.12)	0.00	(0.05)	0.09	(0.02)	0.00	(0.05)	0.03	(0.01)
	SMUCE	20									-0.01	(0.23)	0.55	(0.21)
		100									0.01	(0.10)	0.12	(0.10)
		500									0.00	(0.04)	0.02	(0.01)
$𝒕_{𝟑}$	SLasso	20	0.00	(0.38)	0.61	(0.46)	0.00	(0.38)	0.73	(0.36)	0.00	(0.38)	0.47	(0.31)
		100	0.00	(0.17)	0.41	(0.14)	0.00	(0.17)	0.33	(0.81)	0.00	(0.17)	0.20	(0.11)
		500	0.00	(0.08)	0.36	(0.11)	0.00	(0.08)	0.69	(2.32)	0.00	(0.08)	0.05	(0.03)
		20	-0.01	(0.35)	0.59	(0.28)	-0.04	(0.34)	0.73	(0.16)	0.01	(0.28)	0.39	(0.21)
		100	-0.01	(0.15)	0.44	(0.14)	0.00	(0.14)	0.15	(0.08)	0.00	(0.14)	0.24	(0.12)
	QLasso	500	-0.02	(0.07)	0.42	(0.13)	0.00	(0.06)	0.11	(0.03)	0.00	(0.06)	0.04	(0.02)
	SMUCE	20									-0.01	(0.39)	1.33	(2.06)
		100									0.01	(0.19)	1.08	(1.60)
		500									0.00	(0.09)	0.95	(2.52)
$𝑪$	SLasso	20	-1.57	(74.19)	5867	(154600)	-1.57	(74.19)	109736	(3089261)	-1.57	(74.19)	5785	(154589)
		100	-1.15	(22.21)	524	(7333)	-1.15	(22.21)	50589	(765657)	-1.15	(22.21)	512	(7322)
		500	-1.65	(36.73)	1354	(24483)	-1.65	(36.72)	671194	(12245127)	-1.65	(36.72)	1353	(24477)
		20	-0.02	(0.46)	0.75	(0.48)	-0.03	(0.44)	0.81	(0.30)	0.01	(0.36)	0.53	(0.30)
		100	-0.02	(0.20)	0.49	(0.16)	-0.02	(0.18)	0.20	(0.12)	-0.01	(0.18)	0.28	(0.15)
	QLasso	500	-0.02	(0.09)	0.44	(0.14)	0.00	(0.08)	0.18	(0.06)	0.00	(0.07)	0.05	(0.03)
	SMUCE	20									-1.58	(74.19)	109953	(3091453)
		100									-1.16	(22.21)	50683	(766016)
		500									-1.65	(36.72)	671259	(12245434)

Table 3. Table 3: Comparison of the quantile LASSO performance (QLasso) with the standard LASSO approach (SLasso) and the SMUCE method. Three different (symmetric) error distributions are considered ( N ≡ N ( 0 , 1 ) 𝑁 𝑁 0 1 N\equiv N(0,1) , t 3 ≡ subscript 𝑡 3 absent t_{3}\equiv Student’s distribution with three degrees of freedom, and finally, C ≡ C a u c h y ( 0 , 1 ) 𝐶 𝐶 𝑎 𝑢 𝑐 ℎ 𝑦 0 1 C\equiv Cauchy(0,1) ) and the number of estimated change-points (where ”[m | | | m | | | m]” stands for the minimum, median, and maximum number of change-points estimated over 1000 Monte Carlo simulations) and the change-point detection rate together with the corresponding standard errors are provided. The models with three different values of λ n > 0 subscript 𝜆 𝑛 0 \lambda_{n}>0 are considered: the model with λ ( 2 ) subscript 𝜆 2 \lambda_{(2)} , the model with λ A S subscript 𝜆 𝐴 𝑆 \lambda_{AS} , and the model with λ M S subscript 𝜆 𝑀 𝑆 \lambda_{MS} . The change-point detection rate is calculated only for models where at lest two change-points were discovered, otherwise NA values are reported.

$𝓓$		$𝒏$	$𝝀_{𝑨 𝑺}$	$𝝀_{(𝟐)}$	$𝝀_{𝑴 𝑺}$	Number of Jumps $\| {\hat{𝓐}}_{𝒏} \|$		Change-point Detection Error
$𝓓$		$𝒏$	Value	Avg.	Avg.	$λ_{A S}$	$λ_{M S}$ / SMUCE	(with $λ_{(2)}$ )		(with $λ_{A S}$ )		( $λ_{C V}$ / SMUCE )
$𝑵$	SLasso	20	3.87	1.68	0.98	[0 $\|$ 0 $\|$ 3]	[0 $\|$ 0 $\|$ 11]	0.08	(0.06)	0.17	(0.03)	0.04	(0.04)
		100	2.15	7.77	3.29	[2 $\|$ 2 $\|$ 14]	[2 $\|$ 2 $\|$ 21]	0.02	(0.02)	0.01	(0.01)	0.01	(0.01)
		500	1.11	38.09	4.54	[32 $\|$ 32 $\|$ 66]	[3 $\|$ 3 $\|$ 23]	0.00	(0.00)	0.00	(0.00)	0.00	(0.00)
		20	3.87	1.58	1.17	[0 $\|$ 0 $\|$ 0]	[0 $\|$ 0 $\|$ 13]	0.10	(0.07)	NaN	(NA)	0.04	(0.04)
		100	2.15	6.54	4.00	[2 $\|$ 2 $\|$ 21]	[1 $\|$ 1 $\|$ 28]	0.03	(0.04)	0.01	(0.01)	0.01	(0.03)
	QLasso	500	1.11	31.31	4.68	[35 $\|$ 35 $\|$ 70]	[4 $\|$ 4 $\|$ 65]	0.00	(0.00)	0.00	(0.00)	0.00	(0.00)
	SMUCE	20					[0 $\|$ 0 $\|$ 4]					0.07	(0.04)
		100					[0 $\|$ 1 $\|$ 4]					0.02	(0.03)
		500					[2 $\|$ 2 $\|$ 3]					0.00	(0.00)
$𝒕_{𝟑}$	SLasso	20	3.87	2.14	2.25	[0 $\|$ 0 $\|$ 5]	[0 $\|$ 0 $\|$ 9]	0.10	(0.06)	0.12	(0.06)	0.06	(0.05)
		100	2.15	8.86	4.02	[3 $\|$ 3 $\|$ 23]	[1 $\|$ 1 $\|$ 14]	0.05	(0.05)	0.02	(0.02)	0.02	(0.03)
		500	1.11	41.10	7.61	[67 $\|$ 67 $\|$ 124]	[3 $\|$ 3 $\|$ 23]	0.01	(0.02)	0.00	(0.00)	0.00	(0.00)
		20	3.87	1.52	1.55	[0 $\|$ 0 $\|$ 0]	[0 $\|$ 0 $\|$ 13]	0.11	(0.06)	NaN	(NA)	0.05	(0.05)
		100	2.15	6.03	3.91	[2 $\|$ 2 $\|$ 20]	[0 $\|$ 0 $\|$ 26]	0.04	(0.05)	0.01	(0.02)	0.02	(0.04)
	QLasso	500	1.11	28.98	4.59	[54 $\|$ 54 $\|$ 104]	[4 $\|$ 4 $\|$ 36]	0.01	(0.01)	0.00	(0.00)	0.00	(0.00)
	SMUCE	20					[0 $\|$ 0 $\|$ 6]					0.09	(0.05)
		100					[0 $\|$ 2 $\|$ 10]					0.05	(0.05)
		500					[2 $\|$ 2 $\|$ 23]					0.02	(0.02)
$𝑪$	SLasso	20	3.87	112.40	83.71	[0 $\|$ 0 $\|$ 17]	[0 $\|$ 1 $\|$ 19]	0.12	(0.06)	0.08	(0.06)	0.10	(0.06)
		100	2.15	386.53	237.47	[7 $\|$ 7 $\|$ 96]	[0 $\|$ 0 $\|$ 96]	0.12	(0.06)	0.01	(0.01)	0.07	(0.06)
		500	1.11	1414.77	1353.09	[164 $\|$ 164 $\|$ 493]	[0 $\|$ 0 $\|$ 499]	0.10	(0.06)	0.00	(0.00)	0.08	(0.06)
		20	3.87	1.45	2.66	[0 $\|$ 0 $\|$ 1]	[0 $\|$ 0 $\|$ 12]	0.12	(0.06)	0.07	(NA)	0.06	(0.05)
		100	2.15	5.27	3.65	[2 $\|$ 2 $\|$ 22]	[0 $\|$ 0 $\|$ 39]	0.05	(0.05)	0.02	(0.02)	0.03	(0.04)
	QLasso	500	1.11	24.58	4.86	[54 $\|$ 54 $\|$ 192]	[2 $\|$ 2 $\|$ 42]	0.01	(0.01)	0.00	(0.00)	0.00	(0.00)
	SMUCE	20					[0 $\|$ 0 $\|$ 7]					0.10	(0.05)
		100					[1 $\|$ 1 $\|$ 22]					0.05	(0.04)
		500					[25 $\|$ 25 $\|$ 72]					0.01	(0.01)

Equations225

Y_{t} = μ_{k}^{*} + ε_{t}, for t = 1, \dots, n, k = 1, \dots, K^{*} + 1, t_{k - 1}^{*} \leq t \leq t_{k}^{*} - 1,

Y_{t} = μ_{k}^{*} + ε_{t}, for t = 1, \dots, n, k = 1, \dots, K^{*} + 1, t_{k - 1}^{*} \leq t \leq t_{k}^{*} - 1,

Y_{t} = k = 1 \sum K^{*} + 1 μ_{k}^{*} 1 1_{{t_{k - 1}^{*} \leq t \leq t_{k}^{*} - 1}} + ε_{t}, for t = 1, \dots, n,

Y_{t} = k = 1 \sum K^{*} + 1 μ_{k}^{*} 1 1_{{t_{k - 1}^{*} \leq t \leq t_{k}^{*} - 1}} + ε_{t}, for t = 1, \dots, n,

Y_{t} = u_{t}^{*} + ε_{t}, for t = 1, \dots, n,

Y_{t} = u_{t}^{*} + ε_{t}, for t = 1, \dots, n,

\widehat{\boldsymbol{u}}=\mathop{\mathrm{arg\,min}}_{(u_{1},\dots,u_{n})\in\mathbb{R}^{n}}\bigg{(}\sum^{n}_{i=1}\rho_{\tau}(Y_{i}-u_{i})+n\lambda_{n}\sum^{n-1}_{i=1}|u_{i+1}-u_{i}|\bigg{)},

\widehat{\boldsymbol{u}}=\mathop{\mathrm{arg\,min}}_{(u_{1},\dots,u_{n})\in\mathbb{R}^{n}}\bigg{(}\sum^{n}_{i=1}\rho_{\tau}(Y_{i}-u_{i})+n\lambda_{n}\sum^{n-1}_{i=1}|u_{i+1}-u_{i}|\bigg{)},

Y^{n} = X_{n} β^{n} + ε^{n},

Y^{n} = X_{n} β^{n} + ε^{n},

\mathbb{X}_{n}\equiv\left[\begin{array}[]{ccccccccc}1&&0&&0&&\cdots&&0\\ 1&&1&&0&&\cdots&&0\\ 1&&1&&1&&\cdots&&0\\ \vdots&&\vdots&&\vdots&&\cdots&&0\\ 1&&1&&1&&\cdots&&1\\ \end{array}\right].

\mathbb{X}_{n}\equiv\left[\begin{array}[]{ccccccccc}1&&0&&0&&\cdots&&0\\ 1&&1&&0&&\cdots&&0\\ 1&&1&&1&&\cdots&&0\\ \vdots&&\vdots&&\vdots&&\cdots&&0\\ 1&&1&&1&&\cdots&&1\\ \end{array}\right].

\widehat{\textrm{$\mathbf{\beta}$}^{n}}=\mathop{\mathrm{arg\,min}}_{\textrm{$\mathbf{\beta}$}\in\mathbb{R}^{n}}\bigg{[}\sum^{n}_{i=1}\rho_{\tau}(Y_{i}-(\mathbb{X}_{n}\textrm{$\mathbf{\beta}$})_{i})+n\lambda_{n}\sum_{i=2}^{n}|\beta_{i}|\bigg{]},

\widehat{\textrm{$\mathbf{\beta}$}^{n}}=\mathop{\mathrm{arg\,min}}_{\textrm{$\mathbf{\beta}$}\in\mathbb{R}^{n}}\bigg{[}\sum^{n}_{i=1}\rho_{\tau}(Y_{i}-(\mathbb{X}_{n}\textrm{$\mathbf{\beta}$})_{i})+n\lambda_{n}\sum_{i=2}^{n}|\beta_{i}|\bigg{]},

\widehat{u}_{i}\equiv\big{(}\mathbb{X}_{n}\widehat{\textrm{$\mathbf{\beta}$}^{n}}\big{)}_{i}={\bf X}_{i}\widehat{\textrm{$\mathbf{\beta}$}^{n}},\qquad\textrm{ for }i=1,\cdots,n.

\widehat{u}_{i}\equiv\big{(}\mathbb{X}_{n}\widehat{\textrm{$\mathbf{\beta}$}^{n}}\big{)}_{i}={\bf X}_{i}\widehat{\textrm{$\mathbf{\beta}$}^{n}},\qquad\textrm{ for }i=1,\cdots,n.

\frac{n}{lo g n} δ_{n} n \to \infty ⟶ \infty.

\frac{n}{lo g n} δ_{n} n \to \infty ⟶ \infty.

\mathbb{P}\bigg{[}\max_{1\leqslant k\leqslant K^{*}}|\widehat{t}_{k}-t^{*}_{k}|\geq n\delta_{n}\bigg{]}\rightarrow 0,\qquad\textrm{ for }n\rightarrow\infty.

\mathbb{P}\bigg{[}\max_{1\leqslant k\leqslant K^{*}}|\widehat{t}_{k}-t^{*}_{k}|\geq n\delta_{n}\bigg{]}\rightarrow 0,\qquad\textrm{ for }n\rightarrow\infty.

{\cal E}\big{(}A||B\big{)}\equiv\sup_{b\in B}\inf_{a\in A}|a-b|.

{\cal E}\big{(}A||B\big{)}\equiv\sup_{b\in B}\inf_{a\in A}|a-b|.

\frac{1}{n}\textbf{S}^{t}\textbf{S}=\left[\begin{array}[]{ccccccccc}1&&1-\frac{t^{*}_{1}}{n}&&1-\frac{t^{*}_{2}}{n}&&\cdots&&1-\frac{t^{*}_{K^{*}}}{n}\\ 1-\frac{t^{*}_{1}}{n}&&1-\frac{t^{*}_{1}}{n}&&1-\frac{t^{*}_{2}}{n}&&\cdots&&1-\frac{t^{*}_{K^{*}}}{n}\\ \vdots&&\vdots&&\vdots&&\cdots&&\vdots\\ 1-\frac{t^{*}_{K^{*}}}{n}&&1-\frac{t^{*}_{K^{*}}}{n}&&1-\frac{t^{*}_{K^{*}}}{n}&&\cdots&&1-\frac{t^{*}_{K^{*}}}{n}\\ \end{array}\right]

\frac{1}{n}\textbf{S}^{t}\textbf{S}=\left[\begin{array}[]{ccccccccc}1&&1-\frac{t^{*}_{1}}{n}&&1-\frac{t^{*}_{2}}{n}&&\cdots&&1-\frac{t^{*}_{K^{*}}}{n}\\ 1-\frac{t^{*}_{1}}{n}&&1-\frac{t^{*}_{1}}{n}&&1-\frac{t^{*}_{2}}{n}&&\cdots&&1-\frac{t^{*}_{K^{*}}}{n}\\ \vdots&&\vdots&&\vdots&&\cdots&&\vdots\\ 1-\frac{t^{*}_{K^{*}}}{n}&&1-\frac{t^{*}_{K^{*}}}{n}&&1-\frac{t^{*}_{K^{*}}}{n}&&\cdots&&1-\frac{t^{*}_{K^{*}}}{n}\\ \end{array}\right]

0<C\leq\lambda_{min}\Big{(}\frac{1}{n}\textbf{S}^{\top}\textbf{S}\Big{)}\leq\lambda_{max}\Big{(}\frac{1}{n}\textbf{S}^{\top}\textbf{S}\Big{)}\leq\frac{1}{C},

0<C\leq\lambda_{min}\Big{(}\frac{1}{n}\textbf{S}^{\top}\textbf{S}\Big{)}\leq\lambda_{max}\Big{(}\frac{1}{n}\textbf{S}^{\top}\textbf{S}\Big{)}\leq\frac{1}{C},

\Big{\|}\frac{1}{n}\textbf{Q}^{t}\textbf{S}\Big{\|}_{2,\infty}<C,

\Big{\|}\frac{1}{n}\textbf{Q}^{t}\textbf{S}\Big{\|}_{2,\infty}<C,

μ_{k} - μ_{k}^{*} = O_{P} (\frac{lo g n}{n}), for any k = 1, \dots, K^{*} + 1,

μ_{k} - μ_{k}^{*} = O_{P} (\frac{lo g n}{n}), for any k = 1, \dots, K^{*} + 1,

\mathbb{P}\big{[}|\widehat{\cal A}_{n}|\leq CK^{*}\big{]}{\underset{n\rightarrow\infty}{\longrightarrow}}1,

\mathbb{P}\big{[}|\widehat{\cal A}_{n}|\leq CK^{*}\big{]}{\underset{n\rightarrow\infty}{\longrightarrow}}1,

\mathbb{P}\bigg{[}{\cal E}\big{(}\widehat{{\cal T}}_{|\widehat{\cal A}_{n}|}||{\cal T}^{*}\big{)}\leq n\delta_{n}\bigg{]}\rightarrow 1,\qquad\textrm{ as }n\rightarrow\infty.

\mathbb{P}\bigg{[}{\cal E}\big{(}\widehat{{\cal T}}_{|\widehat{\cal A}_{n}|}||{\cal T}^{*}\big{)}\leq n\delta_{n}\bigg{]}\rightarrow 1,\qquad\textrm{ as }n\rightarrow\infty.

(lo g n)^{- 1} I_{min}^{*} \to \infty, for n \to \infty .

(lo g n)^{- 1} I_{min}^{*} \to \infty, for n \to \infty .

P [∣ A_{n} ∣ < K^{*}] \to 0, as n \to \infty .

P [∣ A_{n} ∣ < K^{*}] \to 0, as n \to \infty .

Y_{t}=\left\{\begin{array}[]{cc}\mu_{1}^{*}+\varepsilon_{t}&\textrm{for $0\leq t\leq t_{1}^{*}$}\\ \mu_{2}^{*}+\varepsilon_{t}&\textrm{for $t_{1}^{*}\leq t\leq t_{2}^{*}$}\\ \mu_{3}^{*}+\varepsilon_{t}&\textrm{for $t_{2}^{*}\leq t\leq 1$}\end{array}\right.,\quad\textrm{for $t=1,\dots,n$, and $n\in\mathbb{N}$,}

Y_{t}=\left\{\begin{array}[]{cc}\mu_{1}^{*}+\varepsilon_{t}&\textrm{for $0\leq t\leq t_{1}^{*}$}\\ \mu_{2}^{*}+\varepsilon_{t}&\textrm{for $t_{1}^{*}\leq t\leq t_{2}^{*}$}\\ \mu_{3}^{*}+\varepsilon_{t}&\textrm{for $t_{2}^{*}\leq t\leq 1$}\end{array}\right.,\quad\textrm{for $t=1,\dots,n$, and $n\in\mathbb{N}$,}

τ (n - t_{l}) - i = t_{l} \sum n 1 1_{{Y_{i} < u_{i}}} = n λ_{n} α_{l}, \forall l \in {1, \dots, ∣ A_{n} ∣},

τ (n - t_{l}) - i = t_{l} \sum n 1 1_{{Y_{i} < u_{i}}} = n λ_{n} α_{l}, \forall l \in {1, \dots, ∣ A_{n} ∣},

τ (n - j) - i = j \sum n 1 1_{{Y_{i} < u_{i}}} \leq n λ_{n}, \forall j \in {1, \dots, n},

τ (n - j) - i = j \sum n 1 1_{{Y_{i} < u_{i}}} \leq n λ_{n}, \forall j \in {1, \dots, n},

\tau\sum^{n}_{i=1}X_{ij}-\sum^{n}_{i=1}X_{ij}1\!\!1_{\big{\{}Y_{i}<(\mathbb{X}_{n}\widehat{\textrm{$\mathbf{\beta}$}^{n}})_{i}\big{\}}}=n\lambda_{n}sign(\widehat{\beta}_{j}).

\tau\sum^{n}_{i=1}X_{ij}-\sum^{n}_{i=1}X_{ij}1\!\!1_{\big{\{}Y_{i}<(\mathbb{X}_{n}\widehat{\textrm{$\mathbf{\beta}$}^{n}})_{i}\big{\}}}=n\lambda_{n}sign(\widehat{\beta}_{j}).

1\leq\mathbb{P}\bigg{[}x\geq\frac{|A|}{v}\bigg{]}+\mathbb{P}\bigg{[}|B|\geq\frac{v-1}{v}|A|\bigg{]}.

1\leq\mathbb{P}\bigg{[}x\geq\frac{|A|}{v}\bigg{]}+\mathbb{P}\bigg{[}|B|\geq\frac{v-1}{v}|A|\bigg{]}.

\mathbb{P}\bigg{[}\max_{\begin{subarray}{c}1\leq r_{n}<s_{n}\leq n\\ s_{n}-r_{n}\geq v_{n}\end{subarray}}\;\sup_{t\in\mathbb{R}}\bigg{|}\frac{1}{s_{n}-r_{n}}\sum^{s_{n}-1}_{i=r_{n}}1\!\!1_{\{\varepsilon_{i}\leq t\}}-F(t)\bigg{|}\geq x_{n}\bigg{]}{\underset{n\rightarrow\infty}{\longrightarrow}}0,

\mathbb{P}\bigg{[}\max_{\begin{subarray}{c}1\leq r_{n}<s_{n}\leq n\\ s_{n}-r_{n}\geq v_{n}\end{subarray}}\;\sup_{t\in\mathbb{R}}\bigg{|}\frac{1}{s_{n}-r_{n}}\sum^{s_{n}-1}_{i=r_{n}}1\!\!1_{\{\varepsilon_{i}\leq t\}}-F(t)\bigg{|}\geq x_{n}\bigg{]}{\underset{n\rightarrow\infty}{\longrightarrow}}0,

\displaystyle\mathbb{P}\bigg{[}\max_{\begin{subarray}{c}1\leq r_{n}<s_{n}\leq n\\ s_{n}-r_{n}\geq v_{n}\end{subarray}}

\displaystyle\mathbb{P}\bigg{[}\max_{\begin{subarray}{c}1\leq r_{n}<s_{n}\leq n\\ s_{n}-r_{n}\geq v_{n}\end{subarray}}

\displaystyle\leq\sum_{\begin{subarray}{c}1\leq r_{n}<s_{n}\leq n\\ s_{n}-r_{n}\geq v_{n}\end{subarray}}\mathbb{P}\bigg{[}\sup_{t\in\mathbb{R}}\bigg{|}\frac{1}{s_{n}-r_{n}}\sum^{s_{n}-1}_{i=r_{n}}1\!\!1_{\varepsilon_{i}\leq t}-F(t)\bigg{|}\geq x_{n}\bigg{]}.

\mathbb{P}\bigg{[}\sup_{t\in\mathbb{R}}\bigg{|}\frac{1}{s_{n}-r_{n}}\sum^{s_{n}-1}_{i=r_{n}}1\!\!1_{\varepsilon_{i}\leq t}-F(t)\bigg{|}\geq x_{n}\bigg{]}\leq 2\exp\big{(}-2(s_{n}-r_{n})x^{2}_{n}\big{)}.

\mathbb{P}\bigg{[}\sup_{t\in\mathbb{R}}\bigg{|}\frac{1}{s_{n}-r_{n}}\sum^{s_{n}-1}_{i=r_{n}}1\!\!1_{\varepsilon_{i}\leq t}-F(t)\bigg{|}\geq x_{n}\bigg{]}\leq 2\exp\big{(}-2(s_{n}-r_{n})x^{2}_{n}\big{)}.

\mathbb{P}\bigg{[}\max_{\begin{subarray}{c}1\leq r_{n}<s_{n}\leq n\\ s_{n}-r_{n}\geq v_{n}\end{subarray}}\;\sup_{t\in\mathbb{R}}\bigg{|}\frac{1}{s_{n}-r_{n}}\sum^{s_{n}-1}_{i=r_{n}}1\!\!1_{\varepsilon_{i}\leq t}-F(t)\bigg{|}\geq x_{n}\bigg{]}\leq 2n^{2}\exp\big{(}-2(s_{n}-r_{n})x^{2}_{n}\big{)}{\underset{n\rightarrow\infty}{\longrightarrow}}0,

\mathbb{P}\bigg{[}\max_{\begin{subarray}{c}1\leq r_{n}<s_{n}\leq n\\ s_{n}-r_{n}\geq v_{n}\end{subarray}}\;\sup_{t\in\mathbb{R}}\bigg{|}\frac{1}{s_{n}-r_{n}}\sum^{s_{n}-1}_{i=r_{n}}1\!\!1_{\varepsilon_{i}\leq t}-F(t)\bigg{|}\geq x_{n}\bigg{]}\leq 2n^{2}\exp\big{(}-2(s_{n}-r_{n})x^{2}_{n}\big{)}{\underset{n\rightarrow\infty}{\longrightarrow}}0,

V_{n,k}\equiv\big{\{}|\widehat{t}_{k}-t^{*}_{k}|\geq n\delta_{n}\big{\}}\qquad\textrm{and}\qquad W_{n}\equiv\Big{\{}\max_{1\leqslant j\leqslant K^{*}}|\widehat{t}_{j}-t^{*}_{j}|<\frac{I^{*}_{min}}{2}\Big{\}}.

V_{n,k}\equiv\big{\{}|\widehat{t}_{k}-t^{*}_{k}|\geq n\delta_{n}\big{\}}\qquad\textrm{and}\qquad W_{n}\equiv\Big{\{}\max_{1\leqslant j\leqslant K^{*}}|\widehat{t}_{j}-t^{*}_{j}|<\frac{I^{*}_{min}}{2}\Big{\}}.

n \to \infty lim P [V_{n, k}] = 0.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Control Systems and Identification · Statistical and numerical algorithms

Full text

Change-point Detection by the Quantile LASSO Method

Gabriela CIUPERCA1 and Matúš MACIAK2

Abstract

A simultaneous change-point detection and estimation in a piece-wise constant model is a common task in modern statistics. If, in addition, the whole estimation can be performed automatically, in just one single step without going through any hypothesis tests for non-identifiable models, or unwieldy classical a-posterior methods, it becomes an interesting, but also challenging idea. In this paper we introduce the estimation method based on the quantile LASSO approach. Unlike standard LASSO approaches, our method does not rely on typical assumptions usually required for the model errors, such as sub-Gaussian or Normal distribution. The proposed quantile LASSO method can effectively handle heavy-tailed random error distributions, and, in general, it offers a more complex view of the data as one can obtain any conditional quantile of the target distribution, not just the conditional mean. It is proved that under some reasonable assumptions the number of change-points is not underestimated with probability tenting to one, and, in addition, when the number of change-points is estimated correctly, the change-point estimates provided by the quantile LASSO are consistent. Numerical simulations are used to demonstrate these results and to illustrate the empirical performance robust favor of the proposed quantile LASSO method.

11footnotetext: Université de Lyon, Université Lyon 1, CNRS, UMR 5208, Institut Camille Jordan, Bat. Braconnier, 43, blvd du 11 novembre 1918, F - 69622 Villeurbanne Cedex, France

Email address: [email protected]

2Charles University, Faculty of Mathematics and Physics, Department of Probability and Mathematical Statistics, Sokolovská 83, Prague, 186 75, Czech Republic

Email address: [email protected]

Keywords: quantile LASSO; change-points; sparsity; piece-wise constant model; automatic detection; consistency.

1 Introduction

Change-points in statistical models attract a lot of attention in recent years. The reason is that a continuous or even smooth favor of standard modeling approaches is not what we usually observe in real life situations. In many applications it is quite common that the mechanism producing data can suddenly change. This usually happens due to some known or unknown event which caused this change. In such situations we refer to change-points and we are interested in their detection, estimation, and statistical inference.

The change-point detection and estimation is typically performed using a standard $L_{2}$ -norm minimization, therefore the estimated structure can be interpreted as a conditional mean value of the target variable. There are many approaches proposed in the statistical literature to handle structural breaks (change-points respectively) from various perspectives (e.g., [1, 7, 8, 12, 15, 26] to name a few). Such methods are either based on a segmentation principle (e.g, [17]) or a two stage approach (e.g., [6, 28]), where in both one firstly needs to detect potential change-point locations, and later, in the second phase—if there are some change-points detected—the overall dependence structure is estimated using the $L_{2}$ -norm objective function and the knowledge about the existing change-points gained in the first phase. An alternative idea was recently proposed in [11] where the authors utilized a two stage non-convex minimization based on likelihood approach to recover piece-wise constant trend in exponential family models.

In order to avoid the two (and more) stage estimation techniques mentioned above, an effective algorithm can be obtained when taking an advantage of some recent developments in the area of machine learning approaches and atomic pursuit techniques, the LASSO regularization in particular.

Although the pioneering idea of the LASSO penalization originates in sparse signal recovering problems (see [3] and [29]) it can be also effectively used for the change-point detection and estimation. There is enormous literature available on LASSO in general (see [31] for a nice summary) with various LASSO modifications (e.g., fused LASSO proposed in [30]; adaptive LASSO introduces in [35]; or elastic LASSO presented in [34]), which can be also used for the trend filtering (e.g., [32]). On the other hand, there is only very little work available on the automatic change-point detection using the LASSO type methods. A simple change-point in location problem in a piece-wise constant model within the LASSO estimation framework was firstly considered in [13], but an alternative insight on the same model can be also found in [2] and [24]. A generalization of the piece-wise constant change-point model into a piece-wise linear and continuous case was considered in [23] and a more general linear scenarios are presented in [4], [27], and [32]. Some post-selection inference tools in such models are discussed in [11] and [16]. In addition, a high-dimensional regression scenario for detecting change-points by employing the LASSO penalty is investigated in [20]. However, in all the aforementioned situations the authors consider the standard $L_{2}$ -norm based approach for estimating the conditional mean and, moreover, the results are derived under the assumptions on the Gaussian (or sub-Gaussian respectively) distribution of random errors.

On the other hand, modeling the conditional mean may not be sufficient from the practical point of view. The reason is that there is only a limited information provided about the target distribution when referring to its mean value. Ideally, one should be interested in estimating the whole conditional distribution which, unfortunately, turns out to be a quite complex problem. Instead, the quantile LASSO approach allows us to estimate any conditional quantile and therefore, we can still obtain a complex and overall insight into the distribution of the data. The main idea presented in this paper follows as a generalization of the approach presented in [13] and further elaborated in [21]. We consider the same model, however, with one key difference: the authors in both aforementioned papers work either with the normally distributed random error terms or the zero mean errors with a sub-Gaussian distribution. Unlike their work, the results derived in this paper are free of such distributional assumptions imposed on the random error terms. We utilize the LASSO regularized estimation approach together with the standard check function $\rho_{\tau}(v)=v(\tau-1\!\!1(v<0))$ , for $v\in\mathbb{R}$ , and $\tau\in(0,1)$ (see [19]), which allows us to work with various error distributions accounting also for random error terms with outliers or heavy-tailed distributions with no direct specification on their moments.

A posteriori detection of the change-points (their number and locations) by the quantile LASSO model was already considered in [5], but it is done by a rather unwieldy technique to put into practice: in order to find the number of change-points one firstly needs to minimize a Schwarz-type criterion, locate the change-points, and estimate the model between two consecutive change-points. Moreover, the approach presented in [5] does not cover the piece-wise constant model due to the non-singularity of the design matrix. Therefore, the method presented in this paper has the advantage of overcoming this issue and, in addition, it simultaneously estimates the number of change-points, detects their locations, and recovers the overall quantile structure in a robust manner.

Considering the quantile LASSO estimator we mainly focus on providing some precision for the performance of the change-point location detection, similarly as in [13], rather than proving the consistency in terms of the sign consistency or the oracle properties as, for instance, considered in [25] or [33]. This allows us to use less strict assumptions for the design matrix, which has a very specific form in our case, and, otherwise, does not satisfy stricter irrepresentable conditions, or the eigenvalue restriction required for the sign consistency or the mean consistency in the standard $L_{2}$ -norm sense (see [13, 25] or [33] for more details).

The main contribution of this paper lies in the new robust quantile LASSO proposal for a simultaneous change-point detection and estimation: this method is free of any restrictive distributional assumptions common for the standard LASSO approach which is possible due to the different loss function employed in the minimization problem, analogously to [14] or [22]. Moreover, the estimation method presented in this paper is proved to be consistent with respect to the change-point detection and estimation and the consistency results do not depend on such strict assumptions as one needs to require for the sign consistency or the oracle properties. Therefore, the modeling framework presented in this paper is much widely applicable in practical situations and the final model can be easily obtained by using common estimating techniques and standard optimization toolboxes.

This paper is organized as follows: in the next section we introduce the quantile LASSO model and we propose the estimation approach for fitting the model. The main theoretical results are presented in Section 3 and the empirical performance is investigated via an extensive simulation study in Section 4. Some remarks and comments are given in Section 5. All proofs of the theorems are given in the appendix section.

2 Model and Notations

Let us consider a sample $Y_{1},\dots,Y_{n}$ , for $n\in\mathbb{N}$ , with a specific location structure with $K^{*}\in\mathbb{N}$ change-points, located in $t_{1}^{*}\dots t_{K^{*}}^{*}\in\{1,\dots,n\}$ , such that $1<t_{1}^{*}<t_{2}^{*}<\dots<t_{K^{*}}^{*}<n$ , and

[TABLE]

where $t_{0}^{*}=1$ and $t_{K^{*}+1}^{*}=n+1$ . The model can be equivalently expressed as

[TABLE]

with $K^{*}+1$ unknown parameters (phases) to be estimated and the corresponding change-point locations $t_{1}^{*},\dots,t_{K^{*}}^{*}$ , which are also left unknown. Alternatively, we can also use the formulation

[TABLE]

where $u^{*}_{t}=\mu^{*}_{k}$ , for $t=t^{*}_{k-1},\cdots,t^{*}_{k}-1$ , and $k=1,\dots,K^{*}+1$ (see Figure 1 for an illustration). The random error terms $\{\varepsilon_{t}\}_{t=1}^{n}$ are assumed to be independent and identically distributed random variables with some (unknown) continuous distribution function $F$ .

Remark 2.1

The model above can be also seen a sampling scheme within some fixed domain, for instance, interval $(0,1)$ . In such case the change-point locations can be understood as some specific points $\tau_{k}^{*}$ , for $k=1,\dots,K^{*}$ , such that ${t_{k}^{*}}/{n}\to\tau_{k}^{*}\in(0,1)$ for $n\to\infty$ and any $k\in\{1,\dots,K^{*}\}$ . The unknown model segments $\mu_{1}^{*},\dots,\mu_{K^{*}+1}^{*}$ are determined by a fixed sequence of the true change-point locations $0<\tau_{1}^{*}<\dots<\tau_{K^{*}}^{*}<1$ and $K^{*}\in\mathbb{N}$ , which is also fixed.

The formulation in (3) introduces a kind of sparsity principle in parameters $u_{t}^{*}$ , for $t=1,\dots,n$ , as we assume that $u_{t}^{*}=u_{t-1}^{*}$ , for all $t=2,\dots,n$ , but only $K^{*}$ specific exceptions for $t\in\{t_{1}^{*},\dots,t_{K^{*}}^{*}\}$ . In order to estimate the vector of unknown parameters $\boldsymbol{u}^{*}=(u_{1}^{*},\dots,u_{n}^{*})^{\top}\in\mathbb{R}^{n}$ , and the locations where $u_{t}^{*}\neq u_{t-1}^{*}$ , we solve the minimization problem

[TABLE]

with $\widehat{\boldsymbol{u}}=(\widehat{u}_{1},\dots,\widehat{u}_{n})^{\top}$ , and $\rho_{\tau}(v)=v(\tau-1\!\!1(v<0))$ , for some $\tau\in(0,1)$ , and any $v\in\mathbb{R}$ . The regularization parameter $\lambda_{n}>0$ controls for the overall number of change-points in the final model: for $\lambda_{n}=0$ the minimization in (4) results in $\widehat{\boldsymbol{u}}$ where $\widehat{u}_{t}\neq\widehat{u}_{t-1}$ , for each $t=2,\dots,n$ , while for $\lambda_{n}\to\infty$ we have $\widehat{u}_{t}=\widehat{u}_{t-1}$ , for all $t=2,\dots,n$ , and thus, the final model corresponds to a standard quantile linear regression model for the given $\tau\in(0,1)$ .

Using a parameter substitution and some algebra calculations (analogously to [32], where it was applied to the linear (and higher order) trend filtering) we can rewrite the model in (1) in terms of an ordinary linear regression model as

[TABLE]

where ${\bf Y}^{n}\equiv(Y_{1},\cdots,Y_{n})^{\top}$ , $\textrm{$ \mathbf{\beta} $}^{n}\equiv(d_{t^{*}_{0}},0,\cdots,0,d_{t^{*}_{1}},0,\cdots,0,d_{t^{*}_{K^{*}}},0,\cdots,0)^{\top}$ , and $\textrm{$ \mathbf{\varepsilon} $}^{n}\equiv(\varepsilon_{1},\cdots,\varepsilon_{n})^{\top}$ with $d_{t^{*}_{k}}$ on the position $t^{*}_{k}$ , for $k=0,\dots,K^{*}$ , $d_{t^{*}_{0}}=\mu^{*}_{1}$ , and $d_{t^{*}_{k}}=\mu^{*}_{k}-\mu^{*}_{k-1}$ , for $k=2,\cdots,K^{*}+1$ . The model matrix, of the type $n\times n$ , takes the from

[TABLE]

Let ${\bf X}_{i}$ denotes the $i$ -th row of $\mathbb{X}_{n}$ and let $\widehat{\textrm{$ \mathbf{\beta} $}^{n}}\equiv\big{(}\widehat{\beta}_{1},\cdots,\widehat{\beta}_{n}\big{)}^{\top}$ be the solution of the quantile LASSO minimization problem

[TABLE]

where $(\mathbb{X}_{n}\textrm{$ \mathbf{\beta} $})_{i}={\bf X}_{i}\textrm{$ \mathbf{\beta} $}$ . Let $\widehat{\cal A}_{n}\equiv\big{\{}i\in\{2,\cdots,n\};\;\;\widehat{\beta}_{i}\neq 0\big{\}}=\big{\{}\widehat{t}_{1},\cdots,\widehat{t}_{|\widehat{\cal A}_{n}|}\big{\}}$ be the set of estimated change-point locations and the corresponding estimates of $u_{i}$ are defined as

[TABLE]

Remark 2.2

For brevity, we use the notation were we suppress the dependence of the estimates $\widehat{\textrm{$ \mathbf{\beta} $}^{n}}$ , $\widehat{\cal A}_{n}$ , and $\widehat{u}_{i}$ on the value of the regularization parameter $\lambda_{n}>0$ .

The minimization problem defined in (6) is convex and it can be effectively solved using some standard optimization toolboxes. However, the parameter estimates for the vector of parameters $\textrm{$ \mathbf{\beta} $}^{n}$ are not given explicitly and iterative algorithms need to be employed to obtain the final solution. In the next section we consider the model defined in (1) and we derive and prove some theoretical properties for the estimation procedure defined by the minimization problem in (6).

3 Theoretical Results

Let us start with introducing some necessary notation which will be used throughout this paper. Let $I^{*}_{min}\equiv\min_{1\leqslant k\leqslant K^{*}}(t^{*}_{k+1}-t^{*}_{k})$ and $I^{*}_{max}\equiv\max_{1\leqslant k\leqslant K^{*}}(t^{*}_{k+1}-t^{*}_{k})$ . Analogously, for the change-point magnitudes, we define $J^{*}_{min}\equiv\min_{1\leqslant k\leqslant K^{*}}|\mu^{*}_{k+1}-\mu^{*}_{k}|,$ and $J^{*}_{max}\equiv\max_{1\leqslant k\leqslant K^{*}}|\mu^{*}_{k+1}-\mu^{*}_{k}|$ . Obviously, we have $\mu^{*}_{k}\neq\mu^{*}_{k+1}$ , for any $k=1,\cdots,K^{*}$ . Moreover, $C$ is used to denote a universal positive constant which does not depend on the sample size and which may take different values in different formulas. Let the model in (1) hold. Then, in order to prove the results in this section, the following assumptions need to be satisfied:

(A1)

The true parameters $\mu^{*}_{k}\in\mathbb{R}$ , for any $k=1,\cdots,K^{*}+1$ do not depend on $n\in\mathbb{N}$ .

(A2)

Random error terms $\{\varepsilon_{i}\}$ are i.i.d., with some absolutely continuous distribution function $F(x)$ , such that $\mathbb{P}[\varepsilon<0]=\tau$ , for the given quantile level $\tau\in(0,1)$ , with the corresponding density function $f(x)>0$ , for all $x\in\mathbb{R}$ , which is continuously differentiable, such that $|f^{\prime}(x)|<\infty$ ;

(A3)

Let $I^{*}_{min}\geq n\delta_{n}$ , for some decreasing sequence $\{\delta_{n}\}$ , such that $\delta_{n}\rightarrow 0$ , for $n\rightarrow\infty$ ;

(A4)

Let, in addition, the following holds: $\lambda_{n}/\delta_{n}\rightarrow 0$ , for $n\rightarrow\infty$ ;

(A5)

We assume, that the number of change-points $K^{*}\in\mathbb{N}$ is fixed and does not depend on the sample size $n\in\mathbb{N}$ ;

(A6)

Let $\lambda_{n}=C(n^{-1}\log n)^{1/2}$ , for some $C>0$ .

The assumption in (A1) specifies the model defined in (1) while Assumption (A2) is standard for the high-dimensional quantile regression models (see [19]). Assumption (A3) is considered, for instance, by [13] and [27] to ensure a proper change-point detection by the classical LASSO estimation approach: the authors in both these papers assume, among other assumptions, that $(n\delta_{n}J^{*}_{min})^{-1}n\lambda_{n}\rightarrow 0$ , for $n\rightarrow\infty$ . Thus, for $0<J^{*}_{min}<\infty$ fixed, Assumption (A4) in our paper corresponds to Assumption (A4) of [13] and also Assumption (A3)(iii) of [27]. Assumption (A5) on the true number of jumps $K^{*}\in\mathbb{N}$ is, for instance, considered in [13] for a least squares model with $L_{1}$ -penalty it is also quite reasonable in all practical applications. Assumption (A6) is needed in order to apply the results of [10] on the convergence rate of the quantile LASSO estimator. Assumptions (A4) and (A6) imply that for the sequence $(\delta_{n})$ from Assumption (A3) that

[TABLE]

This last relation implies that $(n\delta_{n})\rightarrow\infty$ as $n\rightarrow\infty$ .

Remark 3.1

Concerning the jump magnitudes, the assumptions imposed on $\{\delta_{n}\}$ , $\{\lambda_{n}\}$ , and $J_{min}^{*}$ in [13] are the following: $n\delta_{n}(J_{min}^{*})^{2}/\log n\to\infty$ , and $(n\delta_{n}J_{min}^{*})^{-1}n\lambda_{n}\to 0$ , for $n\to\infty$ . Then, it is easy to see that for $n\delta_{n}=n\lambda_{n}=\log n$ , it is necessary that $J_{min}^{*}\to\infty$ . Thus, the smallest jump magnitude can not be bounded from above which obviously facilitates the detection of changes. Therefore, the method presented in [13] requires the jump sizes to converge to infinity when the sample size increases. In our present paper the jump magnitudes are all fixed.

The main results of this paper are presented in the next three theorems. Theorem 3.1 gives the convergence rate of the change-point location estimates if the number of the estimated change-points coincides with the true number of change-points. Theorem 3.2 covers the situation when the estimated number of change-points is greater than $K^{*}$ , and finally, Theorem 3.3 deals with a scenario where $\widehat{K}$ is smaller than $K^{*}$ . All proofs are postponed to the appendix part in Section A. Let us firstly consider a situation when the estimated number of change-points coincides with the reality—the true number of change-points $K^{*}$ . In this case, with a probability converging to 1 as $n\rightarrow\infty$ , the distance between the true location $t^{*}_{k}$ and the estimated location $\widehat{t}_{k}$ is smaller than $I^{*}_{min}$ , which is the smallest distance between two consecutive true change-points.

Theorem 3.1

Let $|\widehat{\cal A}_{n}|=K^{*}$ . Then, under Assumptions (A1) – (A6), it holds that

[TABLE]

For the purpose of the second theorem, let us introduce (similarly as in [13]), a distance between two sets, $A$ and $B$ , defined as

[TABLE]

Let us also define two sets ${\cal T}^{*}\equiv\{t^{*}_{1},\cdots,t^{*}_{K^{*}}\}$ and $\widehat{{\cal T}}_{|\widehat{\cal A}_{n}|}\equiv\{\widehat{t}_{1},\cdots,\widehat{t}_{|\widehat{\cal A}_{n}|}\}$ . In fact, the set $\widehat{{\cal T}}_{|\widehat{\cal A}_{n}|}$ is identical with $\widehat{\cal A}_{n}$ . Thus, in the following theorem we show that if the estimated number of change-points is greater than $K^{*}$ then the distance between $\widehat{{\cal T}}_{|\widehat{\cal A}_{n}|}$ and ${\cal T}^{*}$ is, with probability converging to one, less than $n\delta_{n}$ . Then, we can say that $\widehat{{\cal T}}_{|\widehat{\cal A}_{n}|}$ is a weakly consistent estimator of ${\cal T}^{*}$ . Le us start by studying the cardinality of the set $\widehat{\cal A}_{n}$ .

We suppose that $\mu^{*}_{1}\neq 0$ . Otherwise, the reasoning is the same. Let ${\cal A}\equiv\{1,t^{*}_{1},\cdots,t^{*}_{K^{*}}\}$ which contains the indices where the vector $\textrm{$ \mathbf{\beta} $}^{n}$ has non-zero components, the elements $t^{*}_{1},\cdots,t^{*}_{K^{*}}$ being also the observations where the model (1) changes.

Consider the following $n\times(K^{*}+1)$ -matrix: $\textbf{S}\equiv\mathbb{X}_{n,{\cal A}}$ , where $\mathbb{X}_{n,{\cal A}}$ is the submatrix formed by columns of $\mathbb{X}_{n}$ with indices in ${\cal A}$ . Then, the $(K^{*}+1)$ -square matrix

[TABLE]

has all the leading principal minors equal to $1$ , $\left(1-\frac{t^{*}_{1}}{n}\right)\frac{t^{*}_{1}}{n}$ , $\left(1-\frac{t^{*}_{2}}{n}\right)\left(\frac{t^{*}_{2}}{n}-\frac{t^{*}_{1}}{n}\right)\frac{t^{*}_{1}}{n}$ , $\cdots$ , $\left(1-\frac{t^{*}_{K^{*}}}{n}\right)\left(\frac{t^{*}_{K^{*}}}{n}-\frac{t^{*}_{K^{*}-1}}{n}\right)\left(\frac{t^{*}_{K^{*}-1}}{n}-\frac{t^{*}_{K^{*}-2}}{n}\right)\cdots\frac{t^{*}_{1}}{n}$ .

By the characterization of Sylvester symmetric matrices, we have that $n^{-1}\textbf{S}^{\top}\textbf{S}$ is positive-definite. Moreover, since ${t^{*}_{k}}/{n}{\underset{n\rightarrow\infty}{\longrightarrow}}\tau^{*}_{k}$ , for all $k=1,\cdots,K^{*}$ (see Remark 2.1), we have that there exists a constant $C>0$ such that

[TABLE]

where $\lambda_{min}(n^{-1}\textbf{S}^{\top}\textbf{S})$ and $\lambda_{max}(n^{-1}\textbf{S}^{\top}\textbf{S})$ are the smallest and the largest eigenvalues of the matrix $n^{-1}\textbf{S}^{\top}\textbf{S}$ . Let us consider also the $n\times(n-K^{*})$ -matrix $\textbf{Q}\equiv\mathbb{X}_{n,\overline{\cal A}}$ , where $\overline{\cal A}$ is the complementary of ${\cal A}$ . Then, there again exists a positive constant $C$ such that:

[TABLE]

with $\|\textbf{A}\|_{2,\infty}=\sup_{\textbf{x}\neq\textbf{0}}\|\textbf{A}\textbf{x}\|_{\infty}/\|\textbf{x}\|_{2}$ . Taking also into account Assumptions (A2), (A5), and (A6), and applying Theorem 2 of [10], we have

[TABLE]

and

[TABLE]

for some constant $0<C<\infty$ . Therefore, using Assumption (A5), we can conclude that the number of estimated change-points, $|\widehat{\cal A}_{n}|$ , is bounded with probability converging to one.

Theorem 3.2

Let $|\widehat{\cal A}_{n}|\geq K^{*}$ . Then, under Assumptions (A1) – (A6), it holds that

[TABLE]

Taking into account Assumptions (A3), (A4), and (A6), we obtain that the minimum distance between two consecutive change-points, $I_{min}^{*}$ , has to satisfy $(n\log n)^{-1/2}I_{min}^{*}\rightarrow\infty$ . Thus, since $(n\log n)^{-1/2}I_{min}^{*}=\frac{I_{min}^{*}}{\log n}\sqrt{\frac{\log n}{n}}$ , we have

[TABLE]

Note, that (11) indicates that in order to avoid underestimation of the number of change-points the minimum distance between two consecutive change-points must be larger than $\log n$ .

Finally, the last theorem proves that the estimated number of change-points is not underestimated, but, it is rather overestimated with probability tending to one as $n$ increases. In such cases, however, for each true change-point location $t_{k}^{*}\in\cal T^{*}$ there is at least one change-point location estimate in $\widehat{{\cal T}}_{|\widehat{\cal A}_{n}|}$ , such that the distance between the true location and the corresponding estimate is less than $n\delta_{n}$ , again with probability tending to 1, for $n\to\infty$ (assertion of Theorem 3.2).

Theorem 3.3

Under Assumptions (A1) – (A6), we have that

[TABLE]

As already mentioned before, the theorem above suggests that the number of estimated change-points is more likely to be overestimated. This is, indeed, a common nature of standard LASSO regularization approaches. On the other hand, the overestimation can be amended, for instance, by adopting the adaptive LASSO approach which is well-known for being able to solve the overestimation problem and, moreover, can achieve the oracle properties.

In the next section we compare the proposed quantile LASSO method with some other common estimation techniques and the presented theoretical results will be illustrated in terms of the empirical performance.

4 Simulation Study

In this section we investigate the finite sample properties of the proposed quantile LASSO estimator. For the simulation purposes we consider a location model defined as

[TABLE]

with two distant change-points $t_{1}^{*},t_{2}^{*}\in\{1,\dots,n\}$ , with the corresponding parameters $\mu_{1}^{*}=0$ , $\mu_{2}^{*}=2$ , and $\mu_{3}^{*}=1$ . The random error terms $\{\varepsilon_{t}\}_{t=1}^{n}$ are considered to be independent but, in order to investigate a signal-to-noise performance and the robust favor of the proposed quantile LASSO method, we consider various error distributions (the standard normal distribution, Student’s $t$ distribution with three degrees of freedom, and finally, the Cauchy distribution).

Three different sample sizes $n\in\{20,100,500\}$ are used but in order to be able to easily compare different models with various sample sizes the model is rescaled in terms of Remark 2.1, such that $Y_{\tilde{t}}\equiv Y_{\frac{t}{n}}$ , for $\tilde{t}\in(0,1)$ , with the corresponding change-points being located at $t_{1}^{*}/n\to\tau_{1}^{*}=0.2$ and $t_{2}^{*}/n\to\tau_{2}^{*}=0.7$ .

A set of quantile levels for $\tau\in\{0.05,0.10,0.25,0.50,0.75,0.90,0.95\}$ is considered and the final model is estimated using Equation (6), while three different approaches are applied to determine the value of the regularization parameter $\lambda_{n}>0$ : firstly, we considered the asymptotically appropriate value fulfilling Assumption (A6), denoted as $\lambda_{AS}$ , where $\lambda_{AS}=C\cdot(n^{-1}\log n)^{1/2}$ . For the second model, we use the prior knowledge that there are two true change-points in the model (thus, the final model always contains two change-points and three segments and the corresponding regularization parameter is denoted as $\lambda_{(2)}$ ). Finally, for the third model, we consider the parameter denoted as $\lambda_{MS}$ which is determined by the minimum Mean Squared Error (MSE) quantity $n^{-1}\sum_{i=1}^{n}(\widehat{u}_{i}^{*}-u_{i}^{*})^{2}$ . In addition, we compare the quantile LASSO method with the standard LASSO approach and the SMUCE estimator proposed by [11]. The final models are compared with respect to the averaged estimation bias given by $n^{-1}\sum_{i=1}^{n}(\widehat{u}_{i}^{*}-u_{i}^{*})$ , the MSE quantity, and the change-point detection error expressed as $\frac{1}{2}\sum_{k=1}^{2}|\widehat{t}_{k}-t_{k}^{*}|$ . The change-point detection error is, however, only obtained for models where at least two change-points are detected (otherwise, the quantity is not reported). The simulations are based on 1000 Monte Carlo repetitions for every possible model scenario and the results are reported in Tables 1, 2, and 3.

First of all, we are primarily interested in the quantile LASSO performance when estimating different quantile levels (the results are summarized in Table 1 and illustrated in Figures 2 and 3). From the asymptotical point of view, the model with $\lambda_{AS}$ outperforms the model with two change-points (the model with the regularization parameter $\lambda_{(2)}$ ): the estimation bias and the MSE quantity are both much smaller for larger sample sizes. The model with $\lambda_{AS}$ selects more than just two change-points and therefore, it allows for more augmentation of the sparse vector of parameters and thus, a smaller bias. On the other hand, the model with two change-points is more reliable when detecting the true change-point locations: the model with $\lambda_{AS}$ tends to select more non-zero parameters—change-points—than actually needed. This is, however, a common property of the LASSO methods in general and it could be slightly reduced by adopting, for instance, an adaptive LASSO approach. It is also worth to mention, that the quantile LASSO performs much better when estimating quantile levels close to the median value rather than the levels on the tails (see Table 1). This is however, a common fact and such behavior is quite expected.

The proposed quantile LASSO method is also compared with the standard LASSO approach and the SMUCE estimator. The quantile LASSO is used to estimate a stepwise conditional median function while the standard LASSO approach and the SMUCE method are estimating the conditional mean value instead. However, the error distributions are all symmetric and, therefore, a mutual comparison of these three methods is quite straightforward. The results are summarized in Tables 2 and 3.

The performance of all three methods is very similar for normally distributed random errors, but the quantile LASSO (denoted as QLASSO) clearly outperforms the standard LASSO (denoted as SLASSO) and SMUCE in case of heavy tailed error distributions (Student’s $t$ distribution and Cauchy distribution). The robust flavor of the quantile LASSO is evident at the first glance: while the quantile LASSO performs quite reasonably and a proper convergence is observed for all scenarios the standard LASSO fails for other than normal distributions—the estimation bias seems to increase with larger sample sizes and the corresponding variance terms literally explode. Thus, no convergence can be observed for the standard LASSO estimates. The SMUCE method performs better than the standard LASSO but, it is still outperformed by the quantile LASSO for heavy tailed distributions (see Figure 4).

The reason why we observe such differences in the reported MSE values among the three models with the standard LASSO approach for heavy tailed distributions in Table 2 can be understood when considering also Table 3. The standard LASSO models with $\lambda_{AS}$ and $\lambda_{MS}$ heavily overfit the data with respect to the number of detected change-points and therefore, the bias and variance terms are little suppressed by the huge number of change-points which are present in the model. The quantile LASSO, however, seems to perform more reasonably even for the heavy-tailed error distributions (median of the number of detected change-points is roughly 2 for the quantile LASSO, but the number of change-points for the standard LASSO is very unstable as it can range from zero up to the maximum number of parameters)—see Table 3 for more details. The vector of sparse parameters is more augmented for the models with $\lambda_{AS}$ and $\lambda_{MS}$ and therefore, the reported bias terms are slightly smaller than for the model with $\lambda_{(2)}$ . The robust nature of the proposed quantile LASSO approach can be also nicely visualized in Figure 4. The difference between the estimation performance with respect to the conditional median/mean of the quantile LASSO, standard LASSO, and the SMUCE method is obvious. While all three methods perform roughly at the same quality for the normally distributed error terms, the quantile LASSO only can handle heavy-tailed distributions—the Cauchy distribution in particular. Unlike the conditional median estimate produced by the quantile LASSO, the conditional mean estimates produced by the standard LASSO approach and SMUCE are totally unrealistic (with huge bias and variability and also too high and unstable number of estimated change-points).

Moreover, the same can be also told about the change-point detection performance. If we use the prior knowledge that two change-points (three segments) are supposed to be estimated then all three methods perform quite well if the error terms are normally distributed but, for the Cauchy distribution, the detection of the standard LASSO and SMUCE approach is way aside from the true change-points locations. The quantile LASSO, however, can still provide a valid detection.

The behavior of the quantile LASSO estimator which can be observed in the simulation results is, indeed, in a concordance with the theoretical results proved in Section 3 and the common knowledge of the LASSO performance. The LASSO penalty, in general, is well-known for recovering usually more non-zero coefficients than really needed—this is also confirmed by the simulation study. Secondly, the estimated parameters are always shrunk towards zero and thus, the estimates tend to underestimate the underlying structure, introducing a systematic bias, which is also observed in the simulation study.

5 Conclusion and Final Remarks

In this paper we proposed the quantile LASSO estimator and we investigated its main theoretical and empirical properties. The quantile LASSO is robust with respect to outlying observations and heavy-tailed random error distributions: it clearly outperforms the standard LASSO method in both—the estimation of the unknown underlying structure and, also, in detection of the unknown change-point locations (both under the heavy-tailed error distributions).

From the theoretical point of view, the main advantage of the proposed method lies in much weaker distributional assumptions: the quantile LASSO performance does not rely on any normal or sub-Gaussian distributions which are typically required for the standard LASSO approach and, moreover, much complex insight into the data can be obtained by estimating an arbitrary quantile rather than the mean value only. Another convenient property of the proposed method is that instead of proving its oracle properties or sign consistency results and thus, requiring strong assumptions for the design matrix, we rather show the performance with respect to the change-point detection and therefore, only some mild assumptions are required and the method, in general, is much widely applicable.

The proposed simulations study confirms the theoretical results and it markedly emphasizes the robust nature of the quantile LASSO estimator.

Acknowledgement

The work was partially supported by a bilateral grant between France and the Czech Republic provided by the PHC Barrande 2017 grant of Campus France (CG, grant number 38105NM) and the Ministry of Educations, Youth, and Sports in the Czech Republic (MM, Mobility grant 7AMB17FR030).

A APPENDIX: Proofs

A.1 Auxiliary lemmas and their proofs

In this section we state three important lemmas which are crucial for proving the results from Section 3. The first lemma is a direct consequence of the Karush-Kuhn-Tucker (KKT) optimality conditions. It is useful not only for proving the asymptotic behavior of the change-point number estimator, but also for deriving the properties of the change-points location estimators given by the sequence $\widehat{t}_{1}<\widehat{t}_{2}<\dots<\widehat{t}_{|\widehat{\cal A}_{n}|}$ .

Lemma A.1

For the model described in (1) and any solution $\widehat{\textrm{$ \mathbf{\beta} $}^{n}}\in\mathbb{R}^{n}$ of the minimization problem in (6), it holds, with probability one, for any $n\in\mathbb{N}$ and $\lambda_{n}>0$ , that

[TABLE]

and

[TABLE]

with $\widehat{\alpha}_{l}\equiv 1\!\!1_{\{\widehat{u}_{\widehat{t}_{l}}>\widehat{u}_{\widehat{t}_{l}-1}\}}-1\!\!1_{\{\widehat{u}_{\widehat{t}_{l}}\leq\widehat{u}_{\widehat{t}_{l}-1}\}}$ .

**Proof of Lemma A.1.

**By the Karush-Kuhn-Tucker (KKT) optimality conditions, we have, for all $j\in\widehat{\cal A}_{n}$ , that

[TABLE]

Taking into account the form of $\mathbb{X}_{n}$ , we obtain the relation in (13). Similarly we also obtain the relation in (14). $\blacksquare$

Lemma A.2

Let $A$ and $B$ be two random variables and $x>0$ is some positive real value such that $\mathbb{P}[|A+B|\leq x]=1$ , then, for any constant $v>1$ we have that

[TABLE]

**Proof of Lemma A.2.

**Obviously, it holds that $1=\mathbb{P}\big{[}x\geq\frac{|A|}{v}\big{]}+\mathbb{P}\big{[}x<\frac{|A|}{v}\big{]}$ . The inequality $|A+B|\geq|A|-|B|$ implies that: $\mathbb{P}[|A+B|\leq x]\leq\mathbb{P}[|A|-|B|\leq x]$ . Then, by using the fact that $\mathbb{P}[|A+B|\leq x]=1$ , we can write: $\mathbb{P}\big{[}x<\frac{|A|}{v}\big{]}=\mathbb{P}\big{[}\big{\{}x<\frac{|A|}{v}\big{\}}\cap\big{\{}|A+B|\leq x\big{\}}\big{]}\leq\mathbb{P}\big{[}\big{\{}x<\frac{|A|}{v}\big{\}}\cap\big{\{}|A|-|B|\leq x\big{\}}\big{]}=\mathbb{P}\big{[}|B|\geq\frac{v-1}{v}|A|\big{]}$ and the lemma follows. $\blacksquare$

Lemma A.3

Let $\{v_{n}\}$ and $\{x_{n}\}$ be two positives sequences such that $v_{n}x^{2}_{n}(\log n)^{-1}{\underset{n\rightarrow\infty}{\longrightarrow}}\infty$ . Then, under Assumption (A2) imposed for error terms $\{\varepsilon_{i}\}_{1\leqslant i\leqslant n}$ , we have

[TABLE]

for $F$ being the distribution function of $\varepsilon_{i}$ .

**Proof of Lemma A.3.

**Firstly, we have that

[TABLE]

By Dvoretzky-Kiefer-Wolfowitz’s inequality (see [9]) for the independent Bernoulli random variables $1\!\!1_{\{\varepsilon_{i}\leq t\}}$ , we obtain for all $\epsilon>0$ , that

[TABLE]

Then, taking into account (A.1) and the fact that $v_{n}x^{2}_{n}(\log n)^{-1}{\underset{n\rightarrow\infty}{\longrightarrow}}\infty$ , we also obtain that

[TABLE]

which proofs the assertion of Lemma A.3. $\blacksquare$

A.2 Proofs of Theorems

In this Section we proof the main results formulated in the three theorems in Section 3.

**Proof of Theorem 3.1.

**Let us start by defining two random events, for any $k=1,\cdots,K^{*}$ :

[TABLE]

By Assumption (A5), since $K^{*}<\infty$ , the theorem is proved if we show that for any $k=1,\cdots,K^{*}$ , it holds that

[TABLE]

In order to prove the relation in (16), we suppose that random event $V_{n,k}$ occurs. The event $V_{n,k}$ , for any $k=1,\cdots,K^{*}$ , can be also expressed as $V_{n,k}=\big{(}V_{n,k}\cap W_{n}\big{)}\cup\big{(}V_{n,k}\cap\overline{W}_{n}\big{)}$ , with $\overline{W}_{n}$ being the complementary event of $W_{n}$ .

If event $V_{n,k}$ occurs, without any loss of generality, we can assume that $t^{*}_{k}-\widehat{t}_{k}\geq[n\delta_{n}]$ . The opposite case for $\widehat{t}_{k}-t^{*}_{k}\geq[n\delta_{n}]$ follows similarly. We now consider two steps for proving (16): firstly, we study $\mathbb{P}[V_{n,k}\cap W_{n}]$ and, later, we focus on $\mathbb{P}[V_{n,k}\cap\overline{W}_{n}]$ .

Step 1. We will show that for any $k=1,\cdots,K^{*}$ , it holds that

[TABLE]

Let us start by considering the relation in (14), for $j=t^{*}_{k}$ ,

[TABLE]

and the relation in (13), for $l=k$ :

[TABLE]

where we assume, without any loss of generality, that $t^{*}_{k-1}\leq\widehat{t}_{k}\leq t^{*}_{k}$ . Thus, we have

[TABLE]

Next, we apply the following general result: for any $a,b,c\in\mathbb{R}$ , such that

[TABLE]

Using (19) for $a=\Big{[}\tau(t^{*}_{k}-\widehat{t}_{k})-\sum^{t^{*}_{k}-1}_{i=\widehat{t}_{k}}1\!\!1_{\{Y_{i}\leq\widehat{u}_{i}\}}\Big{]}$ , $b=\Big{[}\tau(n-t^{*}_{k})-\sum^{n}_{i=t^{*}_{k}}1\!\!1_{\{Y_{i}\leq\widehat{u}_{i}\}}\Big{]}$ , and $c=n\lambda_{n}$ , we have that event $U_{n,k}$ occurs with probability 1, where

[TABLE]

Next, we use Lemma A.2, for $x=2n\lambda_{n}$ , some constant $v$ such that $\displaystyle{v>\frac{\max(\tau,F(\mu^{*}_{k+1}-\mu^{*}_{k}))}{|\tau-F(\mu^{*}_{k+1}-\mu^{*}_{k})|}}$ and random variables $A$ and $B$ defined as follows:

•

if $\tau<F(\mu^{*}_{k+1}-\mu^{*}_{k})$ , then $A=\sum^{t^{*}_{k}-1}_{i=\widehat{t}_{k}}1\!\!1_{\{Y_{i}\leq\widehat{u}_{i}\}}$ , $B=\tau(t^{*}_{k}-\widehat{t}_{k})$ ;

•

if $\tau>F(\mu^{*}_{k+1}-\mu^{*}_{k})$ , then $A=\tau(t^{*}_{k}-\widehat{t}_{k})$ , $B=\sum^{t^{*}_{k}-1}_{i=\widehat{t}_{k}}1\!\!1_{\{Y_{i}\leq\widehat{u}_{i}\}}$ .

Then, for the probability $\mathbb{P}[V_{n,k}\cap W_{n}]$ we obtain

[TABLE]

with ${\cal P}_{1}\equiv\mathbb{P}\big{[}\big{\{}\frac{|A|}{v}\leq x\big{\}}\cap V_{n,k}\cap W_{n}\big{]}$ and ${\cal P}_{2}\equiv\mathbb{P}\big{[}\big{(}|B|\geq\frac{v-1}{v}|A|\big{)}\cap V_{n,k}\cap W_{n}\big{]}$ and we distinguish for two individual cases where we either have $\tau<F(\mu^{*}_{k+1}-\mu^{*}_{k})$ , or $\tau>F(\mu^{*}_{k+1}-\mu^{*}_{k})$ .

We start with the situation for which $\tau<F(\mu^{*}_{k+1}-\mu^{*}_{k})$ . We consider the first term in (20) where we have ${\cal P}_{1}=\mathbb{P}\bigg{[}\bigg{\{}\sum^{t^{*}_{k}-1}_{i=\widehat{t}_{k}}1\!\!1_{\{\varepsilon_{i}+\mu^{*}_{k}<\widehat{\mu}_{k+1}\}}\leq 2n\lambda_{n}v\bigg{\}}\cap V_{n,k}\cap W_{n}\bigg{]}$ , with the constant $v$ , such that $v>\frac{F(\mu^{*}_{k+1}-\mu^{*}_{k})}{F(\mu^{*}_{k+1}-\mu^{*}_{k})-\tau}$ .

Under Assumptions (A2), (A5), and (A6), by applying Theorem 2 of [10], we obtain that the relation in (8) holds. Then, we have that

[TABLE]

which implies that there exists a constant $c_{1}>0$ not depending on $n$ , that

[TABLE]

Next, we recall two general results, which are needed to complete the proof:

(i) Let $X$ and $Z$ be two real random variables and $x\in\mathbb{R}$ . Then the following holds:

(i1) If $\mathbb{P}[Z\geq x]=1$ , then $1\!\!1_{\{X\leq Z\}}\geq 1\!\!1_{\{X\leq x\}}$ with probability 1.

(i2) If $\mathbb{P}[Z\leq x]=1$ , then $1\!\!1_{\{X\leq Z\}}\leq 1\!\!1_{\{X\leq x\}}$ with probability 1.

(ii) Let $S_{1}$ and $S_{2}$ be two real random variables such that $S_{1}\leq S_{2}$ with probability one. Then for any $x\in\mathbb{R}$ we have that $\mathbb{P}[S_{1}\leq x]\geq\mathbb{P}[S_{2}\leq x]$ .

Using now relation (i1) together with (22), we have, with probability converging to 1,

[TABLE]

Using this last inequality together with (ii), we obtain for ${\cal P}_{1}$ , that

[TABLE]

where for the last inequality we used the fact that $t^{*}_{k}-\widehat{t}_{k}$ must the smallest possible value, that is $(t^{*}_{k}-[n\delta_{n}])$ , with $[n\delta_{n}]$ being the integer part of $n\delta_{n}$ .

By the random events $V_{n,k}$ and $W_{n}$ we have that $n\delta_{n}<t^{*}_{k}-\widehat{t}_{k}<{I^{*}_{min}}/{2}$ . By Assumption (A5) and the Strong Law of Large Numbers for independent $\varepsilon_{i}$ , we obtain

[TABLE]

Since by Assumption (A2) we have $F(x)>0$ for all $x\in\mathbb{R}$ , there exists a constant $C>0$ , such that $F\big{(}\mu^{*}_{k+1}-\mu^{*}_{k}-c_{1}\sqrt{\frac{\log n}{n}}\big{)}>C$ . Thus, there also exists a positive constant $\tilde{C}>0$ , such that

[TABLE]

with probability converging to one as $n$ tends to infinity. Taking into account Assumption (A4), we finally get

[TABLE]

Analogously, for ${\cal P}_{2}=\mathbb{P}\Big{[}\big{(}\tau(t^{*}_{k}-\widehat{t}_{k})\geq\frac{v-1}{v}\sum^{t^{*}_{k}-1}_{i=\widehat{t}_{k}}1\!\!1_{\{\varepsilon_{i}\leq\widehat{\mu}_{k+1}-\mu^{*}_{k}\}}\big{)}\cap V_{n,k}\cap W_{n}\Big{]}$ , we have

[TABLE]

Using Lemma A.3 for $v_{n}=[n\delta_{n}]$ and $x_{n}=\big{|}\frac{\tau v}{v-1}-F(\mu^{*}_{k+1}-\mu^{*}_{k})\big{|}$ , and due to Assumption (A4) where we have $\delta_{n}/\lambda_{n}\rightarrow\infty$ , we get that

[TABLE]

Thus, we obtain

[TABLE]

We used the fact that $F\Big{(}\mu^{*}_{k+1}-\mu^{*}_{k}-c_{1}\sqrt{\frac{\log n}{n}}\Big{)}\rightarrow F(\mu^{*}_{k+1}-\mu^{*}_{k})$ as $n$ converges to infinity, and ${\tau v}(v-1)^{-1}-F(\mu^{*}_{k+1}-\mu^{*}_{k})<0$ . Finally, we have

[TABLE]

and combining relations (20), (23), and (24), we get that (17) holds true.

Let us now focus on the second case, where $\tau>F(\mu^{*}_{k+1}-\mu^{*}_{k})$ . For ${\cal P}_{1}$ , we can write

[TABLE]

So, we only need to deal with ${\cal P}_{2}=\mathbb{P}\big{[}\big{\{}\frac{1}{t^{*}_{k}-\widehat{t}_{k}}\sum^{t^{*}_{k}-1}_{i=\widehat{t}_{k}}1\!\!1_{\{\varepsilon_{i}\leq\widehat{\mu}_{k+1}-\mu^{*}_{k}\}}\geq\frac{v-1}{v}\tau\big{\}}\cap V_{n,k}\cap W_{n}\big{]}$ . Using (i2) together with (22), we have that

[TABLE]

with probability converging to 1. Thus, by (ii) we obtain

[TABLE]

By Lemma A.3 we obtain

[TABLE]

and, since $\frac{v-1}{v}\tau-F(\mu^{*}_{k+1}-\mu^{*}_{k})>0$ , we also have

[TABLE]

Combining now the last expression with (25) and (A.2), we obtain that (17) holds true also for the case $\tau>F(\mu^{*}_{k+1}-\mu^{*}_{k})$ .

Step 2. Now, we study the probability $\mathbb{P}[V_{n,k}\cap\overline{W}_{n}]$ , with $\overline{W}_{n}\equiv\left\{\max_{1\leqslant j\leqslant K^{*}}|\widehat{t}_{j}-t^{*}_{j}|>\frac{I^{*}_{min}}{2}\right\}$ . We consider the following three random events (using the same notations as in [13]):

[TABLE]

Then, $\mathbb{P}[V_{n,k}\cap\overline{W}_{n}]=\mathbb{P}[V_{n,k}\cap D_{n}^{(l)}]+\mathbb{P}[V_{n,k}\cap D_{n}^{(m)}]+\mathbb{P}[V_{n,k}\cap D_{n}^{(r)}]$ and we deal with each probability term on the right side separately. For $\mathbb{P}[V_{n,k}\cap D_{n}^{(m)}]$ we have

[TABLE]

We have that $t^{*}_{k-1}<\widehat{t}_{k}<t^{*}_{k}<\widehat{t}_{k+1}<t^{*}_{k+2}$ and applying (14) for $j=t^{*}_{k}$ and (13) for $l=k$ , we obtain that

[TABLE]

On the other hand, using (14) for $j=t^{*}_{k}$ and (13) for $l=k+1$ , we also get that

[TABLE]

Therefore,

[TABLE]

Since $\mu^{*}_{k}$ , $\mu^{*}_{k+1}$ don’t depend on $n$ and $\mu^{*}_{k}\neq\mu^{*}_{k+1}$ , there is at least one of the differences $\widehat{\mu}_{k+1}-\mu^{*}_{k}$ or $\widehat{\mu}_{k+1}-\mu^{*}_{k+1}$ which does not converge to 0 as $n\rightarrow\infty$ . Suppose it’s $\widehat{\mu}_{k+1}-\mu^{*}_{k}$ . Then

[TABLE]

Similarly as in Step 1 we obtain that the last probability converges to 0 as $n\rightarrow\infty$ . Analogously we can show that $\lim_{n\rightarrow\infty}\mathbb{P}[\{t^{*}_{i}-\widehat{t}_{i}\geq I^{*}_{min}/2\}\cap\{\widehat{t}_{i+1}-t^{*}_{i}\geq I^{*}_{min}/2\}\cap D_{n}^{(m)}]=0$ , for any $i=k+1,\cdots,K^{*}$ , and since $K^{*}$ is bounded we obtain that $\lim_{n\rightarrow\infty}\mathbb{P}[V_{n,k}\cap D_{n}^{(m)}]=0.$

For $\mathbb{P}[D_{n}^{(l)}]$ we have, similarly as in [13], that

[TABLE]

Repeating the same arguments as above we can also show that $\lim_{n\rightarrow\infty}\mathbb{P}[D_{n}^{(l)}]=0$ and $\lim_{n\rightarrow\infty}\mathbb{P}[D_{n}^{(r)}]=0$ , therefore, also $\lim_{n\rightarrow\infty}\mathbb{P}[V_{n,k}\cap D_{n}^{(l)}]=0$ and $\lim_{n\rightarrow\infty}\mathbb{P}[V_{n,k}\cap D_{n}^{(r)}]=0$ . Putting everything together we have that $\mathbb{P}\left[V_{n,k}\cap\overline{W}_{n}\right]{\underset{n\rightarrow\infty}{\longrightarrow}}0$ which competes the proof. $\blacksquare$

Proof of Theorem 3.2.

In order to prove the theorem we take into account the relation in (9) and we study the probability

[TABLE]

where used conditional probabilities, conditioned on the number of the estimated jumps $|\widehat{\cal A}_{n}|$ . For the first term in (26) we have

[TABLE]

and taking into account the assertion of Theorem 3.1, we have that the last probability converges to 0 as $n\rightarrow\infty$ . Therefore

[TABLE]

For the second term in (26) we have

[TABLE]

where (using the same notations as in [13]), the random events $E_{n,k,K,1}$ , $E_{n,k,K,2}$ , and $E_{n,k,K,3}$ are defined as follows:

[TABLE]

Let us start with $\mathbb{P}[E_{n,k,K,1}]$ : since $\mu^{*}_{K^{*}}\neq\mu^{*}_{K^{*}+1}$ , we can deduct by using the relation in (8) and Assumption (A1) that for fixed $K$ , the only option for $E_{n,k,K,1}$ to occur with probability not converging to zero as $n$ goes to infinity, is for $k=K^{*}$ and $t^{*}_{K^{*}-1}<\widehat{t}_{K}<t^{*}_{K^{*}}$ . Therefore, we study the random event $E_{n,K^{*},K,1}=\bigg{\{}\big{\{}t^{*}_{K^{*}}-\widehat{t}_{K}>n\delta_{n}\big{\}}\cap\big{\{}t^{*}_{K^{*}-1}<\widehat{t}_{K}<t^{*}_{K^{*}}\big{\}}\cap\big{\{}\widehat{t}_{K}-t^{*}_{K^{*}-1}\geq n\delta_{n}\big{\}}\bigg{\}}$ .

Applying now the relation in (14) from Lemma A.1, for $j=t^{*}_{K^{*}}$ , we have

[TABLE]

and, analogously, using the relation in (13), for $l=K$ , we obtain

[TABLE]

Next, the expression in (30) can be also rewritten as

[TABLE]

and we can use the property already given in (19) for $a=\tau(t^{*}_{K^{*}}-\widehat{t}_{K})-\sum^{t^{*}_{K^{*}}-1}_{i=\widehat{t}_{K}}1\!\!1_{\{Y_{i}<\widehat{u}_{i}\}}$ , $b=\tau(n-t^{*}_{K^{*}})-\sum^{n}_{i=t^{*}_{K^{*}}}1\!\!1_{\{Y_{i}<\widehat{u}_{i}\}}$ , and $c=n\lambda_{n}$ . Then, taking into account (29), (31), and (19), we have, with probability one, that

[TABLE]

By Lemma A.2, for $x=2n\lambda_{n}$ , some constant $v$ such that $\displaystyle{v>\frac{\max(\tau,F(\mu^{*}_{K^{*}+1}-\mu^{*}_{K^{*}}))}{|\tau-F(\mu^{*}_{K^{*}+1}-\mu^{*}_{K^{*}})|}}$ , and random variables $A$ and $B$ defined as

•

$A=\sum^{t^{*}_{K^{*}}-1}_{i=\widehat{t}_{K}}1\!\!1_{\{Y_{i}\leq\widehat{u}_{i}\}}$ and $B=\tau(t^{*}_{K^{*}}-\widehat{t}_{K})$ , if $\tau<F(\mu^{*}_{K^{*}+1}-\mu^{*}_{K^{*}})$ ,

•

$A=\tau(t^{*}_{K^{*}}-\widehat{t}_{K})$ and $B=\sum^{t^{*}_{K^{*}}-1}_{i=\widehat{t}_{K}}1\!\!1_{\{Y_{i}\leq\widehat{u}_{i}\}}$ , if $\tau>F(\mu^{*}_{K^{*}+1}-\mu^{*}_{K^{*}})$ ,

we have,

[TABLE]

To show that ${\cal P}_{1,K^{*}}\underset{n\rightarrow\infty}{\longrightarrow}0$ we can use the same idea as for the probability in (23) and, similarly, to show that ${\cal P}_{2,K^{*}}\underset{n\rightarrow\infty}{\longrightarrow}0$ , we use the same principle as in (24). Finally, by Assumption (A5), we have

[TABLE]

Next, we consider $E_{n,k,K,2}$ : again, the only option for $E_{n,k,K,2}$ to occur with some probability not converging to zero, is for $k=1$ and $t^{*}_{1}<\widehat{t}_{1}<t^{*}_{2}$ . Therefore, we only need to focus on $E_{n,1,K,2}=\bigg{\{}\big{\{}t^{*}_{2}-\widehat{t}_{1}>n\delta_{n}\big{\}}\cap\big{\{}t^{*}_{1}<\widehat{t}_{1}<t^{*}_{2}\big{\}}\cap\big{\{}\widehat{t}_{1}-t^{*}_{1}\geq n\delta_{n}\big{\}}\bigg{\}}$ . Applying Lemma A.1 for $j=t^{*}_{2}$ and $l=1$ , we obtain, same as before, that

[TABLE]

Finally, we deal with $E_{n,k,K,3}$ . We can apply Lemma A.1 for the same indexes $j$ and $l$ as in the proof of Proposition 4 in [13], and by following the same idea as above we get that

[TABLE]

Using now the relations in (33), (34), and (35), taking also into account the expression in (28), we obtain that

[TABLE]

which, together with (27) and (26), gives

[TABLE]

which also implies the relation in (10). $\blacksquare$

Proof of Theorem 3.3.

Let $\widehat{{\cal T}}_{\widehat{K}}\equiv\{\widehat{t}_{1},\cdots,\widehat{t}_{\widehat{K}}\}$ be the set of the change-point location the estimates by the quantile LASSO method, such that $|\widehat{\cal A}_{n}|=\widehat{K}$ . Let us consider two quantile processes

[TABLE]

where $\boldsymbol{\mu}(K)=(\mu_{1},\dots,\mu_{K+1})^{\top}$ for some $K\in\mathbb{N}$ fixed and $(\beta_{i})_{1\leqslant i\leqslant n}$ , and $(u_{i})_{1\leqslant i\leqslant n}$ being defined in Section 2. Let us define the quantile LASSO estimator of the $(K+1)$ -dimensional vector $\textrm{$ \mathbf{\mu} $}(K)$ and of the cahnge-point number $K$ , as

[TABLE]

with $\widehat{\textrm{$ \mathbf{\mu} $}}(\widehat{K})=\big{(}\widehat{\mu}_{1},\cdots,\widehat{\mu}_{\widehat{K}+1}\big{)}^{\top}$ obtained by estimating $K$ and $\textrm{$ \mathbf{\mu} $}(K)$ simultaneously. Let us also define another estimator for the same vector $\textrm{$ \mathbf{\mu} $}(K)=(\mu_{1},\cdots,\mu_{K+1})^{\top}$ , however, for some $K\in\mathbb{N}$ fixed, defined as

[TABLE]

where $\overset{\vee}{\textrm{$ \mathbf{\mu} $}}(K)=\big{(}\overset{\vee}{\mu}_{1},\cdots,\overset{\vee}{\mu}_{K+1}\big{)}^{\top}$ . The assertion of the theorem will be proved if we show that under the supposition that $\widehat{K}<K^{*}$ we have

[TABLE]

For $\widehat{K}<K^{*}$ , let us consider then the difference

[TABLE]

where the $(K^{*}+1)$ -vector of the true values $(\mu^{*}_{k})_{1\leqslant k\leqslant K^{*}+1}$ is denoted as $\textrm{$ \mathbf{\mu} $}^{*}$ . In order to study the difference $D_{1}$ , we can rewrite it as a sum of two terms

[TABLE]

and, similarly, the difference $D_{2}$ can be further rewritten as

[TABLE]

We start by studying the difference $D_{2}$ : firstly, we focus on $D_{2,2}$ and afterwards on $D_{2,1}$ . Using the inequality $\big{|}|a|-|b|\big{|}\leq|a-b|$ , Assumption (A5), and the relation in (21), we have

[TABLE]

On the other hand, from [18], we have for any $x,y\in\mathbb{R}$ that

[TABLE]

Using this relation for $x=\varepsilon_{i}$ and $y=C\sqrt{\frac{\log n}{n}}$ , we obtain that $D_{2,1}$ can be expressed as

[TABLE]

By the Limit Central Theorem for i.i.d. Bernoulli random variables we obtain that $\sum^{n}_{i=1}(1\!\!1_{\varepsilon_{i}<0}-\tau)=O_{\mathbb{P}}(n^{1/2})$ and, thus $D_{2,11}=O_{\mathbb{P}}\big{(}\sqrt{\log n}\big{)}$ .

For $D_{2,12}$ , we use the following identity: for all $a>0$ (the situation for $a<0$ is quite analogous) it holds that $I\!\!E[\int^{a}_{0}(1\!\!1_{\{\varepsilon<t\}}-1\!\!1_{\{\varepsilon<0\}})dt]=\int^{a}_{0}I\!\!E[1\!\!1_{\{0<\varepsilon<t\}}]dt=\int^{a}_{0}[F(t)-F(0)]dt$ . Now, for some $t$ in a neighborhood of zero, we can write $F(t)-F(0)=tf(\tilde{t})$ , for some $\tilde{t}\in(0,t)$ , and using the fact that $f(t)>0$ for all $t\in\mathbb{R}$ , which follows from Assumption (A2), we have that:

[TABLE]

Similarly, we obtain that the variance of $D_{2,12}$ is $C_{+}K^{*}\log n$ . Thus, by the Bienaymé-Tchebychev inequality for ${n}^{-1}D_{2,12}$ , we have with probability converging to 1 that $D_{2,12}=C_{+}K^{*}\log n$ , and $D_{2,1}=D_{2,12}\big{(}1+o_{\mathbb{P}}(1)\big{)}=C_{+}K^{*}\log n$ . Then, taking also into account relation (38), Assumption (A6), we obtain

[TABLE]

with $C_{+}^{(3)}>0$ bounded and not depending on $n$ .

Finally, we study $D_{1}$ of (37). We recall that $\widehat{\cal T}_{\widehat{K}}\equiv\{\widehat{t}_{1},\cdots,\widehat{t}_{\widehat{K}}\}$ . Then, $D_{1,2}$ can be rewritten as $D_{1,2}=n\lambda_{n}\big{[}{\cal R}(\widehat{\cal T}_{\widehat{K}})-{\cal R}({\cal T}^{*})\big{]}$ , with

[TABLE]

Thus, since $\widehat{\mu}_{k}$ is bounded for all $k$ , and since $\widehat{K}<K^{*}$ , we have $D_{1,2}=O_{\mathbb{P}}(K^{*}(n\lambda_{n}))=O_{\mathbb{P}}(n\lambda_{n})$ .

We study now $D_{1,1}$ . For this, let us also consider the sets ${\cal T}^{*}\equiv\{t^{*}_{1},\cdots,t^{*}_{K^{*}}\}$ , $\overset{\sim}{\cal T}\equiv\widehat{\cal T}_{\widehat{K}}\cup{\cal T}^{*}=\{\overset{\sim}{t}_{1},\cdots,\overset{\sim}{t}_{|\overset{\sim}{\cal T}|}\}$ , and $\overset{\sim}{\textrm{$ \mathbf{\mu} $}}(|\overset{\sim}{\cal T}|)\equiv\mathop{\mathrm{arg\,min}}_{\textrm{$ \mathbf{\mu} $}(|\overset{\sim}{\cal T}|)\in\mathbb{R}^{|\overset{\sim}{\cal T}|+1}}S\big{(}\textrm{$ \mathbf{\mu} $}(|\overset{\sim}{\cal T}|),|\overset{\sim}{\cal T}|\big{)}=\big{(}\overset{\sim}{\mu}_{1},\cdots,\overset{\sim}{\mu}_{|\overset{\sim}{\cal T}|+1}\big{)}$ . Then, we can write,

[TABLE]

For a better illustration when dealing with $D_{1,11}$ we take a particular case for $K^{*}=2$ and $K=1$ (see the illustration in Figure 5). The other cases are the same, but more painful to do.

We start by expressing $D_{1,11}$ as a sum of three terms where

[TABLE]

For $D_{1,111}$ , since $|\widehat{\mu}_{1}-\mu^{*}_{1}|=O_{\mathbb{P}}(|\widetilde{\mu}_{1}-\mu^{*}_{1}|)=O_{\mathbb{P}}\bigg{(}\sqrt{\frac{\log n}{n}}\bigg{)}$ , we have,

[TABLE]

Then, same as for $D_{2,12}$ , we have that $D_{1,111}=O_{\mathbb{P}}\left(I^{*}_{max}n^{-1}\log n\right)$ . For $D_{1,112}$ , the estimator $\widehat{\mu}_{2}$ is different from at least one of the true values $\mu^{*}_{2}$ or $\mu^{*}_{3}$ . Suppose that it is different to $\mu^{*}_{2}$ . Then, since $\mu^{*}_{k}$ does not depend on $n$ , we have that for all $\epsilon>0$ , there exists some constant $C>0$ , such that $\mathbb{P}\big{[}|\widehat{\mu}_{2}-\mu^{*}_{2}|>C\big{]}>1-\epsilon$ .

Let us now define ${\cal D}\equiv\rho_{\tau}(\varepsilon_{i}+\mu^{*}_{2}-\mu^{*}_{2}+C_{(+)})-\rho_{\tau}(\varepsilon_{i}+\mu^{*}_{2}-\mu^{*}_{2}+C\sqrt{\frac{\log n}{n}})$ , with $C_{(+)}$ being a positive constant not depending on $n$ . By Assumption (A2) we have $f(x)>0$ and the density is bounded for all $x\in\mathbb{R}$ , therefore

[TABLE]

with $C_{(++)}>0$ being some positive constant. Then, with probability converging to 1, as $n\rightarrow\infty$ , we also have $D_{1,112}\geq C_{+}^{(1)}I^{*}_{min}$ , for some $C_{+}^{(1)}>0$ not depending on $n$ .

If $\widehat{\mu}_{3}-\overset{\sim}{\mu}_{3}=O_{\mathbb{P}}\left(\sqrt{\frac{\log n}{n}}\right)$ , then $D_{1,113}$ is same as $D_{1,111}$ , otherwise, since $\mu^{*}_{k}$ does not depend on $n$ , we have that, there exists $C>0$ such that for all $\epsilon>0$ , $\mathbb{P}\big{[}|\widehat{\mu}_{2}-\widetilde{\mu}_{3}|>C\big{]}>1-\epsilon$ . Then, $D_{1,113}>0$ and it is same as $D_{1,112}$ . To conclude, we have that the following holds

[TABLE]

with probability converging to 1, as $n\rightarrow\infty$ .

For $D_{1,12}$ , as for $D_{2,1}$ , we have with probability converging to 1, as $n\rightarrow\infty$ , that

[TABLE]

On the other hand, the relation in (11) implies that $I^{*}_{min}/(I^{*}_{max}\frac{\log n}{n})\rightarrow\infty$ for $n\rightarrow\infty$ , and thus, we have with probability converging to one, that

[TABLE]

Finally, since $D_{1,2}=O_{\mathbb{P}}((n\lambda_{n}))$ , and taking into account that by Assumptions (A3) and (A4) we have $I^{*}_{min}/(n\lambda_{n})\rightarrow\infty$ for $n\rightarrow\infty$ , we obtain that

[TABLE]

holds with probability converging to one as $n$ tends to infinity. Taking now into account the expression in (39), we have

[TABLE]

where the inequality holds with probability converging to 1, for $n\rightarrow\infty$ , with some $C^{(4)}_{+}\geq 0$ . By relation (11), the right side of the last relation is dominated by $C_{+}^{(1)}I^{*}_{min}$ , which is greater then zero. Thus, we obtain (36), when $\widehat{K}<K^{*}$ , which completes the proof. $\blacksquare$

Bibliography35

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Antoch et al. [2006] Antoch, J., Gregoire, G., and Hušková M. (2006). Test for Continuity of Regression Function. Journal for Statistical Planning and Inference , 137 (1), 753 – 777.
2Boysen [2009] Boysen, L., Kempe, A., Munk, A., Liebscher, V., and Wittich, O. (2009). Consistencies and rates of conference of jump penalized least squares estimators. Annals of Statistics , 37 (1), 157–183.
3Chen et al. [2001] Chen, S., Donoho, D., and Saunders, M.A. (2001). Atomic decomposition by basis pursuit. SIAM Reviews , 43 (1), 129–159.
4Ciuperca [2014] Ciuperca, G. (2014). Model selection by LASSO methods in a change-point model. Statistical Papers , 55 (1), 349–374.
5Ciuperca [2016] Ciuperca, G. (2016). Adaptive LASSO model selection in a multiphase quantile regression. Statistics , 50 (5), 1100–1131.
6Csörgő and Horváth [1988] Csörgő, M. and Horváth, L. (1988). 20 Nonparametric methods for changepoint problems. Handbook of Statistics , 7 , 403 – 425.
7Csörgő and Horváth [1997] Csörgő, M. and Horváth, L. (1997). Limit Theorems in Change-Point Analysis. Wiley Series in Probability & Statistics , Chichester, England.
8Desmet and Gijbels [2011] Desmet, L. and Gijbels, I. (2011). Curve Fitting Under Jump and Peak Irregularities Using Local Linear Regression. Communications in Statistics - Theory and Methods , 40 , 4001 – 4020.