Detecting multiple generalized change-points by isolating single ones
Andreas Anastasiou, Piotr Fryzlewicz

TL;DR
The paper presents Isolate-Detect (ID), a novel method for accurately identifying multiple change-points in noisy data sequences, capable of handling various signal changes and outperforming existing methods.
Contribution
Introduction of the Isolate-Detect (ID) method, a new approach for consistent estimation of multiple change-points with enhanced accuracy and minimal parameter tuning.
Findings
ID outperforms state-of-the-art methods in various scenarios.
ID effectively detects change-points with small magnitudes.
The method is implemented in R packages IDetect and breakfast.
Abstract
We introduce a new approach, called Isolate-Detect (ID), for the consistent estimation of the number and location of multiple generalized change-points in noisy data sequences. Examples of signal changes that ID can deal with are changes in the mean of a piecewise-constant signal and changes, continuous or not, in the linear trend. The number of change-points can increase with the sample size. Our method is based on an isolation technique, which prevents the consideration of intervals that contain more than one change-point. This isolation enhances ID's accuracy as it allows for detection in the presence of frequent changes of possibly small magnitudes. In ID, model selection is carried out via thresholding, or an information criterion, or SDLL, or a hybrid involving the former two. The hybrid model selection leads to a general method with very good practical performance and minimal…
| • | |||||||
| Signal | Method | MSE | |||||
| ID | 0 | 0 | 100 | 0 | 0 | ||
| (S1) | NOT | 5 | 86 | 9 | 0 | 0 | |
| MARS | 100 | 0 | 0 | 0 | 0 | ||
| ID | 0 | 1 | 97 | 2 | 0 | ||
| (S2) | NOT | 100 | 0 | 0 | 0 | 0 | |
| PELT | 78 | 22 | 0 | 0 | 0 | ||
| WBS | 27 | 71 | 2 | 0 | 0 | ||
| Type of signal | Method notation | Reference | R package |
|---|---|---|---|
| PELT | Killick et al. (2012) | changepoint | |
| NP.PELT | Haynes et al. (2017) | changepoint.np | |
| S3IB | Rigaill (2015) | Segmentor3IsBack | |
| CumSeg | Muggeo and Adelfio (2011) | cumSeg | |
| Piecewise-constant | CPM | Ross (2015) | cpm |
| WBS | Fryzlewicz (2014) | wbs | |
| WBS2 | Fryzlewicz (2020) | breakfast | |
| NOT | Baranowski et al. (2019) | not | |
| FDR | Li et al. (2016) | FDRSeg | |
| TGUH | Fryzlewicz (2018) | breakfast | |
| NOT | Baranowski et al. (2019) | not | |
| TF | Kim et al. (2009) | - | |
| Continuous piecewise-linear | CPOP | Maidstone et al. (2019) | - |
| MARS | Friedman (1991) | earth | |
| FKS | Spiriti et al. (2013) | freeknotsplines |
| • | |||||||||||
| Method | Model | 0 | 1 | 2 | MSE | Time (ms) | |||||
| PELT | 6 | 32 | 50 | 12 | 0 | 0 | 0 | 3.23 | 0.14 | 3 | |
| NP.PELT | 0 | 2 | 27 | 49 | 15 | 5 | 2 | 2.82 | 0.10 | 211.8 | |
| S3IB | 0 | 7 | 38 | 54 | 1 | 0 | 0 | 2.49 | 0.08 | 343.2 | |
| CumSeg | 39 | 21 | 38 | 2 | 0 | 0 | 0 | 6.37 | 0.20 | 62.3 | |
| CPM.500 | 0 | 0 | 0 | 3 | 3 | 4 | 90 | 4.45 | 0.44 | 2.3 | |
| CPM.3000 | 0 | 0 | 8 | 41 | 26 | 19 | 6 | 3.03 | 0.19 | 3.3 | |
| WBSC1 | (M1) | 0 | 0 | 11 | 32 | 27 | 19 | 11 | 2.79 | 0.25 | 99.3 |
| WBSIC | 0 | 3 | 37 | 53 | 7 | 0 | 0 | 2.59 | 0.08 | 99.3 | |
| WBS2 | 0 | 3 | 54 | 31 | 8 | 2 | 2 | 2.64 | 0.09 | 623.3 | |
| NOT | 0 | 3 | 51 | 43 | 3 | 0 | 0 | 2.61 | 0.10 | 80.7 | |
| FDR | 0 | 0 | 33 | 54 | 12 | 1 | 0 | 2.51 | 0.09 | - | |
| TGUH | 0 | 5 | 37 | 49 | 7 | 1 | 1 | 3.30 | 0.08 | 127.4 | |
| ID | 0 | 3 | 30 | 62 | 5 | 0 | 0 | 2.66 | 0.08 | 23.9 | |
| ID.SDLL | 1 | 2 | 59 | 28 | 5 | 3 | 2 | 2.80 | 0.10 | 20 | |
| ID | 0 | 9 | 62 | 28 | 1 | 0 | 0 | 2.75 | 0.09 | 22.3 | |
| PELT | 85 | 6 | 0 | 9 | 0 | 0 | 0 | 181 | 6.62 | 1.1 | |
| NP.PELT | 84 | 12 | 3 | 1 | 0 | 0 | 0 | 165 | 4.26 | 3.1 | |
| S3IB | 41 | 15 | 1 | 43 | 0 | 0 | 0 | 117 | 3.73 | 15.2 | |
| CumSeg | 100 | 0 | 0 | 0 | 0 | 0 | 0 | 251 | - | 3.9 | |
| CPM.500 | 78 | 4 | 15 | 3 | 0 | 0 | 0 | 145 | 2.96 | 0.4 | |
| WBSC1 | (M2) | 1 | 2 | 7 | 72 | 12 | 6 | 0 | 53 | 0.33 | 38.2 |
| WBSIC | 7 | 8 | 1 | 68 | 13 | 3 | 0 | 64 | 1.00 | 38.2 | |
| WBS2 | 3 | 3 | 4 | 71 | 10 | 4 | 5 | 58 | 0.363 | 30.5 | |
| NOT | 9 | 7 | 4 | 73 | 6 | 1 | 0 | 65 | 0.97 | 43.4 | |
| FDR | 14 | 11 | 11 | 55 | 7 | 2 | 0 | 71 | 0.80 | - | |
| TGUH | 4 | 18 | 3 | 68 | 7 | 0 | 0 | 64 | 0.47 | 22.8 | |
| ID | 7 | 7 | 1 | 74 | 11 | 0 | 0 | 60 | 0.87 | 8.8 | |
| ID.SDLL | 5 | 5 | 6 | 63 | 8 | 4 | 9 | 62 | 0.43 | 3.7 | |
| ID | 28 | 13 | 9 | 47 | 3 | 0 | 0 | 84 | 0.90 | 5.3 | |
| PELT | 0 | 2 | 7 | 90 | 1 | 0 | 0 | 23 | 0.15 | 1.1 | |
| NP.PELT | 100 | 0 | 0 | 0 | 0 | 0 | 0 | 781 | 1.78 | 4.2 | |
| S3IB | 98 | 1 | 1 | 0 | 0 | 0 | 0 | 213 | 0.91 | 20.2 | |
| CumSeg | 0 | 3 | 16 | 72 | 9 | 0 | 0 | 65 | 0.32 | 5.2 | |
| CPM.500 | 1 | 6 | 87 | 6 | 0 | 0 | 0 | 51 | 0.85 | 0.2 | |
| WBSC1 | (M3) | 0 | 0 | 0 | 66 | 26 | 7 | 1 | 24 | 0.19 | 37.3 |
| WBSIC | 0 | 0 | 0 | 64 | 27 | 9 | 0 | 24 | 0.18 | 37.3 | |
| WBS2 | 0 | 0 | 1 | 87 | 8 | 2 | 2 | 25 | 0.17 | 34.7 | |
| NOT | 0 | 0 | 0 | 93 | 7 | 0 | 0 | 21 | 0.13 | 118.3 | |
| FDR | 0 | 0 | 2 | 77 | 15 | 5 | 1 | 23 | 0.17 | - | |
| TGUH | 0 | 0 | 1 | 91 | 6 | 2 | 0 | 25 | 0.15 | 25.2 | |
| ID | 0 | 0 | 0 | 91 | 8 | 1 | 0 | 22 | 0.13 | 9.8 | |
| ID.SDLL | 0 | 0 | 1 | 97 | 1 | 0 | 1 | 24 | 0.14 | 6.8 | |
| ID | 0 | 0 | 2 | 94 | 3 | 1 | 0 | 23 | 0.15 | 4.4 | |
| PELT | - | 53 | 0 | 47 | 0 | 0 | 0 | 14 | 0.54 | 6.7 | |
| NP.PELT | - | 0 | 0 | 21 | 3 | 34 | 42 | 14 | 0.47 | 395.2 | |
| S3IB | - | 12 | 0 | 87 | 1 | 0 | 0 | 7 | 0.12 | 292.1 | |
| CumSeg | - | 100 | 0 | 0 | 0 | 0 | 0 | 23 | - | 84.6 | |
| CPM.500 | - | 0 | 0 | 0 | 0 | 6 | 94 | 31 | 0.76 | 14 | |
| CPM.2000 | - | 0 | 0 | 35 | 11 | 22 | 32 | 13 | 0.39 | 20.4 | |
| WBSC1 | (M4) | - | 0 | 0 | 23 | 20 | 17 | 40 | 13 | 0.50 | 120.8 |
| WBSIC | - | 4 | 0 | 96 | 1 | 0 | 0 | 5 | 0.04 | 119.2 | |
| WBS2 | - | 0 | 1 | 83 | 10 | 4 | 2 | 5 | 0.09 | 666.4 | |
| NOT | - | 8 | 0 | 92 | 0 | 0 | 0 | 6 | 0.08 | 61.8 | |
| FDR | - | 0 | 19 | 70 | 10 | 1 | 0 | 9 | 0.07 | - | |
| TGUH | - | 0 | 51 | 40 | 7 | 2 | 0 | 23 | 0.28 | 169.2 | |
| ID | - | 7 | 0 | 93 | 0 | 0 | 0 | 6 | 0.07 | 42.3 | |
| ID.SDLL | - | 0 | 0 | 81 | 4 | 10 | 5 | 7 | 0.10 | 28.7 | |
| ID | - | 1 | 0 | 98 | 1 | 0 | 0 | 5 | 0.05 | 66.4 | |
| • | ||||||||
|---|---|---|---|---|---|---|---|---|
| Method | MSE | Time (s) | ||||||
| PELT | 100 | 0 | 0 | 0 | 0 | 1.97 | 114.92 | 0.033 |
| NP.PELT | 100 | 0 | 0 | 0 | 0 | 2.25 | 551.89 | 8.976 |
| S3IB | 99 | 1 | 0 | 0 | 0 | 2.23 | 1979.95 | 332.841 |
| CumSeg | 100 | 0 | 0 | 0 | 0 | 2.25 | 1999 | 0.551 |
| CPM.500 | 0 | 45 | 54 | 1 | 0 | 0.19 | 9.00 | 0.002 |
| CPM.20000 | 100 | 0 | 0 | 0 | 0 | 2.23 | 1999 | 1.245 |
| WBSC1 | 100 | 0 | 0 | 0 | 0 | 1.51 | 35.26 | 12.272 |
| WBSIC | 100 | 0 | 0 | 0 | 0 | 2.25 | 1999 | 12.272 |
| WBS2 | 0 | 0 | 0 | 100 | 0 | 0.14 | 0.54 | 5.796 |
| NOT | 100 | 0 | 0 | 0 | 0 | 2.25 | 1999 | 0.484 |
| FDR | 0 | 0 | 0 | 5 | 95 | 0.14 | 0.51 | - |
| TGUH | 0 | 0 | 0 | 100 | 0 | 0.16 | 0.84 | 0.794 |
| ID | 0 | 0 | 0 | 100 | 0 | 0.14 | 0.99 | 0.785 |
| ID.SDLL | 0 | 0 | 0 | 100 | 0 | 0.14 | 0.71 | 120.601 |
| ID | 0 | 82 | 18 | 0 | 0 | 0.22 | 2.48 | 1.363 |
| • | ||||||
|---|---|---|---|---|---|---|
| Method | 0 | 1 | 2 | MSE | Time (s) | |
| PELT | 100 | 0 | 0 | 0 | 39 | 0.004 |
| NP.PELT | 8 | 1 | 23 | 68 | 999 | 1.077 |
| S3IB | 100 | 0 | 0 | 0 | 39 | 0.715 |
| CumSeg | 100 | 0 | 0 | 0 | 39 | 0.115 |
| CPM.500 | 0 | 0 | 0 | 100 | 2957 | 0.011 |
| CPM.3000 | 28 | 6 | 39 | 27 | 628 | 0.031 |
| WBSC1 | 15 | 18 | 20 | 47 | 653 | 0.149 |
| WBSIC | 99 | 1 | 0 | 0 | 44 | 0.149 |
| WBS2 | 89 | 5 | 4 | 2 | 82 | 0.958 |
| NOT | 99 | 1 | 0 | 0 | 44 | 0.089 |
| FDR | 96 | 4 | 0 | 0 | 47 | - |
| TGUH | 100 | 0 | 0 | 0 | 39 | 0.217 |
| ID | 100 | 0 | 0 | 0 | 39 | 0.172 |
| ID.SDLL | 90 | 4 | 0 | 6 | 182 | 0.069 |
| ID | 99 | 0 | 1 | 0 | 41 | 0.259 |
| • | |||||||||||
| Method | Model | 0 | 1 | 2 | MSE | Time (s) | |||||
| NOT | 0 | 0 | 0 | 99 | 1 | 0 | 0 | 0.016 | 0.063 | 0.343 | |
| TF | 0 | 0 | 0 | 0 | 0 | 0 | 100 | 0.029 | 0.451 | 1.125 | |
| CPOP | 0 | 0 | 0 | 99 | 1 | 0 | 0 | 0.013 | 0.055 | 23.190 | |
| MARS | (W1) | 0 | 0 | 2 | 9 | 42 | 39 | 8 | 0.034 | 0.200 | 0.011 |
| FKS | 0 | 0 | 0 | 72 | 22 | 6 | 0 | 0.015 | 0.109 | 270.385 | |
| ID | 0 | 0 | 0 | 91 | 9 | 0 | 0 | 0.030 | 0.104 | 0.036 | |
| ID.SDLL | 0 | 0 | 0 | 98 | 0 | 1 | 1 | 0.033 | 0.098 | 0.030 | |
| NOT | 0 | 0 | 27 | 0 | 6 | 18 | 49 | 0.035 | 0.571 | 0.163 | |
| TF | 0 | 0 | 0 | 0 | 0 | 0 | 100 | 606.523 | 0.432 | 0.117 | |
| CPOP | 0 | 0 | 0 | 90 | 6 | 2 | 2 | 0.010 | 0.097 | 0.078 | |
| MARS | (W3) | 91 | 0 | 7 | 2 | 0 | 0 | 0 | 3.991 | 2.258 | 0.008 |
| FKS | 0 | 0 | 0 | 90 | 9 | 1 | 0 | 0.010 | 0.097 | 67.582 | |
| ID | 0 | 0 | 0 | 99 | 1 | 0 | 0 | 0.013 | 0.101 | 0.017 | |
| ID.SDLL | 0 | 0 | 0 | 93 | 4 | 1 | 2 | 0.022 | 0.130 | 0.010 | |
| NOT | 0 | 1 | 14 | 20 | 16 | 20 | 29 | 0.109 | 0.998 | 0.958 | |
| TF | 0 | 0 | 0 | 0 | 0 | 0 | 100 | 660.399 | 0.465 | 1.349 | |
| CPOP | (W4) | 0 | 0 | 0 | 92 | 8 | 0 | 0 | 0.015 | 0.084 | 1.627 |
| MARS | 100 | 0 | 0 | 0 | 0 | 0 | 0 | 22.058 | 1.609 | 0.019 | |
| ID | 0 | 0 | 0 | 92 | 8 | 0 | 0 | 0.038 | 0.123 | 0.045 | |
| ID.SDLL | 0 | 0 | 0 | 92 | 4 | 1 | 3 | 0.062 | 0.120 | 0.025 | |
| • | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Method | 0 | MSE | Time (s) | |||||||
| NOT | 100 | 0 | 0 | 0 | 0 | 0 | 0 | 4.731 | 99 | 0.869 |
| TF | 0 | 0 | 0 | 0 | 0 | 0 | 100 | 212.547 | 0.387 | 0.863 |
| CPOP | 0 | 0 | 0 | 97 | 3 | 0 | 0 | 0.162 | 0.189 | 1.161 |
| MARS | 100 | 0 | 0 | 0 | 0 | 0 | 0 | 4.703 | 98.523 | 0.009 |
| ID | 0 | 0 | 0 | 98 | 2 | 0 | 0 | 0.201 | 0.242 | 0.589 |
| ID.SDLL | 0 | 0 | 0 | 98 | 2 | 0 | 0 | 0.256 | 0.287 | 0.097 |
| • | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Model | 0 | 1 | 2 | MSE | Time (ms) | ||||||
| (M2) | 6 | 2 | 2 | 74 | 9 | 5 | 2 | 0.86 | 9.7 | ||
| 5 | (M3) | 0 | 0 | 0 | 75 | 16 | 5 | 4 | 0.16 | 9.2 | |
| (W1) | 0 | 0 | 0 | 86 | 12 | 2 | 0 | 0.23 | 32.8 | ||
| (M2) | 7 | 1 | 2 | 52 | 21 | 8 | 9 | 1.18 | 8.7 | ||
| 3 | (M3) | 0 | 1 | 0 | 59 | 20 | 13 | 7 | 0.22 | 9.8 | |
| (W1) | 0 | 0 | 0 | 62 | 28 | 4 | 6 | 0.25 | 22.6 | ||
| Method | Model | MSE | Time (s) | ||
|---|---|---|---|---|---|
| WID | 10 | - | 2.39 | ||
| ID | (D1) | 10 | - | 79.40 | |
| WID | 10 | 2.28 | |||
| ID | (D2) | 10 | 13.06 | ||
| WID | 10 | 0 | 2.17 | ||
| ID | (D3) | 10 | 0 | 6.34 |
| • | ||||||||
| Method | MSE | Time (s) | ||||||
| PELT | 0 | 100 | 0 | 0 | 0 | 0.67 | 1.02 | 0.024 |
| NP.PELT | 100 | 0 | 0 | 0 | 0 | 9655.28 | 113.76 | 11.302 |
| S3IB | 0 | 13 | 87 | 0 | 0 | 0.44 | 1.01 | 111.143 |
| CumSeg | 100 | 0 | 0 | 0 | 0 | 87.11 | 15.08 | 0.323 |
| CPM.500 | 0 | 0 | 0 | 100 | 0 | 0.19 | 0.75 | 0.002 |
| CPM.10000 | 0 | 0 | 74 | 25 | 0 | 0.22 | 1 | 0.002 |
| WBSC1 | 0 | 0 | 80 | 20 | 0 | 0.24 | 1 | 1.293 |
| WBSIC | 0 | 0 | 22 | 78 | 0 | 0.22 | 0.99 | 1.293 |
| WBS2 | 0 | 0 | 29 | 58 | 13 | 0.22 | 0.99 | 1.293 |
| NOT | 100 | 0 | 0 | 0 | 0 | 9.31 | 5.27 | 4.330 |
| FDR | 0 | 0 | 2 | 98 | 0 | 0.19 | 0.83 | - |
| TGUH | 0 | 0 | 100 | 0 | 0 | 0.29 | 1 | 0.484 |
| ID | 0 | 0 | 14 | 86 | 0 | 0.21 | 1.00 | 0.460 |
| • | ||||||||
|---|---|---|---|---|---|---|---|---|
| Method | MSE | Time (s) | ||||||
| PELT | 100 | 0 | 0 | 0 | 0 | 0.55 | 130.64 | 0.016 |
| NP.PELT | 0 | 12 | 88 | 0 | 0 | 0.21 | 4.95 | 0.496 |
| S3IB | 0 | 1 | 54 | 45 | 0 | 0.13 | 2.58 | 33.410 |
| CumSeg | 100 | 0 | 0 | 0 | 0 | 0.56 | 249 | 0.428 |
| CPM.500 | 0 | 0 | 0 | 9 | 91 | 0.12 | 0.59 | 0.008 |
| CPM.10000 | 0 | 0 | 0 | 100 | 0 | 0.12 | 1.21 | 0.008 |
| WBSC1 | 0 | 84 | 16 | 0 | 0 | 0.27 | 4.22 | 0.631 |
| WBSIC | 7 | 0 | 0 | 88 | 5 | 0.16 | 15.92 | 1.051 |
| NOT | 100 | 0 | 0 | 0 | 0 | 0.56 | 240.08 | 0.610 |
| FDR | 0 | 0 | 0 | 99 | 1 | 0.11 | 0.72 | - |
| TGUH | 0 | 0 | 2 | 98 | 0 | 0.14 | 1.43 | 0.580 |
| ID | 0 | 0 | 0 | 100 | 0 | 0.11 | 0.76 | 0.139 |
| • | |||||
|---|---|---|---|---|---|
| Method | MSE | Time (s) | |||
| PELT | 100 | 0 | 0 | 0.94 | 0.086 |
| NP.PELT | 100 | 0 | 0 | 1 | 136.154 |
| CumSeg | 100 | 0 | 0 | 1 | 4.831 |
| CPM.500 | 100 | 0 | 0 | 1 | 59.936 |
| WBSC1 | 100 | 0 | 0 | 0.92 | 6.346 |
| NOT | 100 | 0 | 0 | 1 | 2.873 |
| TGUH | 0 | 0 | 100 | 0.02 | 4.751 |
| ID | 0 | 10 | 90 | 0.02 | 3.693 |
| • | ||||||
|---|---|---|---|---|---|---|
| Method | 0 | 1 | 2 | MSE | Time (ms) | |
| PELT | 100 | 0 | 0 | 0 | 28 | 12.3 |
| NP.PELT | 56 | 13 | 23 | 8 | 269 | 30.4 |
| S3IB | 95 | 3 | 2 | 0 | 57 | 35.7 |
| CumSeg | 100 | 0 | 0 | 0 | 28 | 19.3 |
| CPM.500 | 54 | 11 | 15 | 20 | 294 | 1.5 |
| WBSC1 | 17 | 16 | 19 | 48 | 578 | 62.7 |
| WBSIC | 95 | 3 | 1 | 1 | 59 | 61.3 |
| NOT | 99 | 0 | 0 | 1 | 39 | 37.6 |
| FDR | 90 | 7 | 2 | 1 | 66 | - |
| TGUH | 83 | 0 | 12 | 5 | 147 | 49.6 |
| ID | 95 | 4 | 1 | 0 | 60 | 1.1 |
| • | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| Method | 0 | MSE | Time (s) | ||||||
| NOT | 100 | 0 | 0 | 0 | 0 | 0 | 52.352 | 119 | 6.061 |
| TF | 0 | 0 | 0 | 0 | 0 | 100 | 107.463 | 0.381 | 2.309 |
| CPOP | 0 | 0 | 96 | 4 | 0 | 0 | 1.063 | 0.146 | 3.327 |
| ID | 0 | 0 | 90 | 10 | 0 | 0 | 1.781 | 0.254 | 0.041 |
| • | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Method | 0 | 1 | 2 | MSE | Time (s) | ||||||
| NOT | 0 | 5 | 1 | 1 | 1 | 3 | 25 | 64 | 0.24 | 0.95 | 0.412 |
| TF | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 100 | 0.76 | 0.33 | 1.417 |
| CPOP | 0 | 0 | 0 | 97 | 3 | 0 | 0 | 0 | 0.05 | 0.17 | 8.728 |
| ID | 0 | 0 | 0 | 97 | 2 | 1 | 0 | 0 | 0.07 | 0.27 | 0.049 |
| • | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Method | 0 | MSE | Time (s) | |||||||
| NOT | 100 | 0 | 0 | 0 | 0 | 0 | 0 | 1.063 | 119 | 0.485 |
| TF | 0 | 0 | 0 | 0 | 0 | 0 | 100 | 217868.2 | 0.324 | 0.632 |
| CPOP | 0 | 0 | 0 | 98 | 2 | 0 | 0 | 0.027 | 0.154 | 0.438 |
| ID | 0 | 0 | 0 | 100 | 0 | 0 | 0 | 0.039 | 0.210 | 0.055 |
| • | ||||||
|---|---|---|---|---|---|---|
| Method | 0 | 1 | 2 | MSE | Time (s) | |
| ID | 100 | 0 | 0 | 0 | 31 | 0.116 |
| ID | 100 | 0 | 0 | 0 | 31 | 0.033 |
| ID | 100 | 0 | 0 | 0 | 31 | 0.012 |
| • | |||||||||||
| Method | Model | 0 | 1 | 2 | MSE | Time (ms) | |||||
| ID | 0 | 3 | 40 | 56 | 1 | 0 | 0 | 2.49 | 0.06 | 26.8 | |
| ID | (M1) | 0 | 3 | 40 | 57 | 0 | 0 | 0 | 2.44 | 0.06 | 17.4 |
| ID | 2 | 8 | 47 | 43 | 0 | 0 | 0 | 2.92 | 0.08 | 11.6 | |
| ID | 7 | 8 | 3 | 72 | 9 | 1 | 0 | 67 | 0.88 | 10.9 | |
| ID | (M2) | 83 | 7 | 8 | 4 | 0 | 0 | 0 | 185 | 6.80 | 8.1 |
| ID | 91 | 2 | 0 | 7 | 0 | 0 | 0 | 212 | 8.75 | 5.7 | |
| ID | 0 | 0 | 0 | 82 | 14 | 4 | 0 | 23 | 0.16 | 9.9 | |
| ID | (M3) | 20 | 38 | 30 | 10 | 2 | 0 | 0 | 86 | 0.82 | 9.2 |
| ID | 100 | 0 | 0 | 0 | 0 | 0 | 0 | 784 | 2.73 | 5 | |
| • | ||||||||
|---|---|---|---|---|---|---|---|---|
| Method | MSE | Time (ms) | ||||||
| ID | 12 | 0 | 87 | 1 | 0 | 7 | 0.13 | 31.5 |
| ID | 11 | 0 | 89 | 0 | 0 | 6 | 0.11 | 11.7 |
| ID | 28 | 0 | 72 | 0 | 0 | 11 | 0.29 | 5.8 |
| • | ||||||||
|---|---|---|---|---|---|---|---|---|
| Method | MSE | Time (s) | ||||||
| ID | 0 | 0 | 0 | 100 | 0 | 0.14 | 0.99 | 0.772 |
| ID | 100 | 0 | 0 | 0 | 0 | 1.28 | 3.45 | 0.395 |
| ID | 100 | 0 | 0 | 0 | 0 | 1.53 | 10.34 | 0.308 |
| • | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Method | 0 | 1 | 2 | MSE | Time (s) | |||||
| ID | 0 | 0 | 0 | 91 | 9 | 0 | 0 | 0.031 | 0.104 | 0.036 |
| ID | 0 | 0 | 0 | 92 | 8 | 0 | 0 | 0.027 | 0.099 | 0.020 |
| ID | 0 | 0 | 0 | 99 | 1 | 0 | 0 | 0.020 | 0.067 | 0.017 |
| • | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Method | 0 | MSE | Time (s) | |||||||
| ID | 0 | 0 | 0 | 97 | 3 | 0 | 0 | 0.227 | 0.272 | 0.690 |
| ID | 0 | 0 | 0 | 100 | 0 | 0 | 0 | 0.364 | 0.328 | 0.722 |
| ID | 100 | 0 | 0 | 0 | 0 | 0 | 0 | 4.730 | 98.955 | 0.102 |
| • | |||||||||||
| Method | Model | 0 | 1 | 2 | MSE | Time (s) | |||||
| ID | 0 | 0 | 0 | 94 | 6 | 0 | 0 | 0.020 | 0.122 | 0.014 | |
| ID | (W3) | 0 | 0 | 0 | 99 | 1 | 0 | 0 | 0.009 | 0.067 | 0.012 |
| ID | 100 | 0 | 0 | 0 | 0 | 0 | 0 | 3.917 | 1.875 | 0.009 | |
| ID | 0 | 0 | 0 | 77 | 22 | 1 | 0 | 0.055 | 0.134 | 0.040 | |
| ID | (W4) | 0 | 0 | 0 | 98 | 2 | 0 | 0 | 0.029 | 0.106 | 0.032 |
| ID | 0 | 36 | 50 | 14 | 0 | 0 | 0 | 1.603 | 0.950 | 0.030 | |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMetabolomics and Mass Spectrometry Studies · Statistical Methods and Inference · Fault Detection and Control Systems
∎
11institutetext: A. Anastasiou 22institutetext: Department of Mathematics and Statistics, University of Cyprus, P.O. Box: 20537, 1678, Nicosia, Cyprus.
22email: [email protected] 33institutetext: P. Fryzlewicz 44institutetext: Department of Statistics, The London School of Economics and Political Science, Columbia House, Houghton Street, London, WC2A 2AE.
Detecting multiple generalized change-points by isolating single ones
Andreas Anastasiou111On behalf of all authors, the corresponding author states that there is no conflict of interest.
Piotr Fryzlewicz222Piotr Fryzlewicz’s work was supported by the Engineering and Physical Sciences Research Council grant No. EP/L014246/1.
(Received: date / Accepted: date)
Abstract
We introduce a new approach, called Isolate-Detect (ID), for the consistent estimation of the number and location of multiple generalized change-points in noisy data sequences. Examples of signal changes that ID can deal with are changes in the mean of a piecewise-constant signal and changes, continuous or not, in the linear trend. The number of change-points can increase with the sample size. Our method is based on an isolation technique, which prevents the consideration of intervals that contain more than one change-point. This isolation enhances ID’s accuracy as it allows for detection in the presence of frequent changes of possibly small magnitudes. In ID, model selection is carried out via thresholding, or an information criterion, or SDLL, or a hybrid involving the former two. The hybrid model selection leads to a general method with very good practical performance and minimal parameter choice. In the scenarios tested, ID is at least as accurate as the state-of-the-art methods; most of the times it outperforms them. ID is implemented in the R packages IDetect and breakfast, available from CRAN.
Keywords:
Segmentation symmetric interval expansion threshold criterion Schwarz information criterion SDLL
1 Introduction
Change-point detection is an active area of statistical research that has attracted a lot of interest in recent years. Our work’s focus is on a posteriori change-point detection, where the aim is to estimate the number and locations of certain changes in the behaviour of the data. We work in the model
[TABLE]
where are the observed data and is a one-dimensional, deterministic signal with structural changes at certain points. Two examples are: change-points in the level when is seen as piecewise-constant, and change-points in the first derivative when is piecewise-linear. We highlight, however, that our methodology and analysis apply to more general scenarios, for instance the detection of knots in a piecewise polynomial signal of order , where is not necessarily equal to zero (piecewise-constant mean) or one (piecewise-linear mean). The number of change-points as well as their locations are unknown and our aim is to estimate them. In addition, can grow with . The random variables in (1) have mean zero and variance one; further assumptions will be given in Section 3.2.
When is assumed to be piecewise-constant, the existing change-point detection techniques are mainly split into two categories based on whether the change-points are detected all at once or one at a time. The former category mainly includes optimization-based methods, in which the estimated signal is chosen based on its least squares or log-likelihood fit to the data, penalized by a complexity rule in order to avoid overfitting. The most common example of a penalty function is the Schwarz Information Criterion (SIC); see Yao (1988) for details. To solve the implied penalization problem, dynamic programming approaches, such as the Segment Neighborhood (SN) and Optimal Partitioning (OP) methods of Auger and Lawrence (1989) and Jackson et al. (2005), have been developed. In an attempt to improve on OP’s computational cost, Killick et al. (2012) introduce the PELT method, based on a pruning step applied to OP’s dynamic programming approach. A non-parametric adaptation of PELT is given in Haynes et al. (2017). Rigaill (2015) introduces an improvement over classical SN algorithms, through a pruning approach called PDPa, while Maidstone et al. (2017) give two algorithms by combining ideas from PELT and PDPa. Frick et al. (2014) propose the simultaneous multiscale change-point estimator (SMUCE) for the change-point problem in the case of exponential family regression; solving an optimization problem is also required. The FDRSeg method of Li et al. (2016) is a combination of False Discovery Rate (FDR) control and global segmentation methods in a multiscale way; the change-points are again detected all at once.
In the latter category, in which change-points are detected one at a time, a popular method is binary segmentation, which performs an iterative binary splitting of the data on intervals determined by the previously obtained splits. Vostrikova (1981) introduces and proves the validity of binary segmentation in the setting of change-point detection for piecewise-constant signals. The main advantages of binary segmentation are its conceptual simplicity and low computational cost. However, at each step of the algorithm, binary segmentation looks for a single change-point, which leads to its suboptimality in terms of accuracy, especially for signals with frequent change-points. Some variants of binary segmentation that work towards solving this issue are the Circular Binary Segmentation (CBS) of Olshen et al. (2004), the Wild Binary Segmentation (WBS) of Fryzlewicz (2014) as well as its second version (WBS2) of Fryzlewicz (2020), the Narrowest-Over-Threshold (NOT) method of Baranowski et al. (2019), and the Seeded Binary Segmentation (SeedBS) of Kovács et al. (2020). CBS searches for at most two change-points at each step. Instead of initially calculating the contrast value for the whole data sequence, WBS and NOT are based on a random draw of subintervals of the domain of the data, on which an appropriate statistic is tested against a threshold. The draw of all the subintervals takes place at the beginning of the algorithm. In contrast, WBS2 draws first only a small number, , of data subsamples. It then uses the first change-point candidate to split the data into two parts, and again recursively draws the same number of subsamples to the left and to the right of this change-point candidate, and so on. A major difference between WBS and WBS2 is that the latter adaptively decides where to recursively draw the next subsamples, based on the change-point candidates detected so far; this adds to the detection power of the method. SeedBS is an approach, similar to WBS and NOT, that relies instead on a deterministic construction of background intervals in which single change points are searched. Apart from binary-segmentation-related approaches, the category in which the change-points are detected one at a time also includes methods that control the False Discovery Rate. For instance, the “pseudo-sequential” (PS) procedure of Venkatraman (1992), as well as the CPM method of Ross (2015) are based on an adaptation of online detection algorithms to a posteriori situations and work by bounding the Type I error rate of falsely detecting change-points. Some methods do not fall in either category. For example, the tail-greedy algorithm in Fryzlewicz (2018) achieves a multiscale decomposition of the data using Unbalanced Haar wavelets in an agglomerative way. In addition, Eichinger and Kirch (2018) use moving sum (MOSUM) statistics in order to detect multiple change-points. For a more thorough review of the literature on the detection of multiple change-points in the mean of univariate data sequences, see Cho and Kirch (2020) and Yu (2020). Truong et al. (2020) also present a survey of various a posteriori change-point detection algorithms; the focus is, however, on multivariate time series.
Beyond the piecewise-constant signal model, existing methods mainly minimize the residual sum of squares taking into account a penalty, with the most common being the SIC. This is used in Bai and Perron (1998), in the trend filtering (TF) approach (Kim et al., 2009; Tibshirani, 2014), and in the dynamic programming algorithm CPOP (Maidstone et al., 2019). Friedman (1991) introduces the Multivariate Adaptive Regression Splines (MARS) method for regression analysis based on splines with the number and the location of the knots being determined by the data. Spiriti et al. (2013) propose two methods for optimizing knot locations in spline smoothing, where either the number of knots is fixed or an upper bound for it needs to be given. The NOT approach (Baranowski et al., 2019) detects change-points one at a time in various scenarios including piecewise-linear mean signals.
In general, change-point detection becomes easier in situations where there is at most one change-point to be detected in a given interval; in such cases the detection power of the contrast function (more details are in Section 3.2) is maximised. Therefore, it makes sense to decouple the multiple change-point detection problem into many single change-point detections. To achieve this, we propose a generic technique, Isolate-Detect (ID), for generalized change-point detection in various different structures, such as piecewise-constant or piecewise-linear signals. The concept behind ID is simple and is split into two stages; firstly, the isolation of each of the true change-points within subintervals of the domain , and secondly their detection. From now on, the terms subinterval and interval will be used interchangeably. Although a detailed explanation of our methodology is provided in Section 3.1, the basic idea is that for an observed data sequence of length and with a positive constant, ID first creates two ordered sets of right- and left-expanding intervals as follows. The right-expanding interval is , while the left-expanding interval is . We collect these intervals in the ordered set . For a suitably chosen contrast function (more details are in Section 3.2), ID identifies the point with the maximum contrast value in . If its value exceeds a threshold, denoted by , then it is taken as a change-point. If not, then the next interval in is tested. Upon detection, ID makes a new start from the end-point (or start-point) of the right- (or left-) expanding interval where the detection occurred. Upon correct choice of , ID ensures that we work on intervals with at most one change-point, which was our aim.
We would like to highlight the importance of the change-point isolation aspect present in our method as explained in the previous paragraph. There are various advantages. First, it enables detection in higher-order polynomial signals. Second, it is carried out in a fixed and systematic way, which eliminates any randomness in the selection of the intervals and, by extension, in the final results. Third, the way the isolation is carried out in ID makes it quicker than other localisation-focused algorithms, such as NOT, due to the fact that it needs to work on fewer intervals; more details on this advantage of our proposed methodology are in Section 4.1. We note here that, even though the default methodology described in Fryzlewicz (2014) and Baranowski et al. (2019) is based on the construction of random intervals, the same approaches can be applied to a fixed grid of intervals. However, as noted in Kovács et al. (2020), the latter implementation can be quite slow. Fourth, the pseudo-sequential nature of the attempted isolation, makes our proposed methodology suitable for online change-point detection. This is one of the various different ways that ID is different from existing techniques in the literature which also attempt change-point isolation; a more thorough comparison with seemingly similar, but still different, methods is given in the next section.
The paper is organized as follows. Section 2 is a motivating illustration of our proposed method through examples. Section 3 gives a formal explanation of the ID methodology along with two different scenarios of use and the associated theory. In Section 4, we first discuss the computational aspects of ID and the choice of parameter values. ID variants which lead to improved practical performance are also explained. In Section 5, we provide a thorough simulation study to compare ID with state-of-the-art methods. Real-life data examples are provided in Section 6. The paper is concluded with reflections on the proposed method. The theoretical, as well as practical, merits and weaknesses of ID when compared against state-of-the-art methods are discussed throughout the paper. However, for the sake of clarity these are also brought together in Section 7. ID is implemented in the R packages IDetect and breakfast, available from CRAN.
2 Motivating illustration of Isolate-Detect
The fact that each change-point is sequentially detected using an interval that contains no other change-points leads to high detection power, especially in difficult structures, such as limited spacings between consecutive change-points and/or higher-order piecewise-polynomial signals. Two examples follow in order to make clear the importance of the isolation step and to illustrate the power of ID compared to other change-point detection methods (some of those also attempt localisation) in capturing even small movements in the data that are close to each other. Table 1 provides results on 100 replications of the continuous piecewise-linear signal (S1) and the piecewise-constant signal (S2), where
- (S1)
, with 21 change-points in the slope at locations . The standard deviation is ;
- (S2)
, with 21 change-points in the mean at locations . The standard deviation is .
As a measure of the accuracy of the estimated number we give , while as a measure of the accuracy of the detected locations, we give Monte-Carlo estimates of the mean square error, . The methods compared are ID, NOT, and MARS for (S1) and ID, WBS, NOT, and PELT for (S2). For the ID related results in Table 1, we used the hybrid version of ID explained in Section 4.4. The choice of the parameters is described in Section 4.2. As already mentioned, WBS and NOT also work on subintervals of the data, chosen though in a completely different manner than in ID. More comparative simulation and real-life studies will be given in Sections 5 and 6, respectively.
We notice from Table 1 that ID offers an important increase in the change-point detection power, especially under limited spacings between consecutive change-points. Figures 1 and 2 give a graphical representation of the results for the first out of the 100 repetitions for signals (S1) and (S2), respectively. For better presentation of the results, in (S1) the signals are presented up to , since after there is no change-point and in all methods the estimated signal continues linearly beyond that point. For the same reason, in Figure 2 which is related to (S2), the results are presented up to .
The NOT and WBS methods also operate on sub-intervals of the data. However, the nature of the fixed, certain (we can expand one data point at each time), localization in ID means that it is of an order of magnitude faster than the aforementioned methods, which have high computational cost that increases linearly with the number of the randomly drawn intervals. This is an issue of fundamental importance, especially in signals with a large number of change-points, in which NOT and WBS need to increase the number of intervals drawn. However, doing this also increases the computational cost. More specifically, one could try and draw all possible combinations of start- and end-points of the intervals; however, the computational complexity turns out to be cubic in . In contrast, due to the explained interval expansion approach, in ID no choice of is required, which leads to better practical performance with more predictable execution times, while at the same time ID examines all possible change-point locations. We recall that unlike ID and NOT, the principle of WBS does not extend to models other than piecewise-constant. To be more precise, this generality of Isolate-Detect with respect to its applicability in many different signal structures is a main distinction between our method and recently published competing methods which, with the exception of NOT, have been developed to cover only the detection of level-changes.
3 Methodology and Theory
3.1 Methodology
The model is given in (1) and the unknown number, , of change-points can possibly grow with . Let and and let . For clarity of exposition, we start with a simple example before providing a more thorough explanation of how ID works. Figure 3 covers a specific case of two change-points, and . We will be referring to Phases 1 and 2 involving six and four intervals, respectively. These are clearly indicated in the figure and they are only related to this specific example, as for cases with more change-points we would have more such phases. At the beginning, , , and we take (how to choose will be described in Section 4.2). Suppose the threshold has been chosen well enough (more details in Section 4.2) so that gets detected in , where . After the detection, is updated as the start-point of the interval where the detection occurred; therefore, . In Phase 2 indicated in the figure, ID is applied in . Intervals 1, 3 and 5 of Phase 1 will not be re-examined in Phase 2 and gets, upon a good choice of , detected in , where . After the detection, is updated as the end-point of the interval where the detection occurred; therefore, . Our method is then applied in ; supposing there is no interval on which the contrast function value exceeds , the process will terminate.
We now describe ID more generically. For each change-point, , ID works in two stages: Firstly, we isolate in an interval that contains no other change-point. To ensure this, the expansion parameter can be taken to be as small as equal to 1. If , then isolation is guaranteed with high probability. Theoretically for large , the chosen value for (this typically will be small; see Section 4.2 for more details) is guaranteed to be smaller than the minimum distance (which has to grow with ) between two consecutive change-points and isolation will be guaranteed. For an explanation on the rate of with respect to the sample size , see the discussion that follows Theorem 3.1. (Of course when asymptotics is put aside, in finite samples anything can happen, and in some configurations no method can be guaranteed to detect change-points if they are arbitrarily close.) The second stage is to detect through the use of an appropriate contrast function. This function is, from now on, denoted by , and it is defined for any integer triple , with . Heuristically, the value of is small if is not a change-point and large otherwise. In piecewise-constant signals, the contrast function reduces to the absolute value of the CUSUM statistic defined in (4), while for continuous, piecewise-linear signals, the contrast function is given in Section 3.2. For the better understanding of the method, we provide its step-by-step simple outline through pseudocode, followed by a succinct narrative of the purpose of each step. The threshold to be used, in order to decide if a change has occurred at a specific data point, is denoted by . Practical choices for and are given in Section 4.2. For , let and for , while and . For a generic interval , define the sequences
[TABLE]
where and . Denoting by , the cardinality of any sequence , and by its element, the pseudocode of the main function is as below:
**Pseudocode explaining the proposed ID algorithm
function** ISOLATEDETECT()
if then
STOP
else
For denote For denote \left[s_{2j},e_{2j}\right]:=\left[{\rm L}_{s,e}(j),e\right]$$i=1(Main part)
if then
add to the set of estimated change-points.ISOLATEDETECT()
else
if then
add to the set of estimated change-points.ISOLATEDETECT()
else
if then
Go back to (Main part) and repeat
else
STOP
end if
end if
end if
end if
end function
A brief explanation of the pseudocode follows. With already defined above, the intervals are those used for the isolation step. Notice that in the odd intervals the start-point is fixed, unchanged, and equal to , meaning that . In the even intervals , it is the end-point that is kept fixed and equal to , meaning that . The process will follow until there are intervals to check. The term “expanding intervals” that is used throughout the paper is due to this one-sided expansion (of magnitude ) of the intervals. The pseudocode makes it also clear that ID is looking for change-points interchangeably in right- and left-expanding intervals which, with high probability, contain at most one change-point. The Isolate-Detect procedure is launched by the call ISOLATEDETECT().
The idea of a-posteriori change-point detection, in which change-points are detected sequentially, has appeared previously in the literature. The PS method of Venkatraman (1992) studies the multiple change-point detection problem for the case of piecewise-constant mean signals, as well as for changes in the rate of an exponential process. The CPM method of Ross (2015) treats change-point detection in the mean or variance of a sequence of random variables when their distribution is known. In addition, CPM can be used for distributional changes. Fang and Siegmund (2020), in a work completed after the first version of the current paper appeared on arXiv, search for significant change-points in settings such as piecewise-linear, and one of their algorithms, labelled Seq, bears some resemblance to ID; we note, however, that in addition to some algorithmic differences our aim is different as we focus on consistent estimation while Fang and Siegmund (2020) on testing.
ID is conceptually and in practice different from these methods in a number of ways related to the threshold choice, the construction of the estimated change-point locations as well as the way PS, CPM, and Seq restart upon detection. Furthermore, ID’s isolation technique does not appear in CPM. By contrast, we use this isolation property of ID as a device enabling its use in piecewise-(higher-order-) polynomial models. Indeed, as shown in Baranowski et al. (2019), fast segmentation of signals of the latter type is difficult to achieve unless any change-point present can be isolated away from neighbouring change-points before detection is performed, which is exactly what ID sets out to do. In particular, this paper demonstrates the use of ID in continuous piecewise-linear models. A comparison between the performance of ID and that of state-of-the-art methods is given in Section 5.
3.2 Theoretical behavior of ID
The assumption of the random sequence being independent and identically distributed (i.i.d.) from the Gaussian distribution is widely used in the literature. In this paper, the Gaussianity assumption is only made for technical convenience with respect to the proofs of Theorems 3.1 and 3.2. Relaxing both the Gaussianity and the independence assumptions in order to have time-dependent errors is a more complicated issue in terms of theory development. Recently, Dette et al. (2018) have attempted to treat this issue, specifically for the SMUCE approach of Frick et al. (2014), using a reliable estimate for the long run variance, , of the error distribution, which is not necessarily Gaussian.
Apart from the well-studied i.i.d. Gaussian noise structure, Isolate-Detect is explored under a variety of settings including i.i.d. non-Gaussian (see Section 4.5), and auto-correlated noise structures; see Fearnhead and Rigaill (2020) who conclude that “IDetect has very strong performance for many scenarios when either we have auto-correlated or heavy-tailed noise”.
If the standard deviation, , of is unknown, then we need to estimate it and in the cases of independent errors with the signal being piecewise-constant or piecewise-linear, can be estimated via the Median Absolute Deviation (MAD) method proposed in Hampel (1974). For , the proposed estimator, denoted by , has been shown to be, for , a consistent estimator of the population standard deviation in the case of Gaussian data (Rousseeuw, 1993). It is very robust as evidenced by its bounded influence function and its 50% breakdown point. For simplicity, let , and (1) becomes
[TABLE]
With and , and for , we examine the theoretical behaviour of ID in the following two illustration cases:
Piecewise-constant signals: for , and .
Continuous, piecewise-linear signals: , for with the additional constraint of for . The change-points, , satisfy .
The above scenarios are only examples of settings in which the ID methodology can be applied. The isolation aspect of the method allows its application to various different cases, such as the estimation of the number and the position of knots in piecewise polynomial signals (with or without the continuity constraint).
Piecewise-constant signals. Under piecewise-constancy, the contrast function used is the absolute value of the CUSUM statistic, the latter being
[TABLE]
where and . Under the i.i.d. Gaussian framework used for the theoretical results presented in this paper, it can be shown that , where is the generalized log-likelihood ratio statistic for all potential single change-points within . For the main result of Theorem 3.1, we also make the following assumption.
- (A1)
The minimum distance, , between two change-points and the minimum magnitude of jumps, , are connected by , for a large enough constant .
The number of change-points, , is assumed to be neither known nor fixed. It can grow with and the only indirect assumption on is due to the minimum distance, , between two change-points in the sense that . Below, we give the theoretical result for the consistency of the number and location of the estimated change-points. The proof is in Section 8 of the supplementary material.
Theorem 3.1
Let follow model (3), with being a piecewise-constant signal and assume that the random sequence is independent and identically distributed (i.i.d.) from the normal distribution with mean zero and variance one and also that (A1) holds. Let and be the number and locations of the change-points, while and are their estimates sorted in increasing order. In addition, , . Then, there exist positive constants , which do not depend on , such that for and for a sufficiently large , we obtain
[TABLE]
The isolation aspect of Isolate-Detect helps us to prove consistency under the conditions used in Theorem 3.1 (and later in Theorem 3.2). From (5), we notice that in order to be able to match the estimated change-point locations with the true ones, should be larger than , meaning that must be at least . For this order of , Chan and Walther (2013) argue that the smallest possible that allows change-point detection is . In our case, assumption (A1) ensures that the rate for is attained, which is nearly optimal (up to the double logarithmic term). This provides evidence that ID allows for detection in complex scenarios, such as limited spacings between change-points. We mention that if is of higher order than , then Assumption (A1) implies that could decrease with .
The quantity on the right-hand side of (5) is ; the same order as in WBS and NOT. However, ID gives a provably lower constant for the bound. To understand this consistency advantage of our method over, for example, NOT see our proof in Section 8 of the supplement and compare (17) with the result in Equation (19), p.28 in the online supplementary material of Baranowski et al. (2019). The rate of the lower bound for the threshold is and this is what will be used in practice as the default rate: we use
[TABLE]
and the choice of the constant will be explained in Section 4. Furthermore, (5) indicates that does not affect the rate of convergence of the estimated change-point locations; these only depend on .
Continuous, piecewise-linear signals. Under Gaussianity and with being the generalized log-likelihood ratio for all possible single change-points within , the idea is to find a contrast function , which is maximized at the same point as . The contrast function is constructed by taking inner products of the data with a contrast vector. In the case of continuous piecewise-linear signals, Baranowski et al. (2019) show that the contrast vector to be used is , where
[TABLE]
where , and . The contrast function is . To explain the reasoning behind the choice of the triangular function , we define, for the interval , the linear vector , (and 0 otherwise) as well as the constant vector , (and 0 otherwise). On the vector , (and 0 otherwise), which is linear with a kink at , we apply the Gram-Schmidt orthogonalization with respect to and . Normalizing the obtained vector such that returns the contrast vector defined in (7). The best approximation, in terms of the Euclidean distance, of in is a linear combination of , , and , which are mutually orthonormal (Baranowski et al., 2019). This orthonormality leads to . For the consistency of ID in continuous piecewise-linear signals, we make the following assumption.
- (A2)
The minimum distance, , between two change-points and the minimum magnitude of jumps, , are connected by the requirement , for a large enough constant .
The term characterizes the difficulty level of the detection problem and is analogous to in the scenario of piecewise-constant signals. Theorem 3.2 gives the consistency result for the case of continuous piecewise-linear signals. The proof is in Section 8 of the supplement.
Theorem 3.2
Let follow model (3) with being a continuous, piecewise-linear signal and assume that the random sequence is independent and identically distributed (i.i.d.) from the normal distribution with mean zero and variance one and that (A2) holds. We denote by and the number and locations of the change-points, while and are their estimates sorted in increasing order. Also, we denote . Then, there exist positive constants , which do not depend on , such that for and for sufficiently large ,
[TABLE]
The quantity on the right-hand side of (8) is . In addition, in the case of , ID’s change-point detection accuracy is , as can be seen from (8). This differs from the rate derived in Raimondo (1998) only by the logarithmic factor. The lower bound of the threshold is . Therefore,
[TABLE]
where is a constant and we will comment on its choice in Section 4.2.
ID is flexible because it does not depend on the structure of the signal; what changes is the choice of an appropriate contrast function. Adopting a similar approach as the one for the case of continuous piecewise-linear signals, one can construct contrast functions for the detection of other types of features.
3.3 Information Criterion approach
Misspecification of the threshold in the ID algorithm can lead to the misestimation of the number of change-points. To remedy this, we develop an approach which starts by possibly overestimating the number of change-points and then creates a solution path, with the estimates ordered according to a certain predefined criterion. The best fit is then chosen, based on the optimization of a model selection criterion.
The solution path algorithm: The estimated number of change-points depends on and this allows us to denote . For given data, we employ ID using first and then , where . Let and be the -associated constants in (6) and (9), respectively. With , we estimate , which are sorted in increasing order in . Our aim is to prune the estimates through an iterative procedure, where at each iteration the estimation most likely to be spurious is removed. The algorithm is split into four parts, with their descriptions being fairly technical. We note however that the different parts are very similar and are based on the idea of removing change-points according to their contrast function values as well as their distance to neighbouring estimates. Even though the full explanation of each part is in Section 1 of the supplement, we now provide a brief summary for the framework of the solution path algorithm. With and , we first collect triplets , and we calculate with being the relevant contrast function. For we check whether , for ; in the proofs of Theorems 3.3 and 3.4, but smaller values could be sufficient; see for example Corollary 1. If , we remove from , reduce by 1, relabel the remaining estimates (in increasing order) in , and repeat this estimate removal process, which is carried out in a way such that once the set contains estimates, then for , each is within a distance of from the true change-point . We keep removing estimates until .
At the end of this change-point removal approach, we collect the estimates in
[TABLE]
where is the estimate that was removed first, is the one that was removed second, and so on. From now on, the vector is called the solution path and is used to give a range of different fits. We define the collection where and . For , let be the sorted elements of . Among the collection of models , we propose to select the one that minimizes the strengthened Schwarz Information Criterion (Liu et al., 1997; Fryzlewicz, 2014), defined as
[TABLE]
where and for each collection , and are the maximum likelihood estimators of the segment parameters for the model (3) with change-point locations . The quantity is the total number of estimated parameters related to . For example, if we do not consider the change-point locations as free parameters, then in the scenario of piecewise-constant mean (the constant values for each of the segments), while in the scenario of continuous and piecewise-linear signals (the starting intercept and slope and the changes in the slope). We mention that if the continuity constraint is to be removed, then would be equal to (the constant and slope values for the segments). If now we consider the change-point locations to be free parameters, then we just need to add in the above values for in the different scenarios.
In the algorithm we have referred to three parameters: , and . Although we do not give a recipe for the choice of and , Section 4.3 describes how to circumvent their choice. With respect to , taking its value to be equal to 1 in (11) gives the standard SIC penalty, but our theory requires . In practice we use in order to remain close to SIC. Theorems 3.3 and 3.4 below give the consistency results for the piecewise-constant and continuous piecewise-linear models, based on the sSIC approach. The proof of Theorem 3.3 is in the supplementary material and the same approach can be followed to prove Theorem 3.4.
Theorem 3.3
Let follow model (3) under piecewise-constancy and let the assumptions of Theorem 3.1 hold. Let and be the number and locations of the change-points. Let , where can also grow with . In addition, let be such that is satisfied, where and are defined in (A1). With being the set of candidate models obtained by the solution path algorithm, we define . Then, there exist positive constants , which do not depend on , such that for ,
[TABLE]
Theorem 3.4
Let follow model (3) under continuous piecewise-linearity and let the assumptions of Theorem 3.2 hold. Let and be the number and locations of the change-points. Let , where can also grow with . In addition, let be such that is satisfied, where and are defined in (A2). With being the set of candidate models obtained by the solution path algorithm, we define . Then, there exist positive constants , which do not depend on , such that for ,
[TABLE]
We note that our solution path algorithm, explained in detail in Section 1 of the supplementary material, allows , the number of the detections from the already explained overestimation process, to grow with . The quantities on the right hand sides of (12) and (13) are ; the same order as those in (5) and (8). The lowest admissible and in Theorems 3.3 and 3.4, respectively, are slightly larger than the same quantities in the thresholding approach. Our empirical expertise suggests that SIC-based approaches tend to exhibit better practical behaviour for signals that have a moderate number of change-points and/or large spacings between them. A hybrid that combines the advantages of the thresholding and the SIC-based approach is introduced in Section 4.4.
4 Computational complexity and practicalities
4.1 Computational cost
With being the minimum distance between two change-points, and the interval-expansion parameter, we use . We note that while is unknown, choosing small enough guarantees with high probability that this requirement holds; see Section 4.2 for how to choose in order to obtain good accuracy performance and at the same time low computational cost. Now, since and the total number, , of intervals required to scan the data is no more than ( intervals from each expanding direction), in the worst case scenario we have . As a comparison, in WBS and NOT one needs to draw at least intervals where . The lower bound for in WBS and NOT is up to a logarithmic factor, whereas the lower bound for is . This results in great speed gains of ID over WBS and NOT. The reason behind this significant difference in the computational complexity of the methods is that in WBS and NOT both the start- and end-points of the randomly drawn intervals have to be chosen, whereas in ID, depending on the expanding direction, we keep the start- or the end-point fixed.
4.2 Parameter choice
Choice of the threshold constant. We start with an upper bound on the constant , as defined in (6), for the case of piecewise-constant signals when the error terms are i.i.d. from the Gaussian distribution. We note that this result is of independent interest. Our model is as in (1) for stationary . For any vector , we define
[TABLE]
where and . It can be shown that if , are serially independent and their distribution is symmetric about zero (for example i.i.d. standard Gaussian random variables), then the sequence satisfies
[TABLE]
The following corollary indicates that as , we have that , meaning that the threshold can be taken to be at most . This value of is smaller than the constant used in the solution path algorithm of Section 3.3 (), which can however be used to give explicit upper bounds on the consistency results as explained in Theorems 3.1 and 3.3; in contrast, Corollary 1 does not give an explicit upper bound for the probability related to the consistency result as expressed in (16). We highlight that the aforementioned bound on the constant and its proof are simpler than the results presented in Fang et al. (2020) which involve the manipulation of complex distributions. The proof is in the supplementary material.
Corollary 1
Let be i.i.d. . For any ,
[TABLE]
For the practical choice of the values of and , in (6) and (9), respectively, we ran a large-scale simulation study involving a wide range of signals. The number of change-points, , was generated from the Poisson distribution with rate parameter . For , we uniformly distributed the change-points in . Then, for piecewise-constant (or continuous piecewise-linear) signals, at each change-point location we introduced a jump (or a slope change) which followed the normal distribution with mean zero and variance . Standard Gaussian noise was then added onto the simulated signal. For each value of , and we generated 1000 replicates and estimated the number of change-points using ID with threshold as in (6) and (9) for a variety of constant values and . The best behaviour occurred when, approximately, and . These values will be referred to as the default constants and they hold true for all signals that satisfy the assumption of the error terms being i.i.d. Gaussian. We note that the value of does not violate Corollary 1 because the result expressed in the latter is only for piecewise-constant signals, while the constant applies to the scenario of continuous, piecewise-linear signals. Due to the fact that the contrast function used is based on local averaging, the CLT can be used to show that for sufficiently large sample size , ID is robust when the normality assumption is not satisfied; this has also been explored in Fearnhead and Rigaill (2020). Also, pre-averaging is a practical approach that we employed in Section 4.5 for such cases with error departures from Gaussianity.
In the SIC-based approach of Section 3.3, we started by detecting change-points using threshold . In practice, we take the constants related to , namely and as defined in Section 3.3, to be 0.9 and 1.25, respectively.
Choice of the expansion parameter . We start by highlighting that our numerical experience suggests that ID is robust to small changes in the value of ; for a small-scale simulation study when the value of changes significantly (), see Section 6 of the supplementary material. Theoretically, for a given signal, the change-point detection results obtained from ID are the same for any value of used which is less than the minimum spacing between two successive change-points. The computational cost of running ID is inversely proportional to the size of the expansion parameter; the smaller the , the more intervals we need to work on. However, the low computational complexity of our algorithm allows us to take to be as small as the value of three leading to very good accuracy even for signals with frequent change-points. We now give example execution times for two models, (T1) and (T2) defined below, on a 3.60GHz CPU with 16 GB of RAM. We employed the ID-variant for long signals explained in Section 4.3.
- (T1)
Length , with change-points at and values between them . The standard deviation is . Execution times: 0.31s (), 2.25s (), 26.41s ().
- (T2)
Length , with no change-points. We use . Execution times: 0.64s (), 3.01s (), 30.35s ().
4.3 Variants
Here, we describe three different ways to further improve ID’s practical performance.
Long signals: If is large, we split the given data sequence uniformly into smaller parts (windows), to which ID is then applied. In practical implementations, the length of the window is 3000 and we apply this structure only when , because for smaller values of there are no significant differences in the execution times of ID and its window-based variant. The computational improvement that this structure offers is explained in Section 3 of the supplement.
Restarting after detection: In practice, instead of starting from the end-point (or start-point ) of the right-expanding (or left-expanding) interval where a detection occurred, we could start from the estimated change-point, . This alternative, labelled , leads to accuracy improvement without affecting the speed of the method.
Faster solution path algorithm: In practice, we use only Part 4 of the solution path algorithm described in Section 1 of the supplement because it is quicker and conceptually simpler; it requires only the choice of and tends not to affect ID’s accuracy.
4.4 Alternative model selection criteria
A hybrid between thresholding and SIC stopping rules: For signals with a large number of regularly occurring change-points, the threshold-based ID tends to behave better than the SIC-based procedure. As explained after Theorems 3.3 and 3.4, this is unsurprising because SIC-based approaches typically perform better on signals with a moderate number of change-points separated by larger spacings. This difference in ID’s behaviour between the threshold- and SIC-based versions is what motivates us to introduce a hybrid of these two stopping rules with minimal parameter choice, which works as follows. Firstly, we estimate the change-points using the threshold approach with . If the estimated change-points are more than a constant , then the result is accepted and we stop. Otherwise, the hybrid method proceeds to detect the change-points using the SIC-based approach with , since the already-applied thresholding rule has not suggested a signal with many change-points. In the simulations, we use , .
Steepest Drop to Low Levels (SDLL): We also combine ID with the SDLL model selection method introduced in Fryzlewicz (2020).
4.5 Extension to different noise structures
This section describes how to use ID when the noise is not Gaussian. We pre-process the data in order to obtain a noise structure that is closer to Gaussianity. For a given scale number and data , let and , for , while . We apply ID on to obtain the estimated change-points, namely , in increasing order. To estimate the original locations of the change-points we define . The larger the value of , the closer the distribution of the noise to normal, but the more the amount of pre-processing. In simulations presented in Section 5, we use for the case of Student- distributed noise, while if the tails are heavier (Student-), we set . The hybrid version of ID will be employed on and in order to be consistent with the choice of the expansion parameter, we take . In practice, for unknown noise, our recommendation is to set .
5 Simulations
This section compares the performance of ID with competitors. The main change-point detection functions in the competing packages were called using their default input arguments, which does not always allow direct like-for-like comparisons of the methods. Whenever needed (difficult signal structures), and in order to help the competitors capture their best possible performance, the input values were adjusted accordingly. The code used for the simulation study is available from Github at https://github.com/Anastasiou-Andreas/IDetect/blob/master/R/Simulations_used.R. Table 2 shows the competitors used.
CPOP is employed based on code found in http://www.research.lancs.ac.uk/portal/en/datasets/cpop(56c07868-3fe9-4016-ad99-54439ec03b6c).html and TF in https://stanford.edu/~boyd/l1_tf. For WBS, we give results based on both the information criterion and the thresholding (for ) stopping rules. The notation is WBSIC and WBSC1, respectively. With respect to WBS2, its performance is investigated based on the SDLL model selection criterion introduced in Fryzlewicz (2020). In the cpm package, the threshold is decided through the average run length (ARL) until a false positive occurs. In our simulations, we give results for (the default value) and if the signal length, , is greater than 500, results are also given for . The notation is CPM., with the value of ARL. For FKS, when the number of knots is unknown (the scenario we work in), we need to specify the maximum allowed number of knots. We take this to be 2, with the true number of change-points. Also, the estimated change-points by FKS are positive real numbers; we take as estimation the closest integer. The proposed ID version is the hybrid described in Section 4.4. However, we also present the results for two more variants: SDLL and thresholding with constant (see (6)), which is the upper bound proven in Corollary 1. The notation for these variants is ID.SDLL and ID, respectively.
A seemingly difficult structure for ID: Signals that present the most difficulty to ID are ones in which change-points are concentrated in the middle part of the data and offset each other, as in Figure 4. The reason is that due to the left- and right- expanding feature of ID, where one of the two end-points of the interval is kept fixed, the change-points need to be detectable based on relatively “unbalanced” (explanation follows directly below) tests, which typically tend to offer poor power. For example, referring again to Figure 4, the change-point at 490 will need to be isolated and detected by comparing the means of the data over the long interval and a short interval of the form , where is the end-point of a right-expanding interval . To be more precise, if the expansion parameter , then and therefore our procedure will have seven opportunities to detect the change-point 490 while it is still isolated in intervals that do not contain any other change-points. Even though ID would be expected to struggle in detecting the change-points in such unbalanced intervals, our numerical experience suggests that its performance on such challenging signals is in fact very good and matches or surpasses that of the best competitors; see for example the results in Table 3 for the model (M4), which follows this structure.
All the signals are fully specified in Section 2 of the supplementary material. Figure 5 shows examples of the data generated by models (M1) blocks, (M2) teeth, (M4) middle-points, and (W1) wave 1. Tables 3–7 summarize the results in the case of i.i.d. Gaussian noise. Table 8 presents the behaviour of ID under the setting of i.i.d. scaled Student- noise, where . More examples are in the supplement.
We highlight that the NOT, WBSIC, and S3IB methods require the specification of the maximum number, , of change-points allowed to be detected. If the default values in these methods are lower than the true number of change-points in the simulated examples, then we take , where is the minimum distance between two change-points. We ran 100 replications for each signal and the frequency distribution of for each method is presented. The methods with the highest empirical frequency of (or in a neighbourhood of zero, depending on the example) and those within off the highest are given in bold. As a measure of the accuracy of the detected locations, we provide Monte-Carlo estimates of the mean squared error, , where is the ordinary least square approximation of between two successive change-points. In continuous piecewise-linear signals, is the splines fit obtained using the splines package in R. The scaled Hausdorff distance, where is the length of the largest segment, is also given in all examples apart from the signal (NC) in Table 5, which is a constant-mean signal with no change-points.
The average computational time for all methods, apart from FDR, is also provided. FDR is excluded due to its non-uniform procedure in terms of the execution speed for each signal (if a newly obtained signal has length greater than previously treated signals, then FDR estimates the threshold by 5000 Monte-Carlo simulations, which makes it slow). In some cases the average computational time for FKS is not given. We have already explained that we need to pre-specify the maximum allowed number of knots in order for FKS to work. The method is somewhat slow and we exclude the results for FKS when the true change-points are more than 10, as in such cases it would take a significant amount of time to finish all the 100 simulations.
With regards to piecewise-constancy, ID is always in the top of the best methods when considering accuracy in any aspect (estimation of , MSE, ); in most cases it is the best method overall. ID.SDLL is also, in most cases, in the top 10% of the best performing methods; this provides evidence that the Isolate-Detect algorithm can be combined with various model selection criteria (thresholding, SIC, SDLL) and maintain a good practical behaviour. When the threshold constant, , is equal to , the behaviour of ID remains good for signals that have a moderate number of change-points that are not near each other. As we can see from Table 4, ID seems to struggle in scenarios with a large number of frequently occurring change-points. In continuous piecewise-linear signals, CPOP, ID, and ID.SDLL are in all cases in the top of the best methods in terms of the accurate estimation of . In terms of the MSE and , CPOP is by a narrow margin the overall best method, with ID and ID.SDLL coming second and third, respectively. We can deduce that our method exhibits uniformity in detecting with high accuracy the change-points for various different signal structures, a characteristic which is at least partly absent from the majority of its competitors. Furthermore, ID’s behaviour is particularly impressive in extremely long signals with a large number of frequently occurring change-points; see Tables 4 and 7. Compared to other well-behaved methods, such as NOT for piecewise-constancy and CPOP for continuous piecewise-linear signals, our methodology has by far the lowest computational cost. To conclude, the simulation study provides evidence that Isolate-Detect is an accurate, reliable, and quick method for generalized change-point detection.
The results of Table 8 are very good for and not too different from those under Gaussian noise. For , there is a slight overestimation of the number of change-points. When the tails of the distribution of the noise are significantly heavier than those of the normal distribution, one can obtain better results by increasing the threshold constant. For example, the results in Table 8 for were improved when the threshold constant was slightly increased. We highlight that more thorough simulations can be done using our R packages IDetect and breakfast and code available from https://github.com/Anastasiou-Andreas/IDetect/blob/master/R/Simulations_used.R.
6 Real data examples
6.1 UK House Price Index
We investigate the performance of ID on monthly percentage changes in the UK House price index from January 1995 to December 2020 in two London Boroughs: Tower Hamlets and Hackney. The data are available from http://landregistry.data.gov.uk/app/ukhpi and they were accessed in March 2021. Figure 6 shows the fits of ID, ID.SDLL, NOT, and TGUH. In both data sets, ID behaves similarly to NOT whereas ID.SDLL’s performance is closer to that of TGUH where we detect more change-points. This difference between the examined methods is, in our opinion, due to the fact that ID in this example and NOT detect change-points based on the Schwarz Information Criterion, so fewer estimated change-points can be expected. The detection of two change-points near March 2008 and September 2009 for both boroughs may be related to the financial crisis during that time, which led to a decrease in house prices. As explained in Section 3.3, our methodology returns the solution path defined in (10), which can be used to obtain different fits; see Section 7 in the supplement for more details and for a real-data example where this is useful.
Residual diagnostics have indicated that the behaviour of the raw residuals, , in relation to normality and independence is good for all methods.
6.2 The COVID-19 outbreak in the UK
The performance of ID is investigated on data from the recent COVID-19 pandemic; we employ a continuous piecewise-linear model on the daily number of lab-confirmed cases in England, as well as on the daily additional COVID-19 associated UK deaths. The data concern the period from the beginning of March 2020 until the end of February 2021 and they are available from https://coronavirus.data.gov.uk. The data were accessed on the of March 2021. Before applying the various methods to the data, we bring the distribution closer to Gaussian with constant variance. To achieve this we perform the Anscombe transform, , with as described in Anscombe (1948). We denote the transformed number of COVID-19 cases by and the transformed number of COVID-19 associated deaths by . Figure 7 presents the results of ID, ID.SDLL, CPOP, and NOT for the transformed data.
We observe that ID, ID.SDLL, and NOT have a similar behaviour, while CPOP gives a higher estimated number of change-points. In an attempt to date the detected change-points by ID, we provide a possible explanation of their location with respect to the outbreak of the pandemic in the UK; this discussion is given in Section 4 of the supplementary material.
For another example related to the continuous, piecewise-linear case, see Section 7 of the supplement where we explore the behaviour of Isolate-Detect and two competitors, CPOP and NOT, on the daily closing stock prices of Samsung Electronics Co. from July 2012 until June 2020.
7 Concluding reflections on ID
In this paper, we have proposed Isolate-Detect which is a new, generic technique for multiple generalized change-point detection in noisy data sequences. The method is based on a change-point isolation approach which seems to provide an advantage in detection power, especially in complex structures where most state-of-the-art competitors seem to suffer (see the simulations in Section 5) such as limited spacings between change-points. In addition, the aforementioned isolation aspect allows the extension of our method to the detection of knots in higher-order polynomial signals. As already mentioned in Section 1, NOT, WBS, and WBS2 also work on sub-intervals of the data, but the way the isolation is carried out in ID, where one of the end-points of the subintervals is kept fixed, provides predictable execution times for the analysis of a given data sequence, which are faster than the aforementioned competitors; see Sections 4.1 and 5. Another advantage of our method over NOT, WBS and WBS2 is that, due to its pseudo-sequential interval expansion character, it can easily be applied for online change-point detection.
In Section 4.4, a variant of ID was introduced that combines the threshold- and SIC-based versions of our proposed method with the aim to enhance its accuracy (both in terms of the estimated number and the estimated change-point locations) for signals of different structures with respect to the true number of change-points and the distance between them. In addition, due to the way that the relevant hybrid approach has been developed in Section 4.4, we manage to offer, for ease of execution, minimal parameter choice. Apart from thresholding and SIC, we have also combined ID with the SDLL model selection criterion.
In the practical applications of Sections 5 and 6, compared to the state-of-the-art competitors, ID lies in the top 10% (in terms of the accurate estimation of the number and the location of the change-points) of the best methods. Furthermore, it exhibits a notable advantage over other techniques in long signals with many change-points that occur frequently. In addition, ID’s pseudo-sequential character assists in attaining a low computational time; our method can accurately analyse signals of tens of thousands with thousands of change-points in less than a second; see for example Table 4. In cases where the normality assumption for the error terms is violated, Section 4.5 provides a practical solution where pre-processing allows us to use ID without altering the proposed parameter values. The results of simulations from a Student- distribution with two options for the degrees of freedom are in Table 8.
Since no method has a uniformly best behaviour, it is natural to also highlight the weaknesses of our method in terms of its practical behaviour. To start with, ID can be slow in long and constant signals in which change-points do not occur. This is because of the expanding intervals attribute, which in the case of no change-points will push the method to keep testing for change-points in growing, overlapping intervals. This is inevitably going to lead to high computational costs. We tried to eliminate this weakness by introducing a window-based variant, as explained in Section 4.3. Another drawback of the method is that, due to its left- and right-expanding feature, the change-points need to be detectable based on relatively unbalanced intervals. This could lead to accuracy issues in signals where the change-points are in the middle of the data sequence and offset each other. In practice, we have not encountered this type of behaviour in ID; in particular it accurately detects the change-points for the model (M4) in Table 3, which is an example of the aforementioned structure with two nearby change-points in the middle of the data sequence.
1 The different parts in the removal process of the solution path algorithm
The four different parts of the removal process applied in order to create the solution path explained in Section 3.3 of the paper are very similar and are based on the idea of removing change-points according to their contrast function values as well as their distance to neighbouring estimates. We mention that if the algorithm proceeds from Part 1 to Part 2 as below, then it is guaranteed that it will also proceed up to Part 4. All events below occur with probability tending to one with .
Part 1: With being a positive constant, the aim is to prune the estimates in , such that, for each true change-point, there are at most four and at least one estimated change-point within a distance of . To achieve this, and with , , we collect triplets and we calculate , with being the relevant contrast function. For , firstly we check whether , for ; in the proofs of Theorems 3.3 and 3.4, , but smaller values could be sufficient. If and also , we remove from , reduce by 1, relabel the remaining estimates (in increasing order) in , and repeat this estimate removal process. We proceed to Part 2 when . If this is not satisfied at any point of this part, then we conclude that there are no change-points in the data sequence and we stop.
Part 2: The aim is to continue the pruning process of Part 1, in a way that at the end of Part 2 there is at least one estimate within a distance of from each true change-point, but also there are at most two estimates between any pair of consecutive true change-points. For the relabelled estimates in after the completion of Part 1, if , then we remove , relabel the remaining estimates, and keep removing the estimates until there is no pair , such that . We then calculate as in Part 1 and for , if , then we remove and relabel the remaining elements of . This removal process is repeated and we proceed to Part 3 only when .
Part 3: We need to ensure that once contains estimates, then for , each is within a distance of from . To achieve this, for the remaining estimated change-points after Part 2, we use triplets , with and . For , if , then we remove and relabel the remaining estimates in in increasing order. We repeat this removal procedure until , which is when we proceed to Part 4.
Part 4: For the estimated change-points that are in after Part 3 is completed, we use again the triplets in order to find and then remove from . This estimates removal approach is repeated until .
2 Models used in the simulation study of the paper
The characteristics of the test signals as well as the standard deviations of the noise , which were used in the simulation study are given in the list below.
- (M1)
blocks: length 2048 with change-points at 205, 267, 308, 472, 512, 820, 902, 1332, 1557, 1598, 1659 with values between change-points 0, 14.64, 3.66, 7.32, 7.32, 10.98, 4.39, 3.29, 19.03, 7.68, 15.37, 0. The standard deviation is .
- (M2)
teeth: length 140 with change-points at 11, 21, 31, 41, 51, 61, 71, 81, 91, 101, 111, 121, 131 with values between change-points 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1. The standard deviation of the noise is .
- (M3)
stairs: length 150 with change-points at 11, 21, 31, 41, 51, 61, 71, 81, 91, 101, 111, 121, 131, 141 with values between change-points 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15. The standard deviation of the noise is .
- (M4)
middle-points: length 2000 with change-points at 1000 and 1020 with values between change-points 0, 1.5, 0. The standard deviation of the noise is .
- (M5)
long teeth: length 20000 with 1999 change-points at with values between change-points . The standard deviation is .
- (NC)
constant signal: length 3000 with no change-points. The standard deviation is .
- (W1)
wave 1: piecewise-linear signal without jumps in the intercept, , with 7 change-points at with the corresponding changes in slopes , starting intercept and slope = . The standard deviation of the noise is .
- (W2)
wave 2: piecewise-linear signal without jumps in the intercept, , with 99 change-points at . The corresponding changes in the slope are , while the starting intercept is and the starting slope is = . The standard deviation is .
- (W3)
smoother signal 1: piecewise-linear signal without jumps in the intercept, , with 9 change-points at with the corresponding changes in slopes . The starting intercept is and slope = . The standard deviation of the noise is .
- (W4)
smoother signal 2: piecewise-linear signal without jumps in the intercept, , with 19 change-points at with the corresponding changes in slopes , starting intercept and slope . The standard deviation of the noise is .
3 Improvement of ID in the case of big data
In this section, we show through simulations that applying ID on a fixed window grid improves its speed in large data sets, without affecting its accuracy. We compare the classic ID method as explained in Section 3.1 of the main paper with the new window-grid-based version (WID) of Section 4.3 in the case of three data sequences of length , each with standard Gaussian noise. We work under the scenario of piecewise-constant mean. The three signals are
- (D1)
No change-points;
- (D2)
three change-points at 25000, 55000, 85000 and the values between change-points are 0,3,-3,2;
- (D3)
seven change-points at 16000, 22000, 28000, 46000, 62000, 74000, 86000 and the values between change-points are 0,4,-4,4,-4,4,-4,4.
We took the expansion parameter to be equal to 10 and the results are shown in Table 1. As a measure of the accuracy of the detected locations, we provide Monte-Carlo estimates of the mean squared error, . The scaled Hausdorff distance,
[TABLE]
where is the length of the largest segment, is also given for (D2) and (D3); for (D1), is not informative. In terms of accuracy, both methods exhibit excellent behaviour. However, in terms of speed, the advantage of the windows-based approach is obvious. Note the decrease in the computational time of ID when the number of change-points gets larger. This is expected because the worst case in terms of computational complexity is when there are no change-points because ID will then be forced to calculate the contrast function on quite large intervals, even on , which is computationally more expensive.
4 A justification of the estimations in the example of Section 6.2
Here, we provide a possible explanation of the three most important (based on the solution path) detected change-points by ID in the real data related to the COVID-19 outbreak in the UK. We focus on the signal obtained with respect to the daily number of deaths (ID and ID.SDLL give exactly the same outcome), but similar conclusions can be extracted for the daily number of cases. The three most important changes in the behaviour of the data sequence have been detected on the of April, the of July and the of January. Some justification of these results is as follows:
- •
The change-point on the of April, shows that the upward trend vanishes and a negative slope takes its place leading to a vast decrease in the number of reported deaths. There is an obvious connection of this change-point with the British Government’s important decision in late March to impose a general lockdown. On 9 April, the Secretary of State for Foreign Affairs stated that the UK was starting to see the impact of the restrictions; our second detection is in full agreement with the aforementioned date and statement.
- •
With respect to the change-point on the of July, it indicates a stabilisation, at a low level, for the number of deaths. This is, of course, expected because the downward trend could not continue indefinitely.
- •
The last change-point on the of January indicates that an upward trend on the daily number of deaths, which was apparent for a period of almost 3.5 months, vanishes and its place takes a significantly negative slope. There is an obvious connection of this change-point with the vaccination programme in the country. More specifically, on 8 January 2021, MRNA-1273 (commonly known as the Moderna vaccine) was the third COVID-19 vaccine approved for use in the UK.
5 Additional simulation results
Here we present the results of simulations for various signals other than those presented in Section 5 of the main article. Tables 3-7 summarize the results for the following models.
- (LS)
long stairs: length 10000 with 499 change-points at with values between change-points . The standard deviation is .
- (LT2)
long teeth 2: length 10000 with 249 change-points at with values between change-points . The standard deviation is .
- (ELT)
extremely long teeth: length 100000 with 19999 change-points at with values between change-points . The standard deviation is .
- (NC2)
constant signal 2: length 300 with no change-points. The standard deviation is .
- (SW1)
wave 5: piecewise-linear signal without jumps in the intercept, , with 119 change-points at with the corresponding changes in slopes , starting intercept , slope = and .
- (SW2)
wave 6: piecewise-linear signal without jumps in the intercept, , with 29 change-points at with the corresponding changes in slopes , starting intercept , slope = and .
- (SW3)
wave 7: piecewise-linear signal without jumps in the intercept, , with 119 change-points at with the corresponding changes in slopes , starting intercept and slope = . The standard deviation is .
The signals (LS), (LT2), (ELT), and (NC2) are treated under piecewise-constancy, while (SW1), (SW2) and (SW3) under the continuous and piecewise-linear case. FDR, WBSIC and S3IB are excluded from the comparative study for the extremely long signal (ELT). For FDR, we had to interrupt the execution after 10 hours, while for WBSIC and S3IB, in order to have a fair comparison of the methods with the rest, we had to increase the default value of the maximum number of change-points allowed to be detected to be greater than 20000. For a single iteration we had to stop the execution for WBSIC and S3IB after 30 minutes.
In all examples, the ID methodology is within of the best method. Once again, it exhibits remarkable behaviour when it comes to very long signals with a large number of frequently appearing change-points; see Tables 3, 4 and 6.
6 Investigation of the impact of different values of on accuracy
In this section, we provide a small-scale simulation study in order to examine the behaviour of ID on different values of the expansion parameter . The simulation set up is as in Section 5 of the main paper and the signals used are those explained in Section 2 of the supplement. For the expansion parameter, we take . The results are in Tables 9 - 15 below.
In terms of accuracy, we notice that in most cases, the best method is ID. In addition, as long as the different values of are all less than the minimum distance, , between two successive change-points, then the results are extremely similar (if not identical); see for example Tables 9 and 13. On the other hand, when the accuracy is getting worse; compare for example the results for the three different values of in Models (M2), (M3), (M5), (W2), and (W3). In terms of speed, as expected, the larger the value of , the quicker the method in general.
7 Additional real-data example
In this section we explore the behaviour of our method and two competitors, CPOP and NOT, to the daily closing stock prices of Samsung Electronics Co. from July 2012 until June 2020. The data are available from https://finance.yahoo.com/quote/005930.KS/history?p=005930.KS and they were accessed in July 2020. We look for changes in a continuous piecewise-linear mean signal. Figure 8 shows the results for the ID, ID.SDLL, NOT and CPOP methods, which detect change-points, respectively. From both the fit and the residuals given in Figure 8, it is not easy to say which of the three methods gives the “best” number of change-points.
ID can return a range of different fits providing users with the flexibility to choose according to their preference. In Figure 9, we use the solution path and we obtain the estimated signal and the raw residuals of ID for and , which are the estimated change-point numbers through NOT and CPOP, respectively. The fit is similar to those obtained by the aforementioned methods as presented in Figure 8. However, we note that both those competitors are significantly slower than ID; see Tables 6 and 7 in the main paper and Tables 6 - 8 in the supplement for a comparison. To conclude, apart from returning the estimated fit, the ID methodology can directly, and without any extra effort, produce a series of estimated signals based on the solution path defined in (10).
8 Proof of the theorems in the paper and of Corollary 1
From now on, the contrast vector is defined through the contrast function
[TABLE]
where . Notice that for any vector , we have that We first present Lemma 1, which is partly used for the proof of Theorem 3.1,
Lemma 1
Suppose is a piecewise-constant vector. Pick any interval such that contains exactly one change-point . Let , , and . Then,
[TABLE]
In addition,
for any , ; 2. 2.
for any , ;
Proof
See Lemma 4 from Baranowski et al. (2019).
**Brief discussion of the steps of the proof of Theorem 3.1
**Before proceeding with the thorough mathematical proof, we give an informal explanation of the main steps. In the main part of the proof, we derive results for the signal . However, the consistency is concerned with the estimated number and locations of the change-points in the observed process . Therefore, in order to be able to deduce consistency related to from our -reliant proof, we need first to show that for all , the observed quantity is uniformly close to the unobserved ; this is achieved in Step 1. In Step 2, for , we control the distance between the noised and its noiseless equivalent for all possible combinations of . This allows us to transfer the decision on whether or is more suitable as a change-point, from to the calculable . Step 3 is the main part of our proof, where we first show that as the ID algorithm proceeds, each change-point will get isolated in an interval where its detection will occur with high probability. Therefore, it suffices to restrict our proof to a single change-point detection framework, and the convergence rate is proved to hold for each estimated location. Because upon detection ID proceeds from the end-point (or start-point) of the interval where the detection occurred, we also show that with probability one there is no change-point in those bypassed points (between the detection and the new start- or end-point). Furthermore, in Step 3 it is shown that, the new start- and end-points are at places that allow the detection of the next change-point. In Step 4, we conclude the proof by showing that after detecting all change-points, then ID, with high probability, will terminate after scanning all the remaining data. We mention that for our proof, we employ Lemma 1 given in the online supplement.
Proof of Theorem 3.1. We will prove the more specific result
[TABLE]
which implies the result in (5).
Step 1: Allow us to denote by
[TABLE]
We will show that . From (3) and (4), simple steps yield , where . Thus, for , using the Bonferroni inequality we get that
[TABLE]
where is the probability density function of the standard normal distribution.
Step 2: For intervals that contain only one true change-point , we denote by
[TABLE]
Because follows the standard normal distribution, then we use a similar approach as in Step 1, to show that . Therefore, Steps 1 and 2 lead to
[TABLE]
Step 3: This is the main part of our proof, where we explain in detail how to get the result in (17). For ease of understanding, we split this step into two smaller parts. From now on, we assume that and both hold. The constants we use are
[TABLE]
where is as in condition (A1).
Step 3.1: For ease of presentation, we take ; see Remark 1 for comments in regards to the general case of , for an . Allow us now , to define the intervals
[TABLE]
In order for and to have at least one point, we actually implicitly require that , which is the case for sufficiently large ; see assumption (A1). Since the length of the intervals in (21) is equal to and , then ID ensures that for and , there exists at least one and at least one that are in and , . At the beginning of our algorithm, , and depending on whether then or will get isolated in a right- or left-expanding interval, respectively. W.l.o.g., assume that . As already mentioned, ID naturally ensures that such that . There is no other change-point in apart from . We will show that for , then . Using (18), we have that
[TABLE]
But,
[TABLE]
By the definition of and from our notation of , we know that . In addition, since , then , meaning that
[TABLE]
The result in (22), the assumption (A1) and the application of (24) in (8) yield
[TABLE]
Therefore, there will be an interval of the form , with , such that contains only and . Let us, for , to denote by the first right-expanding point where this happens and let with . Our aim now is to find such that for any with , we have that
[TABLE]
Proving (25) and using the definition of we can conclude that . Now, since , then (25) can be expressed as
[TABLE]
W.l.o.g. assume that and a similar approach as below holds when . Lemma 1, gives for the left-hand side of the inequality in (8) that
[TABLE]
For the terms on the right-hand side of (8), using (18) we obtain that
[TABLE]
[TABLE]
Therefore (8) is satisfied if the stronger inequality is satisfied, which has solution
[TABLE]
From (27) and since , we deduce that (25) is implied by
[TABLE]
However,
[TABLE]
and this is because if we assume that , then
[TABLE]
This comes to a contradiction to . Therefore, (29) holds and (28) is restricted to , which implies (25). Thus, we conclude that necessarily,
[TABLE]
So far, for we have proven that working under the assumption that and hold, there will be an interval , with , where is an estimation of that satisfies (30).
Step 3.2: After detecting the first change-point, ID follows the same process as in Step 3.1 but in the set , which contains . This means that we bypass, without checking for possible change-points, the interval and we need to prove that:
- (S.1)
There is no change-point in , apart from maybe the already detected ;
- (S.2)
is at a location which allows for detection of .
For (S.1): We will split the explanation into two cases with respect to the location of .
Case 1: . Using (30) and imposing the condition
[TABLE]
then since , we have that
[TABLE]
Since and is already in , then there is no other change-point in apart from . Actually, the result in (31) is not an extra assumption and we will briefly explain the reason at the end of our proof.
Case 2: . Since , then , which means that apart from there is no other change-point in . With , then does not have any change-point.
Cases 1 and 2 above show that no matter the location of , there is no change-point in other than possibly the previously detected . Similarly to the approach in Step 3.1, our method applied now in , will first isolate or depending on whether is smaller or larger than . If then will get isolated first in a left-expanding interval and the procedure to show its detection is exactly the same as for the detection of in Step 3.1. Therefore, for the sake of showing (S.2) let us assume that .
For (S.2): With as in (2), there exists such that , with defined in (21). We will show that gets detected in , for and its detection is , which satisfies . Following similar steps as in (8), we have that for ,
[TABLE]
By construction, and
[TABLE]
which means that and therefore continuing from (32),
[TABLE]
Therefore, for a we have shown that there exists an interval of the form , with . Let us denote by the first right-expanding point where this occurs and let with .
We will now show that . Following exactly the same process as in Step 3.1 and assuming now w.l.o.g. that , we have that for ,
[TABLE]
is implied by . In the same way as in Step 3.1 and by contradiction we can show that and (33) is implied by . Therefore would mean that , which is not true by the definition of . Having said this, we conclude that . Having detected , then our algorithm will proceed in the interval and all the change-points will get detected one by one since Step 3.2 will be applicable as long as there are undetected change-points in .
Denoting by the estimation of as we did in the statement of the theorem, then we conclude that all change-points will get detected one by one and . In addition, as one can see from (31), our process imposes that , which, by the definition of , is implied by
[TABLE]
We will now explain why (34) is not actually an extra assumption but it is implied by our assumption (A1), which requires . Proving that would mean that indeed (A1) implies (34). Due to , we require to be such that . Simple steps yield
[TABLE]
We conclude that , meaning that (34) is something already satisfied due to (A1).
Step 4: The arguments given in Steps 1-3 hold in . At the beginning of the algorithm, and for , there exist such that and such that . As in our previous steps, w.l.o.g. assume that and gets isolated and detected first in an interval , where and it is less than or equal to . Then, is the estimated location for and . After this, the method continues in and keeps detecting all the change-points as explained in Step 3. There will not be any double detection issues because naturally, at each step of the algorithm, the new interval does not include any previously detected change-points. Once all the change-points have been detected one by one, then will contain no other change-points. ID will keep checking for possible change-points in intervals of the form and for and . We denote by any of these intervals. ID will not detect anything in since ,
[TABLE]
After not detecting anything in all intervals of the above form, then the algorithm concludes that there are not any change-points in and stops.
Remark 1
It is interesting to explore what happens when instead of , we use the more general case of , for . The adjustments need to be made are
- (Adj.1)
Instead of the definition in (21), we now have
[TABLE]
Note that the length of the above intervals is , meaning that with probability one there will be at least one left and one right expanding point in each of them because the distance between two consecutive right (left) expanding points is .
- (Adj.2)
Instead of as in (20), we should now use that . This is easy to prove and it will not be shown here.
- (Adj.3)
In (35), we give a lower bound for . Following similar steps, this now becomes
[TABLE]
We see from (Adj.3) that the higher the value of , the smaller the lower bound will be, meaning that the assumption on gets possibly relaxed for larger values of . On the other hand, the results above hold for an expanding level of and thus, we notice that the smaller the value of , the larger the upper bound for the acceptable -values. Our choice of gives a more symmetric aspect to our approach as the length of the intervals and is the same as the minimum distance of their start- and end-points from possible change-points, which is .
We now proceed to prove the result in Theorem 3.2. For the continuous piecewise-linear case, the contrast function values at for the observed data, the signal, and the noise are denoted by , and , respectively. We have and as in the case of piecewise-constancy, . The contrast vector is defined through the contrast function
[TABLE]
where and , with . For any vector , we have that
[TABLE]
Towards the proof of Theorem 3.2, we use Lemmas 2 and 3 given below.
Lemma 2
Suppose is piecewise-linear vector and are the locations of the change-points. Suppose , such that , for some . Let . Then,
[TABLE]
Proof
See Lemma 5 from Baranowski et al. (2019).
Lemma 3
Suppose is piecewise-linear vector. With the locations of the change-points, suppose that , such that for some . Let , , and . Then,
[TABLE]
Furthermore,
for any , ; 2. 2.
for any , .
Proof
See Lemma 7 from Baranowski et al. (2019), where the approach is similar as to the one for Lemma 1.
The steps we follow for the proof of Theorem 3.2 are the same as those explained for the proof of Theorem 3.1.
Proof of Theorem 3.2. We will prove the more specific result
[TABLE]
which implies the result in (8).
Steps 1 and 2: As in Theorem 3.1, let
[TABLE]
The same reasoning as in the proof of Theorem 3.1 leads to and . Therefore, Steps 1 and 2 lead to
[TABLE]
Step 3: This is the main part of our proof, where we explain in detail how to get the result in (36). From now on, we assume that and both hold. The constants we use are
[TABLE]
where is as in assumption (A2).
Step 3.1: First, , we define and as in (21). At the beginning of our algorithm, , and depending on whether then or will get isolated first, respectively. W.l.o.g., assume that . Our aim is to first show that there will be at least an interval of the form , for , which contains only and no other change-point, such that . Due to ID’s nature, for , then such that and there is no other change-point in apart from . We will now show that for , then . Firstly, we have that
[TABLE]
From Lemma 2, we know that . Now, , because for continuous piecewise-linear signals we have that for identifiability purposes. In addition, since , then , meaning that
[TABLE]
The result in (37), the assumption (A2) and (38) yield
[TABLE]
Therefore, there will be an interval of the form , with , such that contains only and . Let us, for , denote by the first right-expanding point where this happens and let with . Note that can not be an estimation of any other change-point as includes only .
Our aim now is to find such that for any with , we have
[TABLE]
Proving (40) and using the definition of we can conclude that . Since , then (40) can be expressed as
[TABLE]
W.l.o.g. assume that and a similar approach as below holds when . We denote by
[TABLE]
and for the terms in the right-hand side of (8), we get that
[TABLE]
while from Lemma 3,
[TABLE]
Therefore (8) is satisfied if the stronger inequality
[TABLE]
is satisfied, which has solution
[TABLE]
Using Lemma 3, we have that (42) is implied by
[TABLE]
However,
[TABLE]
and this is because if we assume that
[TABLE]
yields
[TABLE]
This comes to a contradiction to . Therefore, (44) holds and for sufficiently large ,
[TABLE]
From (45) we deduce that (8) is restricted to
[TABLE]
which implies (40). Therefore, necessarily,
[TABLE]
So far, for we have proven that working under the sets and , there will be an interval of the form , with , where is an estimation of that satisfies (46).
Step 3.2: After detecting the first change-point, ID follows the same process as in Step 3.1 in the set , which contains . This means that we do not check for possible change-points in the interval . Therefore, we need to prove that:
- (S.1)
There is no other change-point in , apart from possibly the already detected ;
- (S.2)
is at a location which allows for detection of .
For (S.1): The approach is the same as the one in Step 3.2 in the proof of Theorem 3.1 and will not be repeated here.
Similarly to the approach in Step 3.1, our method applied now to , will first detect or depending on whether is smaller or larger than . If then will get isolated first and the procedure to show its detection is exactly the same as in Step 3.1 where we explained the detection of . Therefore, w.l.o.g. and also for the sake of showing (S.2) let us assume that .
For (S.2): With as in (2), there exists such that . We will show that gets detected in , for and its detection is , which satisfies . Using again Lemma 3 and for , we have that
[TABLE]
By construction,
[TABLE]
which means that . Therefore, continuing from (8) and using the exact same calculations as in (8), we have that
[TABLE]
Therefore, for a we have shown that there exists an interval of the form , with . Let us denote by the first right-expanding point where this occurs and let with . We will now show that . Following the same process as in Step 3.1 and assuming now that , we have that for ,
[TABLE]
is implied by . However, following the same procedure as in Step 3.1 we can show that for sufficiently large ,
[TABLE]
Thus, (48) is implied by . Therefore, would necessarily mean that , which is not true by the definition of . Having said this, we conclude that .
Having detected , then our algorithm will proceed in the interval and all the change-points will get detected one by one since Step 3.2 will be applicable as long as there are previously undetected change-points in . Denoting by the estimation of as we did in the statement of the theorem, then we conclude that all change-points will first get isolated and then detected one by one and
[TABLE]
Step 4: The arguments given in Steps 1-3 hold in . At the beginning of the algorithm, and for , there exist such that and such that . As in our previous steps, w.l.o.g. assume that , meaning that gets isolated and detected first in an interval , where and it is less than or equal to . Then, is the estimated location for and . After this, the algorithm continues in and keeps detecting all the change-points as explained in Step 3. It is important to note that there will not be any double detection issues because naturally, at each step of the algorithm, the new interval does not include any previously detected change-points.
Once all the change-points have been detected one by one, then will have no other change-points in it. Our method will keep interchangeably checking for possible change-points in intervals of the form and for and . Allow us to denote by any of these intervals. Our algorithm will not detect anything in since ,
[TABLE]
After not detecting anything in all intervals of the above form, then the algorithm concludes that there are not any change-points in and stops.
**Brief discussion of the steps of the proof of Theorem 3.3
**Before the thorough mathematical proof of Theorem 3.3, we provide an informal explanation of the three main steps in our proof. The notation is as in the main paper with denoting the ordered set with the remaining and relabelled estimated change-points, , after each estimation is removed. At the beginning of the change-point removal approach, . In Step 1 of the proof, we show that for each true change-point , there is at least one and at most four estimated change-points, within a distance equal to , where . In Step 2, we show that there are at most two estimated change-points between two consecutive true change-points. In Step 3, we prove that as the algorithm proceeds, then , the only remaining change-point for is within a distance of from and it cannot be removed whilst there are still more than estimated change-points in . Step 4 shows that the sSIC penalty as defined in (11), proposes a solution with estimated change-points.
**Proof of Theorem 3.3
**Allow us first to denote by
[TABLE]
We will show that . For , using the Bonferroni inequality we get that
[TABLE]
where is the probability density function of the standard normal distribution. Therefore, . With as in (18), the work that follows is valid on the set with . We take from the main paper to be equal to .
Step 1: When the algorithm moves from Part 1 to Part 2, as described in Subsection 3.3, then we are under a structure described by the following three characteristics:
(P1) For , there is at least one estimation within a distance of from , . We know that this is true at the beginning of Part 1 due to calling the ID algorithm with threshold . This continues to be the case when the algorithm proceeds to Part 2, because if is the last estimation within from , then and cannot be removed in Part 1.
(P2) For each , there are at most four estimated change-points within a distance of from . We can not have more than four estimations as if this was the case then at least three of them, let’s denote them by , would be either on the right or the left of , which would then mean that both and are satisfied and therefore would have been removed in Part 1 of the algorithm as explained in Subsection 3.3 of the paper.
(P3) There is an unknown, possibly large, number of estimated change-points (which tends to infinity as goes to infinity) between any two true change-points, namely and . This issue is solved in Part 2 of the algorithm.
Step 2: We are now in Part 2 of the algorithm as explained in Subsection 3.3, which guarantees that the minimum distance between two estimated change-points is , and also that there exists , such that there is at least one estimation within a distance of from . After that, in Part 2 we collect the triplets and we calculate and for , if , then is removed and the process is repeated for the remaining estimated change-points. By doing this it is easy to see that, first of all, between and , there will be at most two estimated change-points, since if there were more, then we would have triplets with and would have been removed. Secondly, for each there is still at least an estimation within a distance of from . If there is exactly one estimated change-point in the area , namely , then this cannot be removed in Part 2 of the algorithm because and therefore for ,
[TABLE]
which for sufficiently large is greater than . Therefore, will not be removed and there is at least one estimation within a distance of from , . This estimation will be in the set of estimated change-points that continue in Part 3 of the algorithm.
Step 3: We are in Part 3 of the algorithm as explained in Subsection 3.3. In this step, we will show that with , then once we will be at the stage where contains estimated change-points; one estimated change-point within a distance of from each , , where . W.l.o.g. let be between and . From Step 2 we know that there is a finite number (no more than four) of estimated change-points in . It is straightforward that at the beginning of Part 3 we have at most estimations and either of the following two cases is possible:
Case 1: is not the closest change-point to either the true change-point on its left ( for a ) or the true change-point on its right (). Since, and , it is straightforward to see that necessarily is on the right of and is on the left of . This means that there are not any true change-points in and because we are working in the set , we have that
[TABLE]
Therefore, will be removed from the set .
Case 2: is the closest change-point to a true change-point, namely for a , and from what has been discussed in Steps 1 and 2, is within a distance of . If , then there is at least another estimated change-point within a distance of from (and therefore from too), where is a constant that does not depend on . If this was not the case, then since we are working under ,
[TABLE]
for a constant and for sufficiently large . Therefore, would not get removed. Since there are at most estimations, then the constants are upper bounded by a general constant and therefore, when Part 3 terminates, each true change-point will have only one estimated change-point within the distance of . This is exactly the stage in our algorithm, where , with denoting the number of estimated change-points in the set . There are not any change-point identifiability issues, because and for large enough
[TABLE]
The algorithm will then proceed to Part 4 explained in Subsection 3.3 and each of the remaining estimated change-points will be removed one by one until is the empty set.
Step 4: In this last step we will prove that the sSIC penalty indicates the solution obtained when Part 3 terminates, where the number of estimated change-points is equal to . We have already explained in Section 3.3 of the paper that in the scenario of piecewise-constant mean signals,
[TABLE]
where for any candidate model , we have , for , and is the maximum likelihood estimator of the residual variance associated with model . It has also been proven in Fryzlewicz (2014) that in an interval ,
[TABLE]
where . With the number of estimated change-points related to , if , it means that all the change-points have been detected (see explanation in Step 2) and therefore since we are working under . Therefore, . In addition, since we are working in the set , we have that , for a positive constant . Using these results, the definition of in (50), and a first order Taylor expansion, we conclude that
[TABLE]
where and are positive constants. The lower bound in (8) is positive for large enough. Now, if , then from the proof of Theorem 3.1, we know that in the interval , we have that , leading to . Therefore,
[TABLE]
where and are positive constants. The lower bound in (8) is positive for large enough because . The results in (8) and (8) show that for large enough and on the set , we have that , for . Therefore, is minimized for , showing that .
Proof of Corollary 1. From now on, we denote by
[TABLE]
where and . Using also the notation in (14), we have
[TABLE]
Define
[TABLE]
and consider any straight line in the -plane, defined by the equation , that is tangent to the curve in the quadrant . By elementary calculus, which we do not repeat here, the tangency is attained when . By elementary geometry,
[TABLE]
Consider the particular values of given by
[TABLE]
and note that by the definition of we have
[TABLE]
Therefore, from the definitions of , and , the implication in (54) and the identity in (55), we have
[TABLE]
Considering again (53), this implies
[TABLE]
By (15), we have
[TABLE]
From (56), we have
[TABLE]
Therefore, from (57), and using (58) and (59) in turn, we have
[TABLE]
Now, for any , taking , we have that
[TABLE]
and the statement is a consequence of Lemma 1 in Yao (1988).
References
- Baranowski et al. (2019)
Baranowski, R., Chen, Y. and Fryzlewicz, P. (2019). Narrowest-over-threshold detection of multiple change points and change-point-like features.
Journal of the Royal Statistical Society Series B, 81, 649–672.
- Fryzlewicz (2014)
Fryzlewicz, P. (2014).
Wild binary segmentation for multiple change-point detection.
Annals of Statistics 42, 2243–2281.
- Yao (1988)
Yao, Y.-C. (1988). Estimating the number of change-points via Schwarz’ criterion.
Statistics & Probability Letters, 6, 181–189.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Anscombe (1948) Anscombe, F. J. (1948). The transformation of Poisson, binomial and negative-binomial data. Biometrika , 35 , 246–254.
- 2Auger and Lawrence (1989) Auger, I. E. and Lawrence, C. E. (1989). Algorithms for the Optimal Identification of Segment Neighborhoods. Bulletin of Mathematical Biology , 51 , 39–54.
- 3Bai and Perron (1998) Bai, J. and Perron, P. (1998). Estimating and testing linear models with multiple structural changes. Econometrica , 66 , 47–78.
- 4Baranowski et al. (2019) Baranowski, R., Chen, Y. and Fryzlewicz, P. (2019). Narrowest-over-threshold detection of multiple change points and change-point-like features. Journal of the Royal Statistical Society Series B , 81 , 649–672.
- 5Chan and Walther (2013) Chan, H. P. and Walther, G. (2013). Detection with the scan and the average likelihood ratio. Statistica Sinica , 23 , 409–428.
- 6Cho and Kirch (2020) Cho, H., and Kirch, C. (2020). Data segmentation algorithms: Univariate mean change and beyond. https://arxiv.org/pdf/2012.12814.pdf .
- 7Dette et al. (2018) Dette, H., Eckle, T., and Vetter, M. (2020). Multiscale change point detection for dependent data. Scandinavian Journal of Statistics , 47 , 1243–1274.
- 8Eichinger and Kirch (2018) Eichinger, B. and C. Kirch (2018). A MOSUM procedure for the estimation of multiple random change points. Bernoulli 24 , 526–564.
