Quickest Change Detection in the Presence of a Nuisance Change
Tze Siong Lau, Wee Peng Tay

TL;DR
This paper introduces a recursive, asymptotically optimal quickest change detection method that effectively distinguishes critical changes from nuisance changes, outperforming existing procedures in simulations and real-world bearing failure data.
Contribution
A novel window-limited sequential detection procedure based on the generalized likelihood ratio test that handles nuisance changes and is proven to be asymptotically optimal.
Findings
Proposed method outperforms FMA and 2-stage procedures in simulations.
The recursive update scheme is computationally efficient.
Real data experiments confirm the method's effectiveness.
Abstract
In the quickest change detection problem in which both nuisance and critical changes may occur, the objective is to detect the critical change as quickly as possible without raising an alarm when either there is no change or a nuisance change has occurred. A window-limited sequential change detection procedure based on the generalized likelihood ratio test statistic is proposed. A recursive update scheme for the proposed test statistic is developed and is shown to be asymptotically optimal under mild technical conditions. In the scenario where the post-change distribution belongs to a parametrized family, a generalized stopping time and a lower bound on its average run length are derived. The proposed stopping rule is compared with the FMA stopping time and the naive 2-stage procedure that detects the nuisance or critical change using separate CuSum stopping procedures for the nuisance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Quickest Change Detection in the Presence of a Nuisance Change
Tze Siong Lau, and Wee Peng Tay This research is supported by the Singapore Ministry of Education Academic Research Fund Tier 1 grant 2017-T1-001-059 (RG20/17) and Tier 2 grant MOE2018-T2-2-019.T. S. Lau and W. P. Tay are with the School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore (e-mail: [email protected], [email protected]).
Abstract
In the quickest change detection problem in which both nuisance and critical changes may occur, the objective is to detect the critical change as quickly as possible without raising an alarm when either there is no change or a nuisance change has occurred. A window-limited sequential change detection procedure based on the generalized likelihood ratio test statistic is proposed. A recursive update scheme for the proposed test statistic is developed and is shown to be asymptotically optimal under mild technical conditions. In the scenario where the post-change distribution belongs to a parametrized family, a generalized stopping time and a lower bound on its average run length are derived. The proposed stopping rule is compared with the finite moving average (FMA) stopping time and the naive 2-stage procedure that detects the nuisance or critical change using separate CuSum stopping procedures for the nuisance and critical changes. Simulations demonstrate that the proposed rule outperforms the FMA stopping time and the 2-stage procedure, and experiments on a real dataset on bearing failure verify the performance of the proposed stopping time.
Index Terms:
Quickest change detection, nuisance change, Generalized Likelihood Ratio Test (GLRT), average run length, average detection delay
I Introduction
The problem of detecting a change in the statistical properties of a signal with the shortest possible delay after the change is known as quickest change detection (QCD). Given a sequence of independent and identically distributed (i.i.d.) observations with distribution up to an unknown change point and i.i.d. with distribution after , we aim to detect this change as quickly as possible while maintaining a false alarm constraint. Detecting for a change has applications in many areas, including manufacturing quality control[1, 2], fraud detection[3], cognitive radio[4], network surveillance[5, 6, 7], structural health monitoring[8], spam detection[9, 10, 11], bioinformatics[12], power system line outage detection[13], and sensor networks[14, 15, 16].
For the non-Bayesian formulation of QCD, the change-point is assumed to be unknown but deterministic. When both the pre- and post-change distributions are known, Page [17] developed the Cumulative Sum Control Chart (CuSum) for quickest change detection. Lorden[18] proved that the CuSum test has asymptotically optimal worst-case average detection delay as the false alarm rate goes to zero. Moustakides [19] later established that the CuSum test is exactly optimal under Lorden’s optimality criterion. Later, Lai showed in [20] that the CuSum test is asymptotically optimal under Pollak’s criterion[21], as the false alarm rate goes to zero. For the case where the post-change distribution is unknown, Lorden[18] showed that the generalized likelihood-ratio (GLR) CuSum is asymptotically optimal for the case of finite multiple post-change distributions. Other methods were also proposed for the case when the post-change distribution is unknown to a certain degree [22, 20, 23, 24, 25, 26]. We refer the reader to [27, 28, 29] and the references therein for an overview of the QCD problem.
In many practical applications, the signal of interest may undergo different types of change. However, only a subset of these changes may be of interest to the user. One example is the problem of bearing failure detection using accelerometer readings[30]. During normal operations, the bearings are driven at two different activity levels, idle or active. In a typical bearing failure detection scenario, the bearing is initially driven at the idle state. A change to the active state results in a change in the statistical properties of the accelerometer readings. However, this change is not of interest to us and is called a nuisance change. We are only interested in the change arising from the failure of the bearing, which is known as a critical change. Furthermore, the statistical properties of the observations obtained when the bearing is faulty depend on the activity level that it is driven at. The traditional QCD framework does not allow us to distinguish between critical and nuisance changes. Furthermore, due to the nuisance change, the observations are no longer i.i.d. either in the pre-change or post-change regime, depending on when the nuisance change occurs. In this paper, we investigate the non-Bayesian formulation of the QCD problem under a nuisance change, and propose a window-limited stopping time that ignores the nuisance change but detects the critical change as quickly as possible.
I-A Related Work
Existing works in QCD that consider the problem where observations are not generated i.i.d. before and after the change-point can be categorized into three main categories. In the first category, the papers[31, 32, 33] consider the problem where the pre-change distribution and the post-change distribution are modeled as hidden Markov models (HMMs). In [31], the authors proved the asymptotic optimality of the CuSum procedure for the HMM signal model in the sense of Lorden. In [32], the authors developed the Shiryayev-Roberts-Pollak (SRP) rule for the HMM signal model and proved its optimality in the sense of Pollak. The authors of [33] consider the problem where the vector parameter of a two-state HMM changes at some unknown time. The second category of papers[34, 35] considers a QCD problem which relaxes the i.i.d. assumption. In [34], the authors established the optimality of CuSum and the Shiryayev-Roberts stopping rule in the class of random processes with likelihood ratios that satisfy certain independence and stationary conditions. The class of random processes includes Markov chains, AR processes, and processes evolving on a circle. In [35], the authors considered the Bayesian QCD problem where conditions on the asymptotic behavior of the likelihood process are assumed. Unlike all the aforementioned papers, the signal model in our QCD problem with nuisance change cannot be modeled by an HMM, and the likelihood ratios generated by our signal model are non-stationary. In the third category, the papers [36, 37, 38, 39, 40, 41] consider QCD of transient changes, where the change is either not persistent or multiple changes occur throughout the monitoring process. Unlike our QCD problem which allows some changes to be considered nuisance, all changes are considered critical in the aforementioned papers.
I-B Our Contributions
In this paper, we consider the non-Bayesian QCD problem where both nuisance and critical changes may occur, and our objective is to detect the critical change as quickly as possible while ignoring the nuisance change. Our goal is to develop a sequential algorithm with computational complexity that increases linearly with the number of samples observed. Our main contributions are as follows:
We formulate the QCD problem with a nuisance change and propose a window-limited simplified GLR (W-SGLR) stopping time. 2. 2.
We derive a lower bound for the average run length (ARL) to a false alarm, and the asymptotic upper bound of the worst-case average detection delay (WADD) for our proposed test. 3. 3.
We prove the asymptotic optimality of the W-SGLR stopping time under mild technical assumptions. 4. 4.
We provide simulation and experimental results that verify the theoretical guarantees of our proposed test and also illustrate the performance of our proposed test on a real dataset.
A preliminary version of this work was presented in [42, 43]. To the best of our knowledge, there are no existing works that consider the QCD problem for a signal that may undergo a nuisance change.
The rest of this paper is organized as follows. In Section II, we present our signal model and problem formulation. We propose the W-SGLR stopping time and derive the theoretical properties of our test statistics in Section III. In Section IV, we discuss a modification of the proposed stopping time when the post-change distribution belongs to a parametrized family. We present numerical simulations and experiments on a real dataset to illustrate the performance of our proposed stopping time in Section V. We conclude in Section VI.
Notations: The operator denotes mathematical expectation with respect to (w.r.t.) the probability density (pdf) , and means that the random variable has distribution with pdf . If the nuisance change point is at , and the critical change point is at , we let and be the probability measure and mathematical expectation, respectively. The Gaussian distribution with mean and variance is denoted as . Convergence in -probability is denoted as . We use as the indicator function of the set , and to denote the Kullback-Leibler (KL) divergence. We use , and to denote the set of positive integers, real numbers and positive real numbers, respectively.
II Problem formulation
In many applications, the statistical distribution of the observed signal may undergo different changes over time. For example, in the application of fault detection in motor bearings [30], we aim to raise an alarm as soon as possible after a bearing fault has occurred (critical change). This is done by monitoring the accelerometer readings from the motor to detect any changes in the signal statistics. However, the accelerometer readings are also affected by non-critical or nuisance changes like variation in the motor-load of the bearing. It would be undesirable if we declare that a fault has taken place whenever the motor-load changes. This motivates a need to define a change-point model that allows both critical and nuisances changes and to develop change detection techniques that can effectively ignore nuisance changes while efficiently identifying critical changes.
In this paper, we assume that the signals observed, , may undergo two types of change: a critical change at and a nuisance change at . Both the critical and nuisance change points are unknown a priori. We are interested in detecting the critical change while the nuisance change is not of interest. Let be probability distributions. At each time , we let to be the distribution that generates the observation when the nuisance change point is at and the critical change point is at :
[TABLE]
Thus, in our model (cf. Fig. 1), is the pre-change distribution.If , the signal distribution first changes to at and then to at . If , the distribution first changes to at and then to at . If , then the distribution changes from to at the common change point.
The sequence of observations is a sequence of random variables satisfying where are mutually independent given . The quickest change detection problem is to detect the critical change through observing as quickly as possible while ignoring the nuisance change and keeping the false alarm rate low. In our signal model, the nuisance change also changes the distribution that generates the observations after the critical change point. This creates a dependence between the nuisance change point and the distribution after the critical change point. Our formulation is different from assuming composite pre-change and post-change distribution families[44] since the nuisance change leads to non-stationarity in the distribution of before or after the critical change, depending on whether the nuisance change occurs before or after the critical change, respectively.
In a typical sequential change detection procedure, at each time , a test statistic is computed based on the currently available observations , and the observer decides that a change has occurred at a stopping time
In the traditional QCD framework[18], the rate of false alarms is quantified by the mean time between false alarms. Since the nuisance change-point affects the distributions generating the signal, this quantity varies with the nuisance change point. In this paper, we consider the worst-possible rate as the nuisance change point varies by considering the smallest mean time between false alarms for all possible nuisance change points. A similar generalization can be made to quantify the detection delay by taking the largest detection delay over all possible nuisance change points.
Mathematically, our QCD problem can be formulated as a minimax problem similar to Lorden’s formulation[18], where we seek a stopping time that minimizes the WADD subject to an ARL constraint:
[TABLE]
where is a predefined threshold, is a stopping time w.r.t. the filtration ,
[TABLE]
and is the essential supremum operator. In the next section, we propose a stopping time for (2).
A closely related topic is transient change detection (TCD) [45, 46, 47, 48] where the change only occurs for a finite period of time and the objective is to detect if such a change has occurred within a predefined window or not instead of detecting the change as quickly as possible. There are two widely adopted methods for the TCD problem, the window-limited CuSum stopping time [37] and the FMA stopping time[48]. The FMA stopping time has been shown to perform well for the TCD problem, and we will use the FMA stopping time as a comparison in Section V. When , our system model can be seen to be a generalization of the TCD problem variant where one seeks to detect the transient change as quickly as possible by letting . In 16 below, we propose a test statistic and stopping time for 2. By setting and in our test statistic, our proposed stopping time reduces to the window-limited CuSum stopping time with pre-change distribution and post-change distribution .
III Test Statistic for QCD with Nuisance Change
In this section, we derive a test-statistic and stopping time for QCD under a nuisance change. Suppose that we observe the sequence and know a priori that the nuisance change does not take place (i.e., ), then Page’s CuSum test statistic[17] given as can be used and we declare that a critical change has taken place at
[TABLE]
where is a pre-determined threshold. The CuSum test statistics has a convenient recursion which allows the CuSum stopping time to be implemented efficiently.
If the nuisance change takes places at a time and is known, a modification of Page’s test statistic gives the following:
[TABLE]
where and are as defined in (1) and are the probability distributions corresponding to the cases where the critical change has already occurred or will never occur, respectively. Similar to the case where , the CuSum test statistics admits a convenient recursion for efficient implementation. Furthermore, for both the cases mentioned above, was shown to be asymptotically optimal by [18].
A naive approach is to utilize four variants of , one for detecting for a change in each of the cases: from to , from to , from to , and from to . In the first stage, we monitor for changes from to either , or . If a change to is detected, then we monitor for a change from to . The difficulty in such an approach is that any false alarm or miss detection in the first stage propagates to the second stage. We demonstrate that such an approach is suboptimal in Section V-A.
In our problem formulation, the nuisance change-point is unknown. Replacing with its maximum likelihood estimator in both the numerator and denominator, we obtain the following GLR test statistic and stopping time:
[TABLE]
From our simulations in Section V-A, it turns out that 12 does not achieve the best trade-off between average detection delay (ADD) and ARL to false alarm over a wide range of threshold values . Furthermore, its ARL is challenging to characterize theoretically since the GLR test statistic is not a likelihood ratio and standard techniques in the QCD literature (e.g., Theorem 6.16 of [29]) cannot be used to analyze its ARL. This is a critical problem for practical applications that require us to pre-determine a suitable threshold to achieve a desired ARL.
To develop a stopping time with ARL that can be characterized theoretically, we simplify the maximum likelihood estimation in the numerator of 10 to be the maximum of only two cases and . This gives us the Simplified GLR (SGLR) test statistic and stopping time as follows:
[TABLE]
Unlike the CuSum test statistic, the SGLR test statistic does not have a convenient recursion. Any implementation of the SGLR stopping time would require computational resources that increases with the number of samples observed. The requirement on computational resources would be a significant limitation for many practical applications. To limit the computational resources required,in the same spirit as [20], we propose the Window-Limited SGLR (W-SGLR) test statistic and stopping time as follows:
[TABLE]
where the window size is chosen such that
[TABLE]
with
[TABLE]
and denoting a term that goes to zero as . Window-limited test statistics were first introduced by [49]. The paper [20] further discussed their properties and the choice of window size and thresholds. We make the following assumption.
Assumption 1**.**
The first four moments of w.r.t. both and are finite, and , where we define , , , , and .
In Theorem 3 of Section III-B, we show that the proposed is asymptotically optimal as under Assumption 1 and an additional technical assumption. To do that, we first analyze the asymptotic properties of . We let
[TABLE]
and study their properties in Section III-A. Then, using the relationships
[TABLE]
where
[TABLE]
we finally show the asymptotic optimality of under mild technical conditions in Section III-B.
III-A Log Likelihood Ratio Growth Rates
In this subsection, we derive properties of and as defined in 20 and 21, respectively. The stopping times and are defined by the first time the test statistics and cross the threshold respectively. The rates of growth, and , allow us to understand the detection delay of these stopping times. We show that these rates of growth converge in probability as . In particular, the limit that the rate of growth converges to depends on the sign of and .
As the nuisance change point is unknown, the denominator of both and contains a maximization of the likelihood If the first moment , the distribution is closer to the distribution as compared to in the KL divergence sense. When the critical change point is at and no nuisance change has taken place, we expect the denominator to approach . Thus, our statistic can be approximated by . A similar argument can be made for when . This observation is made precise in the following two propositions.
Proposition 1**.**
Suppose that Assumption 1 holds, and . For any and , we have
[TABLE]
Proof:
See Appendix A. ∎
Proposition 2**.**
Suppose that Assumption 1 holds, and . For any , , and , we have
[TABLE]
Proof:
See Appendix A. ∎
Using Propositions 1 and 2 together with the weak law of large numbers, we obtain the following result.
Theorem 1**.**
Suppose that Assumption 1 holds, , and . For any ,
[TABLE]
as . Furthermore, for any ,
[TABLE]
as .
In Theorem 1, we have assumed that and . If we vary the signs of and , a similar argument to that provided in Theorem 1 gives us the following result.
Theorem 2**.**
Suppose that Assumption 1 holds. For any and , we have the following convergences in probability as shown in the table below.
[TABLE]
Theorem 2 gives us the average rate of growth of the statistics and . Since in 19 is the minimum of the growth rates in Theorem 2, we see that the average growth rate of these statistics is at least regardless of the signs of and . This suggests that the WADD of grows linearly with respect to with a gradient bounded above by . This observation is made precise in the next subsection.
III-B Conditions for Asymptotic Optimality
In this subsection, we establish the asymptotic WADD-ARL trade-off under Assumption 1 and provide a sufficient condition for to be asymptotically optimal. In particular, we show that is asymptotically optimal if in addition to Assumption 1, the following assumption holds.
Assumption 2**.**
The KL divergences , , and satisfy
[TABLE]
Assumption 2 essentially says that cannot be too similar to , which makes intuitive sense as otherwise it is difficult to distinguish the critical change from the nuisance change (see Fig. 1). A sufficient condition for Assumption 2 is as assumed in Theorem 1. For example, in the problem of spectrum sensing in cognitive radio[50], we are often interested in detecting a variance change of a signal generated by independent Gaussian distributions. Furthermore, in many signal processing applications, a change in mean may be due to sensor drift as a result of long duration monitoring. This change in mean is usually not of interest and interferes with the actual signal processing task[51, 52, 53]. A typical signal model of this type is given by , , , and with , , . While Assumption 2 may seem artificial, it is shown in Lemma 3 that this model satisfies and hence Assumption 2. In this case, the W-SGLR stopping time achieves asymptotic optimality.
We use techniques from the proof of Theorem 6.16 in [29] to obtain a lower bound for the ARL of in 15. Since , the same lower bound also applies for the ARL of in 17. In the previous subsection, we have shown that the rate of growth of the statistics and converge to constants as . This means that, asymptotically, and grow linearly w.r.t. . Heuristically, this implies that the WADD of the stopping times, and , grow linearly w.r.t. the threshold in 25.
Lemma 1 derives an upper-bound for the probability of a false alarm for a stopping time related to . Following Theorem 6.16 in [29], this upper bound then yields a lower bound for the ARL of in Theorem 3.
Lemma 1**.**
For , let be stopping times defined by
[TABLE]
so that and in 25. For any , we have
[TABLE]
and
[TABLE]
Proof:
See Appendix B. ∎
The next lemma checks that our proposed stopping time satisfies the assumption required in [20] to relate the asymptotic upper-bound for the WADD to the threshold in Proposition 3.
Lemma 2**.**
Suppose that Assumption 1 holds. For any , we have
- (i)
, and 2. (ii)
.
Proof:
See Appendix C. ∎
Proposition 3**.**
Suppose that Assumption 1 holds. There exists a such that for all , we have
- (i)
, and 2. (ii)
.
Proof:
See Appendix D. ∎
Finally, we show the asymptotic optimality of in the following result.
Theorem 3**.**
Suppose that Assumption 1 holds. For any ,
[TABLE]
where is a term going to zero as . Furthermore, if Assumption 2 holds, then the stopping time is asymptotically optimal for the problem (2) as .
Proof:
See Appendix E. ∎
In Theorem 3, we have shown that is asymptotically optimal under Assumption 1 and Assumption 2. In the next lemma, we derive sufficient conditions for Assumption 2 when belong to an exponential family.
Lemma 3**.**
Suppose that , , , , an exponential family of distributions on with parameters respectively. Here, and for . If any of the following inequalities hold:
[TABLE]
then Assumption 2 holds.
In particular, if , and with , , , and , Assumption 2 holds.
Proof:
To show that 34 implies Assumption 2, we rearrange the terms on the left-hand side (L.H.S.) of 34 to obtain
[TABLE]
This implies that
[TABLE]
and hence Assumption 2 holds. A similar argument shows that 35 and 36 imply and , respectively.
If , and with , , , and , we can define , with the functions , , , and . The L.H.S. of 34 becomes . Simplifying, the L.H.S. of 34 becomes . Thus, for any and , the inequality 34 holds. The proof is now complete. ∎
IV Parametrized Families of Post-Change Distributions
In many applications, the post-change distribution and nuisance post-change distribution may contain unknown parameters. In this section, we modify and in 25 to obtain a Generalized Likelihood Ratio Test (GLRT)-based stopping time for the following signal model: Let be a set with non-empty interior and be a sequence of independent random variables satisfying: where
[TABLE]
and , the interior of . We derive a lower bound for the ARL of under the following assumption.
Assumption 3**.**
* is a compact -dimensional sub-manifold of . The pdfs of the post-change distributions and nuisance post-change distribution are twice continuously differentiable w.r.t. .*
A commonly used method to handle unknown parameters is to replace the likelihood ratio with the generalized likelihood ratio. We define the generalized W-SGLR test statistic as
[TABLE]
where the minimal delay is required to prevent difficulties of under-determination when performing maximum likelihood estimation of the parameter . While 40 is commonly used, the maximization over make it difficult to theoretically quantify the ARL of the stopping time . To work around this problem, we modify the stopping times and as follows. Let denote the largest eigenvalue of the symmetric matrix . Fix . We let
[TABLE]
We define the generalized W-SGLR stopping time as
[TABLE]
Note that is a modification of with additional conditions required for stopping.
The paper [49] first introduced window-limited generalized detection rules. We compute the false alarm probability of and . We then use this false alarm probability to obtain a lower bound for the ARL of and in Proposition 4.
Lemma 4**.**
Suppose that Assumption 3 holds. Given any , there exists such that and for any and .
Proof:
As the proof is similar to Lemma 2 in [20], we omit it here and refer the reader to the extended version in [54]. ∎
Proposition 4**.**
Suppose that Assumption 3 holds. For any , there exists such that for all and , we have
[TABLE]
Proof:
Fix . By Lemma 4, there exists such that for all . Applying results from Theorem 6.16 in [29], we obtain
[TABLE]
for all . Taking infimum over , we have and the proof is complete. ∎
V Numerical Results
In this section, we first illustrate the performance of the proposed W-SGLR stopping time under the assumption that the distributions and are known. Next, we illustrate the performance of the proposed generalised W-SGLR stopping time when and belongs to a parametrized family of distributions. Finally, we evaluate the performance of the proposed W-SGLR stopping time on real data from the Case Western Reserve University Bearing Dataset[30].
V-A W-SGLR on Synthetic Data Satisfying Assumption 2
In our first set of simulations, we let , , , and where the critical change is a change in variance and the nuisance change is a change in mean (see example after Assumption 2 for motivation). We ran the simulations with two change-point configurations to illustrate the behaviour of the W-SGLR test statistic for different window-sizes. In Fig. 2(a), we set , while in Fig. 2(b), we set . In Fig. 2(a) and Fig. 2(b), the test statistic remains low before the critical change-point and grows linearly with the gradient of at least , as described in 19, after the critical change-point. This trend continues until the test statistic approximately achieves the value of . From our choice of in 18, we see that for large, i.e., our is able to detect the critical change given sufficient delay for every choice of sufficiently large. However, choosing a larger is more resistant to outlier noise. For example, in Fig. 2(a), when , we note that the test statistic continues to grow linearly with the gradient even after the nuisance change point. The trade-off is the increase in memory requirement and computational complexity. In Fig. 2(b), we note that the test statistic continues to remain low during the period between the nuisance and the critical change point. This demonstrates that is oblivious to the nuisance change.
Next, we compare , the GLRT stopping time developed in [42], the finite-moving average (FMA) stopping time and a naive 2-stage CuSum stopping time denoted as . Following ideas from the TCD literature[48], the FMA stopping time is constructed by replacing the maximum in the test statistic 16 with a sum across the entire window, i.e. setting . It should be noted that while the FMA stopping time has been shown to perform well for the TCD problem, there are no guarantees that it will perform as well for the QCD problem. The naive stopping time is constructed from stopping times based on the CuSum stopping time described in (6) with for any pair of pdfs with . We consider four stopping times: , , , and , where the threshold for declaring a critical change is and the threshold for declaring a nuisance change is . In the first stage, we apply the stopping times , and to the observations. If or stops the process before , we declare that a critical change has occurred and set . Otherwise, we apply to the rest of the observations after the stopping time and set .
In our simulations, our signal is generated using , , , . Here, the critical change is a change in mean from [math] to , and the nuisance change is a change in variance from to . We generate a signal of length and independently select the nuisance change point and critical change point with uniform probability on the possible data points. A total of signals are generated. We compare the trade-off between the ADD and the empirical ARL of the proposed , , and in Fig. 3. We observe that our proposed achieves a lower ADD as compared to , and for large empirical ARL.
In the next set of simulations, we let , , , and where unlike the first set of simulations, the change in mean before and after the critical change point differs, and the change in variance before and after the nuisance change point differs. In Fig. 4(a), we set . We see that the test statistic remains low before the critical change-point and grows linearly with a gradient of at least (cf. 19), after the critical change-point. This trend continues until the nuisance change point where rate of growth changes to until it approximately achieves the value of . While the growth of the test statistic after the change-point is not linear, the observation that the overall rate of growth from the critical change point is at least is consistent with Lemma 2. In Fig. 4(b), we set and note that the test statistic continues to remain low during the period between the nuisance and the critical change points. This demonstrates that is oblivious to the nuisance change prior to the critical change-point.
V-B W-SGLR on Synthetic Data Violating Assumption 2
When Assumption 2 is violated, Theorem 3 still provides the asymptotic trade-off between the ARL and the WADD. However, the asymptotic optimality of the W-SGLR stopping is not guaranteed. Here, we provide discussions and numerical simulations that suggests that the W-SGLR stopping time out-performs the two-stage stopping time and FMA stopping time with respect to 2.
If Assumption 2 is violated, from Theorem 3, we have
[TABLE]
This worst-case performance of the W-SGLR stopping time is achieved when . For the rest of this discussion, we let to compare the two-stage stopping time with our proposed W-SGLR stopping time under this worst-case scenario.
Since , the CuSum
[TABLE]
associated with the stopping time experiences a positive drift when . Thus, for any finite threshold and sufficiently large , the two-stage stopping time declares that a nuisance change has taken place and transits into the second stage after the critical change point. The CuSum associated with the stopping time in the second stage is expected to grow at a rate of for when . In contrast, the W-SGLR test statistic, from Theorem 2, is expected to grow at a rate of . Heuristically, this means that, when , the as the . It should also be noted that it is possible that the stopping time fails completely when is negative and .
In Figs. 5(b) and 5(a), we compare the trade-off between the ADD and the ARL of the different stopping times when Assumption 2 is violated under the cases and , respectively. To estimate the empirical ARL, the stopping times are applied to a set of signals each of length with nuisance change point independently selected with uniform probability on the possible data points. To compute the corresponding ADD for the stopping times, they are applied to a set of signals of length . It can be seen from both Figs. 5(b) and 5(a) that the W-SGLR stopping time achieves a lower ADD as compared to both and for large empirical ARL. Consistent with our intuition, it can be seen that significantly outperforms when .
V-C Parametrized Post-Change Distributions
In this set of simulations, we let , the critical change to be a change in variance where , and the nuisance change to be a change in the mean where and . The parameters are unknown to the change detection algorithms. This corresponds to the case where the transmission power is unknown in the problem of spectrum sensing[50]. We ran the simulations with two change-point configurations to demonstrate the behavior of the generalized W-SGLR test statistic used in as described in (42) and (45) for window-sizes and . In Fig. 6(a), we set the critical change point to be and nuisance change point to be . It can be observed that our proposed generalized W-SGLR test statistic remains low during the pre-change regime, increases in the post-change regime and continue to increase in the nuisance post-change regime when the window is sufficiently large. This demonstrates that our stopping time is able to detect the critical change even in the nuisance critical change region. In Fig. 6(b), we set the critical change point to be and nuisance change point to be .
We see that our stopping time is effective in detecting critical changes while ignoring the nuisance change in pre-change regime for window sizes as small as . In practice, we can use graphs like Fig. 6(a) and Fig. 6(b) to compare if the increase in the test-statistic after the critical change is discernible from the test-statistic in the pre-change regime. This would provide assistance in determining if the choice window-size is suitable.
Next, we compare the generalized W-SGLR stopping time with the W-SGLR stopping time. In our simulations, our signal is generated using the following distributions , , , . Here we set and assume that the condition that is always satisfied. We generate a signal of length and independently select the nuisance change point and critical change point with uniform probability on the possible data points. A total of signals are generated. We compare the trade-off between the ADD and the ARL of the proposed W-SGLR stopping time when and are known against the generalized W-SGLR stopping time when and are unknown in Fig. 7. We observe that the generalized W-SGLR stopping time has a higher ADD as compared to the W-SGLR stopping time. Our experiments suggest that the difference in ADD is bounded as the ARL becomes large.
V-D Real Data
In this subsection, we test our proposed stopping time on the Case Western Reserve University Bearing Dataset [30]. The dataset is collected from experiments conducted using an electric motor with accelerometer data measured at locations near to and remote from the motor bearings. Samples were collected at 12 KHz. We pre-process the signal by de-trending the signal using a first order finite difference: for each signal sample time , let where is the observed raw signal sample at time .
We consider signals obtained at a motor load of 1hp and 2hp with normal bearings and also faulty bearings with a 0.007-inch fault diameter. We assume that the critical change would be the transition from a normal to faulty bearing, and a nuisance change would be a change in the motor load. We use the first 12,000 samples as training data to build a model for each of the following scenarios: normal bearings under a motor load of 1hp, normal bearings under a motor load of 2hp, faulty bearings under a motor load of 1hp, and faulty bearings under a motor load of 2hp. Fig. 8 shows the learned distributions of the de-trended signals observed in each scenario.
There are two challenges faced in testing our proposed stopping time on real data: (i) we lack theoretical results for the ARL of the 2-stage stopping times for the selection of appropriate thresholds for comparison and (ii) real run-to-failure data is difficult to obtain. We divide the remaining samples into 3 disjoint sets to address the above challenges.
For the first set, we create a training set of 1000 signals each with length 36,000 with a randomly selected nuisance change point for each signal such that there is a period of samples for a normal bearing under a motor load of 1hp, and a period of samples for a normal bearing under a motor load of 2hp. We select appropriate thresholds for each of the stopping times so that the empirical ARL varies between 1200 and 18,000.
The next two sets are testing sets. We create 1000 signals of length 3600 each with (i) a period of 1200 samples for a normal bearing under a motor load of 1hp, which transitions to (ii) a period of 1200 samples for a normal bearing under a motor load of 2hp, which finally transitions to (iii) a period of 1200 samples for a faulty bearing under a motor load of 2hp.
Similarly, we create 1000 signals for the scenario where a normal bearing under a motor load of 1hp transitions to a faulty bearing under a motor load of 1hp and finally a faulty bearing under a motor load of 2hp.
Finally, we apply the selected thresholds obtained from the first training set to the two testing sets to compute the stopping times’ empirical ADD performance. The window size of for the FMA stopping time is selected to minimize its empirical ADD on the test set. For this dataset, if is chosen to be or , the empirical ADD of the FMA stopping time becomes much larger compared to the empirical ADD of the W-SGLR and 2-stage stopping times. Thus, we only present the ARL-ADD trade-off for .
In Figs. 9(a) and 9(b), we present some examples of the performance of the W-SGLR test statistic. It can be seen that in both cases, the test-statistic remains low before the bearing failure and quickly rises after the bearing fails even as the motor load changes.
In Fig. 10(a) and Fig. 10(b), we present the trade-off between the empirical ADD and ARL for the proposed W-SGLR stopping time with , the 2-stage stopping times with different thresholds and the FMA stopping time. It can be seen that our proposed stopping time achieves better ADD-ARL trade-off compared to the other stopping times. However, as the KL divergences are large, the empirical ADD for all the algorithms remains low across the range of ADD tested. In this case, the reduction in empirical ADD is small, between to samples, over the range of ARLs tested. In terms of computational complexity, up till sample , the W-SGLR stopping time requires operations[43], which is slightly more than both the two-stage stopping time and the FMA stopping time, both of which require operations. Thus, for applications that have limited computational resources and large differences in their pre and post-change distributions, we may want to consider using the FMA or the 2-stage stopping time as the degradation in performance is small.
VI Discussions and Conclusions
We have studied the non-Bayesian QCD problem where the signal may be subjected to a nuisance change. We proposed the W-SGLR stopping time that quickly detects the critical change while ignoring the nuisance change. The limited window size ensures that the W-SGLR stopping time does not require increasing computational resources as more samples are observed. We also derived the stopping time’s asymptotic behavior and showed that it is asymptotically optimal under mild technical assumptions. A generalized W-SGLR stopping time is also proposed for the case where the critical and nuisance post-change distributions are unknown but belong to a parametrized family. Numerical simulations and experiments on a real dataset demonstrated that the W-SGLR stopping time achieves better ADD-ARL trade-off than various other competing stopping times.
In this paper, we have assumed that if both the critical and nuisance changes occur, the eventual distribution that generates the signal is the same, regardless of which change comes first. A more general model would be to allow the eventual distribution to depend on the order of the change points. An easy generalization of the W-SGLR stopping would be to include all the different eventual distributions into the numerator of 13. The asymptotic trade-off between the WADD and ARL can be derived using similar techniques in Section III-A. However, deriving the conditions for asymptotic optimality of this stopping time is more complicated and would be a possible direction for future research.
Another possible future research direction is to consider a modification of the W-SGLR stopping time for the TCD problem under the possibility of a nuisance change. As the performance metrics of the TCD problem are different from the QCD problem, its asymptotic trade-off between the worst-case false alarms and missed detection within a specified window needs to be studied. Also, as the FMA performs well in the TCD problem, it will be interesting to consider if the FMA stopping time can be adapted to solve our QCD problem.
Appendix A Proof of Propositions 1 and 2
We start off with some notation definitions. Let . For any , let For any such that , we define the following averages:
[TABLE]
We have
[TABLE]
For the case where , we let . Finally, we define the random variable
[TABLE]
An outline of the proof of Propositions 1 and 2 is as follows. Lemma A.2 and Lemma A.4 provide the results required for controlling the error bound in Proposition 1. Similarly, Lemma A.3 and Lemma A.4 provide the results required for controlling the error bound in Proposition 2. The Lemmas A.2, A.3 and A.4 require that decay in the tail probabilities of the average log-likelihood ratio to be at most , which is shown in Lemma A.1.
Lemma A.1**.**
For any such that and , and , we have
[TABLE]
where and .
Proof:
As the proof is elementary, we omit it here and refer the reader to the extended version in [54]. ∎
From Lemma A.1, for any , we have for ,
[TABLE]
Similarly, for , and , we have
[TABLE]
For the next two lemmas, we use bounds on the tail probability of to derive asymptotic properties of the random variable under the distributions and for any .
Lemma A.2**.**
For any , , and , we have
[TABLE]
Proof:
We have
[TABLE]
as . The inequality (58) follows from 55. The proof is now complete. ∎
Lemma A.3**.**
For any and , let . If , we have
[TABLE]
Proof:
For , we have and
[TABLE]
as , where 60 follows from 56. The proof is now complete. ∎
Lemma A.4**.**
Suppose that and . For any and , there exist such that for any ,
[TABLE]
Proof:
Given any , since the fourth moment of exists, by the monotone convergence theorem, there exists and such that
[TABLE]
where . Applying Markov’s inequality, we obtain
[TABLE]
Next, we derive an upper bound for . For any , we have
[TABLE]
where 66 follows from Jensen’s inequality, and 67 follows from 63. We obtain
[TABLE]
where (68) is because , 69 follows from Hölder’s inequality, 70 from 67, 71 from 55, and 72 from the definition of . From (65), we have and 61 is proved. The proof of (62) is similar and the lemma is proved. ∎
A-A Proof of Proposition 1
It suffices to show that for any , there exists such that for all we have
[TABLE]
For any and , the left-hand side of (73) becomes
[TABLE]
From Lemma A.4, there exists such that . Next, by choosing , we have Finally, from Lemma A.2, there exists such that for all , we have The right-hand side of 74 is then upper bounded by , and the proof is complete.
A-B Proof of Proposition 2
It suffices to show that for any , there exists such that for all we have
[TABLE]
Let . The left-hand side of (75) can be written as
[TABLE]
Applying Markov’s inequality to 76, there exists such that for all , we have
[TABLE]
For any and , 77 becomes
[TABLE]
From Lemma A.4, there exists such that . Next, by choosing , we have Finally, from Lemma A.2, there exists such that for all , we have The right-hand side of 78 is then upper bounded by , and the proof is complete.
Appendix B Proof of Lemma 1
For any , we have
[TABLE]
The proof that is similar. We then have Since , applying [29, Theorem 6.16], we obtain and 31 follows from . The proof is now complete.
Appendix C Proof of Lemma 2
It suffices to show that for any , there exists such that for all we have
[TABLE]
The set over which the supremum in 79 is taken can be divided into two subsets: and . We have
[TABLE]
where the last inequality follows from Theorem 2 for sufficiently large.
If , we obtain
[TABLE]
From Theorem 2, there exists such that for all , we have
[TABLE]
From Markov’s inequality and Assumption 1, there exists such that for all and , we have
[TABLE]
Next, we show that for any , both 81 and 82 are bounded by . There are three possible cases:
and , 2. 2.
and , 3. 3.
and .
Applying 83 and 84 in the first case, 84 and 85 in the second case, and 83 and 86 in the thrid case to 81 and 82, respectively, we obtain The proof for ii is similar and proof is now complete.
Appendix D Proof of Proposition 3
From 18, there exists such that for all sufficiently large. For any , let and . There exists such that for all . From Lemma 2, by choosing sufficiently large, we have for all ,
[TABLE]
Let be such that . Then, for , we have
[TABLE]
For any , we then have
[TABLE]
where the last equality follows from independence and the last inequality from 87. Therefore, for any , we have
[TABLE]
which yields i. The proof for ii is similar and the proposition is proved.
Appendix E Proof of Theorem 3
From Lemma 1, taking infimum on both sides of 31, we obtain Since and , by Proposition 3, we have as .
To see that is asymptotically optimal when Assumption 2 is satisfied, let be the set of stopping times satisfying . By expanding using 3, we obtain
[TABLE]
where 88 is due to the min-max inequality[55]. For each of the cases , by Theorem 6.17 in [29], we have
[TABLE]
Since Assumption 2 is satisfied, we have
[TABLE]
Therefore, from 89 and 90, we obtain
[TABLE]
and the proof is now complete.
Appendix F Proof of Lemma 4
We use techniques is similar to [20] to prove Lemma 4. To analyze the probability , we use a change-of-measure argument. For any , choose so that for any
[TABLE]
where is the volume or Lebesgue measure of and is the gamma function. From Kolmogorov’s Consistency Theorem, there is a probability measure for the stochastic process under which the pdf of each is . Define a measure . Since is compact in , the measure is finite. For each , the Radon-Nikodym derivative of the law of under w.r.t. is
[TABLE]
which follows from Fubini’s Theorem. By Wald’s likelihood ratio identity,
[TABLE]
Suppose . Since , from Taylor series, there exists such that
[TABLE]
Thus, for , we have
[TABLE]
where the last inequality follows from . We obtain
[TABLE]
Therefore, we have
[TABLE]
This yields the upper bound
[TABLE]
Applying this upper bound to (91), we obtain
[TABLE]
for all . The proof that is similar, and the lemma is proved.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] W. H. Woodall, D. J. Spitzner, D. C. Montgomery, and S. Gupta, “Using control charts to monitor process and product quality profiles,” J. of Quality Technology , vol. 36, no. 3, p. 309, 2004.
- 2[2] T. L. Lai, “Sequential changepoint detection in quality control and dynamical systems,” J. of the Roy. Statistical Soc. , pp. 613–658, 1995.
- 3[3] R. J. Bolton and D. J. Hand, “Statistical fraud detection: A review,” Statistical Sci. , pp. 235–249, 2002.
- 4[4] L. Lai, Y. Fan, and H. V. Poor, “Quickest detection in cognitive radio: A sequential change detection framework,” in IEEE Conf. Global Telecommun. IEEE, 2008, pp. 1–5.
- 5[5] K. Sequeira and M. Zaki, “ADMIT: anomaly-based data mining for intrusions,” in Proc. Conf. Knowl. Discovery and Data Mining . ACM, 2002, pp. 386–395.
- 6[6] A. G. Tartakovsky, B. L. Rozovskii, R. B. Blazek, and H. Kim, “A novel approach to detection of intrusions in computer networks via adaptive sequential and batch-sequential change-point detection methods,” IEEE Trans. Signal Process. , vol. 54, no. 9, pp. 3372–3382, 2006.
- 7[7] W. Luo, W. P. Tay, and M. Leng, “Infection spreading and source identification: A hide and seek game,” IEEE Trans. Signal Process. , vol. 64, no. 16, pp. 4228 – 4243, Aug. 2016.
- 8[8] H. Sohn, J. A. Czarnecki, and C. R. Farrar, “Structural health monitoring using statistical process control,” J. Structural Eng. , vol. 126, no. 11, pp. 1356–1363, 2000.
