Quickest Change Detection in the Presence of a Nuisance Change

Tze Siong Lau; Wee Peng Tay

arXiv:1902.03460·math.ST·October 23, 2019·IEEE Trans. Signal Process.

Quickest Change Detection in the Presence of a Nuisance Change

Tze Siong Lau, Wee Peng Tay

PDF

TL;DR

This paper introduces a recursive, asymptotically optimal quickest change detection method that effectively distinguishes critical changes from nuisance changes, outperforming existing procedures in simulations and real-world bearing failure data.

Contribution

A novel window-limited sequential detection procedure based on the generalized likelihood ratio test that handles nuisance changes and is proven to be asymptotically optimal.

Findings

01

Proposed method outperforms FMA and 2-stage procedures in simulations.

02

The recursive update scheme is computationally efficient.

03

Real data experiments confirm the method's effectiveness.

Abstract

In the quickest change detection problem in which both nuisance and critical changes may occur, the objective is to detect the critical change as quickly as possible without raising an alarm when either there is no change or a nuisance change has occurred. A window-limited sequential change detection procedure based on the generalized likelihood ratio test statistic is proposed. A recursive update scheme for the proposed test statistic is developed and is shown to be asymptotically optimal under mild technical conditions. In the scenario where the post-change distribution belongs to a parametrized family, a generalized stopping time and a lower bound on its average run length are derived. The proposed stopping rule is compared with the FMA stopping time and the naive 2-stage procedure that detects the nuisance or critical change using separate CuSum stopping procedures for the nuisance…

Equations247

h_{ν_{n}, ν_{c}, t} = ⎩ ⎨ ⎧ f if t < min {ν_{c}, ν_{n}}, f_{n} if ν_{n} \leq t < ν_{c}, g if ν_{c} \leq t < ν_{n}, g_{n} if max {ν_{c}, ν_{n}} \leq t .

h_{ν_{n}, ν_{c}, t} = ⎩ ⎨ ⎧ f if t < min {ν_{c}, ν_{n}}, f_{n} if ν_{n} \leq t < ν_{c}, g if ν_{c} \leq t < ν_{n}, g_{n} if max {ν_{c}, ν_{n}} \leq t .

τ min

τ min

s.t.

WADD (τ) = ν_{n} \in N \cup {\infty} sup WADD_{ν_{n}} (τ),

WADD (τ) = ν_{n} \in N \cup {\infty} sup WADD_{ν_{n}} (τ),

WADD_{ν_{n}} (τ) = ν_{c} \geq 1 sup ess sup E_{ν_{n}, ν_{c}} [(τ - ν_{c} + 1)^{+} X_{1}, \dots, X_{ν_{c} - 1}],

ARL (τ) = ν_{n} \in N \cup {\infty} in f E_{ν_{n}, \infty} [τ],

τ_{CuSum} (b)

τ_{CuSum} (b)

= in f {t : 1 \leq k \leq t + 1 max i = k \sum t lo g \frac{g ( X _{i} )}{f ( X _{i} )} \geq b},

S_{CuSum} (t)

S_{CuSum} (t)

τ_{CuSum} (b)

Λ_{GLR} (k, t)

Λ_{GLR} (k, t)

= \frac{max _{k \leq j \leq t + 1} \prod _{i = k}^{j - 1} g ( X _{i} ) \prod _{i = j}^{t} g _{n} ( X _{i} )}{max _{k \leq j \leq t + 1} \prod _{i = k}^{j - 1} f ( X _{i} ) \prod _{i = j}^{t} f _{n} ( X _{i} )},

S_{GLR} (t)

τ_{GLR} (b)

Λ_{SGLR} (k, t)

Λ_{SGLR} (k, t)

S_{SGLR} (t)

τ_{SGLR} (b)

S_{W-SGLR} (t)

S_{W-SGLR} (t)

τ_{W-SGLR} (b)

b \to \infty lim inf \frac{m _{b}}{b} > I^{- 1} and lo g m_{b} = o (b),

b \to \infty lim inf \frac{m _{b}}{b} > I^{- 1} and lo g m_{b} = o (b),

I = min {D (g ∣∣ f), D (g ∣∣ f_{n}), D (g_{n} ∣∣ f), D (g_{n} ∣∣ f_{n})},

I = min {D (g ∣∣ f), D (g ∣∣ f_{n}), D (g_{n} ∣∣ f), D (g_{n} ∣∣ f_{n})},

Λ (k, t)

Λ (k, t)

Λ_{n} (k, t)

Λ_{SGLR} (k, t)

Λ_{SGLR} (k, t)

τ_{SGLR} (b)

τ_{W-SGLR} (b)

τ (b) τ_{n} (b) τ (b) τ_{n} (b) = in f {t : k \leq t max lo g Λ (k, t) \geq b}, = in f {t : k \leq t max lo g Λ_{n} (k, t) \geq b}, = in f {t : t - m_{b} \leq k \leq t max lo g Λ (k, t) \geq b}, = in f {t : t - m_{b} \leq k \leq t max lo g Λ_{n} (k, t) \geq b},

τ (b) τ_{n} (b) τ (b) τ_{n} (b) = in f {t : k \leq t max lo g Λ (k, t) \geq b}, = in f {t : k \leq t max lo g Λ_{n} (k, t) \geq b}, = in f {t : t - m_{b} \leq k \leq t max lo g Λ (k, t) \geq b}, = in f {t : t - m_{b} \leq k \leq t max lo g Λ_{n} (k, t) \geq b},

t \to \infty lim P_{\infty, ν_{c}} (\frac{l o g Λ ( k , t )}{t - k + 1} - \frac{1}{t - k + 1} i = k \sum t lo g \frac{g ( X _{i} )}{f ( X _{i} )} \geq ϵ) = 0,

t \to \infty lim P_{\infty, ν_{c}} (\frac{l o g Λ ( k , t )}{t - k + 1} - \frac{1}{t - k + 1} i = k \sum t lo g \frac{g ( X _{i} )}{f ( X _{i} )} \geq ϵ) = 0,

t \to \infty lim P_{\infty, ν_{c}} (\frac{l o g Λ _{n} ( k , t )}{t - k + 1} - \frac{1}{t - k + 1} i = k \sum t lo g \frac{g _{n} ( X _{i} )}{f ( X _{i} )} \geq ϵ) = 0.

t \to \infty lim P_{ν_{n}, ν_{c}} (\frac{l o g Λ ( k , t )}{t - k + 1} - \frac{1}{t - k + 1} i = k \sum t lo g \frac{g ( X _{i} )}{f _{n} ( X _{i} )} \geq ϵ) = 0,

t \to \infty lim P_{ν_{n}, ν_{c}} (\frac{l o g Λ ( k , t )}{t - k + 1} - \frac{1}{t - k + 1} i = k \sum t lo g \frac{g ( X _{i} )}{f _{n} ( X _{i} )} \geq ϵ) = 0,

t \to \infty lim P_{ν_{n}, ν_{c}} (\frac{l o g Λ _{n} ( k , t )}{t - k + 1} - \frac{1}{t - k + 1} i = k \sum t lo g \frac{g _{n} ( X _{i} )}{f _{n} ( X _{i} )} \geq ϵ) = 0.

\frac{l o g Λ ( k , t )}{t - k + 1}

\frac{l o g Λ ( k , t )}{t - k + 1}

\frac{l o g Λ _{n} ( k , t )}{t - k + 1}

\frac{l o g Λ ( k , t )}{t - k + 1}

\frac{l o g Λ ( k , t )}{t - k + 1}

\frac{l o g Λ _{n} ( k , t )}{t - k + 1}

D (g ∣∣ f_{n}) > min {D (g ∣∣ f), D (g_{n} ∣∣ f), D (g_{n} ∣∣ f_{n})} .

D (g ∣∣ f_{n}) > min {D (g ∣∣ f), D (g_{n} ∣∣ f), D (g_{n} ∣∣ f_{n})} .

η^{k}

η^{k}

η_{n}^{k}

P_{ν_{n}, \infty} (η^{1} < \infty) \leq e^{- b} and P_{ν_{n}, \infty} (η_{n}^{1} < \infty) \leq e^{- b},

P_{ν_{n}, \infty} (η^{1} < \infty) \leq e^{- b} and P_{ν_{n}, \infty} (η_{n}^{1} < \infty) \leq e^{- b},

E_{ν_{n}, \infty} [τ_{W-SGLR} (b)] \geq E_{ν_{n}, \infty} [τ_{SGLR} (b)] \geq \frac{1}{2} e^{b} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Quickest Change Detection in the Presence of a Nuisance Change

Tze Siong Lau, and Wee Peng Tay This research is supported by the Singapore Ministry of Education Academic Research Fund Tier 1 grant 2017-T1-001-059 (RG20/17) and Tier 2 grant MOE2018-T2-2-019.T. S. Lau and W. P. Tay are with the School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore (e-mail: [email protected], [email protected]).

Abstract

In the quickest change detection problem in which both nuisance and critical changes may occur, the objective is to detect the critical change as quickly as possible without raising an alarm when either there is no change or a nuisance change has occurred. A window-limited sequential change detection procedure based on the generalized likelihood ratio test statistic is proposed. A recursive update scheme for the proposed test statistic is developed and is shown to be asymptotically optimal under mild technical conditions. In the scenario where the post-change distribution belongs to a parametrized family, a generalized stopping time and a lower bound on its average run length are derived. The proposed stopping rule is compared with the finite moving average (FMA) stopping time and the naive 2-stage procedure that detects the nuisance or critical change using separate CuSum stopping procedures for the nuisance and critical changes. Simulations demonstrate that the proposed rule outperforms the FMA stopping time and the 2-stage procedure, and experiments on a real dataset on bearing failure verify the performance of the proposed stopping time.

Index Terms:

Quickest change detection, nuisance change, Generalized Likelihood Ratio Test (GLRT), average run length, average detection delay

I Introduction

The problem of detecting a change in the statistical properties of a signal with the shortest possible delay after the change is known as quickest change detection (QCD). Given a sequence of independent and identically distributed (i.i.d.) observations $\{x_{t}:t\in\mathbb{N}\}$ with distribution $f$ up to an unknown change point $\nu$ and i.i.d. with distribution $g\neq f$ after $\nu$ , we aim to detect this change as quickly as possible while maintaining a false alarm constraint. Detecting for a change has applications in many areas, including manufacturing quality control[1, 2], fraud detection[3], cognitive radio[4], network surveillance[5, 6, 7], structural health monitoring[8], spam detection[9, 10, 11], bioinformatics[12], power system line outage detection[13], and sensor networks[14, 15, 16].

For the non-Bayesian formulation of QCD, the change-point is assumed to be unknown but deterministic. When both the pre- and post-change distributions are known, Page [17] developed the Cumulative Sum Control Chart (CuSum) for quickest change detection. Lorden[18] proved that the CuSum test has asymptotically optimal worst-case average detection delay as the false alarm rate goes to zero. Moustakides [19] later established that the CuSum test is exactly optimal under Lorden’s optimality criterion. Later, Lai showed in [20] that the CuSum test is asymptotically optimal under Pollak’s criterion[21], as the false alarm rate goes to zero. For the case where the post-change distribution is unknown, Lorden[18] showed that the generalized likelihood-ratio (GLR) CuSum is asymptotically optimal for the case of finite multiple post-change distributions. Other methods were also proposed for the case when the post-change distribution is unknown to a certain degree [22, 20, 23, 24, 25, 26]. We refer the reader to [27, 28, 29] and the references therein for an overview of the QCD problem.

In many practical applications, the signal of interest may undergo different types of change. However, only a subset of these changes may be of interest to the user. One example is the problem of bearing failure detection using accelerometer readings[30]. During normal operations, the bearings are driven at two different activity levels, idle or active. In a typical bearing failure detection scenario, the bearing is initially driven at the idle state. A change to the active state results in a change in the statistical properties of the accelerometer readings. However, this change is not of interest to us and is called a nuisance change. We are only interested in the change arising from the failure of the bearing, which is known as a critical change. Furthermore, the statistical properties of the observations obtained when the bearing is faulty depend on the activity level that it is driven at. The traditional QCD framework does not allow us to distinguish between critical and nuisance changes. Furthermore, due to the nuisance change, the observations are no longer i.i.d. either in the pre-change or post-change regime, depending on when the nuisance change occurs. In this paper, we investigate the non-Bayesian formulation of the QCD problem under a nuisance change, and propose a window-limited stopping time that ignores the nuisance change but detects the critical change as quickly as possible.

I-A Related Work

Existing works in QCD that consider the problem where observations are not generated i.i.d. before and after the change-point can be categorized into three main categories. In the first category, the papers[31, 32, 33] consider the problem where the pre-change distribution and the post-change distribution are modeled as hidden Markov models (HMMs). In [31], the authors proved the asymptotic optimality of the CuSum procedure for the HMM signal model in the sense of Lorden. In [32], the authors developed the Shiryayev-Roberts-Pollak (SRP) rule for the HMM signal model and proved its optimality in the sense of Pollak. The authors of [33] consider the problem where the vector parameter of a two-state HMM changes at some unknown time. The second category of papers[34, 35] considers a QCD problem which relaxes the i.i.d. assumption. In [34], the authors established the optimality of CuSum and the Shiryayev-Roberts stopping rule in the class of random processes with likelihood ratios that satisfy certain independence and stationary conditions. The class of random processes includes Markov chains, AR processes, and processes evolving on a circle. In [35], the authors considered the Bayesian QCD problem where conditions on the asymptotic behavior of the likelihood process are assumed. Unlike all the aforementioned papers, the signal model in our QCD problem with nuisance change cannot be modeled by an HMM, and the likelihood ratios generated by our signal model are non-stationary. In the third category, the papers [36, 37, 38, 39, 40, 41] consider QCD of transient changes, where the change is either not persistent or multiple changes occur throughout the monitoring process. Unlike our QCD problem which allows some changes to be considered nuisance, all changes are considered critical in the aforementioned papers.

I-B Our Contributions

In this paper, we consider the non-Bayesian QCD problem where both nuisance and critical changes may occur, and our objective is to detect the critical change as quickly as possible while ignoring the nuisance change. Our goal is to develop a sequential algorithm with computational complexity that increases linearly with the number of samples observed. Our main contributions are as follows:

We formulate the QCD problem with a nuisance change and propose a window-limited simplified GLR (W-SGLR) stopping time. 2. 2.

We derive a lower bound for the average run length (ARL) to a false alarm, and the asymptotic upper bound of the worst-case average detection delay (WADD) for our proposed test. 3. 3.

We prove the asymptotic optimality of the W-SGLR stopping time under mild technical assumptions. 4. 4.

We provide simulation and experimental results that verify the theoretical guarantees of our proposed test and also illustrate the performance of our proposed test on a real dataset.

A preliminary version of this work was presented in [42, 43]. To the best of our knowledge, there are no existing works that consider the QCD problem for a signal that may undergo a nuisance change.

The rest of this paper is organized as follows. In Section II, we present our signal model and problem formulation. We propose the W-SGLR stopping time and derive the theoretical properties of our test statistics in Section III. In Section IV, we discuss a modification of the proposed stopping time when the post-change distribution belongs to a parametrized family. We present numerical simulations and experiments on a real dataset to illustrate the performance of our proposed stopping time in Section V. We conclude in Section VI.

Notations: The operator $\mathbb{E}_{f}$ denotes mathematical expectation with respect to (w.r.t.) the probability density (pdf) $f$ , and $X\sim f$ means that the random variable $X$ has distribution with pdf $f$ . If the nuisance change point is at $\nu_{n}$ , and the critical change point is at $\nu_{c}$ , we let $\mathbb{P}_{\nu_{n},\nu_{c}}$ and $\mathbb{E}_{\nu_{n},\nu_{c}}$ be the probability measure and mathematical expectation, respectively. The Gaussian distribution with mean $\mu$ and variance $\sigma^{2}$ is denoted as $\mathcal{N}(\mu,\sigma^{2})$ . Convergence in $\mathbb{P}$ -probability is denoted as $\xrightarrow{\ \mathbb{P}\ }$ . We use ${\bf 1}_{E}$ as the indicator function of the set $E$ , and ${D({\cdot}\,||\,{\cdot})}$ to denote the Kullback-Leibler (KL) divergence. We use $\mathbb{N}$ , $\mathbb{R}$ and $\mathbb{R}_{>0}$ to denote the set of positive integers, real numbers and positive real numbers, respectively.

II Problem formulation

In many applications, the statistical distribution of the observed signal may undergo different changes over time. For example, in the application of fault detection in motor bearings [30], we aim to raise an alarm as soon as possible after a bearing fault has occurred (critical change). This is done by monitoring the accelerometer readings from the motor to detect any changes in the signal statistics. However, the accelerometer readings are also affected by non-critical or nuisance changes like variation in the motor-load of the bearing. It would be undesirable if we declare that a fault has taken place whenever the motor-load changes. This motivates a need to define a change-point model that allows both critical and nuisances changes and to develop change detection techniques that can effectively ignore nuisance changes while efficiently identifying critical changes.

In this paper, we assume that the signals observed, $X_{1},X_{2},\ldots$ , may undergo two types of change: a critical change at $\nu_{c}\geq 0$ and a nuisance change at $\nu_{n}\geq 0$ . Both the critical and nuisance change points are unknown a priori. We are interested in detecting the critical change while the nuisance change is not of interest. Let $f,f_{n},g,g_{n}$ be probability distributions. At each time $t$ , we let $h_{\nu_{n},\nu_{c},t}$ to be the distribution that generates the observation $X_{t}$ when the nuisance change point is at $\nu_{n}$ and the critical change point is at $\nu_{c}$ :

[TABLE]

Thus, in our model (cf. Fig. 1), $f$ is the pre-change distribution.If $\nu_{c}<\nu_{n}<\infty$ , the signal distribution first changes to $g$ at $\nu_{c}$ and then to $g_{n}$ at $\nu_{n}$ . If $\nu_{n}<\nu_{c}<\infty$ , the distribution first changes to $f_{n}$ at $\nu_{n}$ and then to $g_{n}$ at $\nu_{c}$ . If $\nu_{n}=\nu_{c}$ , then the distribution changes from $f$ to $g_{n}$ at the common change point.

The sequence of observations $X_{1},X_{2},\ldots$ is a sequence of random variables satisfying $X_{t}\sim h_{\nu_{n},\nu_{c},t}$ where $\{X_{t}\}_{t\in\mathbb{N}}$ are mutually independent given $\nu_{n},\nu_{c}$ . The quickest change detection problem is to detect the critical change $\nu_{c}$ through observing $X_{1},X_{2},\ldots,$ as quickly as possible while ignoring the nuisance change and keeping the false alarm rate low. In our signal model, the nuisance change also changes the distribution that generates the observations after the critical change point. This creates a dependence between the nuisance change point and the distribution after the critical change point. Our formulation is different from assuming composite pre-change and post-change distribution families[44] since the nuisance change leads to non-stationarity in the distribution of $X_{t}$ before or after the critical change, depending on whether the nuisance change occurs before or after the critical change, respectively.

In a typical sequential change detection procedure, at each time $t$ , a test statistic $S(t)$ is computed based on the currently available observations $X_{1},\ldots,X_{t}$ , and the observer decides that a change has occurred at a stopping time $\tau(b)=\inf\{t:S(t)\geq b\}.$

In the traditional QCD framework[18], the rate of false alarms is quantified by the mean time between false alarms. Since the nuisance change-point affects the distributions generating the signal, this quantity varies with the nuisance change point. In this paper, we consider the worst-possible rate as the nuisance change point varies by considering the smallest mean time between false alarms for all possible nuisance change points. A similar generalization can be made to quantify the detection delay by taking the largest detection delay over all possible nuisance change points.

Mathematically, our QCD problem can be formulated as a minimax problem similar to Lorden’s formulation[18], where we seek a stopping time that minimizes the WADD subject to an ARL constraint:

[TABLE]

where $\gamma$ is a predefined threshold, $\tau$ is a stopping time w.r.t. the filtration $\{\sigma(X_{1},X_{2},\ldots,X_{t}):\ t\geq 0\}$ ,

[TABLE]

and $\operatorname*{ess\,sup}$ is the essential supremum operator. In the next section, we propose a stopping time for (2).

A closely related topic is transient change detection (TCD) [45, 46, 47, 48] where the change only occurs for a finite period of time and the objective is to detect if such a change has occurred within a predefined window or not instead of detecting the change as quickly as possible. There are two widely adopted methods for the TCD problem, the window-limited CuSum stopping time [37] and the FMA stopping time[48]. The FMA stopping time has been shown to perform well for the TCD problem, and we will use the FMA stopping time as a comparison in Section V. When $\nu_{c}<\nu_{n}$ , our system model can be seen to be a generalization of the TCD problem variant where one seeks to detect the transient change as quickly as possible by letting $g_{n}=f$ . In 16 below, we propose a test statistic and stopping time for 2. By setting $g_{n}=f$ and $f_{n}=f$ in our test statistic, our proposed stopping time reduces to the window-limited CuSum stopping time with pre-change distribution $f$ and post-change distribution $g$ .

III Test Statistic for QCD with Nuisance Change

In this section, we derive a test-statistic and stopping time for QCD under a nuisance change. Suppose that we observe the sequence $X_{1},X_{2},\ldots$ and know a priori that the nuisance change does not take place (i.e., $\nu_{n}=\infty$ ), then Page’s CuSum test statistic[17] given as $S_{\text{CuSum}}(t)=\max_{1\leq k\leq t+1}\sum_{i=k}^{t}\log\tfrac{g(X_{i})}{f(X_{i})},$ can be used and we declare that a critical change has taken place at

[TABLE]

where $b$ is a pre-determined threshold. The CuSum test statistics has a convenient recursion $S_{\text{CuSum}}(t+1)=\max\left\{S_{\text{CuSum}}(t)+\log\tfrac{g(X_{t})}{f(X_{t})},0\right\},$ which allows the CuSum stopping time to be implemented efficiently.

If the nuisance change takes places at a time $\nu_{n}<\infty$ and $\nu_{n}$ is known, a modification of Page’s test statistic gives the following:

[TABLE]

where $h_{\nu_{n},1,i}(x)$ and $h_{\nu_{n},\infty,i}(x)$ are as defined in (1) and are the probability distributions corresponding to the cases where the critical change has already occurred or will never occur, respectively. Similar to the case where $\nu_{n}=\infty$ , the CuSum test statistics admits a convenient recursion for efficient implementation. Furthermore, for both the cases mentioned above, $\tau_{\text{CuSum}}$ was shown to be asymptotically optimal by [18].

A naive approach is to utilize four variants of $\tau_{\text{CuSum}}$ , one for detecting for a change in each of the cases: from $f$ to $f_{n}$ , from $f$ to $g$ , from $f$ to $g_{n}$ , and from $f_{n}$ to $g_{n}$ . In the first stage, we monitor for changes from $f$ to either $f_{n}$ , $g$ or $g_{n}$ . If a change to $f_{n}$ is detected, then we monitor for a change from $f_{n}$ to $g_{n}$ . The difficulty in such an approach is that any false alarm or miss detection in the first stage propagates to the second stage. We demonstrate that such an approach is suboptimal in Section V-A.

In our problem formulation, the nuisance change-point $\nu_{n}$ is unknown. Replacing $\nu_{n}$ with its maximum likelihood estimator in both the numerator and denominator, we obtain the following GLR test statistic and stopping time:

[TABLE]

From our simulations in Section V-A, it turns out that 12 does not achieve the best trade-off between average detection delay (ADD) and ARL to false alarm over a wide range of threshold values $b$ . Furthermore, its ARL is challenging to characterize theoretically since the GLR test statistic $\Lambda_{\text{GLR}}(k,t)$ is not a likelihood ratio and standard techniques in the QCD literature (e.g., Theorem 6.16 of [29]) cannot be used to analyze its ARL. This is a critical problem for practical applications that require us to pre-determine a suitable threshold $b$ to achieve a desired ARL.

To develop a stopping time with ARL that can be characterized theoretically, we simplify the maximum likelihood estimation in the numerator of 10 to be the maximum of only two cases $j=k$ and $j=t+1$ . This gives us the Simplified GLR (SGLR) test statistic and stopping time as follows:

[TABLE]

Unlike the CuSum test statistic, the SGLR test statistic does not have a convenient recursion. Any implementation of the SGLR stopping time would require computational resources that increases with the number of samples observed. The requirement on computational resources would be a significant limitation for many practical applications. To limit the computational resources required,in the same spirit as [20], we propose the Window-Limited SGLR (W-SGLR) test statistic and stopping time as follows:

[TABLE]

where the window size $m_{b}$ is chosen such that

[TABLE]

with

[TABLE]

and $o(b)$ denoting a term that goes to zero as $b\to\infty$ . Window-limited test statistics were first introduced by [49]. The paper [20] further discussed their properties and the choice of window size and thresholds. We make the following assumption.

Assumption 1.

The first four moments of $\log\tfrac{f_{n}(X)}{f(X)}$ w.r.t. both $g$ and $g_{n}$ are finite, and $\rho_{g},\rho_{g_{n}}\neq 0$ , where we define $\rho_{g}=\mathbb{E}_{g}\left[{\log\tfrac{f_{n}(X)}{f(X)}}\right]$ , $\sigma_{g}^{2}=\mathbb{E}_{g}\left[{\left(\log\tfrac{f_{n}(X)}{f(X)}-\rho_{g}\right)^{2}}\right]$ , $\omega_{g}^{4}=\mathbb{E}_{g}\left[{\left(\log\tfrac{f_{n}(X)}{f(X)}-\rho_{g}\right)^{4}}\right]$ , $\rho_{g_{n}}=\mathbb{E}_{g_{n}}\left[{\log\tfrac{f_{n}(X)}{f(X)}}\right]$ , $\sigma_{g_{n}}^{2}=\mathbb{E}_{g_{n}}\left[{\left(\log\tfrac{f_{n}(X)}{f(X)}-\rho_{g_{n}}\right)^{2}}\right]$ and $\omega_{g_{n}}^{4}=\mathbb{E}_{g_{n}}\left[{\left(\log\tfrac{f_{n}(X)}{f(X)}-\rho_{g_{n}}\right)^{4}}\right]$ .

In Theorem 3 of Section III-B, we show that the proposed $\tau_{\text{W-SGLR}}(b)$ is asymptotically optimal as $b\to\infty$ under Assumption 1 and an additional technical assumption. To do that, we first analyze the asymptotic properties of $\tau_{\text{W-SGLR}}$ . We let

[TABLE]

and study their properties in Section III-A. Then, using the relationships

[TABLE]

where

[TABLE]

we finally show the asymptotic optimality of $\tau_{\text{W-SGLR}}$ under mild technical conditions in Section III-B.

III-A Log Likelihood Ratio Growth Rates

In this subsection, we derive properties of $\Lambda$ and $\Lambda_{n}$ as defined in 20 and 21, respectively. The stopping times $\tau_{\text{SGLR}}(b)$ and $\tau_{\text{W-SGLR}}(b)$ are defined by the first time the test statistics $S_{\text{SGLR}}$ and $S_{\text{W-SGLR}}$ cross the threshold $b$ respectively. The rates of growth, $\tfrac{1}{t-k+1}\log\Lambda(k,t)$ and $\tfrac{1}{t-k+1}\log\Lambda_{n}(k,t)$ , allow us to understand the detection delay of these stopping times. We show that these rates of growth converge in probability as $t\to\infty$ . In particular, the limit that the rate of growth converges to depends on the sign of $\rho_{g}$ and $\rho_{g_{n}}$ .

As the nuisance change point is unknown, the denominator of both $\Lambda$ and $\Lambda_{n}$ contains a maximization of the likelihood $\max_{k\leq j\leq t+1}\prod_{i=k}^{j-1}f(X_{i})\prod_{i=j}^{t}f_{n}(X_{i}).$ If the first moment $\rho_{g}<0$ , the distribution $g$ is closer to the distribution $f$ as compared to $f_{n}$ in the KL divergence sense. When the critical change point is at $\nu_{c}=1$ and no nuisance change has taken place, we expect the denominator to approach $\prod_{i=k}^{t}f(X_{i})$ . Thus, our statistic $\Lambda(k,t)$ can be approximated by $\prod_{i=k}^{t}\tfrac{g(X_{i})}{f(X_{i})}$ . A similar argument can be made for $\Lambda_{n}$ when $\nu_{n}=\nu_{c}=1$ . This observation is made precise in the following two propositions.

Proposition 1.

Suppose that Assumption 1 holds, and $\rho_{g}<0$ . For any $\nu_{c}\leq k<\infty$ and $\epsilon>0$ , we have

[TABLE]

Proof:

See Appendix A. ∎

Proposition 2.

Suppose that Assumption 1 holds, and $\rho_{g_{n}}>0$ . For any $\nu_{c}\leq k<\infty$ , $\nu_{n}<\infty$ , and $\epsilon>0$ , we have

[TABLE]

Proof:

See Appendix A. ∎

Using Propositions 1 and 2 together with the weak law of large numbers, we obtain the following result.

Theorem 1.

Suppose that Assumption 1 holds, $\rho_{g}={D({g}\,||\,{f})}-{D({g}\,||\,{f_{n}})}<0$ , and $\rho_{g_{n}}={D({g_{n}}\,||\,{f})}-{D({g_{n}}\,||\,{f_{n}})}>0$ . For any $\nu_{c}\leq k<\infty$ ,

[TABLE]

as $t\to\infty$ . Furthermore, for any $\nu_{n}<\infty$ ,

[TABLE]

as $t\to\infty$ .

and

[TABLE]

Proof:

See Appendix B. ∎

The next lemma checks that our proposed stopping time satisfies the assumption required in [20] to relate the asymptotic upper-bound for the WADD to the threshold $b$ in Proposition 3.

Lemma 2.

Suppose that Assumption 1 holds. For any $\delta>0$ , we have

(i)

$\displaystyle\lim_{t\to\infty}\quad\ \ \sup_{{\nu_{n}\in\mathbb{N},1\leq\nu_{c}\leq k}}\mathbb{P}_{\nu_{n},\nu_{c}}\left({\tfrac{1}{t}\log\Lambda(k,k+t-1)-I\leq-\delta}\right)=0$ , and 2. (ii)

$\displaystyle\lim_{t\to\infty}\sup_{1\leq\nu_{c}\leq k}\mathbb{P}_{\infty,\nu_{c}}\left({\tfrac{1}{t}\log\Lambda(k,k+t-1)-I\leq-\delta}\right)=0$ .

Proof:

See Appendix C. ∎

Proposition 3.

Suppose that Assumption 1 holds. There exists a $B$ such that for all $b\geq B$ , we have

(i)

$\displaystyle\sup_{\nu_{n},\nu_{c}\geq 1}\operatorname*{ess\,sup}\mathbb{E}_{\nu_{n},\nu_{c}}\left[\left.{(\widetilde{\tau}_{n}(b)-\nu_{c}+1)^{+}}\,\middle|\,{X_{1},\ldots,X_{\nu_{c}-1}}\right.\right]\\ \leq(I^{-1}+o(1))b$ , and 2. (ii)

$\displaystyle\sup_{\nu_{c}\geq 1}\operatorname*{ess\,sup}\mathbb{E}_{\infty,\nu_{c}}\left[\left.{(\widetilde{\tau}(b)-\nu_{c}+1)^{+}}\,\middle|\,{X_{1},\ldots,X_{\nu_{c}-1}}\right.\right]\leq(I^{-1}+o(1))b$ .

Proof:

See Appendix D. ∎

Finally, we show the asymptotic optimality of $\tau_{\text{W-SGLR}}$ in the following result.

Theorem 3.

Suppose that Assumption 1 holds. For any $b>0$ ,

[TABLE]

where $o(1)$ is a term going to zero as $b\to\infty$ . Furthermore, if Assumption 2 holds, then the stopping time $\tau_{\text{W-SGLR}}(b)$ is asymptotically optimal for the problem (2) as $b\to\infty$ .

Proof:

See Appendix E. ∎

In Theorem 3, we have shown that $\tau_{\text{W-SGLR}}$ is asymptotically optimal under Assumption 1 and Assumption 2. In the next lemma, we derive sufficient conditions for Assumption 2 when $f,f_{n},g,g_{n}$ belong to an exponential family.

Lemma 3.

Suppose that $f$ , $f_{n}$ , $g$ , $g_{n}\in\{\phi:\ \phi(x)=h(x)\exp\left(\sum_{i=1}^{s}B_{i}(\theta)T_{i}(x)-A(\theta)\right)\}$ , an exponential family of distributions on $\mathbb{R}^{N}$ with parameters $\theta=\theta_{f},\theta_{f_{n}},\theta_{g},\theta_{g_{n}},$ respectively. Here, $T_{i}\in\mathbb{R}^{N}\times\mathbb{R}$ and $A,B_{i}\in\mathbb{R}^{M}\times\mathbb{R}$ for $i=1,\ldots,s$ . If any of the following inequalities hold:

[TABLE]

then Assumption 2 holds.

In particular, if $f=\mathcal{N}(\mu_{0},\sigma_{0}^{2}),f_{n}=\mathcal{N}(\mu_{1},\sigma_{0}^{2}),g=\mathcal{N}(\mu_{0},\sigma_{1}^{2})$ , and $g_{n}=\mathcal{N}(\mu_{1},\sigma_{1}^{2})$ with $\mu_{0},\mu_{1}\in\mathbb{R}$ , $\mu_{0}\neq\mu_{1}$ , $\sigma_{0},\sigma_{1}\in\mathbb{R}_{>0}$ , and $\sigma_{0}\neq\sigma_{1}$ , Assumption 2 holds.

Proof:

To show that 34 implies Assumption 2, we rearrange the terms on the left-hand side (L.H.S.) of 34 to obtain

[TABLE]

This implies that

[TABLE]

and hence Assumption 2 holds. A similar argument shows that 35 and 36 imply ${D({g_{n}}\,||\,{f})}<{D({g}\,||\,{f_{n}})}$ and ${D({g_{n}}\,||\,{f_{n}})}<{D({g}\,||\,{f_{n}})}$ , respectively.

If $f=\mathcal{N}(\mu_{0},\sigma_{0}^{2}),\ f_{n}=\mathcal{N}(\mu_{1},\sigma_{0}^{2}),\ g=\mathcal{N}(\mu_{0},\sigma_{1}^{2})$ , and $g_{n}=\mathcal{N}(\mu_{1},\sigma_{1}^{2})$ with $\mu_{0},\mu_{1}\in\mathbb{R}$ , $\mu_{0}\neq\mu_{1}$ , $\sigma_{0},\sigma_{1}\in\mathbb{R}_{>0}$ , and $\sigma_{0}\neq\sigma_{1}$ , we can define $\theta_{f}=[\mu_{0},\sigma_{0}^{2}],\ \theta_{f_{n}}=[\mu_{1},\sigma_{0}^{2}],\ \theta_{g}=[\mu_{0},\sigma_{1}^{2}],\ \theta_{g_{n}}=[\mu_{1},\sigma_{1}^{2}]$ , with the functions $B_{1}(\mu,\sigma^{2})=\mu/\sigma^{2}$ , $B_{2}(\mu,\sigma^{2})=\tfrac{-1}{2\sigma^{2}}$ , $T_{1}[X]=X$ , $T_{2}[X]=X^{2}$ and $A(\mu,\sigma^{2})=\tfrac{\mu^{2}}{2\sigma^{2}}+\log\sigma$ . The L.H.S. of 34 becomes $\tfrac{\mu_{1}^{2}-\mu_{0}^{2}}{\sigma_{0}^{2}}-\left(\tfrac{{\mu_{1}-\mu_{0}}}{\sigma_{0}}\right)\mu_{0}$ . Simplifying, the L.H.S. of 34 becomes $\tfrac{(\mu_{1}-\mu_{0})^{2}}{2\sigma^{2}_{0}}$ . Thus, for any $\mu_{1}\neq\mu_{0}$ and $\sigma_{0},\sigma_{1}\in\mathbb{R}_{>0}$ , the inequality 34 holds. The proof is now complete. ∎

IV Parametrized Families of Post-Change Distributions

In many applications, the post-change distribution $g$ and nuisance post-change distribution $g_{n}$ may contain unknown parameters. In this section, we modify $\tau$ and $\tau_{n}$ in 25 to obtain a Generalized Likelihood Ratio Test (GLRT)-based stopping time $\widehat{\tau}_{\text{W-SGLR}}$ for the following signal model: Let $\Theta\subseteq\mathbb{R}^{d}$ be a set with non-empty interior and $X_{1},X_{2},\ldots$ be a sequence of independent random variables satisfying: $X_{t}\sim h_{\nu_{n},\nu_{c},\theta,\theta_{n},t}$ where

[TABLE]

and $\theta,\theta_{n}\in\text{Int}(\Theta)$ , the interior of $\Theta$ . We derive a lower bound for the ARL of $\widehat{\tau}_{\text{W-SGLR}}$ under the following assumption.

Assumption 3.

$\Theta$ * is a compact $d$ -dimensional sub-manifold of $\mathbb{R}^{d}$ . The pdfs of the post-change distributions $g(\cdot;\theta)$ and nuisance post-change distribution $g_{n}(\cdot;\theta)$ are twice continuously differentiable w.r.t. $\theta$ .*

A commonly used method to handle unknown parameters is to replace the likelihood ratio $\Lambda(k,t)$ with the generalized likelihood ratio. We define the generalized W-SGLR test statistic $\widehat{S}_{\text{W-SGLR}}$ as

[TABLE]

where the minimal delay $m_{b}^{\prime}$ is required to prevent difficulties of under-determination when performing maximum likelihood estimation of the parameter $\theta$ . While 40 is commonly used, the maximization over $\Theta$ make it difficult to theoretically quantify the ARL of the stopping time $\inf\{t:\ \widehat{S}_{\text{W-SGLR}}(t)\geq b\}$ . To work around this problem, we modify the stopping times $\tau$ and $\tau_{n}$ as follows. Let $\lambda_{\max}(A)$ denote the largest eigenvalue of the symmetric matrix $A$ . Fix $m_{b}^{\prime}\geq 0$ . We let

[TABLE]

We define the generalized W-SGLR stopping time as

[TABLE]

Note that $\widehat{\tau}_{\text{W-SGLR}}$ is a modification of $\tau_{\text{W-SGLR}}$ with additional conditions required for stopping.

The paper [49] first introduced window-limited generalized detection rules. We compute the false alarm probability of $\widehat{\eta}_{l}$ and $\widehat{\eta}_{n,l}$ . We then use this false alarm probability to obtain a lower bound for the ARL of $\widehat{\tau}$ and $\widehat{\tau}_{n}$ in Proposition 4.

Lemma 4.

Suppose that Assumption 3 holds. Given any $0<\delta<1$ , there exists $b_{\delta}>0$ such that $\mathbb{P}_{\nu_{n},\infty}\left({\widehat{\eta}_{k}<\infty}\right)\leq\exp\left(-(1-\delta)b\right)$ and $\mathbb{P}_{\nu_{n},\infty}\left({\widehat{\eta}_{n,l}<\infty}\right)\leq\exp\left(-(1-\delta)b\right)$ for any $\nu_{n}\in\mathbb{N}\cup\{\infty\}$ and $b\geq b_{\delta}$ .

Proof:

As the proof is similar to Lemma 2 in [20], we omit it here and refer the reader to the extended version in [54]. ∎

Proposition 4.

Suppose that Assumption 3 holds. For any $0<\delta<1$ , there exists $b_{\delta}>0$ such that for all $b\geq b_{\delta}$ and $\nu_{n}\in\mathbb{N}\cup\{\infty\}$ , we have

[TABLE]

Proof:

We see that our stopping time $\widehat{\tau}_{\text{W-SGLR}}$ is effective in detecting critical changes while ignoring the nuisance change in pre-change regime for window sizes as small as $m_{b}=16$ . In practice, we can use graphs like Fig. 6(a) and Fig. 6(b) to compare if the increase in the test-statistic after the critical change is discernible from the test-statistic in the pre-change regime. This would provide assistance in determining if the choice window-size is suitable.

Next, we compare the generalized W-SGLR stopping time with the W-SGLR stopping time. In our simulations, our signal is generated using the following distributions $f=\mathcal{N}(0,1)$ , $f_{n}=\mathcal{N}(0,2)$ , $g=\mathcal{N}(\theta,1)$ , $g_{n}=\mathcal{N}(\theta_{n},2)$ . Here we set $\theta=\theta_{n}=2$ and assume that the condition that $\widehat{\theta}\in\text{Int}(\Theta)$ is always satisfied. We generate a signal of length $2^{16}=65,536$ and independently select the nuisance change point and critical change point with uniform probability on the $2^{16}$ possible data points. A total of $2^{12}=4096$ signals are generated. We compare the trade-off between the ADD and the ARL of the proposed W-SGLR stopping time when $\theta$ and $\theta_{n}$ are known against the generalized W-SGLR stopping time when $\theta$ and $\theta_{n}$ are unknown in Fig. 7. We observe that the generalized W-SGLR stopping time has a higher ADD as compared to the W-SGLR stopping time. Our experiments suggest that the difference in ADD is bounded as the ARL becomes large.

V-D Real Data

In this subsection, we test our proposed stopping time $\tau_{\text{W-SGLR}}$ on the Case Western Reserve University Bearing Dataset [30]. The dataset is collected from experiments conducted using an electric motor with accelerometer data measured at locations near to and remote from the motor bearings. Samples were collected at 12 KHz. We pre-process the signal by de-trending the signal using a first order finite difference: for each signal sample time $t$ , let $X_{t}=Y_{t}-Y_{t-1},$ where $Y_{t}$ is the observed raw signal sample at time $t$ .

We consider signals $X_{t}$ obtained at a motor load of 1hp and 2hp with normal bearings and also faulty bearings with a 0.007-inch fault diameter. We assume that the critical change would be the transition from a normal to faulty bearing, and a nuisance change would be a change in the motor load. We use the first 12,000 samples as training data to build a model for each of the following scenarios: normal bearings under a motor load of 1hp, normal bearings under a motor load of 2hp, faulty bearings under a motor load of 1hp, and faulty bearings under a motor load of 2hp. Fig. 8 shows the learned distributions of the de-trended signals observed in each scenario.

There are two challenges faced in testing our proposed stopping time on real data: (i) we lack theoretical results for the ARL of the 2-stage stopping times for the selection of appropriate thresholds for comparison and (ii) real run-to-failure data is difficult to obtain. We divide the remaining samples into 3 disjoint sets to address the above challenges.

For the first set, we create a training set of 1000 signals each with length 36,000 with a randomly selected nuisance change point $\nu_{n}$ for each signal such that there is a period of $\nu_{n}-1$ samples for a normal bearing under a motor load of 1hp, and a period of $36,000-\nu_{n}+1$ samples for a normal bearing under a motor load of 2hp. We select appropriate thresholds for each of the stopping times so that the empirical ARL varies between 1200 and 18,000.

The next two sets are testing sets. We create 1000 signals of length 3600 each with (i) a period of 1200 samples for a normal bearing under a motor load of 1hp, which transitions to (ii) a period of 1200 samples for a normal bearing under a motor load of 2hp, which finally transitions to (iii) a period of 1200 samples for a faulty bearing under a motor load of 2hp.

Similarly, we create 1000 signals for the scenario where a normal bearing under a motor load of 1hp transitions to a faulty bearing under a motor load of 1hp and finally a faulty bearing under a motor load of 2hp.

Finally, we apply the selected thresholds obtained from the first training set to the two testing sets to compute the stopping times’ empirical ADD performance. The window size of $m_{b}=8$ for the FMA stopping time is selected to minimize its empirical ADD on the test set. For this dataset, if $m_{b}$ is chosen to be $128$ or $1024$ , the empirical ADD of the FMA stopping time becomes much larger compared to the empirical ADD of the W-SGLR and 2-stage stopping times. Thus, we only present the ARL-ADD trade-off for $m_{b}=8$ .

In Figs. 9(a) and 9(b), we present some examples of the performance of the W-SGLR test statistic. It can be seen that in both cases, the test-statistic remains low before the bearing failure and quickly rises after the bearing fails even as the motor load changes.

In Fig. 10(a) and Fig. 10(b), we present the trade-off between the empirical ADD and ARL for the proposed W-SGLR stopping time with $m_{b}=1024$ , the 2-stage stopping times with different thresholds $b_{n}$ and the FMA stopping time. It can be seen that our proposed stopping time $\tau_{\text{W-SGLR}}$ achieves better ADD-ARL trade-off compared to the other stopping times. However, as the KL divergences ${D({g_{n}}\,||\,{f_{n}})},{D({g}\,||\,{f_{n}})},{D({g_{n}}\,||\,{f})},{D({g}\,||\,{f})}$ are large, the empirical ADD for all the algorithms remains low across the range of ADD tested. In this case, the reduction in empirical ADD is small, between $1$ to $4$ samples, over the range of ARLs tested. In terms of computational complexity, up till sample $t$ , the W-SGLR stopping time requires $O(m_{b}t)$ operations[43], which is slightly more than both the two-stage stopping time and the FMA stopping time, both of which require $O(t)$ operations. Thus, for applications that have limited computational resources and large differences in their pre and post-change distributions, we may want to consider using the FMA or the 2-stage stopping time as the degradation in performance is small.

VI Discussions and Conclusions

We have studied the non-Bayesian QCD problem where the signal may be subjected to a nuisance change. We proposed the W-SGLR stopping time that quickly detects the critical change while ignoring the nuisance change. The limited window size ensures that the W-SGLR stopping time does not require increasing computational resources as more samples are observed. We also derived the stopping time’s asymptotic behavior and showed that it is asymptotically optimal under mild technical assumptions. A generalized W-SGLR stopping time is also proposed for the case where the critical and nuisance post-change distributions are unknown but belong to a parametrized family. Numerical simulations and experiments on a real dataset demonstrated that the W-SGLR stopping time achieves better ADD-ARL trade-off than various other competing stopping times.

In this paper, we have assumed that if both the critical and nuisance changes occur, the eventual distribution that generates the signal is the same, regardless of which change comes first. A more general model would be to allow the eventual distribution to depend on the order of the change points. An easy generalization of the W-SGLR stopping would be to include all the different eventual distributions into the numerator of 13. The asymptotic trade-off between the WADD and ARL can be derived using similar techniques in Section III-A. However, deriving the conditions for asymptotic optimality of this stopping time is more complicated and would be a possible direction for future research.

Another possible future research direction is to consider a modification of the W-SGLR stopping time for the TCD problem under the possibility of a nuisance change. As the performance metrics of the TCD problem are different from the QCD problem, its asymptotic trade-off between the worst-case false alarms and missed detection within a specified window needs to be studied. Also, as the FMA performs well in the TCD problem, it will be interesting to consider if the FMA stopping time can be adapted to solve our QCD problem.

Appendix A Proof of Propositions 1 and 2

We start off with some notation definitions. Let $L_{i}=\log\tfrac{f_{n}(X_{i})}{f(X_{i})}$ . For any $N\geq 0$ , let $L_{i,>N}=L_{i}\mathbf{1}_{\{|L_{i}|>N\}}\ \text{and}\ L_{i,\leq N}=L_{i}\mathbf{1}_{\{|L_{i}|\leq N\}}.$ For any $k,t\in\mathbb{N}$ such that $k\leq t$ , we define the following averages:

[TABLE]

We have

[TABLE]

For the case where $k>t$ , we let $\overline{L^{k:t}}=\overline{L^{k:t}_{>N}}=\overline{L^{k:t}_{\leq N}}=0$ . Finally, we define the random variable

Next, we derive an upper bound for $\mathbb{E}_{\infty,\nu_{c}}\left[{\left|\overline{L^{V_{k,t}:t}_{>N_{g}}}\right|}\right]$ . For any $v\leq t$ , we have

[TABLE]

[TABLE]

From Theorem 2, there exists $N_{1}$ such that for all $n\geq N_{1}$ , we have

[TABLE]

From Markov’s inequality and Assumption 1, there exists $N_{2}$ such that for all $1\leq n<N_{1}$ and $m\geq N_{2}$ , we have

[TABLE]

Next, we show that for any $t\geq T=2N_{1}+N_{2}$ , both 81 and 82 are bounded by $\epsilon/2$ . There are three possible cases:

$\nu_{n}-k\geq N_{1}$ and $t-(\nu_{n}-k)\geq N_{1}$ , 2. 2.

$\nu_{n}-k<N_{1}$ and $t-(\nu_{n}-k)\geq N_{1}$ , 3. 3.

$\nu_{n}-k\geq N_{1}$ and $t-(\nu_{n}-k)<N_{1}$ .

Applying 83 and 84 in the first case, 84 and 85 in the second case, and 83 and 86 in the thrid case to 81 and 82, respectively, we obtain $\sup_{A_{2}}\mathbb{P}_{\nu_{n},\nu_{c}}\left({\tfrac{1}{t}\log\Lambda(k,k+t-1)-I\leq-\delta}\right)\leq\epsilon.$ The proof for ii is similar and proof is now complete.

Appendix D Proof of Proposition 3

From 18, there exists $\gamma>0$ such that $m_{b}\geq(1+\gamma)b/I$ for all $b$ sufficiently large. For any $0<\epsilon<\gamma/(1+\gamma)$ , let $n_{b}=\left\lceil{\tfrac{b}{(1-\epsilon)I}}\right\rceil$ and $\delta=\epsilon I$ . There exists $b_{1}>0$ such that $n_{b}(I-\delta)\geq b$ for all $b\geq b_{1}$ . From Lemma 2, by choosing $b_{1}$ sufficiently large, we have for all $b\geq b_{1}$ ,

[TABLE]

Let $b_{2}\geq b_{1}$ be such that $I/b_{2}\leq 1+\gamma-(1-\epsilon)^{-1}$ . Then, for $b\geq b_{2}$ , we have

[TABLE]

For any $k,\nu_{n},\nu_{c}\geq 1$ , we then have

[TABLE]

where the last equality follows from independence and the last inequality from 87. Therefore, for any $b\geq b_{2}$ , we have

[TABLE]

which yields i. The proof for ii is similar and the proposition is proved.

Appendix E Proof of Theorem 3

From Lemma 1, taking infimum on both sides of 31, we obtain $\text{ARL}(\tau_{\text{W-SGLR}}(b))=\inf_{\nu_{n}\in\mathbb{N}\cup\{\infty\}}\mathbb{E}_{\nu_{n},\infty}\left[{\tau_{\text{W-SGLR}}(b)}\right]\geq\tfrac{1}{2}e^{b}.$ Since $\tau_{\text{W-SGLR}}\leq\widetilde{\tau}$ and $\tau_{\text{W-SGLR}}\leq\widetilde{\tau}_{n}$ , by Proposition 3, we have $\text{WADD}(\tau_{\text{W-SGLR}}(b))\leq(I^{-1}+o(1))b$ as $b\to\infty$ .

To see that $\tau_{\text{W-SGLR}}(b)$ is asymptotically optimal when Assumption 2 is satisfied, let $C_{\gamma}=\{\tau\ :\ \text{ARL}(\tau)\geq\gamma\}$ be the set of stopping times satisfying $\text{ARL}(\tau)\geq\gamma$ . By expanding $\text{WADD}(\tau)$ using 3, we obtain

[TABLE]

where 88 is due to the min-max inequality[55]. For each of the cases $\nu_{n}\in\{0,\nu_{c},\infty\}$ , by Theorem 6.17 in [29], we have

[TABLE]

Since Assumption 2 is satisfied, we have

[TABLE]

Therefore, from 89 and 90, we obtain

[TABLE]

and the proof is now complete.

Appendix F Proof of Lemma 4

We use techniques is similar to [20] to prove Lemma 4. To analyze the probability $\mathbb{P}_{\nu_{n},\infty}\left({\widehat{\eta}_{k}<\infty}\right)$ , we use a change-of-measure argument. For any $0<\delta<1$ , choose $b_{\delta}\geq 0$ so that for any $b\geq b_{\delta}$

[TABLE]

where $|\Theta|$ is the volume or Lebesgue measure of $\Theta\subset\mathbb{R}^{d}$ and $\Gamma(\cdot)$ is the gamma function. From Kolmogorov’s Consistency Theorem, there is a probability measure $G_{\theta}$ for the stochastic process $(X_{i})_{i\geq k}$ under which the pdf of each $X_{i}$ is $g(\cdot;\theta)$ . Define a measure $H(\cdot)=\int_{\Theta}G_{\theta}(\cdot)\ \mathrm{d}\theta$ . Since $\Theta$ is compact in $\mathbb{R}^{d}$ , the measure $H$ is finite. For each $t\geq k$ , the Radon-Nikodym derivative of the law of $(X_{k},X_{k+1},\ldots,X_{t})$ under $H$ w.r.t. $\mathbb{P}_{\nu_{n},\infty}$ is

[TABLE]

which follows from Fubini’s Theorem. By Wald’s likelihood ratio identity,

[TABLE]

Suppose $\widehat{\eta}_{k}=t$ . Since $\widehat{\theta}=\operatorname*{arg\,max}_{\theta}\log\widehat{\Lambda}(k,t,\theta)\in\text{Int}(\Theta)$ , from Taylor series, there exists $\theta^{*}\in\Theta$ such that

[TABLE]

Thus, for $\|\theta-\widehat{\theta}\|<1/\sqrt{b}$ , we have

[TABLE]

where the last inequality follows from $\sup_{\|\theta-\widehat{\theta}\|<1/\sqrt{b}}\lambda_{\max}\left(-\nabla^{2}\log\widehat{\Lambda}(k,t,\theta)\right)\leq b$ . We obtain

[TABLE]

Therefore, we have

[TABLE]

This yields the upper bound

[TABLE]

Applying this upper bound to (91), we obtain

[TABLE]

for all $b\geq b_{\delta}$ . The proof that $\mathbb{P}_{\nu_{n},\infty}\left({\widehat{\eta}_{n,l}<\infty}\right)\leq\exp\left(-(1-\delta)b\right)$ is similar, and the lemma is proved.

Bibliography55

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] W. H. Woodall, D. J. Spitzner, D. C. Montgomery, and S. Gupta, “Using control charts to monitor process and product quality profiles,” J. of Quality Technology , vol. 36, no. 3, p. 309, 2004.
2[2] T. L. Lai, “Sequential changepoint detection in quality control and dynamical systems,” J. of the Roy. Statistical Soc. , pp. 613–658, 1995.
3[3] R. J. Bolton and D. J. Hand, “Statistical fraud detection: A review,” Statistical Sci. , pp. 235–249, 2002.
4[4] L. Lai, Y. Fan, and H. V. Poor, “Quickest detection in cognitive radio: A sequential change detection framework,” in IEEE Conf. Global Telecommun. IEEE, 2008, pp. 1–5.
5[5] K. Sequeira and M. Zaki, “ADMIT: anomaly-based data mining for intrusions,” in Proc. Conf. Knowl. Discovery and Data Mining . ACM, 2002, pp. 386–395.
6[6] A. G. Tartakovsky, B. L. Rozovskii, R. B. Blazek, and H. Kim, “A novel approach to detection of intrusions in computer networks via adaptive sequential and batch-sequential change-point detection methods,” IEEE Trans. Signal Process. , vol. 54, no. 9, pp. 3372–3382, 2006.
7[7] W. Luo, W. P. Tay, and M. Leng, “Infection spreading and source identification: A hide and seek game,” IEEE Trans. Signal Process. , vol. 64, no. 16, pp. 4228 – 4243, Aug. 2016.
8[8] H. Sohn, J. A. Czarnecki, and C. R. Farrar, “Structural health monitoring using statistical process control,” J. Structural Eng. , vol. 126, no. 11, pp. 1356–1363, 2000.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Quickest Change Detection in the Presence of a Nuisance Change

Abstract

Index Terms:

I Introduction

I-A Related Work

I-B Our Contributions

II Problem formulation

III Test Statistic for QCD with Nuisance Change

Assumption 1**.**

III-A Log Likelihood Ratio Growth Rates

Proposition 1**.**

Proof:

Proposition 2**.**

Proof:

Theorem 1**.**

Theorem 2**.**

III-B Conditions for Asymptotic Optimality

Assumption 2**.**

Lemma 1**.**

Proof:

Lemma 2**.**

Proof:

Proposition 3**.**

Proof:

Theorem 3**.**

Proof:

Lemma 3**.**

Proof:

IV Parametrized Families of Post-Change Distributions

Assumption 3**.**

Lemma 4**.**

Proof:

Proposition 4**.**

Proof:

V Numerical Results

V-A W-SGLR on Synthetic Data Satisfying Assumption 2

V-B W-SGLR on Synthetic Data Violating Assumption 2

V-C Parametrized Post-Change Distributions

V-D Real Data

VI Discussions and Conclusions

Appendix A Proof of Propositions 1 and 2

Lemma A.1**.**

Proof:

Lemma A.2**.**

Proof:

Lemma A.3**.**

Proof:

Lemma A.4**.**

Proof:

A-A Proof of Proposition 1

A-B Proof of Proposition 2

Appendix B Proof of Lemma 1

Appendix C Proof of Lemma 2

Appendix D Proof of Proposition 3

Appendix E Proof of Theorem 3

Appendix F Proof of Lemma 4

Assumption 1.

Proposition 1.

Proposition 2.

Theorem 1.

Theorem 2.

Assumption 2.

Lemma 1.

Lemma 2.

Proposition 3.

Theorem 3.

Lemma 3.

Assumption 3.

Lemma 4.

Proposition 4.

Lemma A.1.

Lemma A.2.

Lemma A.3.

Lemma A.4.