Active Hypothesis Testing: Beyond Chernoff-Stein

Dhruva Kartik; Ashutosh Nayyar; Urbashi Mitra

arXiv:1901.06795·cs.IT·January 23, 2019

Active Hypothesis Testing: Beyond Chernoff-Stein

Dhruva Kartik, Ashutosh Nayyar, Urbashi Mitra

PDF

TL;DR

This paper formulates an active hypothesis testing problem allowing a fixed number of experiments and inconclusive decisions, deriving bounds on misclassification probability and proposing a heuristic strategy for optimal decision-making.

Contribution

It introduces a new active hypothesis testing framework with bounds on misclassification, extending the Chernoff-Stein lemma and proposing a heuristic strategy.

Findings

01

Derived asymptotically tight bounds on misclassification probability.

02

Formulated a generalized Chernoff-Stein lemma for the problem.

03

Proposed a heuristic strategy with analyzed performance.

Abstract

An active hypothesis testing problem is formulated. In this problem, the agent can perform a fixed number of experiments and then decide on one of the hypotheses. The agent is also allowed to declare its experiments inconclusive if needed. The objective is to minimize the probability of making an incorrect inference (misclassification probability) while ensuring that the true hypothesis is declared conclusively with moderately high probability. For this problem, lower and upper bounds on the optimal misclassification probability are derived and these bounds are shown to be asymptotically tight. In the analysis, a sub-problem, which can be viewed as a generalization of the Chernoff-Stein lemma, is formulated and analyzed. A heuristic approach to strategy design is proposed and its relationship with existing heuristic strategies is discussed.

Equations216

D (p ∣∣ q) = y \in Y \sum p (y) lo g \frac{p ( y )}{q ( y )} .

D (p ∣∣ q) = y \in Y \sum p (y) lo g \frac{p ( y )}{q ( y )} .

Y_{n} = ξ (H, U_{n}, W_{n}) .

Y_{n} = ξ (H, U_{n}, W_{n}) .

p_{h}^{u} (y) := P (Y_{n} = y ∣ H = h, U_{n} = u) .

p_{h}^{u} (y) := P (Y_{n} = y ∣ H = h, U_{n} = u) .

I_{n} = {U_{1 : n - 1}, Y_{1 : n - 1}} .

I_{n} = {U_{1 : n - 1}, Y_{1 : n - 1}} .

U_{n} \sim g_{n} (I_{n}) .

U_{n} \sim g_{n} (I_{n}) .

\hat{H}_{N} = f (I_{N + 1}) .

\hat{H}_{N} = f (I_{N + 1}) .

ψ_{N} (i)

ψ_{N} (i)

ϕ_{N} (i)

γ_{N}

γ_{N}

= i \in H \sum ϕ_{N} (i) (1 - ρ_{1} (i)) .

f \in F, g \in G min

f \in F, g \in G min

ψ_{N} (i) \leq ϵ_{N}, \forall i \in H

ρ_{n} (i) = P [H = i ∣ U_{1 : n - 1}, Y_{1 : n - 1}] = P [H = i ∣ I_{n}] .

ρ_{n} (i) = P [H = i ∣ U_{1 : n - 1}, Y_{1 : n - 1}] = P [H = i ∣ I_{n}] .

C_{i} (ρ) := lo g \frac{ρ ( i )}{1 - ρ ( i )} .

C_{i} (ρ) := lo g \frac{ρ ( i )}{1 - ρ ( i )} .

J_{N}^{g} (i) := \frac{1}{N} E_{i}^{g} [C_{i} (ρ_{N + 1}) - C_{i} (ρ_{1})] .

J_{N}^{g} (i) := \frac{1}{N} E_{i}^{g} [C_{i} (ρ_{N + 1}) - C_{i} (ρ_{1})] .

λ_{j}^{i} (u, y) := lo g \frac{p _{i}^{u} ( y )}{p _{j}^{u} ( y )} .

λ_{j}^{i} (u, y) := lo g \frac{p _{i}^{u} ( y )}{p _{j}^{u} ( y )} .

D (p_{i}^{u} ∣∣ p_{j}^{u}) > 0.

D (p_{i}^{u} ∣∣ p_{j}^{u}) > 0.

D^{*} (i)

D^{*} (i)

= β \in Δ \tilde{H}_{i} min u \in U max j \neq = i \sum β (j) D (p_{i}^{u} ∣∣ p_{j}^{u}),

N \to \infty lim \frac{- lo g ϵ _{N}}{N} = 0.

N \to \infty lim \frac{- lo g ϵ _{N}}{N} = 0.

γ_{N}

γ_{N}

γ_{N}^{*}

γ_{N}^{*}

γ_{N}^{*} \leq i \in H \sum (1 - ρ_{1} (i)) exp (- N (D^{*} (i) - δ)) .

γ_{N}^{*} \leq i \in H \sum (1 - ρ_{1} (i)) exp (- N (D^{*} (i) - δ)) .

N \to \infty lim - \frac{1}{N} lo g γ_{N}^{*} = i \in H min D^{*} (i) .

N \to \infty lim - \frac{1}{N} lo g γ_{N}^{*} = i \in H min D^{*} (i) .

C_{i} (ρ_{N + 1}) - C_{i} (ρ_{1})

C_{i} (ρ_{N + 1}) - C_{i} (ρ_{1})

= - lo g j \neq = i \sum exp (lo g \tilde{ρ}_{1} (j) + n = 1 \sum N λ_{i}^{j} (U_{n}, Y_{n})),

lo g \frac{ρ _{N + 1} ( i )}{1 - ρ _{N + 1} ( i )} - lo g \frac{ρ _{1} ( i )}{1 - ρ _{1} ( i )}

lo g \frac{ρ _{N + 1} ( i )}{1 - ρ _{N + 1} ( i )} - lo g \frac{ρ _{1} ( i )}{1 - ρ _{1} ( i )}

=

=

=

=

∣ C_{i} (ρ_{N + 1}) - C_{i} (ρ_{1}) ∣ < N B,

∣ C_{i} (ρ_{N + 1}) - C_{i} (ρ_{1}) ∣ < N B,

C_{i} (ρ_{N + 1}) - C_{i} (ρ_{1})

C_{i} (ρ_{N + 1}) - C_{i} (ρ_{1})

\leq - lo g j \neq = i \sum exp (lo g \tilde{ρ}_{1} (j) - N B)

= - lo g j \neq = i \sum exp (lo g \tilde{ρ}_{1} (j)) + N B

= - lo g j \neq = i \sum \tilde{ρ}_{j} (1) + N B

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Active Hypothesis Testing: Beyond Chernoff-Stein

Dhruva Kartik, Ashutosh Nayyar and Urbashi Mitra

Ming Hsieh Department of Electrical Engineering

University of Southern California, Los Angeles, CA, USA

Email: {mokhasun, ashutosh.nayyar, ubli}@usc.edu

Abstract

An active hypothesis testing problem is formulated. In this problem, the agent can perform a fixed number of experiments and then decide on one of the hypotheses. The agent is also allowed to declare its experiments inconclusive if needed. The objective is to minimize the probability of making an incorrect inference (misclassification probability) while ensuring that the true hypothesis is declared conclusively with moderately high probability. For this problem, lower and upper bounds on the optimal misclassification probability are derived and these bounds are shown to be asymptotically tight. In the analysis, a sub-problem, which can be viewed as a generalization of the Chernoff-Stein lemma, is formulated and analyzed. A heuristic approach to strategy design is proposed and its relationship with existing heuristic strategies is discussed.

I Introduction

We frequently encounter scenarios wherein we would like to deduce whether one of several hypotheses is true by gathering data or evidence. This problem is referred to as multi-hypothesis testing. If we have access to multiple candidate experiments or data sources, we can adaptively select more informative experiments to infer the true hypothesis. This leads to a joint control and inference problem commonly referred to as active hypothesis testing. There are numerous ways of formulating this problem and the precise mathematical formulation depends on the target application.

In this paper, we consider a scenario in which there is an agent that can perform a fixed number of experiments. Subsequently, the agent can decide on one of the hypotheses using the collected data. The agent is also allowed to declare the experiments inconclusive if needed. The objective is to minimize the probability of making an incorrect inference (misclassification probability) while ensuring that the true hypothesis is declared conclusively with moderately high probability. This formulation is of particular interest when the agent is time-constrained and the penalty for making an incorrect inference is significantly higher than the penalty for making no decision. In such cases, it is reasonable for the agent to abstain from drawing conclusions unless there is strong evidence supporting one of the hypotheses.

For example, consider a decentralized system in which an agent needs to perform experiments and convey its results to another decision-maker (such as a fusion center). Due to communication constraints, the agent can only communicate its estimate of the hypothesis or remain silent. The agent incurs heavy penalty for transmitting an incorrect hypothesis. However, the agent is also constrained to transmit the true hypothesis with moderately high probability. Thus, we would like to design an experiment selection strategy and an inference (transmission) strategy for the agent which minimize the misclassification probability while ensuring that the correct estimate is transmitted with sufficiently high probability.

Our contributions in this paper can be summarized as follows. We find lower and upper bounds on the optimal misclassification probabilities in our constrained problem. These bounds are asymptotically tight under some mild assumptions. In our analysis, we formulate a sub-problem and use the results from the sub-problem to solve our original problem. This sub-problem can be viewed as a generalization of the Chernoff-Stein lemma [1] to a setting with multiple hypotheses and multiple experiments. Thereby, we describe an alternate approach to finding a lower bound on the optimal error probability in the Chernoff-Stein lemma. Further, we show that the experiment selection strategy described in [2, 3] in a sequential setting is asymptotically optimal for our fixed horizon problem. We also describe an alternate heuristic approach to strategy design that might improve the performance in the non-asymptotic regime. This approach is based on, what we call, the expected confidence rate which naturally arises out of our analysis.

The rest of the paper is organized as follows. In Section I-A, we summarize key prior literature on hypothesis testing and discuss how our problem is related to various other formulations. In Section I-B, we describe our notation and in Section II, we formulate our problem. We state the main results in Section III and sketch the proof of our results in Section IV. In Section V, we discuss some heuristic approaches for strategy design. We conclude the paper in Section VI.

I-A Prior Work

Hypothesis testing is a long-standing problem and has been addressed in various settings. The classical formulations have been described in [1],[2],[4]. More recently, active hypothesis testing has been addressed in [3],[5]. The key difference between our formulation and the fixed horizon formulations in [1], [3] is that unlike our agent, the agents in these works are compelled to decide on a hypothesis after performing all the experiments. Our analysis shows that this modification significantly alters the optimal error exponents and the strategy design for inference and experiment selection. Another common formulation is the sequential setting in which the agent can perform experiments until sufficiently strong evidence is gathered [2], [3], [5]. The objective in the sequential setting is to minimize a combination of Bayesian error probability and expected stopping time. Interestingly, the analysis and results in our fixed horizon problem have a strong overlap with those in the sequential setting. As mentioned earlier, a sub-problem in our analysis is a generalization of the Chernoff-Stein lemma [1] and our original problem can be seen as a symmetric version of this lemma. To the best of our knowledge, our formulation has not been considered before. The analysis involved in obtaining the upper bound for our problem borrows from prior works [2],[3]. However, our approach for obtaining lower bounds is different from the approach used in all the aforementioned works.

I-B Notation

Random variables are denoted by upper case letters, their realization by the corresponding lower case letter. We use calligraphic fonts to denote sets (e.g. $\mathcal{U}$ ) and $\Delta\mathcal{U}$ is the probability simplex over a finite set $\mathcal{U}$ . In general, subscripts denote time index unless stated otherwise. For time indices $n_{1}\leq n_{2}$ , ${{Y}}_{n_{1}:n_{2}}$ is the short hand notation for the variables $({{Y}}_{n_{1}},{{Y}}_{n_{1}+1},...,{{Y}}_{n_{2}})$ . For a strategy $g$ , we use ${\mathbb{P}}^{g}[\cdot]$ and ${\mathbb{E}}^{g}[\cdot]$ to indicate that the probability and expectation depend on the choice of $g$ . For an hypothesis $i$ , ${\mathbb{E}}_{i}^{g}[\cdot]$ denotes the expectation conditioned on hypothesis $i$ . The Kullback-Leibler divergence between distributions $p$ and $q$ over a finite space $\mathcal{Y}$ is given by

[TABLE]

II Problem Formulation

Let $\mathcal{H}=\{1,2,\ldots,M\}$ be a finite set of hypotheses and let the random variable ${{H}}$ denote the true hypothesis. The prior probability on ${{H}}$ is $\bm{\rho}_{1}$ . At each time $n=1,2,\ldots$ , an agent can perform an experiment ${{U}}_{n}\in\mathcal{U}$ and obtain an observation ${{Y}}_{n}\in\mathcal{Y}$ . We assume that the sets $\mathcal{U}$ and $\mathcal{Y}$ are finite. The observation ${{Y}}_{n}$ at time $n$ is given by

[TABLE]

where $\{{{W}}_{n}:n=1,2,\dots\}$ is a collection of mutually independent and identically distributed primitive random variables. The probability of observing $y$ after performing an experiment $u$ under hypothesis $h$ is denoted by $p_{h}^{u}(y)$ , that is,

[TABLE]

The time horizon, that is the total number of experiments performed, is fixed a priori to $N<\infty$ .

At time $n=1,2,\ldots$ , the information available to the agent, denoted by ${{I}}_{n}$ , is the collection of all experiments performed and the corresponding observations up to time $n-1$ , i.e.

[TABLE]

At time $n$ , the agent selects a distribution over the set of actions $\mathcal{U}$ according to an experiment selection rule $g_{n}$ and the action ${{U}}_{n}$ is randomly drawn from this distribution, that is

[TABLE]

The sequence $\{g_{n},n=1,\ldots,N\}$ is denoted by $g$ and referred to as the experiment selection strategy. Let the collection of all such strategies be $\mathcal{G}$ .

After performing $N$ experiments, the agent can declare one of the hypotheses to be true or it can declare that its experiments were inconclusive. We refer to this final declaration as the agent’s inference decision and denote it by $\hat{{{H}}}_{N}$ . The inference decision can take values in $\mathcal{H}\cup\{\varnothing\}$ , where $\varnothing$ denotes the inconclusive declaration. $\hat{{{H}}}_{N}$ is chosen according to an inference strategy $f$ , i.e.

[TABLE]

Let the set of all inference strategies be $\mathcal{F}$ .

For an experiment selection strategy $g$ and an inference strategy $f$ , we define the following error probabilities.

Definition 1.

Let $\psi_{N}(i)$ be the probability that the agent does not infer $i$ when the true hypothesis is indeed $i$ , i.e.

[TABLE]

Remark 1.

Note that when there are only two hypotheses and the agent is forced to decide on one of the two hypotheses, $\psi_{N}(1)$ and $\psi_{N}(2)$ are type I and type II errors, respectively. In this case, $\psi_{N}(1)=\phi_{N}(2)$ and $\psi_{N}(2)=\phi_{N}(1)$ .

In this paper, we will be interested in the event that the agent declares an incorrect hypothesis to be true. That is, we will consider the event $\cup_{i\in\mathcal{H}}\{\hat{{{H}}}_{N}=i,{{H}}\neq i\}$ . We refer to this event as the misclassification event. Let $\gamma_{N}$ be the probability of this event. Using the definitions above, this probability can be written as

[TABLE]

We will consider the problem of designing the experiment selection and inference strategies to minimize $\gamma_{N}$ (the probability of declaring an incorrect hypothesis) while satisfying constraints on the type- $i$ error probabilities. That is, we are interested in the following optimization problem:

[TABLE]

where $0<\epsilon_{N}<1$ . Let $\gamma^{*}_{N}$ denote the infimum value of this optimization problem. We define $\gamma^{*}_{N}:=\infty$ if the optimization problem is infeasible.

The above formulation is intended for scenarios where the penalty for declaring an incorrect hypothesis to be true is much higher than the penalty for making no decision about the hypothesis. In such cases, it is reasonable for the agent to abstain from drawing conclusions when the evidence is not strong enough. The constraints on type- $i$ error probabilities ensure that the agent does not abstain from drawing conclusions too often. The optimization problem seeks to minimize the probability of declaring an incorrect hypothesis while satisfying the type- $i$ error probability constraints.

III Main Results

In this section, we will describe asymptotically tight lower and upper bounds on the optimal error probability $\gamma_{N}^{*}$ in Problem (P1). We will first define some useful quantities and then state the assumptions we make to prove our results.

The posterior belief $\bm{\rho}_{n}$ on the hypothesis ${{H}}$ based on information ${{I}}_{n}$ is given by

[TABLE]

Note that given a realization of the experiments and observations until time $n$ , the posterior belief does not depend on the experiment selection strategy $g$ .

Definition 2 (Bayesian Log-Likelihood Ratio & Expected Confidence Rate).

The Bayesian log-likelihood ratio $\mathcal{C}_{i}(\bm{\rho})$ associated with an hypothesis $i\in\mathcal{H}$ is defined as

[TABLE]

The Bayesian log-likelihood ratio (BLLR) is the logarithm of the ratio of the probability that hypothesis $i$ is true versus the probability that hypothesis $i$ is not true. The BLLR can be interpreted as a confidence level on hypothesis $i$ . For a hypothesis $i$ and a strategy $g\in\mathcal{G}$ , we define the expected confidence rate $J_{N}^{g}(i)$ as

[TABLE]

Assumption 1 (Full support).

There exists a constant $B>0$ such that $|\lambda_{j}^{i}(u,y)|<B$ for every experiment $u\in\mathcal{U}$ , observation $y\in\mathcal{Y}$ and pair of hypotheses $i,j\in\mathcal{H}$ , where

[TABLE]

Assumption 2.

For each experiment $u\in\mathcal{U}$ and any pair of hypotheses $i,j\in\mathcal{H}$ such that $i\neq j$ , we have

[TABLE]

Remark 2.

We make Assumption 2 for ease of exposition. Techniques for relaxing this Assumption have been discussed in [3] and [5].

For each hypothesis $i\in\mathcal{H}$ , define

[TABLE]

where $\tilde{\mathcal{H}}_{i}=\mathcal{H}\setminus\{i\}$ . The equality of the min-max and max-min values follows from the minimax theorem [6] because the sets $\mathcal{U}$ and $\mathcal{H}$ are finite and the Kullback-Leibler divergences are bounded by $B$ .

Assumption 3.

For every $N\geq 1$ , we have that the bound on the type- $i$ error satisfies $0<\epsilon_{N}\leq 1/2N$ . Further,

[TABLE]

Theorem 1 (Lower bound).

There exists a positive constant $K_{1}$ that does not depend on $N$ such that for every $N\geq 1$ the following statements are true.

a)

For any experiment selection strategy $g$ and inference strategy $f$ that satisfy the constraints $\psi_{N}(i)\leq\epsilon_{N}$ for every $i\in\mathcal{H}$ , we have the lower bound

[TABLE]

where $J^{g}_{N}(i)$ is given by (12). 2. b)

The optimal misclassification probability $\gamma_{N}^{*}$ in Problem (P1) satisfies

[TABLE]

where $D^{*}(i)$ is given by (14).

Theorem 2 (Upper bound).

For any $\delta>0$ , there exists an integer $N_{\delta}$ such that for every $N\geq N_{\delta}$ , we have

[TABLE]

Using Theorems 1 and 2, we can therefore conclude that

[TABLE]

IV Proof of Main Results

IV-A Supporting Lemmas

In this section, we describe some important properties of the confidence level $\mathcal{C}_{i}(\bm{\rho})$ which will be used in the proof of our main results.

Lemma 1.

For any experiment selection strategy $g$ , we have

[TABLE]

where $\tilde{\rho}_{1}(j)=\rho_{1}(j)/(1-\rho_{1}(i)).$

Proof.

We have

[TABLE]

∎

Corollary 1 (Bounded increments).

For any experiment selection strategy $g\in\mathcal{G}$ , we have

[TABLE]

with probability 1.

Proof.

Using Lemma 1, we have

[TABLE]

Similarly,

[TABLE]

The same arguments can be used to show that for any experiment selection strategy $g\in\mathcal{G}$ and $1\leq n\leq N$ , we have

[TABLE]

with probability 1. ∎

Corollary 2.

If for every $j\neq i$ , $\sum_{n=1}^{N}\lambda_{j}^{i}({{U}}_{n},{{Y}}_{n}))\geq\theta$ for some $\theta\in{\mathbb{R}}$ , then

[TABLE]

Proof.

We have

[TABLE]

∎

Lemma 2.

For any experiment selection strategy $g$ ,

[TABLE]

Proof.

[TABLE]

The last inequality follows from the fact that the observations ${{Y}}_{n}$ are independent conditioned on the experiment ${{U}}_{n}$ . ∎

Definition 3.

We define the following distributions:

[TABLE]

where $\tilde{\mathcal{H}}_{i}=\mathcal{H}\setminus\{i\}$ .

Lemma 3.

For any experiment selection strategy $g$ , we have

[TABLE]

where $\tilde{\rho}_{1}(j)=\rho_{1}(j)/(1-\rho_{1}(i)).$

Proof.

For every $j\neq i$ , we have the following since $\log x$ is an increasing function

[TABLE]

Therefore,

[TABLE]

Further,

[TABLE]

∎

Lemma 4.

Let $f$ be an inference strategy in which hypothesis $i$ is decided if and only if $\mathcal{C}_{i}(\bm{\rho}_{N+1})-\mathcal{C}_{i}(\bm{\rho}_{1})\geq\theta$ . Then

[TABLE]

Proof.

In this proof, substitute $n$ with $N$ . Let $Z_{n}$ be the region in which the inference policy $f$ selects hypothesis $i$ , that is

[TABLE]

We have

[TABLE]

Therefore,

[TABLE]

∎

IV-B Sub-problem vis-à-vis Chernoff-Stein

We formulate a sub-problem in this section that will be useful for analyzing Problem (P1). For hypothesis $i\in\mathcal{H}$ , consider the following optimization problem:

[TABLE]

Let the infimum value of this optimization problem be $\phi^{*}_{N}(i)$ . Note that this problem is always feasible because the agent can trivially satisfy the type- $i$ error constraint by always declaring hypothesis $i$ .

Remark 3.

When there is only one experiment, two hypotheses and the inconclusive decision $\varnothing$ is not allowed, this formulation is identical to that of the Chernoff-Stein lemma [1].

We follow the proof methodology of the Chernoff-Stein lemma in [1], but with some important modifications. For each experiment selection strategy $g$ and inference strategy $f$ that satisfy the type- $i$ error constraint, we first establish a lower bound on the error probability $\phi_{N}(i)$ based on the expected confidence rate $J_{N}^{g}$ . In [1], the lower bound is obtained using a typicality argument. However, such typicality properties may not hold for every experiment selection strategy $g$ . Thus, we use a different approach to obtain a similar lower bound. We then use Lemma 3 to obtain a lower bound on $\phi_{N}(i)$ that does not depend on the strategies $g$ and $f$ . Further, we construct strategies that asymptotically achieve this strategy-independent lower bound. The construction of these strategies and the analysis thereof builds on the achievability proofs in [1] and [2].

Lemma 5.

Let $g$ be any experiment selection strategy and let $f$ be any inference strategy such that $\psi_{N}(i)\leq\epsilon_{N}.$ Then

[TABLE]

Proof.

In this proof, substitute $n$ with $N$ and $\epsilon$ with $\epsilon_{N}$ . Also, if the belief $\bm{\rho}_{n+1}$ is formed using information $\iota_{n+1}$ , we denote $\mathcal{C}_{i}(\bm{\rho}_{n+1})$ with $\mathcal{C}_{i}(\iota_{n+1})$ to emphasize dependence on $\iota_{n+1}$ . Let $Z_{n}$ be the region in which the inference policy $f$ selects hypothesis $i$ , that is

[TABLE]

We have

[TABLE]

The function $-\log x$ is convex in $x$ and thus, using Jensen’s inequality, we have

[TABLE]

Further, we have

[TABLE]

Therefore,

[TABLE]

The last inequality follows once again from Corollary 1. Hence using inequality (83), we have

[TABLE]

Therefore,

[TABLE]

∎

Lemma 6.

Let $g$ be any experiment selection strategy and let $f$ be any inference strategy such that $\psi_{N}(i)\leq\epsilon_{N}.$ Then there exist positive constants $K_{1}(i)\leq K^{\prime}_{1}(i)$ that do not depend on $N$ such that

[TABLE]

Proof.

This follows directly from Lemmas 3 and 5, and the fact that $\epsilon_{N}\leq 1/2N$ . ∎

Lemma 7.

There exists an integer $N_{i}$ such that for every $N\geq N_{i}$

[TABLE]

Proof.

We prove this by constructing an experiment selection strategy and an inference strategy that achieve the rate and constraints. Let the agent select experiments randomly and independently from the distribution $\bm{\alpha}^{i*}$ . Under strategy, we have for every $j\neq i$

[TABLE]

Using Hoeffding’s inequality, we have

[TABLE]

Let the inference policy be as follows. If $\mathcal{C}_{i}(\bm{\rho}_{N+1})\geq ND^{*}(i)-2B\sqrt{N\log\frac{M}{\epsilon_{N}}}+\mathcal{C}_{i}(\bm{\rho}_{1})$ , decide hypothesis $i$ . Otherwise, declare $\varnothing$ . From Lemma 4, we have

[TABLE]

And thus,

[TABLE]

Now we need to show that ${\mathbb{P}}^{f,g}[\hat{{{H}}}_{N+1}=i\mid{{H}}=i]\geq 1-\epsilon_{N}$ . Notice that under the inference strategy, $\hat{{{H}}}_{N+1}\neq i$ if and only if $\mathcal{C}_{i}(\bm{\rho}_{N+1})<ND^{*}(i)-2B\sqrt{N\log\frac{M}{\epsilon_{N}}}+\mathcal{C}_{i}(\bm{\rho}_{1})$ . Therefore $\hat{{{H}}}_{N+1}\neq i$ , by Corollary 2, implies that for some $j\neq i$

[TABLE]

Using the inequality established in (101) and a union bound, the probability of this event conditioned on hypothesis $i$ is at most $\epsilon_{N}$ . Therefore, ${\mathbb{P}}^{f,g}[\hat{{{H}}}_{N+1}\neq i\mid{{H}}=i]\leq\epsilon_{N}$ . ∎

Using Lemmas 6 and 7 we can therefore conclude that

[TABLE]

IV-C Proof of Theorem 1

In problem (P1), the strategies $f,g$ are required to satisfy the constraints $\psi_{N}(i)\leq\epsilon_{N}$ for every $i\in\mathcal{H}$ . For any pair $f,g$ of strategies that do satisfy all these constraints, we have the following because of Lemma 6.

[TABLE]

With $K_{1}$ defined as $K_{1}:=\max_{i\in\mathcal{H}}K^{\prime}_{1}(i)\geq\max_{i\in\mathcal{H}}K_{1}(i)$ , this proves Theorem 1.

Remark 4.

We can obtain a tighter lower bound on $\gamma_{N}$ using Lemma 5 without any restrictions on $\epsilon_{N}$ , that is

[TABLE]

IV-D Proof of Theorem 2

To prove Theorem 2, we construct appropriate experiment selection and inference strategies. We then show that these strategies achieve the desired bound on misclassification probability while satisfying the constraints in problem (P1). The construction is almost identical to the strategies in [2].

IV-D1 Experiment selection strategy

Let the maximum a posteriori (MAP) estimate at time $n$ be

[TABLE]

If $\bar{i}_{n}=i$ , then an experiment is selected randomly with distribution $\bm{\alpha}^{i*}$ . We denote this experiment selection strategy by $\bar{g}$ .

IV-D2 Inference strategy

Consider the strategy $\bar{f}$ where

[TABLE]

Let us assume that $\delta<D^{*}(i)$ for every $i\in\mathcal{H}$ without loss of generality. This ensures that for large enough $N$ , the threshold condition above can be satisfied by at most one hypothesis.

Using Lemma 4, we can conclude that for each hypothesis $i\in\mathcal{H}$ , $\phi_{N}(i)\leq e^{-(ND^{*}(i)-N\delta)}$ . Therefore, under these strategies, we have

[TABLE]

If we can show that the strategies also satisfy the type- $i$ error constraints in problem (P1), then clearly, $\gamma_{N}^{*}\leq\gamma_{N}$ . To prove that these strategies do satisfy the constraints for large values of $N$ , we use the arguments in [2] and the details of this proof are as follows.

Proof.

We now need to verify if the proposed strategy achieves the type- $i$ error constraints, that is $\psi_{N}(i)\leq\epsilon_{N}$ for each hypothesis $i\in\mathcal{H}$ . Let us examine the evolution of the log likelihood ratio $\lambda_{j}^{i}$ associated with a pair of hypotheses $i$ and $j$ under the hypothesis $i$ . Consider the following Doob decomposition

[TABLE]

Note that ${{X}}_{n}$ is a martingale difference sequence with respect to the filtration ${{I}}_{n}$ and $|{{X}}_{n}|<2B$ with probability 1. Using Azuma’s inequality [7], we have

[TABLE]

We can choose $K_{3}>0$ such that

[TABLE]

Let ${{T}}$ be the smallest time index such that $\bar{i}_{n}=i$ for every $n\geq{{T}}$ . Notice that ${{T}}$ is a random variable. Under Assumption 2, it was shown in [2] (Lemma 1) that there exist constants $b,K>0$ such that for every $i\in\mathcal{H}$ , we have ${\mathbb{P}}_{i}[{{T}}>n]\leq Ke^{-bn}$ . Notice that

[TABLE]

Therefore, if

[TABLE]

Therefore,

[TABLE]

We can pick a $K_{2}$ such that for large enough $N$

[TABLE]

Using inequalities (114) and (119), we can conclude that

[TABLE]

Because of Assumption 3, for large enough $N$ , we have

[TABLE]

Using this fact and Corollary 2, we have

[TABLE]

Therefore, we can conclude that $\psi_{N}(i)\leq\epsilon_{N}$ for every $i\in\mathcal{H}$ . ∎

V Discussion on Strategy Design

In Section IV-D, we described an experiment selection strategy $\bar{g}$ . As discussed earlier, the agent starts with a prior belief on the set of hypotheses and as it performs experiments, its confidence on the true hypothesis improves. We refer to this initial phase of experimentation as the exploration phase. Soon enough, the MAP estimate $\bar{i}_{n}={{H}}$ . This implies that the agent starts selecting experiments using the distribution $\bm{\alpha}^{{{H}}*}$ which rapidly improves its confidence on ${{H}}$ . We refer to this subsequent phase of experimentation as the verification phase.

According to Lemma 1 in [2], the exploration phase terminates in $\mathcal{O}(\log N)$ time with high probability under Assumption 2. We can relax Assumption 2 using the technique in [3] and show that the exploration phase terminates in sublinear time with high probability. Therefore, in the asymptotic analysis, the impact of exploration on the overall performance is negligible. However, in the non-asymptotic regime, the exploration performance may have a significant impact on the overall performance, especially in problems like dynamic search over trees. This issue was discussed in [5] and [8] in a stopping time setting and heuristic strategies were proposed to improve the exploration performance. One such heuristic is based on Extrinsic Jensen-Shannon (EJS) divergence [9]. Using our notation, the EJS divergence associated with an experiment $u$ and posterior belief $\bm{\rho}_{n}$ is the expected increment in confidence level on ${{H}}$ , that is

[TABLE]

The heuristic strategy in [9] is to greedily select the experiment that maximizes $EJS(\bm{\rho}_{n},u)$ at time $n$ .

We described a lower bound on the error probability $\gamma_{N}$ in Remark 4. Using Jensen’s inequality, we can further weaken the bound (109) to obtain the following lower bound

[TABLE]

Therefore, as a heuristic, one can use the expected confidence level ${\mathbb{E}}^{g}[\mathcal{C}_{{{H}}}(\bm{\rho}_{N+1})]$ as a proxy for $-\log\gamma_{N}$ and try to maximize the confidence level instead of the error rate. This is equivalent to maximizing $J_{N}^{g}$ where

[TABLE]

It was shown in [10] that maximizing $J_{N}^{g}$ can be formulated as a Partially Observable Markov Decision Problem (POMDP). Using heuristics for solving POMDPs, we can approximately optimize $J_{N}^{g}$ and one such heuristic was presented in [10]. The strategy based on EJS divergence [9] happens to be a one-step greedy policy with respect to this POMDP.

VI Conclusions

We formulated a fixed horizon active hypothesis testing problem in which the agent can decide on one of the hypotheses or declare its experiments inconclusive. For analyzing this problem, we formulated a sub-problem which is a generalization of Chernoff-Stein lemma [1] to a setting with multiple hypotheses and multiple experiments. We obtained lower bounds on optimal error probability in the sub-problem and used them to obtain lower bounds on misclassification probability in our original problem. We also derived upper bounds by constructing appropriate strategies and analyzing their performance. We defined a quantity called expected confidence rate and based on it, we proposed a heuristic approach for strategy design.

Acknowledgments

This research was supported, in part, by National Science Foundation under Grant NSF CNS-1213128, CCF-1410009, CPS-1446901, Grant ONR N00014-15-1-2550, and Grant AFOSR FA9550-12-1-0215.

Bibliography10

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] T. M. Cover and J. A. Thomas, Elements of information theory . John Wiley & Sons, 2012.
2[2] H. Chernoff, “Sequential design of experiments,” The Annals of Mathematical Statistics , vol. 30, no. 3, pp. 755–770, 1959.
3[3] S. Nitinawarat, G. K. Atia, and V. V. Veeravalli, “Controlled sensing for multihypothesis testing,” IEEE Transactions on Automatic Control , vol. 58, no. 10, pp. 2451–2464, 2013.
4[4] A. Wald, Sequential analysis . Courier Corporation, 1973.
5[5] M. Naghshvar, T. Javidi et al. , “Active sequential hypothesis testing,” The Annals of Statistics , vol. 41, no. 6, pp. 2703–2738, 2013.
6[6] M. J. Osborne and A. Rubinstein, A course in game theory . MIT press, 1994.
7[7] K. Azuma, “Weighted sums of certain dependent random variables,” Tohoku Mathematical Journal, Second Series , vol. 19, no. 3, pp. 357–367, 1967.
8[8] C. Wang, K. Cohen, and Q. Zhao, “Active hypothesis testing on a tree: Anomaly detection under hierarchical observations,” in Information Theory (ISIT), 2017 IEEE International Symposium on . IEEE, 2017, pp. 993–997.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Active Hypothesis Testing: Beyond Chernoff-Stein

Abstract

I Introduction

I-A Prior Work

I-B Notation

II Problem Formulation

Definition 1**.**

Remark 1**.**

III Main Results

Definition 2** (Bayesian Log-Likelihood Ratio & Expected Confidence Rate).**

Assumption 1** (Full support).**

Assumption 2**.**

Remark 2**.**

Assumption 3**.**

Theorem 1** (Lower bound).**

Theorem 2** (Upper bound).**

IV Proof of Main Results

IV-A Supporting Lemmas

Lemma 1**.**

Proof.

Corollary 1** (Bounded increments).**

Proof.

Corollary 2**.**

Proof.

Lemma 2**.**

Proof.

Definition 3**.**

Lemma 3**.**

Proof.

Lemma 4**.**

Proof.

IV-B Sub-problem vis-à-vis Chernoff-Stein

Remark 3**.**

Lemma 5**.**

Proof.

Lemma 6**.**

Proof.

Lemma 7**.**

Proof.

IV-C Proof of Theorem 1

Remark 4**.**

IV-D Proof of Theorem 2

IV-D1 Experiment selection strategy

IV-D2 Inference strategy

Proof.

V Discussion on Strategy Design

VI Conclusions

Acknowledgments

Definition 1.

Remark 1.

Definition 2 (Bayesian Log-Likelihood Ratio & Expected Confidence Rate).

Assumption 1 (Full support).

Assumption 2.

Remark 2.

Assumption 3.

Theorem 1 (Lower bound).

Theorem 2 (Upper bound).

Lemma 1.

Corollary 1 (Bounded increments).

Corollary 2.

Lemma 2.

Definition 3.

Lemma 3.

Lemma 4.

Remark 3.

Lemma 5.

Lemma 6.

Lemma 7.

Remark 4.