Sequential Experiment Design for Hypothesis Verification

Dhruva Kartik; Ashutosh Nayyar; Urbashi Mitra

arXiv:1812.01137·stat.ML·December 5, 2018

Sequential Experiment Design for Hypothesis Verification

Dhruva Kartik, Ashutosh Nayyar, Urbashi Mitra

PDF

TL;DR

This paper formulates the hypothesis verification problem as a POMDP and proposes a heuristic strategy that improves confidence levels in hypothesis testing, with demonstrated advantages over existing methods.

Contribution

It introduces a novel POMDP-based framework for hypothesis verification and proposes a simple heuristic strategy with a game-theoretic interpretation.

Findings

01

Heuristic strategy outperforms some existing methods in numerical experiments.

02

Verification problem formulated as a Markov Decision Process.

03

Relationship between hypothesis testing and verification established.

Abstract

Hypothesis testing is an important problem with applications in target localization, clinical trials etc. Many active hypothesis testing strategies operate in two phases: an exploration phase and a verification phase. In the exploration phase, selection of experiments is such that a moderate level of confidence on the true hypothesis is achieved. Subsequent experiment design aims at improving the confidence level on this hypothesis to the desired level. In this paper, the focus is on the verification phase. A confidence measure is defined and active hypothesis testing is formulated as a confidence maximization problem in an infinite-horizon average-reward Partially Observable Markov Decision Process (POMDP) setting. The problem of maximizing confidence conditioned on a particular hypothesis is referred to as the hypothesis verification problem. The relationship between hypothesis…

Equations112

H (p) = - y \in Y \sum p (y) lo g p (y) .

H (p) = - y \in Y \sum p (y) lo g p (y) .

D (p ∣∣ q) = y \in Y \sum p (y) lo g \frac{p ( y )}{q ( y )} .

D (p ∣∣ q) = y \in Y \sum p (y) lo g \frac{p ( y )}{q ( y )} .

Y = ξ (H, u, W_{k}^{u}),

Y = ξ (H, u, W_{k}^{u}),

Y_{n} = ξ (H, U_{n}, W_{n}) .

Y_{n} = ξ (H, U_{n}, W_{n}) .

I_{n} = {U_{1 : n - 1}, Y_{1 : n - 1}} .

I_{n} = {U_{1 : n - 1}, Y_{1 : n - 1}} .

U_{n} = g_{n} (I_{n}) .

U_{n} = g_{n} (I_{n}) .

ρ_{h} (n) = P [H = h ∣ Y_{1 : n - 1}, U_{1 : n - 1}] .

ρ_{h} (n) = P [H = h ∣ Y_{1 : n - 1}, U_{1 : n - 1}] .

C_{h} (ρ) := lo g \frac{ρ _{h}}{1 - ρ _{h}} .

C_{h} (ρ) := lo g \frac{ρ _{h}}{1 - ρ _{h}} .

\frac{C _{H} ( ρ ( N + 1 )) - C _{H} ( ρ ( 1 ))}{N} .

\frac{C _{H} ( ρ ( N + 1 )) - C _{H} ( ρ ( 1 ))}{N} .

K (g)

K (g)

N \to \infty lim in f \frac{1}{N} E^{g} [C_{H} (ρ (N + 1)) - C_{H} (ρ (1)) ∣ H = h] .

N \to \infty lim in f \frac{1}{N} E^{g} [C_{H} (ρ (N + 1)) - C_{H} (ρ (1)) ∣ H = h] .

J^{*} (h) = g \in G sup J (g, h) .

J^{*} (h) = g \in G sup J (g, h) .

K (g) \leq h \in H \sum ρ_{h} (1) J^{*} (h) .

K (g) \leq h \in H \sum ρ_{h} (1) J^{*} (h) .

K (g)

K (g)

\overset{g}{ˉ} (ρ) = {g^{*} (h) (ρ) g^{e} (ρ) if for some h, ρ_{h} > \overset{ρ}{ˉ} otherwise,

\overset{g}{ˉ} (ρ) = {g^{*} (h) (ρ) g^{e} (ρ) if for some h, ρ_{h} > \overset{ρ}{ˉ} otherwise,

ρ_{h} (n + 1) = \frac{ρ _{h} ( n ) p _{h}^{u} ( y )}{\sum _{h^{'}} ρ _{h^{'}} ( n ) p _{h^{'}}^{u} ( y )} .

ρ_{h} (n + 1) = \frac{ρ _{h} ( n ) p _{h}^{u} ( y )}{\sum _{h^{'}} ρ _{h^{'}} ( n ) p _{h^{'}}^{u} ( y )} .

ρ (n + 1)

ρ (n + 1)

J_{N} (g) :

J_{N} (g) :

= \frac{1}{N} E^{g} n = 1 \sum N [C_{1} (ρ (n + 1)) - C_{1} (ρ (n))]

= \frac{1}{N} E^{g} n = 1 \sum N E [C_{1} (ρ (n + 1)) - C_{1} (ρ (n)) ∣ I_{n}, U_{n}]

= \frac{1}{N} E^{g} n = 1 \sum N E [C_{1} (ρ (n + 1)) - C_{1} (ρ (n)) ∣ ρ (n), U_{n}]

=: \frac{1}{N} E^{g} n = 1 \sum N r (ρ (n), U_{n}) .

r (ρ, u)

r (ρ, u)

= y \in Y \sum p_{1}^{u} (y) lo g \frac{p _{1}^{u} ( y )}{\sum _{j \neq = 1} ρ ~ _{j} p _{j}^{u} ( y )},

J (g) := N \to \infty lim in f \frac{1}{N} n = 1 \sum N E^{g} (r (ρ (n), U_{n})) .

J (g) := N \to \infty lim in f \frac{1}{N} n = 1 \sum N E^{g} (r (ρ (n), U_{n})) .

J^{'} + w (ρ) = u max {r (ρ, u) + y \sum p_{1}^{u} (y) w (F (ρ, u, y))},

J^{'} + w (ρ) = u max {r (ρ, u) + y \sum p_{1}^{u} (y) w (F (ρ, u, y))},

N \to \infty lim sup \frac{1}{N} n = 1 \sum N E^{g} (r (ρ (n), U_{n}))

N \to \infty lim sup \frac{1}{N} n = 1 \sum N E^{g} (r (ρ (n), U_{n}))

\leq

N \to \infty lim sup \frac{1}{N} (E^{g} w (ρ (1)) - E^{g} w (ρ (N + 1))) \leq 0,

N \to \infty lim sup \frac{1}{N} (E^{g} w (ρ (1)) - E^{g} w (ρ (N + 1))) \leq 0,

N \to \infty lim in f \frac{1}{N} (E^{g^{*}} w (ρ (1)) - E^{g^{*}} w (ρ (N + 1))) = 0

N \to \infty lim in f \frac{1}{N} (E^{g^{*}} w (ρ (1)) - E^{g^{*}} w (ρ (N + 1))) = 0

λ_{j}^{i} (u, y) := lo g \frac{p _{i}^{u} ( y )}{p _{j}^{u} ( y )} .

λ_{j}^{i} (u, y) := lo g \frac{p _{i}^{u} ( y )}{p _{j}^{u} ( y )} .

α^{*}

α^{*}

β^{*}

α \in Δ U max j \neq = 1 min u \sum α_{u} D (p_{1}^{u} ∣∣ p_{j}^{u})

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Sequential Experiment Design for Hypothesis Verification

Dhruva Kartik, Ashutosh Nayyar and Urbashi Mitra

D. Kartik, A. Nayyar and U. Mitra are with the Department of Electrical Engineering, University of Southern California, Los Angeles, CA 90089 (e-mail: [email protected]; [email protected]; [email protected]). This research was supported, in part, by National Science Foundation under Grant NSF CNS-1213128, CCF-1410009, CPS-1446901, Grant ONR N00014-15-1-2550, and Grant AFOSR FA9550-12-1-0215.

Abstract

Hypothesis testing is an important problem with applications in target localization, clinical trials etc. Many active hypothesis testing strategies operate in two phases: an exploration phase and a verification phase. In the exploration phase, selection of experiments is such that a moderate level of confidence on the true hypothesis is achieved. Subsequent experiment design aims at improving the confidence level on this hypothesis to the desired level. In this paper, the focus is on the verification phase. A confidence measure is defined and active hypothesis testing is formulated as a confidence maximization problem in an infinite-horizon average-reward Partially Observable Markov Decision Process (POMDP) setting. The problem of maximizing confidence conditioned on a particular hypothesis is referred to as the hypothesis verification problem. The relationship between hypothesis testing and verification problems is established. The verification problem can be formulated as a Markov Decision Process (MDP). Optimal solutions for the verification MDP are characterized and a simple heuristic adaptive strategy for verification is proposed based on a zero-sum game interpretation of Kullback-Leibler divergences. It is demonstrated through numerical experiments that the heuristic performs better in some scenarios compared to existing methods in literature.

I Introduction

Hypothesis testing is a classical problem and has been addressed in various settings. The problem can be described qualitatively as follows. An agent is interested in a phenomenon, and wants to test if the phenomenon conforms to any one of the hypotheses from a known class. The agent can perform various experiments and based on the observations from these experiments, it needs to infer the true hypothesis. As opposed to the one-shot hypothesis testing problem, an active agent can choose which experiment to perform based on the observations made in the past. The agent seeks to select experiments such that all false hypotheses are eliminated as quickly as possible.

Many active hypothesis testing strategies [1, 2] operate in two phases. The first phase is an exploration phase in which the experiment design is such that a moderate level of confidence is achieved on the true hypothesis. In most cases, this phase terminates in finite time almost surely [3]. The second is a verification phase in which the agent has a moderate level of confidence on some hypothesis and experiments are selected such that confidence on this hypothesis is improved to the desired level. When the desired confidence level is very high, the verification cost dominates the performance. In this paper, we make the notions of exploration and verification more formal and focus on analyzing the verification phase.

Active hypothesis testing finds applications in many areas such as sensor selection for target detection and localization, state tracking, design of clinical trials and learning unknown functions from queries [4]. Consequently, the verification phase plays an important role in all these applications.

We consider a slightly different mathematical formulation for hypothesis testing than previously explored [1, 2]. Using posterior belief on the set of hypotheses, we define a confidence level called Bayesian log-likelihood ratio. The objective is to design an experiment selection strategy that maximizes the expected rate of increase in the confidence level. Our contributions in this paper can be summarized as follows:

We formulate the verification problem as an infinite-horizon average-reward Markov Decision Process (MDP) problem. 2. 2.

We characterize the optimal rate using infinite-horizon Dynamic Programming (DP). 3. 3.

We identify a set of critical experiments. We then show that any strategy that selects these experiments while satisfying a stability criterion is asymptotically optimal. 4. 4.

We design a new heuristic experiment selection strategy and numerically show that it achieves better performance compared to existing methods in some scenarios.

The rest of the paper is organized as follows. In Section I-A, we discuss the relation between our problem and those in prior works. Section II formulates the problem. Section III relates the problem to the MDP framework and defines critical experiments. In Section IV, we solve the DP and in Section V, we describe an adaptive strategy and numerically compare it with existing policies. We conclude the paper in Section VI.

I-A Prior Work

The simplest active hypothesis testing problem was first formulated by Chernoff in [3] inspired by Wald’s analysis of the sequential probability ratio test [5]. Thereafter, it has been generalized in different ways depending on the target application [1, 2]. A major difference between our formulation and the formulation in these works is the reward structure. Prior works consider a combination of expected stopping time and Bayesian error probability. Fixed horizon problems have also been considered and they try to minimize the Bayesian error probability or maximal error probability [1]. We define a notion of confidence and maximize the expected rate of increase in confidence over long horizons. In prior formulations, if the agent makes an error in guessing the true hypothesis, it incurs a cost of 1 (or some constant $c$ ) irrespective of its confidence level. Whereas in our formulation, we reward the agent for generating observations that result in a high confidence level on the true hypothesis. We believe that our formulation is related to the stopping time formulation because of the strong similarity in the results. In [6, 3, 1, 2], the authors obtain asymptotically tight performance bounds and design policies that are asymptotically optimal. When the policies in these works are adapted to the verification problem defined herein, they turn out to be open-loop and randomized. A closed loop policy was designed in [7] but this may not always be asymptotically optimal. In this paper, we design a strategy for verification that is more adaptive and conjecture that it is asymptotically optimal.

I-B Notation

Random variables/vectors are denoted by upper case boldface letters, their realization by the corresponding lower case letter. We use calligraphic fonts to denote sets (e.g. $\mathcal{U}$ ) and $\Delta\mathcal{U}$ is the probability simplex over a finite set $\mathcal{U}$ . In general, subscripts are used as time index. There are two exceptions ( ${\rho}_{j}(n),\bm{\mathrm{X}}_{j}(n)$ ) to this convention where the subscript denotes the hypothesis and $n$ denotes time. For time indices $n_{1}\leq n_{2}$ , $\bm{\mathrm{Y}}_{n_{1}:n_{2}}$ is the short hand notation for the variables $(\bm{\mathrm{Y}}_{n_{1}},\bm{\mathrm{Y}}_{n_{1}+1},...,\bm{\mathrm{Y}}_{n_{2}})$ . For a strategy $g$ , we use ${\mathbb{P}}^{g}[\cdot]$ and ${\mathbb{E}}^{g}[\cdot]$ to indicate that the probability and expectation depend on the choice of $g$ . The Shannon entropy of a discrete distribution $p$ over a finite space $\mathcal{Y}$ is given by

[TABLE]

And the Kullback-Leibler divergence between distributions $p$ and $q$ is given by

[TABLE]

II Problem Formulation

Let $\mathcal{H}\subset{\mathbb{N}}$ be a finite set of hypotheses and let $\bm{\mathrm{H}}$ be the true hypothesis. At each time $n\in{\mathbb{N}}$ , the agent can perform an experiment $\bm{\mathrm{U}}_{n}\in\mathcal{U}$ and obtain an observation $\bm{\mathrm{Y}}_{n}\in\mathcal{Y}$ . For simplicity, let us also assume that the sets $\mathcal{U}$ and $\mathcal{Y}$ are finite. When an experiment $u\in\mathcal{U}$ is performed for the $k$ th time, the observation $\bm{\mathrm{Y}}$ obtained is given by

[TABLE]

where $\{\bm{\mathrm{W}}_{k}^{u}:u\in\mathcal{U},k\in{\mathbb{N}}\}$ is a collection of mutually independent and identically distributed primitive random variables. The observation $\bm{\mathrm{Y}}_{n}$ at time $n$ can be expressed as

[TABLE]

The probability of observing $y$ after performing an experiment $u$ under hypothesis $h$ is denoted by $p_{h}^{u}(y)$ .

The information available at time $n$ , denoted by $\bm{\mathrm{I}}_{n}$ , is the collection of all experiments performed and the corresponding observations up to time $n-1$ , i.e.

[TABLE]

Actions of the agent at time $n$ can be functions of $\bm{\mathrm{I}}_{n}$ . Let the policy used for selecting the experiment be $g_{n}$ , i.e.

[TABLE]

The sequence of all the policies $\{g_{n}\}$ is denoted by $g$ which is referred to as a strategy. Let the collection of all such strategies be $\mathcal{G}$ .

Using the available information, the agent forms a posterior belief $\bm{\rho}(n)$ on $\bm{\mathrm{H}}$ at time $n$ which is given by

[TABLE]

Definition II.1 (Bayesian Log-Likelihood Ratio).

The Bayesian log-likelihood ratio $\mathcal{C}_{h}(\bm{\rho})$ associated with an hypothesis $h\in\mathcal{H}$ is defined as

[TABLE]

The Bayesian log-likelihood ratio (BLLR) is the logarithm of the ratio of the probability that hypothesis $h$ is true versus the probability that hypothesis $h$ is not true. BLLR is obtained by applying the logit function (also referred to as log-odds in statistics [8]) on the posterior belief $\rho_{h}$ . The logit function amplifies increments in $\rho_{h}$ when $\rho_{h}$ is close to [math] or $1$ . We can interpret BLLR as a measure of confidence on hypothesis $h$ and thus, we refer to it as confidence level.

The objective is to design an experiment selection strategy $g$ such that the confidence level $\mathcal{C}_{\bm{\mathrm{H}}}$ on the true hypothesis $\bm{\mathrm{H}}$ increases as quickly as possible. In other words, the total reward after acquiring $N$ observations is the average rate of increase in the confidence level on the true hypothesis $\bm{\mathrm{H}}$ and is given by

[TABLE]

More explicitly, we seek to design a strategy $g$ that maximizes the asymptotic expected reward $K(g)$ which is defined as

[TABLE]

Henceforth, we refer to this problem as the Expected Confidence Maximization (ECM) problem for hypothesis testing. For a hypothesis $h$ and a strategy $g\in\mathcal{G}$ , define $J(g,h)$ as

[TABLE]

The value $J(g,h)$ represents the performance of a strategy $g$ conditioned on the hypothesis $h$ . Let

[TABLE]

For a given hypothesis $h$ , we refer to the problem of maximizing $J(g,h)$ as the hypothesis verification problem. Let $g^{*}(h)$ be an optimal verification strategy, i.e. it achieves the supremum in equation (10). We will later show that the existence of an optimal strategy $g^{*}(h)$ is guaranteed under a mild assumption.

II-A Hypothesis Testing vs Hypothesis Verification

The optimal verification cost $J^{*}(h)$ can be used to obtain an upper bound on the expected reward $K(g)$ in the hypothesis testing problem.

Lemma II.1.

For any experiment selection strategy $g\in\mathcal{G}$ , we have

[TABLE]

Proof.

For any strategy $g\in\mathcal{G}$ , we have

[TABLE]

The last inequality follows from the definition of $J^{*}(h)$ . ∎

It is clear from the proof of Lemma II.1 that this upper bound is achieved by employing the strategy $g^{*}(h)$ when hypothesis $h$ is true. However, the agent cannot use different strategies under different hypotheses because it does not know the true hypothesis $\bm{\mathrm{H}}$ . Therefore, we propose an experiment selection strategy of the following form. Similar strategies have also been used in [2].

[TABLE]

where $0.5<\bar{\rho}<1$ is a constant and $g^{e}$ is an exploration strategy. The interpretation of the strategy $\bar{g}$ is that when the agent has a moderate level of confidence on some hypothesis $h$ , it employs the corresponding verification strategy $g^{*}(h)$ . This is to verify if hypothesis $h$ is indeed true by further improving its confidence level. When the agent is not very confident about any particular hypothesis, the agent employs an exploration strategy $g^{e}$ . The primary purpose of the exploration strategy is to ensure that $\rho_{\bm{\mathrm{H}}}$ eventually crosses the threshold $\bar{\rho}$ . A naive exploration strategy is to randomly select every experiment uniformly. Better exploration strategies do exist [2, 7]. It remains to show that a strategy like $\bar{g}$ can indeed achieve the upper bound in Lemma II.1. In this paper, we focus on the hypothesis verification problem. We derive sufficient conditions for an experiment selection strategy to be an optimal verification strategy.

III Markov Decision Process Formulation

In this section, we show that the verification problem can be formulated as an infinite-horizon average-reward MDP problem. All of the following analysis is for $h=1$ and with slight abuse of notation, we henceforth refer to $g^{*}(1)$ and $J(g,1)$ as $g^{*}$ and $J(g)$ , respectively. The same analysis can be repeated for any other $h$ to obtain similar results.

The state of the MDP is the posterior belief $\bm{\rho}(n)$ . The posterior belief is updated using Bayes’ rule. Thus, if $\bm{\mathrm{U}}_{n}=u$ and $\bm{\mathrm{Y}}_{n}=y$ , we have

[TABLE]

For convenience, we denote the Bayes’ update in (14) by

[TABLE]

Since $\bm{\mathrm{H}}=1$ , we have $\bm{\mathrm{Y}}_{n}=\xi(1,\bm{\mathrm{U}}_{n},\bm{\mathrm{W}}_{n})$ . Clearly, the dynamics of this system are Markovian. The expectation of average confidence rate under a strategy ${g}$ is given by

[TABLE]

Instantaneous reward for this MDP is the expected instantaneous increase in the confidence level and is given by

[TABLE]

where $\tilde{\rho}_{j}=\rho_{j}/(1-\rho_{1})$ . Note that $\tilde{\rho}_{j}$ is a probability distribution over the set of alternate hypotheses $\tilde{\mathcal{H}}=\mathcal{H}\setminus\{1\}$ . Also, notice that $r(\bm{\rho},u)$ is a KL-divergence between two distributions and hence, is always non-negative. The objective is to find a strategy $g^{*}$ that maximizes the following average reward

[TABLE]

We use Dynamic Programming (DP) to characterize optimal solutions for this infinite-horizon problem. In this framework, it can be shown that the randomized strategies used in [3, 1, 2] asymptotically achieve optimal rate $J^{*}$ . Additionally, we identify a class of strategies that also achieve optimal rate and possibly, converge faster to the optimal rate than policies used in prior works.

Consider the following fixed point equation for the infinite horizon MDP

[TABLE]

where $J^{\prime}\in{\mathbb{R}}$ is some constant and $w:\Delta{\mathcal{H}}\rightarrow{\mathbb{R}}$ is some mapping. If such $J^{\prime}$ and $w$ exist, then with some algebra (see [9] for details), we can conclude the following for any experiment selection strategy $g$ (possibly non-stationary)

[TABLE]

If we can show that

[TABLE]

for every strategy $g$ , then clearly the optimal rate $J^{*}\leq J^{\prime}$ . Additionally, if for some strategy $g^{*}$ ,

[TABLE]

is satisfied and the experiment selected by $g^{*}$ is a maximizer in the fixed point equation (21), then $g^{*}$ is indeed an optimal strategy and $J^{*}=J^{\prime}$ [9]. Our objective now is to find $J^{\prime}$ and a function $w$ that satisfy these conditions. We make the following assumption on the conditional distributions $p_{h}^{u}(y)$ .

Assumption 1.

There exists a constant $B>0$ such that $|\lambda_{j}^{i}(u,y)|<B$ for every experiment $u$ , observation $y$ and hypotheses $i,j\in\mathcal{H}$ , where

[TABLE]

We use the following defined quantities throughout our proofs. Let

[TABLE]

Since the sets $\mathcal{U}$ and $\mathcal{H}$ are finite, existence of $\bm{\alpha}^{*}$ and $\bm{\beta}^{*}$ is guaranteed and also, by minimax theorem [10]

[TABLE]

We refer to the elements in the support of $\bm{\beta}^{*}$ as critical hypotheses and those in the support of $\bm{\alpha}^{*}$ as critical experiments. In particular, we show that the optimal rate $J^{*}=R^{*}$ .

IV Dynamic Programming Solution

In this section, we solve the MDP formulated in Section III. Lemma IV.1 identifies a solution for the fixed point equation (21) and the subsequent Corollary IV.1 is used to obtain an upper bound on $J^{*}$ . We then show that this upper bound can indeed be achieved.

Lemma IV.1.

The fixed point equation (21) is satisfied with $J^{\prime}=R^{*}$ and

[TABLE]

Also, any critical experiment is a maximizer in the fixed point equation (21).

Proof.

Define $v(\bm{\rho}):=w(\bm{\rho})+\mathcal{C}_{1}(\bm{\rho})$ , that is

[TABLE]

Therefore, we have for every $u$

[TABLE]

This is because $r(\bm{\rho},u)$ equal to the expected increase in the confidence level $\mathcal{C}_{1}(\bm{\rho})$ after performing the experiment $u$ . Hence,

[TABLE]

The last equality follows from the fact that $\bm{\beta}^{*}$ is a solution for the minimax problem and the minimax value is equal to $R^{*}$ . Therefore, $J^{\prime}$ and $w$ satisfy the fixed point equation (21). Note that any critical experiment $u$ is a maximizer in (37).∎

Corollary IV.1.

For any strategy $g$ , we have

[TABLE]

Proof.

This is simply because $\tilde{\rho}_{j}(N+1)\leq 1$ . ∎

Theorem IV.1.

The optimal average rate $J^{*}\leq R^{*}.$

Proof.

This directly follows from the fact that $w$ defined in Lemma IV.1 satisfies inequality (24) and with $J^{\prime}=R^{*}$ , the fixed point equation (21) is satisfied. ∎

Theorem IV.2.

The optimal average rate $J^{*}=R^{*}$ .

Proof.

It is sufficient to show that there exists a strategy $g^{*}$ that satisfies

[TABLE]

and the strategy $g^{*}$ selects only critical experiments. Let

[TABLE]

where $\bm{\mathrm{X}}_{j}(1)=\log\rho_{j}(1)$ . If $\bm{\mathrm{X}}_{j}(N+1)=x_{j}$ and $\tilde{\rho}_{j}(N+1)=\tilde{\rho}_{j}$ , we have

[TABLE]

Consider an open-loop randomized strategy where at each time, the experiment is selected independently using the distribution $\bm{\alpha}^{*}$ . Clearly, this strategy selects only critical experiments. Under this open-loop strategy, we have for any $j\neq 1$

[TABLE]

Notice that for every critical hypothesis $j$ , $R_{j}=R^{*}$ and for every non-critical alternate hypothesis, $R_{j}>R^{*}$ . This follows from the definition of $\bm{\alpha}^{*}$ . Further, we have

[TABLE]

As $N\to\infty$ , the term $\bm{\mathrm{X}}_{j}(0)/N\to 0$ and we can ignore it. Thus, for every critical hypothesis $j$ ,

[TABLE]

We can ignore the non-critical hypotheses because $\beta_{j}^{*}=0$ for non-critical hypotheses. If we can show that the second term approaches $-R^{*}$ as $N\to\infty$ , then clearly, the condition (40) is satisfied with equality. Using Strong Law of Large Numbers (SLLN) [11], we can conclude that for every alternate hypothesis $j$ ,

[TABLE]

with probability 1. We can use SLLN because of Assumption 1. Therefore,

[TABLE]

Further, because of Assumption 1, $\bm{\mathrm{X}}_{j}(N+1)/N$ is uniformly bounded by $B$ for every alternate hypothesis $j$ . Thus, using bounded convergence theorem [11], we have

[TABLE]

For the log sum exponential function, we have the following

[TABLE]

Therefore,

[TABLE]

Thus, the open-loop randomized policy $\bm{\alpha}^{*}$ is asymptotically optimal and $J^{*}=R^{*}$ . ∎

To summarize, the following conditions are sufficient for a stationary verification strategy $g$ to be asymptotically optimal:

The strategy $g$ only selects critical experiments, i.e. experiments from the support of $\bm{\alpha}^{*}$ . 2. 2.

The stability criterion in (40) is satisfied, i.e.

[TABLE]

These conditions suggest that there could be many strategies other than the open-loop randomized strategy used in Theorem IV.2 that achieve asymptotic optimality.

V Numerical Results

In this section, we propose a new heuristic based on a Kullback-Leibler divergence zero-sum game and demonstrate numerically that this heuristic’s performance is close to the maximum achievable confidence rate $R^{*}$ . We first briefly describe all the strategies used in our experiments.

V-1 Extrinsic Jensen-Shannon (EJS) Divergence

Extrinsic Jensen-Shannon divergence as a notion of information was first introduced in [7]. Using our notation, EJS for a query $u$ at some belief state $\bm{\rho}$ is given by

[TABLE]

where

[TABLE]

Notice that the only random variable in the expression above is $\bm{\mathrm{Y}}$ and the expectation is with respect to the distribution $\sum_{h\in\mathcal{H}}\rho_{h}p^{u}_{h}(y)$ on $\mathcal{Y}$ . The EJS heuristic selects the experiment $u$ that maximizes $EJS(\bm{\rho},u)$ for a given state $\bm{\rho}$ .

V-2 Open Loop Verification (OPE)

As discussed earlier, the strategies in [2, 1, 3] when specialized to verification are open-loop and randomized. According to this strategy, the queries are randomly selected independently in an open-loop manner from the distribution $\bm{\alpha}^{*}$ . Recall that this strategy is asymptotically optimal as shown in Theorem IV.2.

V-3 KL-divergence Zero-sum Game (KLZ)

We design the following heuristic. Consider a zero-sum game [10] in which the first player (maximizing) selects an experiment $u\in\mathcal{U}$ and the second player (minimizing) selects an alternate hypothesis $j\in\tilde{\mathcal{H}}$ . The payoff for this zero-sum game is the KL-divergence $D(p_{1}^{u}||p_{j}^{u})$ . The agent picks an experiment $u$ that maximizes

[TABLE]

This strategy can be interpreted as the first player’s best-response when the second player uses the mixed strategy $\tilde{{\rho}}_{j}$ to select an alternate hypothesis. Note that the mixed strategy $\bm{\alpha}^{*}$ used in OPE is an equilibrium strategy for the maximizing player.

V-A Simulation Setup

To simulate these heuristics, we first consider a simple setup with three hypotheses and two queries. The conditional distributions $p_{i}^{u}(y)$ for each of these queries are illustrated in Figure 3.

The queries are designed such that when $\bm{\mathrm{H}}=h_{0}$ , the agent is forced to make both queries $u^{1}$ and $u^{2}$ . This is because hypotheses $h_{0}$ and $h_{2}$ are indistinguishable under query $u^{1}$ and similarly, hypotheses $h_{0}$ and $h_{1}$ are indistinguishable under query $u^{2}$ . We illustrate the evolution of expected confidence rate $J_{N}$ under hypothesis $h_{0}$ in Figure 4. The heuristics EJS and KLZ come very close to the maximum achievable rate. OPE eventually achieves maximal rate but very slowly.

In the second experimental setup, we include two additional queries $u^{3}$ and $u^{4}$ characterized by the distributions in Figure 5. When $\bm{\mathrm{H}}=h_{0}$ the queries $u^{3}$ and $u^{4}$ together can eliminate at a much faster rate than $u^{1}$ and $u^{2}$ . Intuitively, this is because when the agent performs $u^{3}$ and observes $y=1$ , the belief on $h_{1}$ decreases drastically because $y=1$ is extremely unlikely under hypothesis $h_{1}$ . Similarly, $u^{4}$ is very effective in eliminating $h_{2}$ . The evolution of expected confidence rate under hypothesis $h_{0}$ with additional experiments $u^{3}$ and $u^{4}$ is shown in Figure 6. The heuristics KLZ and OPE select queries $u^{3}$ and $u^{4}$ under hypothesis $h_{0}$ . But the greedy heuristic EJS usually selects only $u^{1}$ and $u^{2}$ and fails to realize that queries $u^{3}$ and $u^{4}$ are more effective under hypothesis $h_{0}$ . The greedy EJS approach fails because queries $u^{3}$ and $u^{4}$ are constructed in such way that they are optimal over longer horizons but are sub-optimal over shorter horizons. Thus the assumption required for asymptotic optimality of EJS in [7] does not hold in this setup.

V-B Stopping Time Formulation

In [3, 1, 12], a stopping time formulation for hypothesis testing is considered. The sampling process stops when the belief on some hypothesis exceeds a threshold or equivalently, when the confidence $\mathcal{C}_{h}(\bm{\rho})>\log L$ , where $L$ is a parameter. Let this stopping time be $\bm{\mathrm{N}}$ . Under this stopping criterion, we numerically study the expected stopping time for all the strategies discussed. The plots in Figures 7 and 8 depict the quantity ${\mathbb{E}}[\bm{\mathrm{N}}]/\log L$ as a function of the parameter $L$ . Numerical results suggest that our heuristic performs better even in the stopping time formulation.

VI Conclusion

In this paper, we formulate the problem of quickly verifying a given hypothesis using observations from experiments as an infinite horizon average cost MDP. We characterize the optimal rate of this MDP using infinite horizon dynamic programming. A stability criterion arises out of the DP equations. We show that any strategy that satisfies this stability criterion while selecting experiments from a critical set is asymptotically optimal. We proposed a heuristic adaptive strategy and numerically demonstrated that it performs better than open-loop policies in the non-asymptotic regime. For future work, we intend to use this stability criterion, perhaps with additional penalty terms, to design strategies with better non-asymptotic performance.

Bibliography12

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Sirin Nitinawarat, George K Atia, and Venugopal V Veeravalli, “Controlled sensing for multihypothesis testing,” IEEE Transactions on Automatic Control , vol. 58, no. 10, pp. 2451–2464, 2013.
2[2] Mohammad Naghshvar, Tara Javidi, et al., “Active sequential hypothesis testing,” The Annals of Statistics , vol. 41, no. 6, pp. 2703–2738, 2013.
3[3] Herman Chernoff, “Sequential design of experiments,” The Annals of Mathematical Statistics , vol. 30, no. 3, pp. 755–770, 1959.
4[4] Mohammad Naghshvar, Tara Javidi, and Kamalika Chaudhuri, “Bayesian active learning with non-persistent noise,” IEEE Transactions on Information Theory , vol. 61, no. 7, pp. 4080–4098, 2015.
5[5] Abraham Wald, Sequential analysis , Courier Corporation, 1973.
6[6] Stuart Alan Bessler, Theory and applications of the sequential design of experiments, k-actions and infinitely many experiments , Department of Statistics, Stanford University., 1960.
7[7] Mohammad Naghshvar and Tara Javidi, “Extrinsic jensen-shannon divergence with application in active hypothesis testing,” in Information Theory Proceedings (ISIT), 2012 IEEE International Symposium on . IEEE, 2012, pp. 2191–2195.
8[8] David W Hosmer Jr, Stanley Lemeshow, and Rodney X Sturdivant, Applied logistic regression , vol. 398, John Wiley & Sons, 2013.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Sequential Experiment Design for Hypothesis Verification

Abstract

I Introduction

I-A Prior Work

I-B Notation

II Problem Formulation

Definition II.1** (Bayesian Log-Likelihood Ratio).**

II-A Hypothesis Testing vs Hypothesis Verification

Lemma II.1**.**

Proof.

III Markov Decision Process Formulation

Assumption 1**.**

IV Dynamic Programming Solution

Lemma IV.1**.**

Proof.

Corollary IV.1**.**

Proof.

Theorem IV.1**.**

Proof.

Theorem IV.2**.**

Proof.

V Numerical Results

V-1 Extrinsic Jensen-Shannon (EJS) Divergence

V-2 Open Loop Verification (OPE)

V-3 KL-divergence Zero-sum Game (KLZ)

V-A Simulation Setup

V-B Stopping Time Formulation

VI Conclusion

Definition II.1 (Bayesian Log-Likelihood Ratio).

Lemma II.1.

Assumption 1.

Lemma IV.1.

Corollary IV.1.

Theorem IV.1.

Theorem IV.2.