Sequential Experiment Design for Hypothesis Verification
Dhruva Kartik, Ashutosh Nayyar, Urbashi Mitra

TL;DR
This paper formulates the hypothesis verification problem as a POMDP and proposes a heuristic strategy that improves confidence levels in hypothesis testing, with demonstrated advantages over existing methods.
Contribution
It introduces a novel POMDP-based framework for hypothesis verification and proposes a simple heuristic strategy with a game-theoretic interpretation.
Findings
Heuristic strategy outperforms some existing methods in numerical experiments.
Verification problem formulated as a Markov Decision Process.
Relationship between hypothesis testing and verification established.
Abstract
Hypothesis testing is an important problem with applications in target localization, clinical trials etc. Many active hypothesis testing strategies operate in two phases: an exploration phase and a verification phase. In the exploration phase, selection of experiments is such that a moderate level of confidence on the true hypothesis is achieved. Subsequent experiment design aims at improving the confidence level on this hypothesis to the desired level. In this paper, the focus is on the verification phase. A confidence measure is defined and active hypothesis testing is formulated as a confidence maximization problem in an infinite-horizon average-reward Partially Observable Markov Decision Process (POMDP) setting. The problem of maximizing confidence conditioned on a particular hypothesis is referred to as the hypothesis verification problem. The relationship between hypothesis…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Sequential Experiment Design for Hypothesis Verification
Dhruva Kartik, Ashutosh Nayyar and Urbashi Mitra
D. Kartik, A. Nayyar and U. Mitra are with the Department of Electrical Engineering, University of Southern California, Los Angeles, CA 90089 (e-mail: [email protected]; [email protected]; [email protected]). This research was supported, in part, by National Science Foundation under Grant NSF CNS-1213128, CCF-1410009, CPS-1446901, Grant ONR N00014-15-1-2550, and Grant AFOSR FA9550-12-1-0215.
Abstract
Hypothesis testing is an important problem with applications in target localization, clinical trials etc. Many active hypothesis testing strategies operate in two phases: an exploration phase and a verification phase. In the exploration phase, selection of experiments is such that a moderate level of confidence on the true hypothesis is achieved. Subsequent experiment design aims at improving the confidence level on this hypothesis to the desired level. In this paper, the focus is on the verification phase. A confidence measure is defined and active hypothesis testing is formulated as a confidence maximization problem in an infinite-horizon average-reward Partially Observable Markov Decision Process (POMDP) setting. The problem of maximizing confidence conditioned on a particular hypothesis is referred to as the hypothesis verification problem. The relationship between hypothesis testing and verification problems is established. The verification problem can be formulated as a Markov Decision Process (MDP). Optimal solutions for the verification MDP are characterized and a simple heuristic adaptive strategy for verification is proposed based on a zero-sum game interpretation of Kullback-Leibler divergences. It is demonstrated through numerical experiments that the heuristic performs better in some scenarios compared to existing methods in literature.
I Introduction
Hypothesis testing is a classical problem and has been addressed in various settings. The problem can be described qualitatively as follows. An agent is interested in a phenomenon, and wants to test if the phenomenon conforms to any one of the hypotheses from a known class. The agent can perform various experiments and based on the observations from these experiments, it needs to infer the true hypothesis. As opposed to the one-shot hypothesis testing problem, an active agent can choose which experiment to perform based on the observations made in the past. The agent seeks to select experiments such that all false hypotheses are eliminated as quickly as possible.
Many active hypothesis testing strategies [1, 2] operate in two phases. The first phase is an exploration phase in which the experiment design is such that a moderate level of confidence is achieved on the true hypothesis. In most cases, this phase terminates in finite time almost surely [3]. The second is a verification phase in which the agent has a moderate level of confidence on some hypothesis and experiments are selected such that confidence on this hypothesis is improved to the desired level. When the desired confidence level is very high, the verification cost dominates the performance. In this paper, we make the notions of exploration and verification more formal and focus on analyzing the verification phase.
Active hypothesis testing finds applications in many areas such as sensor selection for target detection and localization, state tracking, design of clinical trials and learning unknown functions from queries [4]. Consequently, the verification phase plays an important role in all these applications.
We consider a slightly different mathematical formulation for hypothesis testing than previously explored [1, 2]. Using posterior belief on the set of hypotheses, we define a confidence level called Bayesian log-likelihood ratio. The objective is to design an experiment selection strategy that maximizes the expected rate of increase in the confidence level. Our contributions in this paper can be summarized as follows:
We formulate the verification problem as an infinite-horizon average-reward Markov Decision Process (MDP) problem. 2. 2.
We characterize the optimal rate using infinite-horizon Dynamic Programming (DP). 3. 3.
We identify a set of critical experiments. We then show that any strategy that selects these experiments while satisfying a stability criterion is asymptotically optimal. 4. 4.
We design a new heuristic experiment selection strategy and numerically show that it achieves better performance compared to existing methods in some scenarios.
The rest of the paper is organized as follows. In Section I-A, we discuss the relation between our problem and those in prior works. Section II formulates the problem. Section III relates the problem to the MDP framework and defines critical experiments. In Section IV, we solve the DP and in Section V, we describe an adaptive strategy and numerically compare it with existing policies. We conclude the paper in Section VI.
I-A Prior Work
The simplest active hypothesis testing problem was first formulated by Chernoff in [3] inspired by Wald’s analysis of the sequential probability ratio test [5]. Thereafter, it has been generalized in different ways depending on the target application [1, 2]. A major difference between our formulation and the formulation in these works is the reward structure. Prior works consider a combination of expected stopping time and Bayesian error probability. Fixed horizon problems have also been considered and they try to minimize the Bayesian error probability or maximal error probability [1]. We define a notion of confidence and maximize the expected rate of increase in confidence over long horizons. In prior formulations, if the agent makes an error in guessing the true hypothesis, it incurs a cost of 1 (or some constant ) irrespective of its confidence level. Whereas in our formulation, we reward the agent for generating observations that result in a high confidence level on the true hypothesis. We believe that our formulation is related to the stopping time formulation because of the strong similarity in the results. In [6, 3, 1, 2], the authors obtain asymptotically tight performance bounds and design policies that are asymptotically optimal. When the policies in these works are adapted to the verification problem defined herein, they turn out to be open-loop and randomized. A closed loop policy was designed in [7] but this may not always be asymptotically optimal. In this paper, we design a strategy for verification that is more adaptive and conjecture that it is asymptotically optimal.
I-B Notation
Random variables/vectors are denoted by upper case boldface letters, their realization by the corresponding lower case letter. We use calligraphic fonts to denote sets (e.g. ) and is the probability simplex over a finite set . In general, subscripts are used as time index. There are two exceptions () to this convention where the subscript denotes the hypothesis and denotes time. For time indices , is the short hand notation for the variables . For a strategy , we use and to indicate that the probability and expectation depend on the choice of . The Shannon entropy of a discrete distribution over a finite space is given by
[TABLE]
And the Kullback-Leibler divergence between distributions and is given by
[TABLE]
II Problem Formulation
Let be a finite set of hypotheses and let be the true hypothesis. At each time , the agent can perform an experiment and obtain an observation . For simplicity, let us also assume that the sets and are finite. When an experiment is performed for the th time, the observation obtained is given by
[TABLE]
where is a collection of mutually independent and identically distributed primitive random variables. The observation at time can be expressed as
[TABLE]
The probability of observing after performing an experiment under hypothesis is denoted by .
The information available at time , denoted by , is the collection of all experiments performed and the corresponding observations up to time , i.e.
[TABLE]
Actions of the agent at time can be functions of . Let the policy used for selecting the experiment be , i.e.
[TABLE]
The sequence of all the policies is denoted by which is referred to as a strategy. Let the collection of all such strategies be .
Using the available information, the agent forms a posterior belief on at time which is given by
[TABLE]
Definition II.1** (Bayesian Log-Likelihood Ratio).**
The Bayesian log-likelihood ratio associated with an hypothesis is defined as
[TABLE]
The Bayesian log-likelihood ratio (BLLR) is the logarithm of the ratio of the probability that hypothesis is true versus the probability that hypothesis is not true. BLLR is obtained by applying the logit function (also referred to as log-odds in statistics [8]) on the posterior belief . The logit function amplifies increments in when is close to [math] or . We can interpret BLLR as a measure of confidence on hypothesis and thus, we refer to it as confidence level.
The objective is to design an experiment selection strategy such that the confidence level on the true hypothesis increases as quickly as possible. In other words, the total reward after acquiring observations is the average rate of increase in the confidence level on the true hypothesis and is given by
[TABLE]
More explicitly, we seek to design a strategy that maximizes the asymptotic expected reward which is defined as
[TABLE]
Henceforth, we refer to this problem as the Expected Confidence Maximization (ECM) problem for hypothesis testing. For a hypothesis and a strategy , define as
[TABLE]
The value represents the performance of a strategy conditioned on the hypothesis . Let
[TABLE]
For a given hypothesis , we refer to the problem of maximizing as the hypothesis verification problem. Let be an optimal verification strategy, i.e. it achieves the supremum in equation (10). We will later show that the existence of an optimal strategy is guaranteed under a mild assumption.
II-A Hypothesis Testing vs Hypothesis Verification
The optimal verification cost can be used to obtain an upper bound on the expected reward in the hypothesis testing problem.
Lemma II.1**.**
For any experiment selection strategy , we have
[TABLE]
Proof.
For any strategy , we have
[TABLE]
The last inequality follows from the definition of . ∎
It is clear from the proof of Lemma II.1 that this upper bound is achieved by employing the strategy when hypothesis is true. However, the agent cannot use different strategies under different hypotheses because it does not know the true hypothesis . Therefore, we propose an experiment selection strategy of the following form. Similar strategies have also been used in [2].
[TABLE]
where is a constant and is an exploration strategy. The interpretation of the strategy is that when the agent has a moderate level of confidence on some hypothesis , it employs the corresponding verification strategy . This is to verify if hypothesis is indeed true by further improving its confidence level. When the agent is not very confident about any particular hypothesis, the agent employs an exploration strategy . The primary purpose of the exploration strategy is to ensure that eventually crosses the threshold . A naive exploration strategy is to randomly select every experiment uniformly. Better exploration strategies do exist [2, 7]. It remains to show that a strategy like can indeed achieve the upper bound in Lemma II.1. In this paper, we focus on the hypothesis verification problem. We derive sufficient conditions for an experiment selection strategy to be an optimal verification strategy.
III Markov Decision Process Formulation
In this section, we show that the verification problem can be formulated as an infinite-horizon average-reward MDP problem. All of the following analysis is for and with slight abuse of notation, we henceforth refer to and as and , respectively. The same analysis can be repeated for any other to obtain similar results.
The state of the MDP is the posterior belief . The posterior belief is updated using Bayes’ rule. Thus, if and , we have
[TABLE]
For convenience, we denote the Bayes’ update in (14) by
[TABLE]
Since , we have . Clearly, the dynamics of this system are Markovian. The expectation of average confidence rate under a strategy is given by
[TABLE]
Instantaneous reward for this MDP is the expected instantaneous increase in the confidence level and is given by
[TABLE]
where . Note that is a probability distribution over the set of alternate hypotheses . Also, notice that is a KL-divergence between two distributions and hence, is always non-negative. The objective is to find a strategy that maximizes the following average reward
[TABLE]
We use Dynamic Programming (DP) to characterize optimal solutions for this infinite-horizon problem. In this framework, it can be shown that the randomized strategies used in [3, 1, 2] asymptotically achieve optimal rate . Additionally, we identify a class of strategies that also achieve optimal rate and possibly, converge faster to the optimal rate than policies used in prior works.
Consider the following fixed point equation for the infinite horizon MDP
[TABLE]
where is some constant and is some mapping. If such and exist, then with some algebra (see [9] for details), we can conclude the following for any experiment selection strategy (possibly non-stationary)
[TABLE]
If we can show that
[TABLE]
for every strategy , then clearly the optimal rate . Additionally, if for some strategy ,
[TABLE]
is satisfied and the experiment selected by is a maximizer in the fixed point equation (21), then is indeed an optimal strategy and [9]. Our objective now is to find and a function that satisfy these conditions. We make the following assumption on the conditional distributions .
Assumption 1**.**
There exists a constant such that for every experiment , observation and hypotheses , where
[TABLE]
We use the following defined quantities throughout our proofs. Let
[TABLE]
Since the sets and are finite, existence of and is guaranteed and also, by minimax theorem [10]
[TABLE]
We refer to the elements in the support of as critical hypotheses and those in the support of as critical experiments. In particular, we show that the optimal rate .
IV Dynamic Programming Solution
In this section, we solve the MDP formulated in Section III. Lemma IV.1 identifies a solution for the fixed point equation (21) and the subsequent Corollary IV.1 is used to obtain an upper bound on . We then show that this upper bound can indeed be achieved.
Lemma IV.1**.**
The fixed point equation (21) is satisfied with and
[TABLE]
Also, any critical experiment is a maximizer in the fixed point equation (21).
Proof.
Define , that is
[TABLE]
Therefore, we have for every
[TABLE]
This is because equal to the expected increase in the confidence level after performing the experiment . Hence,
[TABLE]
The last equality follows from the fact that is a solution for the minimax problem and the minimax value is equal to . Therefore, and satisfy the fixed point equation (21). Note that any critical experiment is a maximizer in (37).∎
Corollary IV.1**.**
For any strategy , we have
[TABLE]
Proof.
This is simply because . ∎
Theorem IV.1**.**
The optimal average rate
Proof.
This directly follows from the fact that defined in Lemma IV.1 satisfies inequality (24) and with , the fixed point equation (21) is satisfied. ∎
Theorem IV.2**.**
The optimal average rate .
Proof.
It is sufficient to show that there exists a strategy that satisfies
[TABLE]
and the strategy selects only critical experiments. Let
[TABLE]
where . If and , we have
[TABLE]
Consider an open-loop randomized strategy where at each time, the experiment is selected independently using the distribution . Clearly, this strategy selects only critical experiments. Under this open-loop strategy, we have for any
[TABLE]
Notice that for every critical hypothesis , and for every non-critical alternate hypothesis, . This follows from the definition of . Further, we have
[TABLE]
As , the term and we can ignore it. Thus, for every critical hypothesis ,
[TABLE]
We can ignore the non-critical hypotheses because for non-critical hypotheses. If we can show that the second term approaches as , then clearly, the condition (40) is satisfied with equality. Using Strong Law of Large Numbers (SLLN) [11], we can conclude that for every alternate hypothesis ,
[TABLE]
with probability 1. We can use SLLN because of Assumption 1. Therefore,
[TABLE]
Further, because of Assumption 1, is uniformly bounded by for every alternate hypothesis . Thus, using bounded convergence theorem [11], we have
[TABLE]
For the log sum exponential function, we have the following
[TABLE]
Therefore,
[TABLE]
Thus, the open-loop randomized policy is asymptotically optimal and . ∎
To summarize, the following conditions are sufficient for a stationary verification strategy to be asymptotically optimal:
The strategy only selects critical experiments, i.e. experiments from the support of . 2. 2.
The stability criterion in (40) is satisfied, i.e.
[TABLE]
These conditions suggest that there could be many strategies other than the open-loop randomized strategy used in Theorem IV.2 that achieve asymptotic optimality.
V Numerical Results
In this section, we propose a new heuristic based on a Kullback-Leibler divergence zero-sum game and demonstrate numerically that this heuristic’s performance is close to the maximum achievable confidence rate . We first briefly describe all the strategies used in our experiments.
V-1 Extrinsic Jensen-Shannon (EJS) Divergence
Extrinsic Jensen-Shannon divergence as a notion of information was first introduced in [7]. Using our notation, EJS for a query at some belief state is given by
[TABLE]
where
[TABLE]
Notice that the only random variable in the expression above is and the expectation is with respect to the distribution on . The EJS heuristic selects the experiment that maximizes for a given state .
V-2 Open Loop Verification (OPE)
As discussed earlier, the strategies in [2, 1, 3] when specialized to verification are open-loop and randomized. According to this strategy, the queries are randomly selected independently in an open-loop manner from the distribution . Recall that this strategy is asymptotically optimal as shown in Theorem IV.2.
V-3 KL-divergence Zero-sum Game (KLZ)
We design the following heuristic. Consider a zero-sum game [10] in which the first player (maximizing) selects an experiment and the second player (minimizing) selects an alternate hypothesis . The payoff for this zero-sum game is the KL-divergence . The agent picks an experiment that maximizes
[TABLE]
This strategy can be interpreted as the first player’s best-response when the second player uses the mixed strategy to select an alternate hypothesis. Note that the mixed strategy used in OPE is an equilibrium strategy for the maximizing player.
V-A Simulation Setup
To simulate these heuristics, we first consider a simple setup with three hypotheses and two queries. The conditional distributions for each of these queries are illustrated in Figure 3.
The queries are designed such that when , the agent is forced to make both queries and . This is because hypotheses and are indistinguishable under query and similarly, hypotheses and are indistinguishable under query . We illustrate the evolution of expected confidence rate under hypothesis in Figure 4. The heuristics EJS and KLZ come very close to the maximum achievable rate. OPE eventually achieves maximal rate but very slowly.
In the second experimental setup, we include two additional queries and characterized by the distributions in Figure 5. When the queries and together can eliminate at a much faster rate than and . Intuitively, this is because when the agent performs and observes , the belief on decreases drastically because is extremely unlikely under hypothesis . Similarly, is very effective in eliminating . The evolution of expected confidence rate under hypothesis with additional experiments and is shown in Figure 6. The heuristics KLZ and OPE select queries and under hypothesis . But the greedy heuristic EJS usually selects only and and fails to realize that queries and are more effective under hypothesis . The greedy EJS approach fails because queries and are constructed in such way that they are optimal over longer horizons but are sub-optimal over shorter horizons. Thus the assumption required for asymptotic optimality of EJS in [7] does not hold in this setup.
V-B Stopping Time Formulation
In [3, 1, 12], a stopping time formulation for hypothesis testing is considered. The sampling process stops when the belief on some hypothesis exceeds a threshold or equivalently, when the confidence , where is a parameter. Let this stopping time be . Under this stopping criterion, we numerically study the expected stopping time for all the strategies discussed. The plots in Figures 7 and 8 depict the quantity as a function of the parameter . Numerical results suggest that our heuristic performs better even in the stopping time formulation.
VI Conclusion
In this paper, we formulate the problem of quickly verifying a given hypothesis using observations from experiments as an infinite horizon average cost MDP. We characterize the optimal rate of this MDP using infinite horizon dynamic programming. A stability criterion arises out of the DP equations. We show that any strategy that satisfies this stability criterion while selecting experiments from a critical set is asymptotically optimal. We proposed a heuristic adaptive strategy and numerically demonstrated that it performs better than open-loop policies in the non-asymptotic regime. For future work, we intend to use this stability criterion, perhaps with additional penalty terms, to design strategies with better non-asymptotic performance.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Sirin Nitinawarat, George K Atia, and Venugopal V Veeravalli, “Controlled sensing for multihypothesis testing,” IEEE Transactions on Automatic Control , vol. 58, no. 10, pp. 2451–2464, 2013.
- 2[2] Mohammad Naghshvar, Tara Javidi, et al., “Active sequential hypothesis testing,” The Annals of Statistics , vol. 41, no. 6, pp. 2703–2738, 2013.
- 3[3] Herman Chernoff, “Sequential design of experiments,” The Annals of Mathematical Statistics , vol. 30, no. 3, pp. 755–770, 1959.
- 4[4] Mohammad Naghshvar, Tara Javidi, and Kamalika Chaudhuri, “Bayesian active learning with non-persistent noise,” IEEE Transactions on Information Theory , vol. 61, no. 7, pp. 4080–4098, 2015.
- 5[5] Abraham Wald, Sequential analysis , Courier Corporation, 1973.
- 6[6] Stuart Alan Bessler, Theory and applications of the sequential design of experiments, k-actions and infinitely many experiments , Department of Statistics, Stanford University., 1960.
- 7[7] Mohammad Naghshvar and Tara Javidi, “Extrinsic jensen-shannon divergence with application in active hypothesis testing,” in Information Theory Proceedings (ISIT), 2012 IEEE International Symposium on . IEEE, 2012, pp. 2191–2195.
- 8[8] David W Hosmer Jr, Stanley Lemeshow, and Rodney X Sturdivant, Applied logistic regression , vol. 398, John Wiley & Sons, 2013.
