Rates of adaptive group testing in the linear regime
Matthew Aldridge

TL;DR
This paper analyzes an adaptive group testing algorithm in the linear regime, achieving high information rates for identifying defective items with fewer tests, especially when defectives are less than half of the total.
Contribution
It provides a detailed analysis of a generalized binary splitting algorithm, demonstrating near-optimal testing rates in the linear regime for both zero-error and small-error scenarios.
Findings
Achieves over 0.9 bits/test for zero-error testing
Achieves over 0.95 bits/test for small-error testing
Effective when fewer than half the items are defective
Abstract
We consider adaptive group testing in the linear regime, where the number of defective items scales linearly with the number of items. We analyse an algorithm based on generalized binary splitting. Provided fewer than half the items are defective, we achieve rates of over 0.9 bits per test for combinatorial zero-error testing, and over 0.95 bits per test for probabilistic small-error testing.
Click any figure to enlarge with its caption.
Figure 1
Figure 2Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Rates of Adaptive Group Testing
in the Linear Regime
Matthew Aldridge
School of Mathematics
University of Leeds
Email: [email protected]
Abstract
We consider adaptive group testing in the linear regime, where the number of defective items scales linearly with the number of items. We analyse an algorithm based on generalized binary splitting. Provided fewer than half the items are defective, we achieve rates of over 0.9 bits per test for combinatorial zero-error testing, and over 0.95 bits per test for probabilistic small-error testing.
I Introduction
Group testing is this problem: Given a collection of items some of which are defective, how many pooled tests are required to recover the defective set? A pooled test is performed on some subset of the items: the test is negative if all items in the test are nondefective, and is positive if at least one item in the test is defective.
In Dorfman’s original work [1], the application was to test men enlisting into the U.S. army for syphilis using a blood test. Dorfman noted that testing pools of mixed blood samples could use fewer tests than testing each blood sample individually. The test result from such a pool should be negative if every blood sample in the pool is free of the disease, while the test result should be positive if at least one of the blood samples is contaminated.
Different group testing models are discussed in the recent surey [2]. The most important distinction between is between:
- •
Adaptive testing, where the items placed in a test can depend on the results of previous tests.
- •
Nonadaptive testing, where all the tests are designed in advance.
This paper concerns adaptive testing, and will examine some cases where adaptive group testing provides large improvements over the nonadaptive case.
Another consideration is how many defective items there are. In this paper, we consider the linear regime, where the number of defective items is a constant proportion of the items. A lot of group testing work has concerned the very sparse regime where is constant as [3, 4, 5] or the sparse regime as for some [6, 7, 8]. However, we argue that the linear regime is more appropriate for many applications. For example, in Dorfman’s original set-up, we might expect each person joining the army to have a similar prior probability of having the disease, and that this probability should remain roughly constant as more people join, rather than tending towards [math]; thus one expects to grow linearly with .
For group testing in the linear regime, two cases have received most consideration in the literature:
- •
Combinatorial zero-error testing: The defective set is any subset of with given size , and one wishes to find the defective set with certainty, whichever such set it is. One assumes that tends to a constant as . [9, 10, 11]
- •
Probabilistic small-error testing: We assume each item is defective with probability , independent of all other items, where stays fixed as . We want to find the defective set with arbitrarily small error probability (averaged over the random defective set). [12, 13, 14, 15]
For group testing in the linear regime, it is easy to see that the optimal scaling is the number of tests scaling linearly with . A simple counting bound (see, for example, [6]) shows that we require for large enough , where is the binary entropy. Meanwhile, testing each item individually requires tests, and succeeds with certainty. (In the combinatorial case, suffices, as the status of the final item can be inferred from whether or defective items have been already discovered from individual tests.) The goal of this paper is to analyse algorithms that require a number of tests very close to the lower bound .
In the sparse regime for , it is known that adaptive testing achieves the counting bound, for both small-error and zero-error criteria, using the generalised binary splitting algorithm of Hwang [16, 17, 6].
For nonadaptive testing in the linear regime, it is well known that individual testing is optimal for all in the combinatorial zero-error case [3, 18, 11, 17], and it was recently shown that this is also the case for probabilistic small-error testing too [15]. Thus, for small , the benefit provided by the adaptive algorithms of this paper will be considerable.
Adaptive group testing in the linear regime has received some attention in the literature. The main point of study has been the question of when individual testing is optimal or not. In the combinatorial zero-error case, Riccio and Colburn [10] showed that individual testing cannot be improved on for , and it is conjectured that this holds for [9]. In the probabilistic case, if one considers the average number of tests required, Ungar [12] showed that individual testing cannot be improved on for
[TABLE]
In the linear regime, we are not aware of any work that has aimed to get a number of tests close to optimal over the whole range of , as we do here. (Zaman and Pippenger [19] do consider this in the limit as .) Another novelty of ours is that we analyse small-error behaviour, not just average-case behaviour, which allows a direct comparison to nonadaptive results.
The goal of this paper is to achieve performance that is close to optimal for adaptive testing under both the zero-error and small-error criteria. We do this using an algorithm similar to that of Hwang [16] (see Algorithm 5), and examining both its worst-case and average-case behaviour. Recall that the counting bound tells us we require at least tests. Our main results are the following, which show very close to optimal performance:
- •
In the zero-error case, we give an algorithm that uses tests for all . (Theorem 2)
- •
In the small-error case, we give an algorithm that uses tests for all . (Theorem 3)
II Definitions and main results
We propose two figures of merit for assessing group testing in the linear regime.
First we have the aspect ratio (as considered by, for example, [14]). We want the aspect ratio to be as small as possible. Individual testing achieves , while the counting bound tells us that we must have .
Second, we have the rate , where is the entropy of the defective set (as considered by [6, 7, 20] and many others). Since is the number of bits required to define the defective set, we can think of the rate as the average number of bits of information learned per test. For combinatorial testing in the linear regime we have
[TABLE]
asymptotically, while for probabilistic testing exactly. Hence we can define the rate to be
[TABLE]
We want the rate to be as big as possible. Individual testing achieves , while the counting bound tells us that we must have .
As a rule of thumb, we recommend the aspect ratio for measuring how much better an algorithm is than individual testing, and recommend the rate for measuring how close an algorithm is to the counting bound or comparing with results from the sparse regime.
Definition 1
We say that an aspect ratio is zero-error achievable if there is an algorithm with aspect ratio and error probability [math] for sufficiently large. We say that is average-case achievable if there is an algorithm with average-case aspect ratio and error probability [math] for sufficiently large. We say that is small-error achievable if, for any , there exists an algorithm with aspect ratio and average error probability less than for sufficiently large.
The equivalent definitions hold for achievable rates, mutatis mutandis.
We now state our two main results. We write for the greatest integer less than or equal to , and for the greatest power of less than or equal to ; so and . We write .
Theorem 2
Consider nonadaptive group testing in the linear regime. Using Algorithm 5, all aspect ratios up to
[TABLE]
and rates up to are zero-error achievable, where
[TABLE]
Theorem 3
Consider nonadaptive group testing in the linear regime. Using Algorithm 5, all aspect ratios up to
[TABLE]
and rates up to are small-error achievable, where
[TABLE]
These aspect ratios and rates are illustrated in Fig. 1. Note that, for zero-error, we have for all , and for small-error, for all . The ‘bumpy’ behaviour occurs from when the optimal value of switches to the next integer or power of .
III Algorithm
Our algorithm is based on the idea of binary splitting. Binary splitting was first introduced for group testing by Sobel and Groll [21], and our algorithms here are inspired by Hwang’s generalized binary splitting [16].
Binary splitting is particularly simple when the size of the set is known to be a power of .
Algorithm 4
Let be a set of items known to contain at least one defective item. Suppose where is a power of .
If , then that item is defective. Stop. 2. 2.
Otherwise, let consist of the first items of . Test .
- (a)
If the test is positive: Set , and return to step 1. 2. (b)
If the test is negative: All items in are nondefective. Set , and return to step 1.
Binary splitting where is a power of will suffice to prove the most important claims of this paper, of rates above and for zero- and small-error respectively. However, in the small-error case, for some it will be possible to slightly improve the rate by allowing to be any integer. We postpone discussion of this until Section V-B.
We now explain our main algorithm.
Algorithm 5
Let be the set of items. We fix an integer parameter .
If , test the items individually, then halt. 2. 2.
Otherwise, remove the first items from , and call these items . Test .
- (a)
If the test is negative: All items in are nondefective. Return to step 1. 2. (b)
If the test is positive: Perform binary splitting on (using Algorithm 4 is is a power of and Algorithm 6 otherwise). This will discover defective item and between [math] and nondefective items. Return the remaining items whose statuses are not discovered to . Return to step 1.
Since we will choose independently of and consider asymptotics as , the small number of individual tests incurred at step 1 (which will happen at the end of the algorithm) will be negligible for our calculations here, so we will ignore them in our analysis.
We note that, in the special case , this algorithm is equivalent to individual testing; while in the special case , we recover an algorithm studied by Fischer, Klasner and Wegenera [22]. We discuss connections with the work of Zaman and Pippenger [19] in Section V-B.
IV Worst-case analysis and zero-error rate
We will use a worst-case analysis of our algorithm to find a zero-error achievable aspect ratio.
Proof:
We perform Algorithm 5 with a power of to be fixed later.
In each pass through step 2 of Algorithm 5, one of two things can happen:
- a)
The set contains no defectives, in which case we discover nondefectives with test. 2. b)
The set contains at least one defective, in which case we discover defective and between [math] and nondefectives with tests.
For the purposes of worst-case analysis, we assume that in the second case, we never get lucky, and only ever find the defective with [math] bonus nondefectives. Thus in our tests we must discover all nondefectives from case 1 and all defectives from case 2. This gives a worst-case number of tests as
[TABLE]
This has an aspect ratio of
[TABLE]
Choosing as in the statement of the theorem gives the result, and this is easily checked to be the optimal choice of . ∎
When , we have individual testing with
[TABLE]
When , we have
[TABLE]
recovering a result of [22]. The case beats individual testing when , recovering a result of Hu, Hwang and Wang [9], also noted in [22].
V Average-case analysis and small-error rate
To get a small-error achievability result, we start with an average-case analysis, and later twin this with a concentration of measure argument.
V-A Powers of algorithm: average-case analysis
We begin with average-case analysis of the simpler case when is a power of .
Again, we look at the outcomes for a pass through step 2.
With probability , all items in the test are negative, and we discover their nondefective statuses with test. 2. 2.
With probability there is at least one defective in the test. Let be the first-numbered defective in the set. We discover defective status of item and the nondefective statuses of items in tests.
The expected number of tests in one pass through step 2 is
[TABLE]
The expected number of items whose status we discover is
[TABLE]
(The sum here has an explicit form since .)
Since the average aspect ratio is the ratio of the average number of tests to the number of items, it seems plausible that . To prove this rigorously, note that the number of tests the algorithm takes on average is, by considering one pass through step 2,
[TABLE]
where is the number of items not considered in the pass, and is the number of items not classified by the pass. This is solved by . Thus
[TABLE]
When optimised over a power of , this achieves rates of over for all .
As before, setting recovers individual testing, and we indeed get . Setting , we have
[TABLE]
We have , therefore outperforming individual testing, when , recovering a result of [12].
V-B General algorithm: average-case analysis
When considering analysis in the average case, the rate for some can be improved by considering to be any integer, not just a power of (see the right-hand side of Fig. 1). We now explain how to perform binary splitting in this general case. We write and , so that for integers and with .
Algorithm 6
We wish to binary split a set of size that contains at least one defective. We use a Huffman tree for the uniform distribution . The th test pool consists of the remaining items that have th bit of their Huffman codeword equal to [math]; if the test is positive, the untested items are removed, while if the test is negative, the tested items are removed. When one item remains, it is defective.
It is a standard result that Huffman coding for the uniform distribution results in items with wordlength and the remaining items with wordlength . It will be convenient for the purposes of a later proof for the items of in label order to be given Huffman codewords that are in lexicographic order, and that the shorter words are given to the first of the items. This means that we discover the status of the first defective item in and all the preceding nondefective items.
It can be verified without too much difficulty that using that is not a power of does not improve the performance of Algorithm 5 in the zero-error case, but for reasons of space we do not give the calculations here.
It appears that Algorithm 5 when used with Algorithm 6 for binary splitting is equivalent to an algorithm studied by Zaman and Pippenger [19]. Their algorithm was defined in terms of optimal prefix-free codes for the geometric and truncated geometric distributions, and they used known results on such codes to prove that this algorithm is optimal among a set of algorithms called ‘nested algorithms’. They also studied the asymptotics of the quantity , which, in our notation, corresponds to where is the average-case achievable aspect ratio. They did not look at the rate for all or consider small-error testing.
We now analyse the average-case number of tests of Algorithm 5 for arbitrary . Again, we look at the outcomes for a pass through step 2 of Algorithm 5.
- a)
With probability , all items in the test are negative, and we discover their nondefective statuses with test. 2. b1)
With probability there is at least one defective in the first items in the test. Let be the first-numbered defective in the set. We discover defective status of item and the nondefective statuses of items in tests. 3. b2)
With probability there are no defectives in the first items in the test, but there is at least one defective in the test. Let be the first-numbered defective in the set. We discover defective status of item and the nondefective statuses of items in tests.
The expected number of tests in one pass through step 2 is
[TABLE]
The expected number of items whose status we discover is the same as before,
[TABLE]
The same argument as before shows that , and it’s easy to check, as noted in [19], that
[TABLE]
is the optimal value of . The average number of tests required is .
V-C Small-error rate
We now wish to prove Theorem 3 by converting the above average-case result into a small-error result. To do this we will use a concentration of measure argument.
Proof:
Let be the average number of tests used, as calculated in the previous section. We will show that there is concentration of measure of the actual number of tests required, which, for any is in the interval \big{(}(1-\delta)\bar{T},(1+\delta)\bar{T}\big{)} with probability tending to as .
We then define an algorithm using tests as follows. We run Algorithm 5 with the optimal value of as in (1). If the algorithm takes fewer than tests, we add extra arbitrary tests until it does, while if it take more than tests, we stop at that point and guess the defective set arbitrarily. Clearly we can only make an error in the second case, and, once we have proved concentration of measure, that probability can be made arbitrarily small. By picking sufficiently small, we ensure that all aspect ratios up to as in Theorem 3 are achievable.
To prove concentration, we use McDiarmid’s inequality [23], which gives concentration of measure when a bounded difference property holds. Let be the number of tests used by Algorithm 5 when denotes that item is defective and denotes it is nondefective. The random variable counting the number of tests used is , where the are independent Bernoulli random variables.
To see that we have the necessary bounded difference property, we claim that, for , we have
[TABLE]
Note from Algorithms 4 and 6 that we discover the status of items is in increasing order of their labels. Thus changing to will only change the number of tests between the last defective before and the first defective after ; outside that interval, the algorithm proceeds exactly the same. Thus changing might effect the number of tests for the first set that covers after the previous defective being discovered – potentially an increase or decrease of tests, which we can bound by . The same thing could happen when reaching the next defective after , for a potential decrease of tests again. This proves the bounded difference claim. McDiarmid’s inequality then says that
[TABLE]
where we used the fact that .
Thus we have the desired concentration, and we are done. ∎
VI Acknowledgements
The author thanks Oliver Johnson and Jonathan Scarlett for useful comments.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] R. Dorfman, “The detection of defective members of large populations,” Ann. Math. Statist. , vol. 14, no. 4, pp. 436–440, 12 1943.
- 2[2] M. Aldridge, O. Johnson, and J. Scarlett, “Group testing: an information theory perspective,” 2019, ar Xiv:1902.06002 [cs.IT].
- 3[3] A. G. D’yachkov and V. V. Rykov, “Bounds on the length of disjunctive codes,” Problemy Peredachi Informatsii , vol. 18, no. 3, pp. 7–13, 1982, translation: Prob. Inf. Transmission , vol. 18, no. 3, pp. 166–171, 1982.
- 4[4] M. Malyutov, “Search for sparse active inputs: a review,” in Information Theory, Combinatorics, and Search Theory: In Memory of Rudolf Ahlswede , H. Aydinian, F. Cicalese, and C. Deppe, Eds. Springer, 2013, pp. 609–647.
- 5[5] G. K. Atia and V. Saligrama, “Boolean compressed sensing and noisy group testing,” IEEE Trans. Inf. Th. , vol. 58, no. 3, pp. 1880–1901, 2012.
- 6[6] L. Baldassini, O. Johnson, and M. Aldridge, “The capacity of adaptive group testing,” in 2013 IEEE Int. Symp. Inf. Th. Proc. (ISIT) , 2013, pp. 2676–2680.
- 7[7] M. Aldridge, L. Baldassini, and O. Johnson, “Group testing algorithms: bounds and simulations,” IEEE Trans. Inf. Th. , vol. 60, no. 6, pp. 3671–3687, 2014.
- 8[8] J. Scarlett and V. Cevher, “Limits on support recovery with probabilistic models: an information-theoretic framework,” IEEE Trans. Inf. Th. , vol. 63, no. 1, pp. 593–620, 2017.
