Rates of adaptive group testing in the linear regime

Matthew Aldridge

arXiv:1901.09687·cs.IT·April 7, 2020

Rates of adaptive group testing in the linear regime

Matthew Aldridge

PDF

TL;DR

This paper analyzes an adaptive group testing algorithm in the linear regime, achieving high information rates for identifying defective items with fewer tests, especially when defectives are less than half of the total.

Contribution

It provides a detailed analysis of a generalized binary splitting algorithm, demonstrating near-optimal testing rates in the linear regime for both zero-error and small-error scenarios.

Findings

01

Achieves over 0.9 bits/test for zero-error testing

02

Achieves over 0.95 bits/test for small-error testing

03

Effective when fewer than half the items are defective

Abstract

We consider adaptive group testing in the linear regime, where the number of defective items scales linearly with the number of items. We analyse an algorithm based on generalized binary splitting. Provided fewer than half the items are defective, we achieve rates of over 0.9 bits per test for combinatorial zero-error testing, and over 0.95 bits per test for probabilistic small-error testing.

Figures2

Click any figure to enlarge with its caption.

Equations47

p > p^{*} := \frac{3 - 5}{2} \approx 0.382.

p > p^{*} := \frac{3 - 5}{2} \approx 0.382.

H (K) = lo g_{2} (k n) \sim n H (\frac{k}{n}) \sim n H (p)

H (K) = lo g_{2} (k n) \sim n H (\frac{k}{n}) \sim n H (p)

R = \frac{n H ( p )}{T} = \frac{H ( p )}{A} .

R = \frac{n H ( p )}{T} = \frac{H ( p )}{A} .

A = \frac{1}{m} + (1 + lo g_{2} m - \frac{1}{m}) p,

A = \frac{1}{m} + (1 + lo g_{2} m - \frac{1}{m}) p,

m = ⌊ \frac{1}{p} - 1 ⌋_{2} .

m = ⌊ \frac{1}{p} - 1 ⌋_{2} .

A=\frac{q^{m}+(1-q^{m-2b})(a+1)+(q^{m-2b}-q^{m})(a+2)}{{mq^{m}+\frac{1}{p}\big{(}1+mq^{m+1}-(m+1)q^{m}\big{)}}},

A=\frac{q^{m}+(1-q^{m-2b})(a+1)+(q^{m-2b}-q^{m})(a+2)}{{mq^{m}+\frac{1}{p}\big{(}1+mq^{m+1}-(m+1)q^{m}\big{)}}},

m = ⌈ - \frac{lo g ( 2 - p )}{lo g ( 1 - p )} ⌉, a = ⌊ lo g_{2} m ⌋, b = m - ⌊ m ⌋_{2} .

m = ⌈ - \frac{lo g ( 2 - p )}{lo g ( 1 - p )} ⌉, a = ⌊ lo g_{2} m ⌋, b = m - ⌊ m ⌋_{2} .

T = \frac{1}{m} (n - k) + (1 + lo g_{2} m) k .

T = \frac{1}{m} (n - k) + (1 + lo g_{2} m) k .

A = \frac{1}{m} (1 - p) + (1 + lo g_{2} m) p = \frac{1}{m} + (1 + lo g_{2} m - \frac{1}{m}) p .

A = \frac{1}{m} (1 - p) + (1 + lo g_{2} m) p = \frac{1}{m} + (1 + lo g_{2} m - \frac{1}{m}) p .

T = (n - k) + k = n .

T = (n - k) + k = n .

T = \frac{1}{2} (n - k) + 2 k = \frac{1}{2} n + \frac{3}{2} k = (\frac{1}{2} + \frac{3}{2} p) n,

T = \frac{1}{2} (n - k) + 2 k = \frac{1}{2} n + \frac{3}{2} k = (\frac{1}{2} + \frac{3}{2} p) n,

F = q^{m} \cdot 1 + (1 - q^{m}) (1 + lo g_{2} m) = 1 + (1 - q^{m}) lo g_{2} m .

F = q^{m} \cdot 1 + (1 - q^{m}) (1 + lo g_{2} m) = 1 + (1 - q^{m}) lo g_{2} m .

G

G

\displaystyle=mq^{m}+\frac{1}{p}\big{(}1+mq^{m+1}-(m+1)q^{m}\big{)}.

A n = E T

A n = E T

+ E # tests to deal with all remaining items

= F + A E # number of remaining items

\displaystyle=F+A\big{(}(n-m)+(m-G)\big{)},

= F + A n - A G,

A=\frac{F}{G}=\frac{1+(1-q^{m})\log_{2}m}{mq^{m}+\frac{1}{p}\big{(}1+mq^{m+1}-(m+1)q^{m}\big{)}}.

A=\frac{F}{G}=\frac{1+(1-q^{m})\log_{2}m}{mq^{m}+\frac{1}{p}\big{(}1+mq^{m+1}-(m+1)q^{m}\big{)}}.

A=\frac{1+(1-q^{2})}{\frac{1}{p}\big{(}1+2q^{3}-3q^{2}\big{)}+2q^{2}}=\frac{2-q^{2}}{1+q}=\frac{1+2p-p^{2}}{2-p}.

A=\frac{1+(1-q^{2})}{\frac{1}{p}\big{(}1+2q^{3}-3q^{2}\big{)}+2q^{2}}=\frac{2-q^{2}}{1+q}=\frac{1+2p-p^{2}}{2-p}.

F = q^{m} \cdot 1 + (1 - q^{m - 2 b}) (a + 1) + (q^{m - 2 b} - q^{m}) (a + 2) .

F = q^{m} \cdot 1 + (1 - q^{m - 2 b}) (a + 1) + (q^{m - 2 b} - q^{m}) (a + 2) .

G=mq^{m}+\frac{1}{p}\big{(}1+mq^{m+1}-(m+1)q^{m}\big{)}.

G=mq^{m}+\frac{1}{p}\big{(}1+mq^{m+1}-(m+1)q^{m}\big{)}.

m = ⌈ - \frac{lo g ( 1 + q )}{lo g q} ⌉ = ⌈ - \frac{lo g ( 2 - p )}{lo g ( 1 - p )} ⌉

m = ⌈ - \frac{lo g ( 1 + q )}{lo g q} ⌉ = ⌈ - \frac{lo g ( 2 - p )}{lo g ( 1 - p )} ⌉

\big{|}T(x_{1},x_{2},\dots,x_{i},\dots,x_{n})-T(x_{1},x_{2},\dots,x_{i}^{\prime},\dots,x_{n})\big{|}\leq 2m.

\big{|}T(x_{1},x_{2},\dots,x_{i},\dots,x_{n})-T(x_{1},x_{2},\dots,x_{i}^{\prime},\dots,x_{n})\big{|}\leq 2m.

P (∣ T - \overset{ˉ}{T} ∣ > δ \overset{ˉ}{T}) \leq exp (- \frac{2 ( δ T ˉ ) ^{2}}{n ( 2 m ) ^{2}}) \leq exp (- \frac{δ ^{2} H ( p ) ^{2}}{2 m ^{2}} n),

P (∣ T - \overset{ˉ}{T} ∣ > δ \overset{ˉ}{T}) \leq exp (- \frac{2 ( δ T ˉ ) ^{2}}{n ( 2 m ) ^{2}}) \leq exp (- \frac{δ ^{2} H ( p ) ^{2}}{2 m ^{2}} n),

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Rates of Adaptive Group Testing

in the Linear Regime

Matthew Aldridge

School of Mathematics

University of Leeds

Email: [email protected]

Abstract

We consider adaptive group testing in the linear regime, where the number of defective items scales linearly with the number of items. We analyse an algorithm based on generalized binary splitting. Provided fewer than half the items are defective, we achieve rates of over 0.9 bits per test for combinatorial zero-error testing, and over 0.95 bits per test for probabilistic small-error testing.

I Introduction

Group testing is this problem: Given a collection of items some of which are defective, how many pooled tests are required to recover the defective set? A pooled test is performed on some subset of the items: the test is negative if all items in the test are nondefective, and is positive if at least one item in the test is defective.

In Dorfman’s original work [1], the application was to test men enlisting into the U.S. army for syphilis using a blood test. Dorfman noted that testing pools of mixed blood samples could use fewer tests than testing each blood sample individually. The test result from such a pool should be negative if every blood sample in the pool is free of the disease, while the test result should be positive if at least one of the blood samples is contaminated.

Different group testing models are discussed in the recent surey [2]. The most important distinction between is between:

•

Adaptive testing, where the items placed in a test can depend on the results of previous tests.

•

Nonadaptive testing, where all the tests are designed in advance.

This paper concerns adaptive testing, and will examine some cases where adaptive group testing provides large improvements over the nonadaptive case.

Another consideration is how many defective items there are. In this paper, we consider the linear regime, where the number of defective items $k$ is a constant proportion $p\in(0,1)$ of the $n$ items. A lot of group testing work has concerned the very sparse regime where $k$ is constant as $n\to\infty$ [3, 4, 5] or the sparse regime $k=\Theta(n^{\alpha})$ as $n\to\infty$ for some $\alpha<1$ [6, 7, 8]. However, we argue that the linear regime is more appropriate for many applications. For example, in Dorfman’s original set-up, we might expect each person joining the army to have a similar prior probability $p$ of having the disease, and that this probability should remain roughly constant as more people join, rather than tending towards [math]; thus one expects $k\approx pn$ to grow linearly with $n$ .

For group testing in the linear regime, two cases have received most consideration in the literature:

•

Combinatorial zero-error testing: The defective set is any subset of $\{1,2,\dots,n\}$ with given size $k$ , and one wishes to find the defective set with certainty, whichever such set it is. One assumes that $k/n$ tends to a constant $p\in(0,1)$ as $n\to\infty$ . [9, 10, 11]

•

Probabilistic small-error testing: We assume each item is defective with probability $p$ , independent of all other items, where $p\in(0,1)$ stays fixed as $n\to\infty$ . We want to find the defective set with arbitrarily small error probability (averaged over the random defective set). [12, 13, 14, 15]

For group testing in the linear regime, it is easy to see that the optimal scaling is the number of tests $T$ scaling linearly with $n$ . A simple counting bound (see, for example, [6]) shows that we require $T\geq H(p)n$ for large enough $n$ , where $H(p)$ is the binary entropy. Meanwhile, testing each item individually requires $T=n$ tests, and succeeds with certainty. (In the combinatorial case, $T=n-1$ suffices, as the status of the final item can be inferred from whether $k$ or $k-1$ defective items have been already discovered from individual tests.) The goal of this paper is to analyse algorithms that require a number of tests very close to the lower bound $H(p)n$ .

In the sparse regime $k=\Theta(n^{\alpha})$ for $\alpha\in[0,1)$ , it is known that adaptive testing achieves the counting bound, for both small-error and zero-error criteria, using the generalised binary splitting algorithm of Hwang [16, 17, 6].

For nonadaptive testing in the linear regime, it is well known that individual testing is optimal for all $p\in(0,1)$ in the combinatorial zero-error case [3, 18, 11, 17], and it was recently shown that this is also the case for probabilistic small-error testing too [15]. Thus, for small $p$ , the benefit provided by the adaptive algorithms of this paper will be considerable.

Adaptive group testing in the linear regime has received some attention in the literature. The main point of study has been the question of when individual testing is optimal or not. In the combinatorial zero-error case, Riccio and Colburn [10] showed that individual testing cannot be improved on for $p>1-\log_{3}2\approx 0.369$ , and it is conjectured that this holds for $p>1/3$ [9]. In the probabilistic case, if one considers the average number of tests required, Ungar [12] showed that individual testing cannot be improved on for

[TABLE]

In the linear regime, we are not aware of any work that has aimed to get a number of tests close to optimal over the whole range of $p$ , as we do here. (Zaman and Pippenger [19] do consider this in the limit as $p\to 0$ .) Another novelty of ours is that we analyse small-error behaviour, not just average-case behaviour, which allows a direct comparison to nonadaptive results.

The goal of this paper is to achieve performance that is close to optimal for adaptive testing under both the zero-error and small-error criteria. We do this using an algorithm similar to that of Hwang [16] (see Algorithm 5), and examining both its worst-case and average-case behaviour. Recall that the counting bound tells us we require at least $T\geq H(p)n$ tests. Our main results are the following, which show very close to optimal performance:

•

In the zero-error case, we give an algorithm that uses $T<1.11H(p)n$ tests for all $p\leq\frac{1}{2}$ . (Theorem 2)

•

In the small-error case, we give an algorithm that uses $T<1.05H(p)n$ tests for all $p\leq\frac{1}{2}$ . (Theorem 3)

II Definitions and main results

We propose two figures of merit for assessing group testing in the linear regime.

First we have the aspect ratio $A=T/n$ (as considered by, for example, [14]). We want the aspect ratio to be as small as possible. Individual testing achieves $A=1$ , while the counting bound tells us that we must have $A\geq H(p)$ .

Second, we have the rate $H(\mathcal{K})/T$ , where $H(\mathcal{K})$ is the entropy of the defective set (as considered by [6, 7, 20] and many others). Since $H(\mathcal{K})$ is the number of bits required to define the defective set, we can think of the rate as the average number of bits of information learned per test. For combinatorial testing in the linear regime we have

[TABLE]

asymptotically, while for probabilistic testing $H(\mathcal{K})=nH(p)$ exactly. Hence we can define the rate to be

[TABLE]

We want the rate to be as big as possible. Individual testing achieves $R=H(p)$ , while the counting bound tells us that we must have $R\leq 1$ .

As a rule of thumb, we recommend the aspect ratio for measuring how much better an algorithm is than individual testing, and recommend the rate for measuring how close an algorithm is to the counting bound or comparing with results from the sparse regime.

Definition 1

We say that an aspect ratio $A$ is zero-error achievable if there is an algorithm with aspect ratio $T/n\geq A$ and error probability [math] for $n$ sufficiently large. We say that $A$ is average-case achievable if there is an algorithm with average-case aspect ratio $\bar{T}/n\geq A$ and error probability [math] for $n$ sufficiently large. We say that $A$ is small-error achievable if, for any $\delta>0$ , there exists an algorithm with aspect ratio $T/n\geq A$ and average error probability less than $\delta$ for $n$ sufficiently large.

The equivalent definitions hold for achievable rates, mutatis mutandis.

We now state our two main results. We write $\lfloor x\rfloor$ for the greatest integer less than or equal to $x$ , and $\lfloor x\rfloor_{2}=2^{\lfloor\log_{2}x\rfloor}$ for the greatest power of $2$ less than or equal to $x$ ; so $\lfloor 5.7\rfloor=5$ and $\lfloor 5.7\rfloor_{2}=4$ . We write $q=1-p$ .

Theorem 2

Consider nonadaptive group testing in the linear regime. Using Algorithm 5, all aspect ratios up to

[TABLE]

and rates up to $H(p)/A$ are zero-error achievable, where

[TABLE]

Theorem 3

Consider nonadaptive group testing in the linear regime. Using Algorithm 5, all aspect ratios up to

[TABLE]

and rates up to $H(p)/A$ are small-error achievable, where

[TABLE]

These aspect ratios and rates are illustrated in Fig. 1. Note that, for zero-error, we have $R>0.9$ for all $p\leq 1/2$ , and for small-error, $R>0.95$ for all $p\leq 1/2$ . The ‘bumpy’ behaviour occurs from when the optimal value of $m$ switches to the next integer or power of $2$ .

III Algorithm

Our algorithm is based on the idea of binary splitting. Binary splitting was first introduced for group testing by Sobel and Groll [21], and our algorithms here are inspired by Hwang’s generalized binary splitting [16].

Binary splitting is particularly simple when the size of the set is known to be a power of $2$ .

Algorithm 4

Let $\mathcal{B}$ be a set of items known to contain at least one defective item. Suppose $|\mathcal{B}|=m$ where $m$ is a power of $2$ .

If $|\mathcal{B}|=1$ , then that item is defective. Stop. 2. 2.

Otherwise, let $\mathcal{C}$ consist of the first $|\mathcal{B}|/2$ items of $\mathcal{B}$ . Test $\mathcal{C}$ .

(a)

If the test is positive: Set $\mathcal{B}:=\mathcal{C}$ , and return to step 1. 2. (b)

If the test is negative: All items in $\mathcal{C}$ are nondefective. Set $\mathcal{B}:=\mathcal{B}\setminus\mathcal{C}$ , and return to step 1.

Binary splitting where $m$ is a power of $2$ will suffice to prove the most important claims of this paper, of rates above $0.9$ and $0.95$ for zero- and small-error respectively. However, in the small-error case, for some $p<1/4$ it will be possible to slightly improve the rate by allowing $m$ to be any integer. We postpone discussion of this until Section V-B.

We now explain our main algorithm.

Algorithm 5

Let $\mathcal{A}=\{1,2,\dots,n\}$ be the set of items. We fix an integer parameter $m$ .

If $|\mathcal{A}|<m$ , test the items individually, then halt. 2. 2.

Otherwise, remove the first $m$ items from $\mathcal{A}$ , and call these items $\mathcal{B}$ . Test $\mathcal{B}$ .

(a)

If the test is negative: All items in $\mathcal{B}$ are nondefective. Return to step 1. 2. (b)

If the test is positive: Perform binary splitting on $\mathcal{B}$ (using Algorithm 4 is $m$ is a power of $2$ and Algorithm 6 otherwise). This will discover $1$ defective item and between [math] and $m-1$ nondefective items. Return the remaining items whose statuses are not discovered to $\mathcal{A}$ . Return to step 1.

Since we will choose $m$ independently of $n$ and consider asymptotics as $n\to\infty$ , the small number of individual tests incurred at step 1 (which will happen at the end of the algorithm) will be negligible for our calculations here, so we will ignore them in our analysis.

We note that, in the special case $m=1$ , this algorithm is equivalent to individual testing; while in the special case $m=2$ , we recover an algorithm studied by Fischer, Klasner and Wegenera [22]. We discuss connections with the work of Zaman and Pippenger [19] in Section V-B.

IV Worst-case analysis and zero-error rate

We will use a worst-case analysis of our algorithm to find a zero-error achievable aspect ratio.

Proof:

We perform Algorithm 5 with $m$ a power of $2$ to be fixed later.

In each pass through step 2 of Algorithm 5, one of two things can happen:

a)

The set contains no defectives, in which case we discover $m$ nondefectives with $1$ test. 2. b)

The set contains at least one defective, in which case we discover $1$ defective and between [math] and $m-1$ nondefectives with $1+\log_{2}m$ tests.

For the purposes of worst-case analysis, we assume that in the second case, we never get lucky, and only ever find the $1$ defective with [math] bonus nondefectives. Thus in our $T$ tests we must discover all $n-k$ nondefectives from case 1 and all $k$ defectives from case 2. This gives a worst-case number of tests as

[TABLE]

This has an aspect ratio of

[TABLE]

Choosing $m$ as in the statement of the theorem gives the result, and this is easily checked to be the optimal choice of $m$ . ∎

When $m=1$ , we have individual testing with

[TABLE]

When $m=2$ , we have

[TABLE]

recovering a result of [22]. The $m=2$ case beats individual testing when $p<1/3$ , recovering a result of Hu, Hwang and Wang [9], also noted in [22].

V Average-case analysis and small-error rate

To get a small-error achievability result, we start with an average-case analysis, and later twin this with a concentration of measure argument.

V-A Powers of $2$ algorithm: average-case analysis

We begin with average-case analysis of the simpler case when $m$ is a power of $2$ .

Again, we look at the outcomes for a pass through step 2.

With probability $q^{m}$ , all items in the test are negative, and we discover their nondefective statuses with $1$ test. 2. 2.

With probability $1-q^{m}$ there is at least one defective in the test. Let $j$ be the first-numbered defective in the set. We discover defective status of item $j$ and the nondefective statuses of items $1,2,\dots,j-1$ in $1+\log_{2}m$ tests.

The expected number of tests in one pass through step 2 is

[TABLE]

The expected number of items whose status we discover is

[TABLE]

(The sum here has an explicit form since $\sum_{j}jq^{j-1}=\frac{\mathrm{d}}{\mathrm{d}j}\sum_{j}q^{j}$ .)

Since the average aspect ratio $A$ is the ratio of the average number of tests to the number of items, it seems plausible that $A=F/G$ . To prove this rigorously, note that the number of tests the algorithm takes on average is, by considering one pass through step 2,

[TABLE]

where $n-m$ is the number of items not considered in the pass, and $m-G$ is the number of items not classified by the pass. This is solved by $A=F/G$ . Thus

[TABLE]

When optimised over $m$ a power of $2$ , this achieves rates $H(p)/A$ of over $0.95$ for all $p\leq 1/2$ .

As before, setting $m=1$ recovers individual testing, and we indeed get $A=1$ . Setting $m=2$ , we have

[TABLE]

We have $A<1$ , therefore outperforming individual testing, when $p\leq p^{*}=(3-\sqrt{5})/2$ , recovering a result of [12].

V-B General algorithm: average-case analysis

When considering analysis in the average case, the rate for some $p<1/4$ can be improved by considering $m$ to be any integer, not just a power of $2$ (see the right-hand side of Fig. 1). We now explain how to perform binary splitting in this general case. We write $2^{a}=\lfloor m\rfloor_{2}$ and $b=m-2^{a}$ , so that $m=2^{a}+b$ for integers $a$ and $b$ with $0\leq b<2^{a}$ .

Algorithm 6

We wish to binary split a set $\mathcal{B}$ of size $m$ that contains at least one defective. We use a Huffman tree for the uniform distribution $(\frac{1}{m},\frac{1}{m},\dots,\frac{1}{m})$ . The $k$ th test pool consists of the remaining items that have $k$ th bit of their Huffman codeword equal to [math]; if the test is positive, the untested items are removed, while if the test is negative, the tested items are removed. When one item remains, it is defective.

It is a standard result that Huffman coding for the uniform distribution results in $2^{a}-b=m-2b$ items with wordlength $a$ and the remaining $2b$ items with wordlength $a+1$ . It will be convenient for the purposes of a later proof for the items of $\mathcal{B}$ in label order to be given Huffman codewords that are in lexicographic order, and that the shorter words are given to the first $m-2b$ of the items. This means that we discover the status of the first defective item in $\mathcal{B}$ and all the preceding nondefective items.

It can be verified without too much difficulty that using $m$ that is not a power of $2$ does not improve the performance of Algorithm 5 in the zero-error case, but for reasons of space we do not give the calculations here.

It appears that Algorithm 5 when used with Algorithm 6 for binary splitting is equivalent to an algorithm studied by Zaman and Pippenger [19]. Their algorithm was defined in terms of optimal prefix-free codes for the geometric and truncated geometric distributions, and they used known results on such codes to prove that this algorithm is optimal among a set of algorithms called ‘nested algorithms’. They also studied the asymptotics of the quantity $\lim_{p\to 0}\lim_{n\to\infty}\bar{T}/k$ , which, in our notation, corresponds to $\lim_{p\to 0}A/p$ where $A$ is the average-case achievable aspect ratio. They did not look at the rate for all $p$ or consider small-error testing.

We now analyse the average-case number of tests $\bar{T}$ of Algorithm 5 for arbitrary $m$ . Again, we look at the outcomes for a pass through step 2 of Algorithm 5.

a)

With probability $q^{m}$ , all items in the test are negative, and we discover their nondefective statuses with $1$ test. 2. b1)

With probability $1-q^{m-2b}$ there is at least one defective in the first $m-2b$ items in the test. Let $j$ be the first-numbered defective in the set. We discover defective status of item $j$ and the nondefective statuses of items $1,2,\dots,j-1$ in $a+1$ tests. 3. b2)

With probability $q^{m-2b}-q^{m}$ there are no defectives in the first $m-2b$ items in the test, but there is at least one defective in the test. Let $j$ be the first-numbered defective in the set. We discover defective status of item $j$ and the nondefective statuses of items $1,2,\dots,j-1$ in $a+2$ tests.

The expected number of tests in one pass through step 2 is

[TABLE]

The expected number of items whose status we discover is the same as before,

[TABLE]

The same argument as before shows that $A=F/G$ , and it’s easy to check, as noted in [19], that

[TABLE]

is the optimal value of $m$ . The average number of tests required is $\bar{T}=An$ .

V-C Small-error rate

We now wish to prove Theorem 3 by converting the above average-case result into a small-error result. To do this we will use a concentration of measure argument.

Proof:

Let $\bar{T}$ be the average number of tests used, as calculated in the previous section. We will show that there is concentration of measure of the actual number of tests required, which, for any $\delta>0$ is in the interval $T\in$ $\big{(}(1-\delta)\bar{T},(1+\delta)\bar{T}\big{)}$ with probability tending to $1$ as $n\to\infty$ .

We then define an algorithm using $(1+\delta)\bar{T}$ tests as follows. We run Algorithm 5 with the optimal value of $m$ as in (1). If the algorithm takes fewer than $(1+\delta)\bar{T}$ tests, we add extra arbitrary tests until it does, while if it take more than $(1+\delta)\bar{T}$ tests, we stop at that point and guess the defective set arbitrarily. Clearly we can only make an error in the second case, and, once we have proved concentration of measure, that probability can be made arbitrarily small. By picking $\delta>0$ sufficiently small, we ensure that all aspect ratios up to $A$ as in Theorem 3 are achievable.

To prove concentration, we use McDiarmid’s inequality [23], which gives concentration of measure when a bounded difference property holds. Let $T(x_{1},x_{2},\dots,x_{n})$ be the number of tests used by Algorithm 5 when $x_{i}=1$ denotes that item $i$ is defective and $x_{i}=0$ denotes it is nondefective. The random variable counting the number of tests used is $T=T(X_{1},X_{2},\dots,X_{n})$ , where the $X_{i}$ are independent Bernoulli $(p)$ random variables.

To see that we have the necessary bounded difference property, we claim that, for $x_{1},x_{2},\dots,x_{i},x_{i}^{\prime},\dots,x_{n}\in\{0,1\}$ , we have

[TABLE]

Note from Algorithms 4 and 6 that we discover the status of items is in increasing order of their labels. Thus changing $x_{i}$ to $x_{i}^{\prime}$ will only change the number of tests between the last defective before $i$ and the first defective after $i$ ; outside that interval, the algorithm proceeds exactly the same. Thus changing $x_{i}$ might effect the number of tests for the first set $\mathcal{B}$ that covers $i$ after the previous defective being discovered – potentially an increase or decrease of $a+1$ tests, which we can bound by $a+1\leq m$ . The same thing could happen when reaching the next defective after $i$ , for a potential decrease of $a+1\leq m$ tests again. This proves the bounded difference claim. McDiarmid’s inequality then says that

[TABLE]

where we used the fact that $\bar{T}\geq H(p)n$ .

Thus we have the desired concentration, and we are done. ∎

VI Acknowledgements

The author thanks Oliver Johnson and Jonathan Scarlett for useful comments.

Bibliography23

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] R. Dorfman, “The detection of defective members of large populations,” Ann. Math. Statist. , vol. 14, no. 4, pp. 436–440, 12 1943.
2[2] M. Aldridge, O. Johnson, and J. Scarlett, “Group testing: an information theory perspective,” 2019, ar Xiv:1902.06002 [cs.IT].
3[3] A. G. D’yachkov and V. V. Rykov, “Bounds on the length of disjunctive codes,” Problemy Peredachi Informatsii , vol. 18, no. 3, pp. 7–13, 1982, translation: Prob. Inf. Transmission , vol. 18, no. 3, pp. 166–171, 1982.
4[4] M. Malyutov, “Search for sparse active inputs: a review,” in Information Theory, Combinatorics, and Search Theory: In Memory of Rudolf Ahlswede , H. Aydinian, F. Cicalese, and C. Deppe, Eds. Springer, 2013, pp. 609–647.
5[5] G. K. Atia and V. Saligrama, “Boolean compressed sensing and noisy group testing,” IEEE Trans. Inf. Th. , vol. 58, no. 3, pp. 1880–1901, 2012.
6[6] L. Baldassini, O. Johnson, and M. Aldridge, “The capacity of adaptive group testing,” in 2013 IEEE Int. Symp. Inf. Th. Proc. (ISIT) , 2013, pp. 2676–2680.
7[7] M. Aldridge, L. Baldassini, and O. Johnson, “Group testing algorithms: bounds and simulations,” IEEE Trans. Inf. Th. , vol. 60, no. 6, pp. 3671–3687, 2014.
8[8] J. Scarlett and V. Cevher, “Limits on support recovery with probabilistic models: an information-theoretic framework,” IEEE Trans. Inf. Th. , vol. 63, no. 1, pp. 593–620, 2017.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Rates of Adaptive Group Testing

Abstract

I Introduction

II Definitions and main results

Definition 1

Theorem 2

Theorem 3

III Algorithm

Algorithm 4

Algorithm 5

IV Worst-case analysis and zero-error rate

Proof:

V Average-case analysis and small-error rate

V-A Powers of 222 algorithm: average-case analysis

V-B General algorithm: average-case analysis

Algorithm 6

V-C Small-error rate

Proof:

VI Acknowledgements

V-A Powers of $2$ algorithm: average-case analysis