Sparse Graph Codes for Non-adaptive Quantitative Group Testing
Esmaeil Karimi, Fatemeh Kazemi, Anoosheh Heidarzadeh, Krishna R., Narayanan, and Alex Sprintson

TL;DR
This paper introduces a non-adaptive quantitative group testing algorithm using sparse graph codes and BCH codes, achieving near-optimal test efficiency and high probability of exact defective item recovery in large-scale scenarios.
Contribution
The paper proposes a novel non-adaptive QGT scheme with sparse graph codes and BCH codes, providing probabilistic guarantees and analyzing test complexity.
Findings
Requires at most c(t)K(t log2(ℓN/(c(t)K)+1)+1)+1 tests for large N,K
Achieves minimum tests with t=2
Decoding complexity is O(K log(N/K)) for t ≤ 4
Abstract
This paper considers the problem of Quantitative Group Testing (QGT). Consider a set of items among which items are defective. The QGT problem is to identify (all or a sufficiently large fraction of) the defective items, where the result of a test reveals the number of defective items in the tested group. In this work, we propose a non-adaptive QGT algorithm using sparse graph codes over bi-regular bipartite graphs with left-degree and right degree and binary -error-correcting BCH codes. The proposed scheme provides exact recovery with probabilistic guarantee, i.e. recovers all the defective items with high probability. In particular, we show that for the sub-linear regime where vanishes as , the proposed algorithm requires at most tests to recover all the…
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | |
| 1.222 | 0.597 | 0.388 | 0.294 | 0.239 | 0.202 | 0.176 | 0.156 | |
| 3 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
\xpatchcmd
- Proof:
Sparse Graph Codes for Non-adaptive
Quantitative Group Testing
Esmaeil Karimi, Fatemeh Kazemi, Anoosheh Heidarzadeh, Krishna R. Narayanan, and Alex Sprintson The authors are with the Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843 USA (E-mail: {esmaeil.karimi, fatemeh.kazemi, anoosheh, krn, spalex}@tamu.edu).
Abstract
This paper considers the problem of Quantitative Group Testing (QGT). Consider a set of items among which items are defective. The QGT problem is to identify (all or a sufficiently large fraction of) the defective items, where the result of a test reveals the number of defective items in the tested group. In this work, we propose a non-adaptive QGT algorithm using sparse graph codes over bi-regular bipartite graphs with left-degree and right degree and binary -error-correcting BCH codes. The proposed scheme provides exact recovery with probabilistic guarantee, i.e. recovers all the defective items with high probability. In particular, we show that for the sub-linear regime where vanishes as , the proposed algorithm requires at most tests to recover all the defective items with probability approaching one as , where depends only on . The results of our theoretical analysis reveal that the minimum number of required tests is achieved by . The encoding and decoding of the proposed algorithm for any have the computational complexity of and , respectively. Our simulation results also show that the proposed algorithm significantly outperforms a non-adaptive semi-quantitative group testing algorithm recently proposed by Abdalla et al. in terms of the required number of tests for identifying all the defective items with high probability.
I introduction
In this work, we consider the problem of Quantitative Group Testing (QGT). Consider a set of items among which items are defective. The QGT problem is to identify (all or a sufficiently large fraction of) the defective items, where the result of a test reveals the number of defective items in the tested group. The key difference between the QGT problem and the original group testing problem is that, unlike the former, in the latter the result of each test is either or [math] depending on whether the tested group contains any defective items or not. The objective of QGT is to design a test plan with minimum number of tests that identifies (all or a sufficiently large fraction of) the defective items.
There are two general categories of test strategies: non-adaptive and adaptive. In an adaptive scheme, each test depends on the outcomes of the previous tests. On the other hand, in a non-adaptive scheme, all tests are planned in advance. In other words, the result of one test does not affect the design of another test. Although, in general, adaptive algorithms require fewer tests, in most practical applications non-adaptive algorithms are preferred since they allow one to perform all tests at once in parallel.
Let be the index set of the defective items and be an estimation of . Depending on the application at hand, there can be different requirements for the closeness of to [1, 2]. The strongest condition for closeness is exact recovery when it is required that . Two weaker conditions are partial recovery without false detections when it is required that and , and partial recovery without missed detections when it is required that and . There are also different types of the recovery guarantees [2]. The strongest guarantee is perfect recovery guarantee when the exact or partial recovery needs to be achieved with probability (over the space of all problem instances). A slightly weaker guarantee is probabilistic recovery guarantee when it suffices to achieve the exact or partial recovery with high probability only (and not necessarily with probability ). In this work, we are interested in the exact recovery of all defective items with the probabilistic recovery guarantee.
I-A Related Work and Applications
The QGT problem has been extensively studied for a wide range of applications, e.g., multi-access communication, spectrum sensing, and network tomography, see, e.g., [3, 4, 5], and references therein. This problem was first introduced by Shapiro in [6]. Several non-adaptive and adaptive QGT strategies have been previously proposed, see, e.g., [7, 3, 8]. It was shown in [9] that any non-adaptive algorithm must perform at least tests. Various order optimal or near-optimal non-adaptive strategies were previously proposed, see, e.g., [9, 8, 7]. The best known polynomial-time non-adaptive algorithms require tests [10, 9]. Recently, a semi-quantitative group testing scheme based on sparse graph codes was proposed in [11], where the result of each test is an integer in the set . This strategy identifies a fraction of defective items using tests with high probability, where depends only on and .
I-B Connection with Compressed Sensing
A closely related problem to QGT is the problem of compressed sensing (CS) in which the goal is to recover a sparse signal from a set of (linear) measurements. Given an -dimensional sparse signal with a support size up to , the objective is to identify the indices and the values of non-zero elements of the signal with minimum number of measurements. The main differences between the CS problem and the QGT problem are in the signal model and the constraints on the measurement matrix. Most of the existing works on the CS problem consider real-valued signals and measurement matrices. The QGT problem, however, deals with binary signals and requires the measurement matrix to be binary-valued.
There are a number of CS algorithms in the literature that use binary-valued measurement matrices, see, e.g. [12, 13] and references therein. However, these strategies either use techniques which are not applicable to binary signals, or provide different types of closeness and guarantee than those required in this work. There are also several CS algorithms for the support recovery where the objective is to determine the indices of the non-zero elements of the signal but not their values [14, 15, 16]. The support recovery problem is indeed equivalent to the QGT problem. Notwithstanding, the existing schemes for support recovery rely on non-binary measurement matrices, and hence are not suitable for the QGT problem.
Last but not least, to the best of our knowledge, the majority of works on the CS problem focus mainly on the order optimality of the number of measurements, whereas in this work for the QGT problem we are also interested in minimizing the constant factor hidden in the order.
I-C Main Contributions
In this work, we propose a non-adaptive quantitative group testing strategy for the sub-linear regime where vanishes as . We utilize sparse graph codes over bi-regular bipartite graphs with left-degree and right-degree and binary -error-correcting BCH codes for the design of the proposed strategy. Leveraging powerful density evolution techniques for the analysis enables us not only to determine the exact value of constants in the number of tests needed but also to provide provable performance guarantees. We show that the proposed scheme provides exact recovery with probabilistic guarantee, i.e. recovers all the defective items with high probability. In particular, for the sub-linear regime, the proposed algorithm requires at most tests to recover all defective items with probability approaching one as , where depends only on .
The results of our theoretical analysis reveal that the minimum number of required tests for the proposed algorithm is achieved by . Moreover, for any , the encoding and decoding of the proposed algorithm have the computational complexity of and , respectively.
II Problem Setup and Notation
Throughout the paper, we use bold-face small and capital letters to denote vectors and matrices, respectively.
In this work, we consider the problem of quantitative group testing (QGT) with exact recovery and probabilistic guarantee, defined as follows. Consider a set of items among which items are defective. We focus on the sub-linear regime where the ratio vanishes as . The problem is to identify all the defective items with high probability while using minimum number of tests on subsets (groups) of the items, where the result of each test shows the number of defective items in the tested group.
Let the vector represent the set of items in which the coordinates with value correspond to the defective items. A non-adaptive group testing problem consisting of tests can be represented by a measurement matrix , where the -th row of the matrix corresponds to the -th test. That is, the coordinates with value in the -th row correspond to the items in the -th test. The results of the tests are expressed in the test vector , i.e.,
[TABLE]
The goal is to design a testing matrix that has a small number of rows (tests), , and can identify with high probability all the defective items given the test vector .
III Proposed Algorithm
III-A Binary -error-correcting codes and -separable matrices
Definition 1**.**
(-separable matrix) A binary matrix (for ) is -separable over field if the sum (over field ) of any set of columns is distinct.
Example 1**.**
Consider the following matrix,
[TABLE]
The matrix is -separable over real field , but it is not -separable over since, for instance, the sum of the first and second columns over is the same as the sum of the third and fourth columns over .
[TABLE]
From the definition, it can be easily seen that if a matrix (with columns) is -separable over a field , then is also -separable over for any .
The vector of test results, , is the sum of the columns in the testing matrix corresponding to the coordinates of the defective items. When a -separable matrix over is used as the testing matrix, the vector will be distinct for any set of defective items. Thus, a -separable matrix over can be used as the testing matrix for identifying defective items. However, the construction of -separable matrices for arbitrary with minimum number of rows is an open problem. Instead, we can leverage the idea that the parity-check matrix of any binary -error-correcting code is a -separable matrix over . Note that -separability over results in -separability over . Hence, a possible choice for designing a -separable matrix over is utilizing the parity-check matrix of a binary -error-correcting code.
In this work, we use binary BCH codes for this purpose. The key feature of the BCH codes which make them suitable for designing -separable matrices is that it is possible to design binary BCH codes, capable of correcting any combination of or fewer errors.
Definition 2**.**
[17]** (Binary BCH codes) For any positive integers and , there exists a binary -error-correcting BCH code with the following parameters:
[TABLE]
The parity-check matrix of such a code is given by
[TABLE]
where is a primitive element in .
Since each entry of is an element in , it can be represented by an -tuple over . Thus, the number of rows in the binary representation of is given by
[TABLE]
III-B Encoding algorithm
The design of the measurement matrix in our scheme is based on an architectural philosophy that was proposed in [2] and [18]. The key idea is to design using a sparse bi-regular bipartite graph and to apply a peeling-based iterative algorithm for recovering the defective items given .
Let be a randomly generated bipartite graph where each of the left nodes is connected to right nodes uniformly at random, and each of the right nodes is connected to left nodes uniformly at random. Note that there are edge connections from the left side and edge connections from the right side,
[TABLE]
Let be the adjacency matrix of the graph , where each column in corresponds to a left node and has exactly ones, and each row corresponds to a right node and has exactly ones. Let denote the -th row of , i.e., . We assign tests to each right node based on a signature matrix . The signature matrix is chosen as , where is an all-ones row of length , and is the parity-check matrix of a binary -error-correcting BCH code of length . From (2), it can be easily seen that .
The measurement matrix is given by where is a matrix that defines the tests at the -th right node. There are exactly ones in each row of , and the signature matrix has columns. Note that is the -th column of , where is the -th column of . is obtained by placing the columns of at the coordinates of the ones of the row vector , and replacing zeros by all-zero columns,
[TABLE]
where .
The number of rows in the measurement matrix , where , represents the total number of tests in the proposed scheme.
Example 2**.**
Let be the total number of items. Let be a randomly generated left-and-right-regular graph with left nodes of degree and right nodes of degree . For this example, suppose that the adjacency matrix of the graph is given by
[TABLE]
Consider the parity-check matrix of a binary -error-correcting BCH code of length given by
[TABLE]
where is a root of the primitive polynomial . The signature matrix is then given by
[TABLE]
Following the construction procedure explained earlier, the testing matrix is then given by
[TABLE]
III-C Decoding algorithm
Let the observation vector corresponding to the -th right node be defined as
[TABLE]
Note that .
Definition 3**.**
(-resolvable right node) A right node is called -resolvable if it is connected to or fewer defective items.
The following lemma is useful for resolving the right nodes. (The proofs of all lemmas can be found in the appendix.)
Lemma 1**.**
The proposed algorithm detects and resolves all the -resolvable right nodes.
The decoding algorithm performs in rounds as follows. In each round, the decoding algorithm first iterates through all the right node observation vectors , and resolves all -resolvable right nodes (by BCH decoding, as discussed in the proof of Lemma 1). Then, given the identities of the recovered left nodes, the edges connected to these defective items are peeled off the graph. That is, the contributions of the recovered defective items will be removed from the unresolved right nodes so that new right nodes may become -resolvable for the next round. The decoding algorithm terminates when there is no more -resolvable right nodes.
Example 3**.**
Consider the group testing problem in the Example 2. Let the number of defective items be and let , i.e., item , item , and item are defective items. We show how the proposed scheme can identify the defective items. The result of the tests can be expressed as follows,
[TABLE]
Then, the right-node observation vectors are given by
[TABLE]
[TABLE]
[TABLE]
[TABLE]
Because the signature matrix is built using a -separable matrix, each right node can be resolved if it is connected to at most one defective item.
Iteration : we first find the -resolvable right nodes. The first and second right nodes are -resolvable because . Using a BCH decoding algorithm, one can find that the defective items connected to the first and second right nodes are item and item , respectively. Next, we remove the contributions of the items and from the unresolved right nodes. The new observation vectors will be as follows,
[TABLE]
[TABLE]
Iteration : it can be easily observed that the third and forth right nodes are -resolvable since . Using a BCH decoding algorithm, it follows that the item is the defective item connected to both right nodes and . Since all the defective items are identified, the decoding algorithm terminates.
IV Main Results
In this section, we present our main results. Theorem 1 characterizes the required number of tests that guarantees the identification of all defective items with probability approaching one as . Theorem 2 presents the computational complexity of the proposed algorithm. The proofs of Theorems 1 and 2 are given in Section V.
Theorem 1**.**
For the sub-linear regime, the proposed scheme recovers all defective items with probability approaching one (as ) with at most tests, where depends only on . Table I shows the values of for .
Theorem 2**.**
The encoding and decoding of the proposed algorithm for any have the computational complexity of and , respectively.
V Proofs of Main Theorems
V-A Proof of Theorem 1
Let be the total number of items, out of which items are defective. Note that in the QGT problem, performing one initial test (on all items) would suffice to obtain the number of defective items. As mentioned in Section III-C, our scheme employs an iterative decoding algorithm. In each iteration, the algorithm finds and resolves all the -resolvable right nodes. At the end of each iteration, the decoder subtracts the contribution of the identified defective items from the unresolved right nodes. This process is repeated until there is no -resolvable right nodes left in the graph. The fraction of defective items that remain unidentified when the decoding algorithm terminates can be analyzed using density evolution as follows.
Assuming that the exact number of the defective items, , is known and the values assigned to the defective and non-defective items are one and zero, respectively, the left-and-right-regular bipartite graph can be pruned. All the zero left nodes and their respective edges are removed from the graph. The number of left nodes in the pruned graph is , but the degree of these nodes remains unchanged. On the other hand, the number of right nodes remains unchanged, but the resulting graph is not right-regular any longer.
Let be the average right degree, i.e., . Let be the right edge degree distribution, where is the probability that a randomly picked edge in the pruned graph is connected to a right node of degree , and is the maximum degree of a right node. As shown in [18], as , we have .
The following lemma is useful for computing the fraction of unidentified defective items at each iteration of the decoding algorithm.
Lemma 2**.**
Let be the probability that a randomly chosen defective item is not recovered at iteration of the decoding algorithm; and let be the probability that a randomly picked right node is resolved at iteration of the decoding algorithm. The relation between and is determined by the following density evolution equations:
[TABLE]
[TABLE]
where is the level of separability, and is the probability that a randomly picked edge in the pruned graph is connected to a right node of degree .
Note that is only a function of the variables , , and when . Recall that the goal is to minimize the total number of tests, i.e., , where is the number of right nodes, and is the number of rows in the signature matrix. The number of rows, , in the signature matrix depends only on the level of separability, . For a given , we can minimize the number of right nodes subject to the constraint , so as to minimize the total number of the tests. The constraint guarantees that running the decoding algorithm for sufficiently large number of iterations, the probability that a randomly chosen defective item remains unidentified approaches zero. For any , let . Then, for any and , we have . Accordingly, for any and , it follows that . Our goal is then to compute
[TABLE]
We can solve this problem numerically and attain the optimal value of , i.e., . Let . The number of right nodes can then be chosen as for any to guarantee that . Substituting in (3) results in . Therefore, the total number of tests will become .
Lemma 3**.**
There exist some such that
[TABLE]
By combining the result of Lemma 3 and the preceding arguments, it follows that with probability approaching one as , tests would suffice for the proposed algorithm to recover all defective items. This completes the proof.
V-B Proof of Theorem 2
Lemma 4**.**
For any , the computational complexity of resolving each -resolvable right node is .
The total number of right nodes, , is . From Lemma 4, it then follows that the complexity of the decoding algorithm is . Using (3), it is easy to see that for any the decoding algorithm has complexity . The total number of measurements is and for each measurement summations are performed. Hence, the complexity of the encoding algorithm is , which becomes equivalent to for any .
VI Evaluation of
In this section, we present the complete analysis for the case of , and show how one can evaluate at , i.e., . The same procedure can be used for evaluating at any .
To compute , we compute the ratio for each and its corresponding . The optimal , i.e., , is the one that yields the minimum value for .
For the case of , the density evolution equations (6) and (7) can be combined as
[TABLE]
Obviously, . Substituting , we can rewrite (9) as
[TABLE]
For the sub-linear regime, (by definition) as , and hence, (by (3)). Thus, in the asymptotic regime of our interest, . Letting , the equation (10) reduces to
[TABLE]
Using (11), we can write
[TABLE]
The following two lemmas are useful for computing for each .
Lemma 5**.**
For any and any , the infinite sequence converges.
Lemma 6**.**
Let be the limit of the sequence , and let
[TABLE]
Then, for any , we have
[TABLE]
By the result of Lemma 6, for any the value of can be computed numerically. One can then obtain the optimal value of , i.e., , which minimizes the ratio of , and accordingly can be computed.
VII Comparison Results
In this section we will evaluate the performance of the proposed algorithm based on our theoretical analysis and the Monte Carlo simulations.
Based on the results in Theorem 1 and Table I, Fig. 2 depicts the total number of tests () required to identify all the defective items for different values of . The number of items is assumed to be . As it can be seen, when the required number of tests for identifying all the defective items is less than that for larger values of .
Using the Monte Carlo simulation, we also compare the performance of the proposed scheme for with the performance of the Multi-Level Group Testing (MLGT) algorithm from [11]. The MLGT scheme is a semi-quantitative group testing scheme where the result of each test is an integer in the set . Letting , the MLGT scheme becomes a QGT scheme. Based on the optimization that we have performed, the optimal left degree for the MLGT scheme is when . For defective items among a population of items, the average fraction of unidentified defective items for the MLGT scheme and the proposed scheme are shown in Fig. 3 for different values of . As it can be observed, the proposed scheme for all the three tested values of outperforms the MLGT scheme significantly. For instance, when the fraction of unidentified defective items is , the required number of tests for the MLGT scheme (for ) is times, times, and times more than that of the proposed scheme for , , and , respectively.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] J. Scarlett and V. Cevher, “How little does non-exact recovery help in group testing?” in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , March 2017, pp. 6090–6094.
- 2[2] K. Lee, R. Pedarsani, and K. Ramchandran, “SAFFRON: A fast, efficient, and robust framework for group testing based on sparse-graph codes,” Co RR , vol. abs/1508.04485, 2015.
- 3[3] C. Wang, Q. Zhao, and C. N. Chuah, “Optimal nested test plan for combinatorial quantitative group testing,” IEEE Transactions on Signal Processing , vol. PP, no. 99, 2017.
- 4[4] A. Heidarzadeh, E. Karimi, F. Kazemi, and A. Sprintson, “Fast localization of multiple users in mm-wave cells.”
- 5[5] A. Heidarzadeh, E. Karimi, F. Kazemi, K. Narayanan, and A. Sprintson, “User localization in mmwave cells: A non-adaptive quantitative group testing approach based on sparse graph codes.”
- 6[6] H. S. Shapiro, “Problem E 1399,” Amer. Math. Monthly , vol. 67, no. 82, pp. 697–697, 1960.
- 7[7] N. H. Bshouty, “Optimal algorithms for the coin weighing problem with a spring scale,” in Conference on Learning Theory , 2009.
- 8[8] E. Karimi, F. Kazemi, A. Heidarzadeh, and A. Sprintson, “A simple and efficient strategy for the coin weighing problem with a spring scale,” in 2018 IEEE International Symposium on Information Theory (ISIT) , June 2018, pp. 1730–1734.
