Explicit Formulas for the Weight Enumerators of Some Classes of Deletion Correcting Codes
Khodakhast Bibak, Olgica Milenkovic

TL;DR
This paper introduces a broad class of deletion/insertion correcting codes, providing explicit formulas for their weight enumerators and sizes, which unify and extend many known codes and have potential applications in various fields.
Contribution
The paper presents a general code framework with explicit weight enumerator formulas, unifying several known codes and deriving new size formulas and solutions to linear congruences.
Findings
Explicit formulas for weight enumerators of the new code class
Unified formulas for sizes of multiple deletion/insertion codes
First explicit formula for solutions to arbitrary linear congruences
Abstract
We introduce a general class of codes which includes several well-known classes of deletion/insertion correcting codes as special cases. For example, the Helberg code, the Levenshtein code, the Varshamov--Tenengolts code, and most variants of these codes including most of those which have been recently used in studying DNA-based data storage systems are all special cases of our code. Then, using a number theoretic method, we give an explicit formula for the weight enumerator of our code which in turn gives explicit formulas for the weight enumerators and so the sizes of all the aforementioned codes. We also obtain the size of the Shifted Varshamov--Tenengolts code. Another application which automatically follows from our result is an explicit formula for the number of binary solutions of an arbitrary linear congruence which, to the best of our knowledge, is the first result of its kind…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Explicit Formulas for the Weight Enumerators
of Some Classes of Deletion Correcting Codes
Khodakhast Bibak and Olgica Milenkovic An extended abstract of this paper was presented at ISIT 2018, Colorado, USA [7]. K. Bibak is with the Department of Computer Science and Software Engineering, Miami University, Oxford, Ohio, 45056, USA. Email: [email protected]. O. Milenkovic is with the Coordinated Science Laboratory, University of Illinois at Urbana-Champaign, Urbana, Illinois, 61801, USA. Email: [email protected].
Abstract
We introduce a general class of codes which includes several well-known classes of deletion/insertion correcting codes as special cases. For example, the Helberg code, the Levenshtein code, the Varshamov–Tenengolts code, and most variants of these codes including most of those which have been recently used in studying DNA-based data storage systems are all special cases of our code. Then, using a number theoretic method, we give an explicit formula for the weight enumerator of our code which in turn gives explicit formulas for the weight enumerators and so the sizes of all the aforementioned codes. We also obtain the size of the Shifted Varshamov–Tenengolts code. Another application which automatically follows from our result is an explicit formula for the number of binary solutions of an arbitrary linear congruence which, to the best of our knowledge, is the first result of its kind in the literature and might be also of independent interest. Our general result might have more applications/implications in information theory, computer science, and mathematics.
Index Terms:
Binary solution, BLCC, weight enumerator, deletion correcting code, linear congruence.
I Introduction
Deletions or insertions can occur in many systems; for example, they can occur in some communication and storage channels, in biological sequences, etc. Therefore, studying deletion/insertion correcting codes may lead to important insight into genetic processes and into many communication problems. Deletion correcting codes have been the subject of intense research for more than fifty years [36, 37, 44], with recent results settling long standing open problems regarding constructions of multiple deletion correcting codes with low redundancy [10, 9]. Nevertheless, our understanding about these codes and channels with this type of errors is still very limited and many open problems in the area remain, especially when considering constructions of deletion correcting codes that satisfy additional constraints, such as weight or parity constraints. Examples include codes in the Damerau distance [20], based on single deletion correcting codes with even weight, and Shifted Varshamov–Tenengolts codes [42] used for burst deletion correction. In such settings, one important question is to determine the weight enumerators of the component deletion correcting codes in order to estimate the size [13, 28] of the weight-constrained deletion correcting codes. The component deletion correcting code is frequently defined in terms of a linear congruence for which the number of solutions of some fixed weight determines the size of the constrained code.
Here, we introduce a general class of codes which includes several well-known classes of deletion/insertion correcting codes as special cases. Then, using a number theoretic method, we give an explicit formula for the weight enumerator of our code which in turn gives explicit formulas for the weight enumerators and for the sizes of the aforedescribed codes (see also [13, 28] for some general upper bounds for the size of deletion correcting codes). Our initial motivation for studying this problem comes from number theory, and pertains to a possible -ary generalization of Lehmer’s Theorem (see Section II).
Before we proceed with our technical exposition, we review some well-known classes of deletion correcting codes.
Throughout the paper, we let . Varshamov and Tenengolts [49] in 1965 introduced an important class of codes, known as the Varshamov–Tenengolts codes (henceforth, VT-codes), and proved that these codes are capable of correcting single asymmetric errors on a -channel.
Definition I.1**.**
Let be a positive integer and . The Varshamov–Tenengolts code is the set of all binary -tuples such that
[TABLE]
A generalization of VT-codes to Abelian groups where the code length is one less than the order of the group was proposed by Constantin and Rao [12]; the size and weight distribution of the latter codes were studied in [16, 24, 26, 34]. Despite the fact that the VT codes can correct only a single deletion [32], the codes and their variants have found many applications, including DNA-based data storage [20, 31] and distributed message synchronization [51, 52].
Levenshtein [32] proved that any code that can correct deletions (or insertions) can also correct a total of deletions and insertions. In the same paper, he also proposed the following important generalization of VT codes.
Definition I.2**.**
Let , be positive integers and . The Levenshtein code is the set of all binary -tuples such that
[TABLE]
By giving an elegant decoding algorithm, Levenshtein [32] showed that if , then the code can correct a single deletion (and consequently, can correct a single insertion). Furthermore, Levenshtein [32] proved that if then the code can correct either a single deletion/insertion error or a single substitution error. The Levenshtein code has found many interesting applications and is considered to be one of the most important examples of deletion/insertion correcting codes.
Motivated by applications in burst of deletion correction, a variant of the Levenshtein code was introduced in [42] under the name of Shifted Varshamov–Tenengolts codes. Gabrys et al. [20] used Shifted VT-codes to construct codes in the Damerau distance. Shifted VT-codes combine a linear congruence constraint with a parity constraint, as stated in the next definition.
Definition I.3**.**
Let , be positive integers, , and . The Shifted Varshamov–Tenengolts code is the set of all binary -tuples such that
[TABLE]
The reason why these codes are called “shifted” is that they can correct a single deletion where the location of the deleted bit is known to be within certain consecutive positions. A variation of the Shifted VT-codes appeared in [14, 15].
Helberg and Ferreira [23] introduced a generalization of the Levenshtein code, referred to as the Helberg code, by replacing the coefficients (weights) with modified versions of the Fibonacci numbers.
Definition I.4**.**
Let , be positive integers. The Helberg code is the set of all binary -tuples such that
[TABLE]
where , for , , for , , and . Note that the multipliers depend on , and depends on both and .
Clearly, the Helberg code with coincides with the VT code. Helberg and Ferreira [23] gave numerical values for the maximum cardinality of this code for some special parameter choices. Abdel-Ghaffar et al. [1] proved that the Helberg code can correct multiple deletion/insertion errors (see also [22] for a short proof of this result). Furthermore, multiple deletion correcting codes over nonbinary alphabets generalizing the Helberg code were recently proposed by Le and Nguyen [29]. The Helberg code constraint was combined with the parity constraint of Shifted VT-codes for the purpose of devising special types of DNA-based data storage codes in [20].
We now introduce our general code family which includes the above codes as special cases.
Definition I.5**.**
Let , be positive integers, , and . We define the Binary Linear Congruence Code (BLCC) as the set of all binary -tuples such that
[TABLE]
The Hamming weight of a string over an alphabet, denoted by , is the number of non-zero symbols in . Equivalently, the Hamming weight of a string is the Hamming distance between that string and the all-zero string of the same length. The weight enumerator of a code is defined as follows.
Definition I.6**.**
Let be a positive integer, be a finite field, and let . Then the weight enumerator of the code is defined as
[TABLE]
where is the Hamming weight of , and is the number of codewords in of Hamming weight . Also, the homogeneous weight enumerator of the code is defined as
[TABLE]
Clearly, by setting in the weight enumerator (or in the homogeneous weight enumerator) we obtain the size of code .
What can we say about the size, or more generally, about the weight enumerator of the Binary Linear Congruence Code (BLCC) ? In the next section, we review linear congruences, exponential sums and in particular, Ramanujan sums. Then, in Section III, we give an explicit formula for the weight enumerator of . In Section IV, we derive explicit formulas for the weight enumerators and for the sizes of the previously described deletion correcting codes. We also obtain a formula for the size of the Shifted Varshamov–Tenengolts codes.
II Linear congruences and Ramanujan sums
Let , . Throughout the paper, an ordered -tuple of integers is denoted by . Also, by we mean the scalar product of the vectors and . A linear congruence in unknowns is of the form
[TABLE]
A solution of (II.1) is an ordered -tuple of integers that satisfies (II.1). The following result, proved by Lehmer [30], gives the number of solutions of the above linear congruence.
Theorem II.1**.**
Let , . The linear congruence has a solution if and only if , where . Furthermore, if this condition is satisfied, then there are solutions.
Lehmer’s Theorem and its variants have been studied extensively and have found intriguing applications in several areas of mathematics, computer science, and physics (see [2, 3, 4, 5, 6, 11, 25] and the references therein).
Now, we pose the following problem that asks for a -ary generalization of Lehmer’s Theorem:
Problem II.2**.**
Let , , and . Give an explicit formula for the number of solutions of the linear congruence with .
Note that we have only changed to . For example, when , the problem is asking for an explicit formula for the number of binary solutions of an arbitrary linear congruence. This is a very natural problem and might lead to interesting applications. In Section III, we solve the binary version of the above problem as an immediate consequence of our main result.
Remark II.3**.**
A solution to Problem II.2 automatically gives the size of a multiple insertion/deletion correcting code recently proposed by Le and Nguyen [29] which generalize the Helberg code.
Next, we review some properties of exponential sums and in particular, Ramanujan sums. Throughout the paper, we let denote the complex exponential with period .
Lemma II.4**.**
Let be a positive integer and be a real number. Then we have
[TABLE]
Proof.
When the result is clear because in this case . So, we let . Since , summing the geometric progression gives
[TABLE]
∎
For integers and with the quantity
[TABLE]
is called a Ramanujan sum. It is the sum of the -th powers of the primitive -th roots of unity, and is also denoted by in the literature. Even though the Ramanujan sum is defined as a sum of some complex numbers, it is integer-valued (see Theorem II.5 below). From (II.3), it is clear that .
Ramanujan sums and some of their properties were certainly known before Ramanujan’s paper [41], as Ramanujan himself declared [41]; nonetheless, probably the reason that these sums bear Ramanujan’s name is that “Ramanujan was the first to appreciate the importance of the sum and to use it systematically”, according to Hardy (see, [19] for a discussion about this).
Ramanujan sums have important applications in additive number theory, for example, in the context of the Hardy-Littlewood circle method, Waring’s problem, and sieve theory (see, e.g., [38, 39, 50] and the references therein). As a major result in this direction, one can mention Vinogradov’s theorem (in its proof, Ramanujan sums play a key role) stating that every sufficiently large odd integer is the sum of three primes, and so every sufficiently large even integer is the sum of four primes (see, e.g., [39, Chapter 8]). Ramanujan sums have also interesting applications in cryptography [6, 43], coding theory [4, 21], combinatorics [5, 35], graph theory [18, 33], signal processing [47, 48], and physics [2, 40].
Clearly, , where is Euler’s totient function. Also, by Theorem II.5 (see below), , where is the Möbius function defined by
[TABLE]
The following theorem, attributed to Kluyver [27], gives an explicit formula for :
Theorem II.5**.**
For integers and , with ,
[TABLE]
Thus, can be easily computed provided can be factored efficiently. One should compare (II.5) with the formula
[TABLE]
III Weight enumerator of the Binary Linear Congruence Code
Using a simple number theoretic argument, we give an explicit formula for the weight enumerator (and the size) of the Binary Linear Congruence Code (BLCC) . Another result which automatically follows from our result is an explicit formula for the number of binary solutions of an arbitrary linear congruence which, to the best of our knowledge, is the first result of its kind in the literature and may be of independent interest.
The following lemma is useful for proving our main result.
Lemma III.1**.**
Let , be positive integers. For any -tuple , we have
[TABLE]
Proof.
Expand the left-hand side of (III.1) and note that . ∎
Now we are ready to state and prove our main result.
Theorem III.2**.**
Let , be positive integers, , and . The weight enumerator of the Binary Linear Congruence Code (BLCC) is
[TABLE]
Proof.
By Lemma III.1, for any -tuple we have
[TABLE]
Let be a solution of the linear congruence . Then we have
[TABLE]
Let and . Note that since is a solution of the linear congruence , we get , for some . Similarly, , for some and .
Therefore,
[TABLE]
Thus,
[TABLE]
[TABLE]
Thus,
[TABLE]
By Lemma II.4,
[TABLE]
Note that if then (and so ), and if then (and so because ). This implies that
[TABLE]
and
[TABLE]
Consequently,
[TABLE]
∎
Setting in (III.2) gives the size of the Binary Linear Congruence Code (BLCC) . Equivalently, it solves Problem II.2 when , that is, it gives an explicit formula for the number of binary solutions of an arbitrary linear congruence.
Corollary III.3**.**
Let , be positive integers, , and . The number of solutions of the linear congruence in is
[TABLE]
where . This implies that
[TABLE]
Proof.
We have
[TABLE]
where . Consequently, we have
[TABLE]
∎
Remark III.4**.**
Recently, Gabrys et al. [20] proposed several variants of the Levenshtein code which are all special cases of our Binary Linear Congruence Code (BLCC) . Theorem III.2 hence provides explicit formulas for the weight enumerators of such codes.
IV Weight enumerators of the aforementioned codes
Using Theorem III.2, we now describe explicit formulas for the weight enumerators (and the sizes) of the Helberg code, the Levenshtein code, and the Varshamov–Tenengolts code. Note that the same approach may be used to derive the weight enumerators of most variants of these codes since they are special cases of Binary Linear Congruence Codes (BLCC) . In addition, we derive a formula for the size of the Shifted Varshamov–Tenengolts code.
IV-A Weight enumerator of the Helberg code
The Helberg code has the same structure as the Binary Linear Congruence Code (BLCC) but with some additional restrictions on the coefficients and the modulus. So, Theorem III.2 immediately gives the following result.
Theorem IV.1**.**
The weight enumerator of the Helberg code is
[TABLE]
As the coefficients in the Helberg code are a modified version of the Fibonacci numbers, it may be possible to connect trigonometric sums as described above with the Fibonacci and Lucas numbers [8], and hence simplify (IV.1).
Corollary IV.2**.**
The size of the Helberg code equals
[TABLE]
where . This implies that
[TABLE]
IV-B Weight enumerator of the Levenshtein code
Theorem III.2 also allows for deriving an explicit formula for the weight enumerator of the Levenshtein code.
Theorem IV.3**.**
The weight enumerator of the Levenshtein code is
[TABLE]
Corollary IV.4**.**
The size of the Levenshtein code equals
[TABLE]
where . This implies that
[TABLE]
IV-C The size of the Shifted Varshamov–Tenengolts code
Next, using Theorem IV.3 once again, we give an explicit formula for the size of the Shifted Varshamov–Tenengolts code . Note that represents the set of codewords in the Levenshtein code with even Hamming weight (when ) or with odd Hamming weight (when ).
Theorem IV.5**.**
If then the size of the Shifted Varshamov–Tenengolts code is
[TABLE]
and if then the size of is
[TABLE]
where ,
[TABLE]
Proof.
To find the number of codewords in the Levenshtein code with even Hamming weight (when ) and with odd Hamming weight (when ), we proceed as follows. If then the size of equals , and if , the size of equals . Invoking Theorem IV.3 proves the claimed result. ∎
IV-D Weight enumerators of VT codes
Using Theorem III.2 we re-derive the formula for the weight enumerator of the Varshamov–Tenengolts code. Due to the special structure of the coefficients int he congruences, our formula simplifies significantly.
We start with the following lemma.
Lemma IV.6**.**
Let be a positive integer and be a non-negative integer. Then, we have
[TABLE]
where .
Proof.
It is well-known that (see, e.g., [45, p. 167])
[TABLE]
Letting , we obtain
[TABLE]
∎
Theorem IV.7**.**
The weight enumerator of the VT code is
[TABLE]
Proof.
Using Theorem III.2 we get
[TABLE]
Therefore,
[TABLE]
Now, using Lemma IV.6 we get
[TABLE]
∎
Based on Theorem IV.7, one can easily obtain the following explicit formula for the general term of the weight distribution of VT codes. This result was recently proved using a different method by Bibak et al. [4] (for a related earlier result, see also [17]).
Theorem IV.8**.**
The number of codewords with Hamming weight in the Varshamov–Tenengolts code equals
[TABLE]
Proof.
The proof reduces to using the binomial theorem to find the coefficient of in the sum of (IV.9). ∎
Corollary IV.9**.**
The size of the VT code equals
[TABLE]
Remark IV.10**.**
Ginzburg [21] in 1967 proved the following explicit formula for the size of the -ary, rather than binary, Varshamov–Tenengolts code , where is an arbitrary positive integer:
[TABLE]
Formula (IV.12) (in fact, a more complicated version of it) was later rediscovered by Stanley and Yoder [46] in 1973. Formula (IV.12) for the binary case was also rediscovered by Sloane [44] in 2002. Bibak et al. [4] derived the binary case formula as a corollary of a general number theory problem.
Remark IV.11**.**
Since for all integers and with one has , from (IV.12) it is clear that the maximum number of codewords in the -ary Varshamov–Tenengolts code is obtained for , that is,
[TABLE]
for all . This result was originally proved by Ginzburg [21].
Remark IV.12**.**
Setting in Formula (IV.11) gives the bound
[TABLE]
On the other hand, by a result of Levenshtein [32], the size of the largest single deletion correcting binary code of length , where is sufficiently large, is roughly . Therefore, as it is well-known, the VT-codes , for sufficiently large , are close to optimal.
Acknowledgements
The authors would like to thank the reviewers for helpful comments that improved the presentation of this paper. This work was supported in part by the Center for Science of Information (CSoI), an NSF Science and Technology Center, under grant agreement CCR-0939370, and by the NSF grant CCF1618366.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] K. A. S. Abdel-Ghaffar, F. Paluncic, H. C. Ferreira, and W. A. Clarke, On Helberg’s generalization of the Levenshtein code for multiple deletion/insertion error correction, IEEE Trans. Inform. Theory 58 (2012), 1804–1808.
- 2[2] K. Bibak, B. M. Kapron, and V. Srinivasan, Counting surface-kernel epimorphisms from a co-compact Fuchsian group to a cyclic group with motivations from string theory and QFT, Nuclear Phys. B 910 (2016), 712–723.
- 3[3] K. Bibak, B. M. Kapron, and V. Srinivasan, MMH ∗ with arbitrary modulus is always almost-universal, Inform. Process. Lett. 116 (2016), 481–483.
- 4[4] K. Bibak, B. M. Kapron, and V. Srinivasan, Unweighted linear congruences with distinct coordinates and the Varshamov–Tenengolts codes, Des. Codes Cryptogr. 86 (2018), 1893–1904.
- 5[5] K. Bibak, B. M. Kapron, V. Srinivasan, R. Tauraso, and L. Tóth, Restricted linear congruences, J. Number Theory 171 (2017), 128–144.
- 6[6] K. Bibak, B. M. Kapron, V. Srinivasan, and L. Tóth, On an almost-universal hash function family with applications to authentication and secrecy codes, Internat. J. Found. Comput. Sci. 29 (2018), 357–375.
- 7[7] K. Bibak and O. Milenkovic, Weight enumerators of some classes of deletion correcting codes, ISIT 2018 , 431–435.
- 8[8] K. Bibak and M. H. Shirdareh Haghighi, Some trigonometric identities involving Fibonacci and Lucas numbers, J. Integer Seq. 12 (2009), Article 09.8.4.
