Generalization of the Ball-Collision Algorithm
Carmelo Interlando, Karan Khathuria, Nicole Rohrer, Joachim Rosenthal,, Violetta Weger

TL;DR
This paper extends the Ball-Collision Algorithm from binary fields to general finite fields, providing a complexity analysis and comparison with other decoding algorithms.
Contribution
It generalizes the Ball-Collision Algorithm to finite fields beyond binary, offering new insights into its complexity and performance.
Findings
Algorithm successfully generalized to finite fields
Complexity analysis provided and compared with existing algorithms
Shows potential advantages in non-binary decoding scenarios
Abstract
In this paper we generalize the Ball-Collision Algorithm by Bernstein, Lange, Peters from the binary field to a general finite field. We also provide a complexity analysis and compare the asymptotic complexity to other generalized information set decoding algorithms.
| Stern | Stern-MO | BJMM-MO | Ball-collision | |
|---|---|---|---|---|
| 2 | ||||
| 3 | ||||
| 4 | ||||
| 5 | ||||
| 7 | ||||
| 8 | ||||
| 11 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Generalization
of the Ball-Collision Algorithm
Carmelo Interlando
Department of Mathematics and Statistics
San Diego State University
San Diego, CA 92182-7720
,
Karan Khathuria
Institute of Mathematics
University of Zurich
Winterthurerstrasse 190
8057 Zurich, Switzerland
,
Nicole Rohrer
Institute of Mathematics
University of Zurich
Winterthurerstrasse 190
8057 Zurich, Switzerland
,
Joachim Rosenthal
Institute of Mathematics
University of Zurich
Winterthurerstrasse 190
8057 Zurich, Switzerland
and
Violetta Weger
Institute of Mathematics
University of Zurich
Winterthurerstrasse 190
8057 Zurich, Switzerland
Abstract.
In this paper we generalize the Ball-Collision Algorithm by Bernstein, Lange, Peters from the binary field to a general finite field. We also provide a complexity analysis and compare the asymptotic complexity to other generalized information set decoding algorithms.
Key words and phrases:
Coding Theory; ISD; Ball-Collision.
2010 Mathematics Subject Classification:
The fourth author is thankful to Swiss National Science Foundation grant number 169510.
1. Introduction
Since 1978 it has been known that decoding a random linear code is an NP-complete problem, this was shown in [7] by Berlekamp, McEliece and van Tilborg. Therefore the interesting task arises of finding the complexity of decoding a random linear code using the best algorithms available. Until today two main methods for decoding have been proposed: information set decoding (ISD) and the generalized birthday algorithm (GBA). The ISD is more efficient if the decoding problem has only a small number of solutions, whereas GBA is efficient when there are many solutions. Also other ideas such as statistical decoding [1], gradient decoding [2] and supercode decoding [5] have been proposed but fail to outperform ISD algorithms. An ISD algorithm is given a corrupted codeword and recovers the message or equivalently finds the error vector. ISD algorithms are often formulated via the parity check matrix, since it is enough to find a vector of a certain weight which has the same syndrome as the corrupted codeword, this problem is also referred to as the syndrome decoding problem. ISD algorithms are based on a decoding algorithm proposed by Prange [29] in 1962 and their structures do not change much from the original: as a first step one chooses an information set, then Gaussian elimination brings the parity check matrix in a standard form and assuming that the errors are outside of the information set, these row operations on the syndrome will exploit the error vector, if the weight does not exceed the given error capacity.
The problem of decoding random linear codes has recently been receiving prominence with the proposal of using code-based public key cryptosystems for an upcoming post-quantum cryptographic public key standard. The idea of using linear codes in public key cryptography was first formulated by Robert McEliece [25]. Since the publication of McEliece a large amount of research has been done and the interested reader will find more information in a recent survey [9].
If the secret code is hidden well enough an adversary who wants to break a code-based cryptosystem encounters the decoding problem of a random linear code. It is therefore of crucial importance to understand the complexity of the best algorithms capable of decoding a general linear code.
The ISD algorithms were often considered when proposing a variant of the McEliece cryptosystem, to find the key size needed for a given security level. ISD algorithms hence do not break a code-based cryptosystem but they determine the choice of secure parameters. Since some of the new proposals (for example [3, 4, 19]) involve codes over general finite fields, having efficient ISD algorithms generalized to is an essential problem.
Bernstein, Lange and Peters found a clever improvement of the ISD algorithm which they called ball-collision decoding [8]. The algorithm of Bernstein et. al. was presented for random binary linear codes. The main contribution of our paper is a generalization of the ball-collision decoding algorithm to arbitrary finite fields.
The paper is structured as follows: in Section 2 we discuss the previous work on ISD algorithms focusing on those which have been generalized to an arbitrary finite field. In Section 3 we describe the ball-collision algorithm over the binary field and the notations and concepts involved in the algorithm. In Section 4 we present the ball-collision algorithm over and in Section 5 we perform the complexity analysis of our algorithm including numerical parameter optimization and asymptotic analysis.
2. Related work
Many improvements have been suggested to Prange’s simplest form of ISD (see for example [10, 12, 13, 14, 20, 22, 31]), they can be split into two types: improvements on the Gaussian elimination step and a more probable and elaborated weight distribution of the error vector. The prior includes the work of Canteaut and Chabaud [11], where they show that the information set should not be taken at random after one unsuccessful iteration, but rather a part of the previous information set should be reused and therefore a part of the Gaussian elimination step is already performed. Whereas Finiasz and Sendrier [15] showed that a complete Gaussian elimination is not necessary, both of these improvements help to bring the cost of the Gaussian elimination step down.
Now we focus on the second type of improvements, which were first proposed for codes over the binary field and then later generalized over an arbitrary finite field. The first improvement of Prange’s ISD was by Lee-Brickell [21] in 1988, where in the information set errors are assumed and outside. In 1993 Stern [30] proposed to partition the information set in to two sets and ask for errors in each part and errors outside the information set. The generalization of both Lee-Brickell and Stern’s algorithm to a general finite field were performed by Peters [28] in 2010.
Niebuhr, Persichetti, Cayrel, Bulygin and Buchmann [27] in 2010 improved the performance of ISD algorithms over based on the idea of Finiasz-Sendrier [15] to allow the errors to overlap in the information set.
In the past 10 years many other improvements were proposed for ISD over . Namely, the ball-collision algorithm by Bernstein, Lange and Peters [8] in 2011, which splits the information set in two sets, having and errors in them and also splits the rest of the positions into three disjoint sets, having and errors respectively. The algorithm’s name comes from a collision check, which builds the most crucial part of the algorithm.
Later in 2011 May, Meurer and Thomae [23] proposed an improvement using the representation technique introduced by Howgrave-Graham and Joux [18]. To this algorithm Becker, Joux, May and Meurer [6] (BJMM) in 2012 introduced further improvements. In the same year Meurer in his dissertation [26] proposed a new generalized ISD algorithm based on these two papers.
In 2015, May-Ozerov [24] used the nearest neighbor algorithm to improve the BJMM version of ISD. In 2016, Hirose [17] generalized the nearest neighbor algorithm over and applied it to the generalized Stern algorithm. Later in 2017, this was applied to generalized BJMM algorithm by Gueye, Klamti and Hirose [16].
In this paper we provide the missing generalization of the ball-collision algorithm. The order of the complexities of ISD algorithms over is consistent also with their generalizations over .
3. Preliminaries
3.1. Notation
We first want to fix some notation: let be a prime power and let be positive integers, such that . We will denote by the identity matrix.
For an matrix and a set of size , we denote by the matrix consisting of the columns of indexed by .
For a set of size , we denote by the vectors in having support in . The projection of to is then canonical and denoted by .
On the other hand we denote by the canonical embedding of a vector into , where is again of size .
For an linear code over we denote by be the parity check matrix of size and by the generator matrix. We denote the Hamming weight of a vector , by . The corrupted codeword is given by , where is the message and is the error vector. The syndrome of is then defined as and coincides with the syndrome of the error vector, since .
3.2. Ball-collision algorithm over the binary field
In what follows we describe the ball-collision algorithm over the binary proposed in [8] by Bernstein, Lange and Peters.
Remark 1*.*
Note that if is already in standard form, then and . In this case and can be written as
[TABLE]
3.3. Concepts
There are a few concepts for computing the complexity of the ball-collision algorithm introduced in [8] that we will use and present beforehand. In general the complexity of an ISD attack consists of the cost of one iteration times the expected number of iterations. The cost in the following refers to operations, i.e. additions or multiplications, over the given field.
The success probability over the binary is usually given by having chosen the correct weight distribution of the error vector. For example let the error vector be of length having weight , now we assume that the error vector has weight in the information set, i.e. in bits and the rest is redundant, then the success probability is given by
[TABLE]
This will not change over , since the algorithm runs through all elements in the finite field having support in those chosen sets.
The concept of intermediate sums is important whenever one wants to compute something for all vectors in a certain space. For example we are given a matrix and want to compute for all , of weight . This would usually cost times additions and multiplications, for each . But if we first compute , where has weight one, this only outputs the corresponding column of and has no cost. From there we can compute the sums of two columns of , there are many of these sums and each one costs additions. From there we can compute all sums of three columns of , which are many and using the sums of two columns we have already computed, means we only need to add one more column costing additions. Proceeding in this way, until one reaches the weight , to compute for all , of weight costs additions, where
[TABLE]
This changes slightly over a general finite field. As a first step one computes for all , of weight . Hence this step is no longer for free, but rather means computing for all , costing multiplications. From there on one computes the sum of two multiple of the columns, there are many and each sum costs additions. Hence proceeding in the same manner the cost turns out to be multiplications and additions,, where
[TABLE]
The next concept called early abort is also important whenever a computation is done while checking the weight of the result. For example one wants to compute , where , which usually costs additions, but we only proceed in the algorithm if . Hence we compute and check the weight simultaneously and if the weight of the partial solution exceeds one does not need to continue. Over the binary one expects a randomly chosen bit to have weight 1 with probability , hence after we should reach the wanted weight , and after we should exceed the weight . Hence on average we expect to compute only many bits of the solution, before we can abort. Over , we expect a randomly chosen bit to have weight 1 with probability , therefore we need to compute many bits before we can abort.
An important step in the ball-collision algorithm is to check for a collision, i.e. if one continues, where again and are living in some sets and respectively. There are many choices for , assuming that they are distributed uniformly over , then on average one expects the number of collisions to be . Similarly over the number of collisions will be
4. Generalization of the Ball-Collision Algorithm
In this section we generalize the ball-collision algorithm over the binary [8] to a general finite field.
The algorithm requires a parity check matrix . Notice that if the generator matrix is published, the easiest way to get is to choose an information set and to compute .
Again, as in the binary case, the idea of the algorithm is to solve instead of , where an invertible is chosen such that UH=\left[\begin{array}[]{cc}A&\mathbf{1}_{n-k}\end{array}\right] and Us=\left[\begin{array}[]{c}s_{1}\\ s_{2}\end{array}\right] with . We are therefore looking for a vector fulfilling
[TABLE]
with . This leads to the following system of equations:
[TABLE]
The algorithm solves the above by finding
[TABLE]
such that
[TABLE]
This last condition is fulfilled by the collision between and in Step 15.
Observe that for the above algorithm is equivalent to the one proposed over the binary. We hence did not change it in its substantial form.
We now want to prove that the ball-collision algorithm over works, i.e. that it returns any vector of the desired form, if it exists. For this we follow the idea of [8].
Theorem 2**.**
The ball-collision algorithm over finds any vector that fulfills and is of the desired form - of weight , with and nonzero entries in and respectively.
Proof.
First, we want to prove, that the output is of the desired form:
- •
is of weight and in ,
- •
is of weight and in ,
- •
is of weight and in ,
- •
is of weight and in ,
- •
and it lies in .
As the above subspaces do not intersect, can be calculated by adding up the weights of each of them. Hence and each of the subspaces has the desired weight distribution by definition.
It remains to prove that . Let us write each of the subspaces and separately.
[TABLE]
And we know that by the collision of and in Step 15.
We now want to prove that the algorithm returns each of the above vectors such that under the assumption, that we worked with a correct partitioning into . We do that by checking whether the algorithm considers all possible combinations and does not exclude any possible solution.
is invertible and hence does not exclude any solution when multiplied to and . In Step 11, where we build the sets and , we go over all the possible sets and , which contain all possible vectors of the desired weight distribution. There are only two steps in the algorithm, where we exclude certain vectors:
- (1)
When we only keep the collisions between and in Step 15. But this is justified as , i.e.
[TABLE]
needs to be satisfied. 2. (2)
When we check whether . But also this is justified as needs this weight to complete the weight of to be .
Hence we consider all possible error vectors that are of the given weight distribution and satisfy . ∎
5. Complexity Analysis
In this section we want to analyze the complexity of the extended ball-collision algorithm over . Since the cost will be given in operations over , we will denote by the multiplications needed and by the amount of additions. Note that one addition over costs bit operations and one multiplication over costs bit operations.
Success Probability of one Iteration
We follow the idea of [8] as the success probability does not depend on the base field, in fact: we have the same success probability over as over , since it only depends on choosing the correct partition of the subspaces. The success probability of one iteration equals the chances that there are error bits in , error bits in and the remaining ones in - all for . If this distribution is assumed correctly, then the algorithm will find the error vector as it goes over all possible combinations of vectors in each of the mentioned subspaces. Hence the iteration succeeds with a probability of
[TABLE]
Cost of one Iteration
In Step 4 of the algorithm, one uses Gaussian elimination to find an invertible matrix , bringing into systematic form, since we will also need to compute we will directly perform Gaussian elimination on the matrix , where we adjoined the vector as a column to . A crude estimate of the cost for this step is .
To build the set we want to use the concept of intermediate sums over described before. Hence to compute , for all we need multiplications and additions. To a fixed , we then add again using intermediate sums this costs additions for each of the , which are many. Hence resulting in a total cost of
[TABLE]
To build the set we proceed similarly, the only difference being that needs to be added to the first step of the intermediate sums over , hence adding a cost of additions. The total cost of this step is hence given by
[TABLE]
In Step 15, when checking for collisions between and , we want to calculate the number of collisions we can expect on average. The elements in and are all of length and hence there is a total of possible elements. has many elements and has many elements, we therefore get that the expected number of collisions is
[TABLE]
For each collision we have, we check whether is satisfied. For this we will use the method of early abort: to compute one bit of the result costs additions and multiplications, hence this step costs on average
[TABLE]
Hence the total cost of one iteration is given by
[TABLE]
Overall cost
Combining the result from (5.1) and (5) the overall cost of the ball-collision algorithm over then amounts to
[TABLE]
5.1. Asymptotic Complexity
In this subsection we want to find the asymptotic complexity of the ball-collision algorithm over .
Fix real numbers and , with
[TABLE]
We consider codes of large length , we fix functions which satisfy and .
We fix real numbers with and
[TABLE]
We fix the parameters of the ball-collision algorithm over such that
- i)
- ii)
- iii)
- iv)
for . We use the convention that , for . In what follows we will use the following asymptotic formula for binomial coefficients:
[TABLE]
With this formula we get the following:
- i)
- ii)
- iii)
- iv)
Success probability
We will denote by the asymptotic exponent of the success probability:
[TABLE]
Cost of one iteration
We will denote by the asymptotic exponent of the cost of one iteration.
[TABLE]
Overall cost
The overall asymptotic cost exponent of the ball-collision algorithm over is given by the difference of and :
[TABLE]
The asymptotic complexity is then given by .
Asymptotically, we assume that the code attains the Gilbert-Varshamov bound, i.e. the code rate and the distance relate via:
[TABLE]
In order to compute the asymptotic complexity of half-distance decoding (i.e. ) for a fixed rate , we performed a numerical optimization of the parameters and such that the overall cost is minimized subject to the following constraints:
[TABLE]
Let be the exponent of the optimized asymptotic complexity. The asymptotic complexity of half-distance decoding at rate over is then given by .
In Table 1, the values refer to the exponent of the worst-case complexity of distinct algorithms, i.e. where . It compares Peter’s generalization of Stern’s algorithm to , Hirose ’s generalization of Stern’s algorithm using May-Ozerov’s nearest neighbor algorithm (MO) to , Gueye et al. generalization of the algorithm of BJMM using MO to and the generalization of the ball-collision algorithm to .
We can observe that the ball-collision algorithm over outperforms Peter’s generalization of Stern’s algorithm to and Hirose’s ISD algorithm over , for all . Like in the binary case, the ball-collision algorithm does not outperform the generalization of Gueye et al. of the BJMM algorithm using MO to .
Acknowledgments
The fourth author is thankful to the Swiss National Science Foundation grant number 169510.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Abdulrahman Al Jabri. A statistical decoding algorithm for general linear block codes. In IMA International Conference on Cryptography and Coding , pages 1–8. Springer, 2001.
- 2[2] Alexei E. Ashikhmin and Alexander Barg. Minimal vectors in linear codes. IEEE Transactions on Information Theory , 44(5):2010–2017, 1998.
- 3[3] Marco Baldi, Marco Bianchi, Franco Chiaraluce, Joachim Rosenthal, and Davide Schipani. Enhanced Public Key Security for the Mc Eliece Cryptosystem. Journal of Cryptology , pages 1–27, 2016.
- 4[4] Gustavo Banegas, Paulo SLM Barreto, Brice Odilon Boidje, Pierre-Louis Cayrel, Gilbert Ndollane Dione, Kris Gaj, Cheikh Thiécoumba Gueye, Richard Haeussler, Jean Belo Klamti, Ousmane N’diaye, et al. Dags: Key encapsulation using dyadic GS codes. Journal of Mathematical Cryptology , 2018.
- 5[5] Alexander Barg, Evgueni Krouk, and Henk CA van Tilborg. On the complexity of minimum distance decoding of long linear codes. IEEE Transactions on Information Theory , 45(5):1392–1405, 1999.
- 6[6] Anja Becker, Antoine Joux, Alexander May, and Alexander Meurer. Decoding random binary linear codes in 2n/20: How 1+ 1= 0 improves information set decoding. In Annual International Conference on the Theory and Applications of Cryptographic Techniques , pages 520–536. Springer, 2012.
- 7[7] Elwyn Berlekamp, Robert Mc Eliece, and Henk Van Tilborg. On the inherent intractability of certain coding problems (corresp.). IEEE Transactions on Information Theory , 24(3):384–386, 1978.
- 8[8] Daniel J Bernstein, Tanja Lange, and Christiane Peters. Smaller decoding exponents: ball-collision decoding. In Annual Cryptology Conference , pages 743–760. Springer, 2011.
