Statistical Decoding

Thomas Debris-Alazard; Jean-Pierre Tillich

arXiv:1701.07416·cs.CR·February 9, 2017

Statistical Decoding

Thomas Debris-Alazard, Jean-Pierre Tillich

PDF

TL;DR

This paper analyzes statistical decoding, a randomized approach for code-based cryptography, providing its asymptotic complexity, efficient computation methods, and bounds, showing it cannot outperform Prange's algorithm at the Gilbert-Varshamov bound.

Contribution

It offers the first detailed complexity analysis of statistical decoding, introduces efficient computation techniques, and establishes lower bounds on its performance.

Findings

01

Provides asymptotic complexity of statistical decoding.

02

Develops efficient methods for computing parity-check equations.

03

Shows statistical decoding cannot outperform Prange's algorithm at the Gilbert-Varshamov bound.

Abstract

The security of code-based cryptography relies primarily on the hardness of generic decoding with linear codes. The best generic decoding algorithms are all improvements of an old algorithm due to Prange: they are known under the name of information set decoding techniques (ISD). A while ago a generic decoding algorithm which does not belong to this family was proposed: statistical decoding. It is a randomized algorithm that requires the computation of a large set of parity-check equations of moderate weight. We solve here several open problems related to this decoding algorithm. We give in particular the asymptotic complexity of this algorithm, give a rather efficient way of computing the parity-check equations needed for it inspired by ISD techniques and give a lower bound on its complexity showing that when it comes to decoding on the Gilbert-Varshamov bound it can never be better…

Figures7

Click any figure to enlarge with its caption.

Equations287

ln (z) = ln ∣ z ∣ + i ar g (z), z \in C ∖ [- \infty, 0],

ln (z) = ln ∣ z ∣ + i ar g (z), z \in C ∖ [- \infty, 0],

q_{1} (e, w, i) = P_{h \sim S_{w, i}} (⟨ e, h ⟩ = 1) \mbox w h e n e_{i} = 1

q_{1} (e, w, i) = P_{h \sim S_{w, i}} (⟨ e, h ⟩ = 1) \mbox w h e n e_{i} = 1

q_{0} (e, w, i) = P_{h \sim S_{w, i}} (⟨ e, h ⟩ = 1) \mbox w h e n e_{i} = 0

q_{0} (e, w, i) = P_{h \sim S_{w, i}} (⟨ e, h ⟩ = 1) \mbox w h e n e_{i} = 0

q_{1} (e, w, i) = \frac{j \mbox e v e n \sum w - 1 ( j t - 1 ) ( w - 1 - j n - t )}{( w - 1 n - 1 )}

q_{1} (e, w, i) = \frac{j \mbox e v e n \sum w - 1 ( j t - 1 ) ( w - 1 - j n - t )}{( w - 1 n - 1 )}

q_{0} (e, w, i) = \frac{j \mbox o dd \sum w - 1 ( j t ) ( w - 1 - j n - t - 1 )}{( w - 1 n - 1 )}

q_{0} (e, w, i) = \frac{j \mbox o dd \sum w - 1 ( j t ) ( w - 1 - j n - t - 1 )}{( w - 1 n - 1 )}

q_{0} = \frac{1}{2} + ε_{0} \mbox; \mbox q_{1} = \frac{1}{2} + ε_{1}

q_{0} = \frac{1}{2} + ε_{0} \mbox; \mbox q_{1} = \frac{1}{2} + ε_{1}

H_{0} \mbox : \mbox e_{i} = 0; H_{1} \mbox : \mbox e_{1} = 1

H_{0} \mbox : \mbox e_{i} = 0; H_{1} \mbox : \mbox e_{1} = 1

V_{m} = k = 1 \sum m sgn (ε_{1} - ε_{0}) \cdot ⟨ y, h^{k} ⟩ \in Z

V_{m} = k = 1 \sum m sgn (ε_{1} - ε_{0}) \cdot ⟨ y, h^{k} ⟩ \in Z

E_{l} = m sgn (ε_{1} - ε_{0}) (1/2 + ε_{l})

E_{l} = m sgn (ε_{1} - ε_{0}) (1/2 + ε_{l})

\forall t \geq 0, P (∣ Z_{m} - m p ∣ \geq m δ) \leq 2 e^{- 2 m δ^{2}}

\forall t \geq 0, P (∣ Z_{m} - m p ∣ \geq m δ) \leq 2 e^{- 2 m δ^{2}}

P (∣ V_{m} - m sgn (ε_{1} - ε_{0}) \cdot (1/2 + ε_{l}) ∣ \geq m \cdot \frac{∣ ε _{1} - ε _{0} ∣}{2}) \leq 2 \cdot 2^{- m \cdot \frac{( ε _{1} - ε _{0} ) ^{2}}{2 l n ( 2 )}}

P (∣ V_{m} - m sgn (ε_{1} - ε_{0}) \cdot (1/2 + ε_{l}) ∣ \geq m \cdot \frac{∣ ε _{1} - ε _{0} ∣}{2}) \leq 2 \cdot 2^{- m \cdot \frac{( ε _{1} - ε _{0} ) ^{2}}{2 l n ( 2 )}}

\frac{E _{1} + E _{0}}{2} = \frac{m}{2} sgn (ε_{1} - ε_{0}) (1 + ε_{1} + ε_{0})

\frac{E _{1} + E _{0}}{2} = \frac{m}{2} sgn (ε_{1} - ε_{0}) (1 + ε_{1} + ε_{0})

π (ω, τ) = △ n \to + \infty \underline{lim} \frac{1}{n} lo g_{2} P_{w}

π (ω, τ) = △ n \to + \infty \underline{lim} \frac{1}{n} lo g_{2} P_{w}

\pi^{complete}(\omega,\tau)\mathop{=}\limits^{\triangle}\varliminf_{n\to+\infty}\frac{1}{n}\max\Big{(}\log_{2}P_{w},\log_{2}|\text{{{ParityCheckComputation}}}_{w}|\Big{)}.

\pi^{complete}(\omega,\tau)\mathop{=}\limits^{\triangle}\varliminf_{n\to+\infty}\frac{1}{n}\max\Big{(}\log_{2}P_{w},\log_{2}|\text{{{ParityCheckComputation}}}_{w}|\Big{)}.

p_{v}^{m} (X) = \frac{( - 1 ) ^{v}}{2 ^{v}} j = 0 \sum v (- 1)^{j} (j X) (v - j m - X) \mbox w h er e, (j X) = \frac{1}{j !} (X (X - 1) \dots (X - j + 1))

p_{v}^{m} (X) = \frac{( - 1 ) ^{v}}{2 ^{v}} j = 0 \sum v (- 1)^{j} (j X) (v - j m - X) \mbox w h er e, (j X) = \frac{1}{j !} (X (X - 1) \dots (X - j + 1))

- \frac{( - 2 ) ^{w - 2}}{( w - 1 n - 1 )} p_{w - 1}^{n - 1} (t - 1)

- \frac{( - 2 ) ^{w - 2}}{( w - 1 n - 1 )} p_{w - 1}^{n - 1} (t - 1)

\frac{( - 2 ) ^{w - 2}}{( w - 1 n - 1 )} p_{w - 1}^{n - 1} (t)

\frac{( - 2 ) ^{w - 2}}{( w - 1 n - 1 )} p_{w - 1}^{n - 1} (t)

p (z) = lo g_{2} z - \frac{σ}{ν} lo g (1 + z) - (α - \frac{σ}{ν}) lo g_{2} (1 - z) .

p (z) = lo g_{2} z - \frac{σ}{ν} lo g (1 + z) - (α - \frac{σ}{ν}) lo g_{2} (1 - z) .

p_{v}^{m} (s) = Q_{σ, ν} (v) 2^{- (p (x_{1}) + 1) v}

p_{v}^{m} (s) = Q_{σ, ν} (v) 2^{- (p (x_{1}) + 1) v}

p_{v}^{m} (s) = R_{σ, ν} (v) ℑ (\frac{2 ^{- (p (x_{1}) + 1) v}}{x _{1} 2 p " ( x _{1} )} (1 + δ (v)))

p_{v}^{m} (s) = R_{σ, ν} (v) ℑ (\frac{2 ^{- (p (x_{1}) + 1) v}}{x _{1} 2 p " ( x _{1} )} (1 + δ (v)))

m

m

v

ν

α

σ_{0}

σ_{1}

p_{i} (z)

p_{i} (z)

Δ_{i}

D_{i}

z_{i}

\frac{ε _{0}}{ε _{1}} = - \frac{1 + z _{1}}{1 - z _{1}} (1 + O (w^{- 1/2}) .

\frac{ε _{0}}{ε _{1}} = - \frac{1 + z _{1}}{1 - z _{1}} (1 + O (w^{- 1/2}) .

\frac{ε _{0}}{ε _{1}} = - \frac{p _{w - 1}^{n - 1} ( t )}{p _{w - 1}^{n - 1} ( t - 1 )}

\frac{ε _{0}}{ε _{1}} = - \frac{p _{w - 1}^{n - 1} ( t )}{p _{w - 1}^{n - 1} ( t - 1 )}

\frac{ε _{0}}{ε _{1}}

\frac{ε _{0}}{ε _{1}}

\frac{Q _{σ_{0}, ν} ( v )}{Q _{σ_{1}, ν} ( v )} = 1 + O (v^{- 1/2}) .

\frac{Q _{σ_{0}, ν} ( v )}{Q _{σ_{1}, ν} ( v )} = 1 + O (v^{- 1/2}) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Statistical Decoding

Thomas Debris-Alazard ∗

Jean-Pierre Tillich 111Inria, SECRET Project, 2 Rue Simone Iff 75012 Paris Cedex, France, Email: {thomas.debris,jean-pierre.tillich}@inria.fr. Part of this work was supported by the Commission of the European Communities through the Horizon 2020 program under project number 645622 PQCRYPTO.

Abstract

The security of code-based cryptography relies primarily on the hardness of generic decoding with linear codes. The best generic decoding algorithms are all improvements of an old algorithm due to Prange: they are known under the name of information set decoding techniques (ISD). A while ago a generic decoding algorithm which does not belong to this family was proposed: statistical decoding. It is a randomized algorithm that requires the computation of a large set of parity-check equations of moderate weight. We solve here several open problems related to this decoding algorithm. We give in particular the asymptotic complexity of this algorithm, give a rather efficient way of computing the parity-check equations needed for it inspired by ISD techniques and give a lower bound on its complexity showing that when it comes to decoding on the Gilbert-Varshamov bound it can never be better than Prange’s algorithm.

1 Introduction

Code-based cryptography relies crucially on the hardness of decoding generic linear codes. This problem has been studied for a long time and despite many efforts on this issue [Pra62, Ste88, Dum91, Bar97, MMT11, BJMM12, MO15] the best algorithms for solving this problem [BJMM12, MO15] are exponential in the number of errors that have to be corrected: correcting $t$ errors in a binary linear code of length $n$ has with the aforementioned algorithms a cost of $2^{ct(1+o(1))}$ where $c$ is a constant depending of the code rate $R$ and the algorithm. All the efforts that have been spent on this problem have only managed to decrease slightly this exponent $c$ . Let us emphasize that this exponent is the key for estimating the security level of any code-based cryptosystem.

All the aforementioned algorithms can be viewed as a refinement of the original Prange algorithm [Pra62] and are actually all referred to as ISD algorithms. There is however an algorithm that does not rely at all on Prange’s idea and does not belong to the ISD family: statistical decoding proposed first by Al Jabri in [Jab01] and improved a little bit by Overbeck in [Ove06]. Later on, [FKI07] proposed an iterative version of this algorithm. It is essentially a two-stage algorithm, the first step consisting in computing an exponentially large number of parity-check equations of the smallest possible weight $w$ , and then from these parity-check equations the error is recovered by some kind of majority voting based on these parity-check equations.

However, even if the study made by R. Overbeck in [Ove06] lead to the conclusion that this algorithm did not allow better attacks on the cryptosystems he considered, he did not propose an asymptotic formula of its complexity that would have allowed to conduct a systematic study of the performances of this algorithm. Such an asymptotic formula has been proposed in [FKI07] through a simplified analysis of statistical decoding, but as we will see this analysis does not capture accurately the complexity of statistical decoding. Moreover both papers did not assess in general the complexity of the first step of the algorithm which consists in computing a large set of parity-check equations of moderate weight.

The primary purpose of this paper is to clarify this matter by giving three results. First, we give a rigorous asymptotic study of the exponent $c$ of statistical decoding by relying on asymptotic formulas for Krawtchouk polynomials [IS98]. The number of equations which are needed for this method turns out to be remarkably simple for a large set of parameters. In Theorem 2 we prove that the number of parity check equations of weight $\omega n$ that are needed in a code of length $n$ to decode $\tau n$ errors is of order $O(2^{n(H(\omega)+H(\tau)-1)})$ (when we ignore polynomial factors) and this as soon as $\omega\geq\frac{1}{2}-\sqrt{\tau-\tau^{2}}$ . For instance, when we consider the hardest instances of the decoding problem which correspond to the case where the number of errors is equal to the Gilbert-Varshamov bound, then essentially our results indicate that we have to take all possible parity-checks of a given weight (when the code is assumed to be random) to perform statistical decoding. This asymptotic study also allows to conclude that the modeling of iterative statistical decoding made in [FKI07] is too optimistic. Second, inspired by ISD techniques, we propose a rather efficient method for computing a huge set of parity-check equations of rather low weight. Finally, we give a lower bound on the complexity of this algorithm that shows that it can not improve upon Prange’s algorithm for the hardest instances of decoding.

This lower bound follows by observing that the number $P_{w}$ of the parity-check equations of weight $w$ that are needed for the second step of the algorithm is clearly a lower-bound on the complexity of statistical decoding. What we actually prove in the last part of the paper is that irrelevant of the way we obtain these parity-check equations in the first step, the lower bound on the complexity of statistical decoding coming from the infimum of these $P_{w}$ ’s is always larger than the complexity of the Prange algorithm for the hardest instances of decoding.

2 Notation

As our study will be asymptotic, we neglect polynomial factors and use the following notation:

Notation 1.

Let $f,g:\mathbb{N}\rightarrow\mathbb{R}$ , we write $f=\tilde{O}(g)$ iff there exists a polynomial $P$ such that $f=O(Pg)$ .

Moreover, we will often use the classical result $\binom{n}{w}=\tilde{O}\left(2^{nH\left(\frac{w}{n}\right)}\right)$ where $H$ denotes the binary entropy. We will also have to deal with complex numbers and follow the convention of the article [IS98] we use here: i is the imaginary unit satisfying the equation ${\text{\bf i}}^{2}=-1$ , $\Re(z)$ is the real part of the complex number $z$ and we choose the branch of the complex logarithm with

[TABLE]

and $\arg(z)\in[-\pi,\pi)$ .

3 Statistical Decoding

In the whole paper we consider the computational decoding problem which we define as follows:

Problem 1.

Given a binary linear code of length $n$ of rate $R$ , a word $y\in\mathbb{F}_{2}^{n}$ at distance $t$ from the code, find a codeword $x$ such that $d_{H}(x,y)=t$ where $d_{H}$ denotes the Hamming distance.

Generally we will specify the code by an arbitrary generator matrix $G$ and we will denote by CSD $(G,t,y)$ a specific instance of this problem. We will be interested as is standard in cryptography in the case where $G\in\mathbb{F}_{2}^{Rn\times n}$ is supposed to be random.

The idea behind statistical decoding may be described as follows. We first compute a very large set ${\mathscr{S}}$ of parity-check equations of some weight $w$ and compute all scalar products $\langle y,h\rangle$ (scalar product is modulo $2$ ) for $h\in{\mathscr{S}}$ . It turns out that if we consider only the parity-checks involving a given code position $i$ the scalar products have a probability of being equal to $1$ which depends whether there is an error in this position or not. Therefore counting the number of times when $\langle y,h\rangle=1$ allows to recover the error in this position.

Let us analyze now this algorithm more precisely. To make this analysis tractable we will need to make a few simplifying assumptions. The first one we make is the same as the one made by R. Overbeck in [Ove06], namely that

Assumption 1.

The distribution of the $\langle y,h\rangle$ ’s when $h$ is drawn uniformly at random from the dual codewords of weight $w$ is approximated by the distribution of $\langle y,h\rangle$ when $h$ is drawn uniformly at random among the words of weight $w$ .

A much simpler model is given in [FKI07] and is based on modeling the distribution of the ${\langle y,h\rangle}$ ’s as the distribution of ${\langle y,h\rangle}$ where the coordinates of $h$ are i.i.d. and distributed as a Bernoulli variable of parameter $w/n$ . This presents the advantage of making the analysis of statistical decoding much simpler and allows to analyze more refined versions of statistical decoding. However as we will show, this is an oversimplification and leads to an over-optimistic estimation of the complexity of statistical decoding. The following notation will be useful.

Notation 2.

$\cdot$ $S_{w}\mathop{=}\limits^{\triangle}\{x\in\mathbb{F}_{2}^{n}:w_{H}(x)=w\}$ denotes the set of binary of words of length $n$ of weight $w$ ;

$\cdot$ $S_{w,i}\mathop{=}\limits^{\triangle}\{x\in S_{w}:x_{i}=1\}$ ;

$\cdot$ ${\mathscr{H}}_{w}\mathop{=}\limits^{\triangle}{\mathscr{C}}^{\perp}\cap S_{w}$ ;

$\cdot$ ${\mathscr{H}}_{w,i}\mathop{=}\limits^{\triangle}{\mathscr{C}}^{\perp}\cap S_{w,i}$ ;

$\cdot$ $X\sim\mathscr{B}(p)$ means that $X$ follows a Bernoulli law of parameter $p$ ;

$\cdot$ $h\sim S_{w,i}$ means we pick $h$ uniformly at random in $S_{w,i}$ .

3.1 Bias in the parity-check sum distribution

We start the analysis of statistical decoding by computing the following probabilities which approximate the true probabilities we are interested in (which correspond to choosing $h$ uniformly at random in ${\mathscr{H}}_{w,i}$ and not in $S_{w,i}$ ) under Assumption 1

[TABLE]

These probabilities are readily seen to be equal to

[TABLE]

They are independent of the error and the position $i$ . So, in the following we will use the notation $q_{1}$ and $q_{0}$ . We will define the biases $\varepsilon_{0}$ and $\varepsilon_{1}$ of statistical decoding by

[TABLE]

It will turn out, and this is essential, that $\varepsilon_{0}\neq\varepsilon_{1}$ . We can use these biases “as a distinguisher”. They are at the heart of statistical decoding. Statistical decoding is nothing but a statistical hypothesis testing algorithm distinguishing between two hypotheses :

[TABLE]

based on computing the random variable $V_{m}$ for $m$ uniform and independent draws of vectors in ${\mathscr{H}}_{w,i}$ :

[TABLE]

We have $\langle y,h^{k}\rangle\sim\mathscr{B}(1/2+\varepsilon_{l})$ according to $\mathscr{H}_{l}$ . So the expectation of $V_{m}$ is given under $\mathscr{H}_{l}$ by:

[TABLE]

We point out that we have $E_{1}>E_{0}$ regardless of the term $\operatorname{sgn}(\varepsilon_{1}-\varepsilon_{0})$ . In order to apply the following proposition, we make the following assumption:

Assumption 2.

$\langle y,h^{k}\rangle$ are independent variables.

Proposition 1 (Chernoff’s Bound).

Let $0<p<1$ , $Y_{1},\cdots,Y_{m}$ i.i.d $\sim\mathscr{B}(p)$ and we set $Z_{m}=\sum_{k=1}^{m}Y_{k}$ . Then,

[TABLE]

Consequences: Under $\mathscr{H}_{l}$ , we have

[TABLE]

To take our decision we proceed as follows: if $V_{m}<\frac{E_{0}+E_{1}}{2}$ where

[TABLE]

we choose $\mathscr{H}_{0}$ and $\mathscr{H}_{1}$ if not. For the cases of interest to us (namely $w$ and $t$ linear in $n$ ) the bias $\varepsilon_{1}-\varepsilon_{0}$ is an exponentially small function of the codelength $n$ and it is obviously enough to choose $m$ to be of order $O\left(\frac{\log n}{(\varepsilon_{1}-\varepsilon_{0})^{2}}\right)$ to be able to make the good decisions on all $n$ positions simultaneously.

On the optimality of the decision. All the arguments used for distinguishing both hypotheses are very crude and this raises the question whether a better test exists. It turns out that in the regime of interest to us, namely $t$ and $w$ linear in $n$ , the term $\tilde{O}\left(\frac{1}{(\varepsilon_{1}-\varepsilon_{0})^{2}}\right)$ is of the right order. Indeed our statistical test amounts actually to the Neymann-Pearson test (with a threshold in this case which is not necessarily in the middle, i.e. equal to $m\frac{1+\varepsilon_{0}+\varepsilon_{1}}{2}$ ). In the case of interest to us, the bias between both distributions $\varepsilon_{1}-\varepsilon_{0}$ is exponentially small in $n$ and Chernoff’s bound captures accurately the large deviations of the random variable $V_{m}$ . Now we could wonder whether using some finer knowledge about the hypotheses ${\mathscr{H}}_{0}$ and ${\mathscr{H}}_{1}$ could do better. For instance we know the a priori probabilities of these hypotheses since $\mathbb{P}(e_{i}=1)=\frac{t}{n}$ . It can be readily verified that using Bayesian hypothesis testing based on the a priori knowledge of the a priori probabilities of both hypotheses does not allow to change the order of number of tests which is still $\tilde{O}\left(\frac{1}{(\varepsilon_{1}-\varepsilon_{0})^{2}}\right)$ when $t$ and $w$ are linear in $n$ .

3.2 The statistical decoding algorithm

Statistical decoding is a randomized algorithm which uses the previous distinguisher. As we just noted, this distinguisher needs $\tilde{O}\left(\frac{1}{(\varepsilon_{1}-\varepsilon_{0})^{2}}\right)$ parity-check equations of weight $w$ to work. This number obviously depends on $w,R$ and $t$ and we use the notation:

Notation 3.

$P_{w}\mathop{=}\limits^{\triangle}\frac{1}{(\varepsilon_{1}-\varepsilon_{0})^{2}}$ .

Now we have two frameworks to present statistical decoding. We can consider the computation of $\tilde{O}(P_{w})$ parity-check equations as a pre-computation or to consider it as a part of the algorithm. To consider the case of pre-computation, simply remove Line $4$ of Algorithm 1 and consider the ${\mathscr{S}}_{i}$ ’s as an additional input to the algorithm. ParityCheckComputationw will denote an algorithm which for an input $G,i$ outputs $\tilde{O}(P_{w})$ vectors of ${\mathscr{H}}_{w,i}$ .

Clearly statistical decoding complexity is given by

•

When the ${\mathscr{S}}_{i}$ ’s are already stored and computed: $\tilde{O}\left(P_{w}\right)$ ;

•

When the ${\mathscr{S}}_{i}$ ’s have to be computed: $\tilde{O}\Big{(}P_{w}+|\emph{{PC}$ {}{w} $}|\Big{)}$ where $|\emph{{PC}$ {}{w} $}|$ stands for the complexity of the call ParityCheckComputationw.

As explained in introduction, our goal is to give the asymptotic complexity of statistical decoding. We introduce for this purpose the following notations:

Notation 4.

$\cdot$ $\omega\mathop{=}\limits^{\triangle}\frac{w}{n}$ ;

$\cdot$ $\tau\mathop{=}\limits^{\triangle}\frac{t}{n}$ .

The two following quantities will be the central object of our study.

Definition 1 (Asymptotic complexity of statistical decoding).

We define the asymptotic complexity of statistical decoding when the ${\mathscr{S}}_{i}$ ’s are already computed by

[TABLE]

whereas the asymptotic complexity of the complete algorithm of statistical decoding (including the computation of the parity-check equations) is defined by

[TABLE]

Remark 1.

One could wonder why these quantities are defined as infimum limits and not directly as limits. This is due to the fact that in certain regions of the error weight and parity-check weights the asymptotic bias may from time to time become much smaller than it typically is. This bias is indeed proportional to values taken by a Krawtchouk polynomial and for certain errors weights and parity-check weights we may be close to the zero of the relevant Krawtchouk polynomial (this corresponds to the second case of Theorem 1).

We are looking for explicit formulas for $\pi(\omega,\tau)$ and $\pi^{complete}(\omega,\tau)$ . The second quantity depends on the algorithm which is used. We will come back to this issue in Subsection 7.1. For our purpose we will use Krawtchouk polynomials and asymptotic expansions for them coming from [IS98]. Let $m$ be a positive integer, we recall that the Krawtchouk polynomial of degree $v$ and order $m$ , $p_{v}^{m}(X)$ is defined for $v\in\{0,\cdots,m\}$ by:

[TABLE]

These Krawtchouk polynomials are readily related to our biases. We can namely observe that $\sum_{j=0}^{w-1}\binom{t-1}{j}\binom{n-t}{w-1-j}=\binom{n-1}{w-1}$ to recast the following evaluation of a Krawtchouk polynomial as

[TABLE]

We have a similar computation for $\varepsilon_{0}$

[TABLE]

Let us recall Theorem 3.1 in [IS98].

Theorem 1 ([IS98, Th. 3.1]).

Let $m,v$ and $s$ be three positive integers. We set $\nu\mathop{=}\limits^{\triangle}\frac{v}{m},\alpha\mathop{=}\limits^{\triangle}\frac{1}{\nu}$ and $\sigma=\frac{s}{m}$ . We assume $\alpha\geq 2$ . Let

[TABLE]

$p^{\prime}(z)=0$ * has two solutions $x_{1}$ and $x_{2}$ which are the two roots of the equation $(\alpha-1)X^{2}+(\alpha-2\frac{\sigma}{\nu})X+1=0$ . Let $D\mathop{=}\limits^{\triangle}\left(\alpha-2\frac{\sigma}{\nu}\right)^{2}-4(\alpha-1)$ and $\Delta\mathop{=}\limits^{\triangle}\alpha-\frac{2\sigma}{\nu}$ . The two roots are equal to $\frac{-\Delta\pm\sqrt{D}}{2(\alpha-1)}$ and $x_{1}$ is defined to be root $\frac{-\Delta+\sqrt{D}}{2(\alpha-1)}$ . There are two cases to consider*

•

In the case $\frac{\sigma}{\nu}\in(0,\alpha/2-\sqrt{\alpha-1})$ , $D$ is positive, $x_{1}$ is a real negative number and we can write

[TABLE]

where $Q_{\sigma,\nu}(v)\mathop{=}\limits^{\triangle}-\sqrt{\frac{1-r^{2}}{2\pi rDv}}(1+O(v^{-1/2}))$ and $r\mathop{=}\limits^{\triangle}-x_{1}$ .

•

In the case $\frac{\sigma}{\nu}\in(\alpha/2-\sqrt{\alpha-1},\alpha/2)$ , $D$ is negative, $x_{1}$ is a complex number and we have

[TABLE]

where $\Im(z)$ denotes the imaginary part of the complex number $z$ , $\delta(v)$ denotes a function which is $o(1)$ uniformly in $v$ , and $R_{\sigma,\nu}(v)\mathop{=}\limits^{\triangle}\frac{1+O(v^{-1/2})}{\sqrt{\pi v}}$ .

The asymptotic formulas hold uniformly on the compact subsets of the corresponding open intervals.

Remark 1.

Note that strictly speaking (3) is incorrectly stated in [IS98, Th. 3.1]. The problem is that (3.20) is incorrect in [IS98], since both $p"(-r_{1})$ and $p^{(3)}(-r_{1})$ are negative and taking a square root of these expressions leads to a purely imaginary number in (3.20). This can be easily fixed since the expression which is just above (3.20) is correct and it just remains to take the imaginary part correctly to derive (3).

It will be helpful to use the following notation from now on.

Notation 5.

[TABLE]

and for $i\in\{0,1\}$ we define the following quantities

[TABLE]

We are now going use these asymptotic expansions to derive explicit formulas for $\pi(\omega,\tau)$ . We start with the following lemma.

Lemma 2.

*With the hypothesis of Proposition just above, we have *

[TABLE]

Proof.

From (1) and (2) we have

[TABLE]

By using Theorem 1 we obtain when plugging the asymptotic expansions of the Krawtchouk polynomials into (5)

[TABLE]

We clearly have $\sigma_{1}=\sigma_{0}-\frac{1}{m}$ and $z_{1}=z_{0}+O\left(\frac{1}{m}\right)$ and therefore from the particular form of $Q_{\sigma_{i},\nu}(v)$ we deduce that

[TABLE]

We observe now that

[TABLE]

and therefore

[TABLE]

It is insightful to express the term $\log_{2}\frac{z_{1}}{z_{0}}-\frac{\sigma_{0}}{\nu}\log_{2}\frac{1+z_{1}}{1+z_{0}}-(\alpha-\frac{\sigma_{0}}{\nu})\log_{2}\frac{1-z_{1}}{1-z_{0}}$ as

[TABLE]

The point is that $p_{0}^{\prime}(z_{0})=0$ and $z_{1}=z_{0}+\delta$ where $\delta=O(1/m)$ . Therefore

[TABLE]

Using this in (10) and then in (6) implies the lemma. ∎

From this lemma we can deduce that

Lemma 3.

Assume $\alpha\geq 2$ and $\frac{\sigma_{i}}{\nu}\in(0,\alpha/2-\sqrt{\alpha-1})$ for $i\in\{0,1\}$ . We have

[TABLE]

Proof.

We have

[TABLE]

where we used in (3.2)

[TABLE]

∎

The second case corresponding to $\frac{\sigma_{i}}{\omega}\in(\alpha/2-\sqrt{\alpha-1},\alpha/2)$ is handled by the following lemma (note that it is precisely the “sin” term that appears in it that lead us to define $\pi(\omega,\tau)$ as an infimum limit and not as a limit)

Lemma 4.

When $\frac{\sigma_{i}}{\omega}\in(\alpha/2-\sqrt{\alpha-1},\alpha/2)$ for $i\in\{0,1\}$ we have

[TABLE]

where $\theta\mathop{=}\limits^{\triangle}\arg\left(2^{-p_{0}(z_{0})}\right)$ and $\theta_{0}\mathop{=}\limits^{\triangle}\arg\left((z_{0}-z_{0}^{2})\sqrt{p_{0}"(z_{0})}\right)$ .

Proof.

The proof of this lemma is very similar to the proof of Lemma 2. From (1) and (2) we have

[TABLE]

By plugging the asymptotic expansion of Krawtchouk polynomials given in Theorem 1 into (13) we obtain

[TABLE]

where the $\delta_{i}$ ’s are functions which are of order $o(1)$ uniformly in $v$ .

We clearly have $\sigma_{1}=\sigma_{0}-\frac{1}{m}$ and $z_{1}=z_{0}+O\left(\frac{1}{m}\right)$ and therefore from the particular form of $R_{\sigma_{i},\nu}(v)$ we deduce that

[TABLE]

From this we deduce that

[TABLE]

We now observe that

[TABLE]

where (16) follows from the observation

[TABLE]

Recall that $z_{1}=z_{0}+\delta$ where $\delta=O(1/m)$ and that

[TABLE]

The point is that $p_{0}^{\prime}(z_{0})=0$ and therefore

[TABLE]

Using this in (16) and then multiply by $v$ implies

[TABLE]

We can substitute for this expression in (14) and obtain

[TABLE]

Recall that

[TABLE]

By using this in (18) we obtain

[TABLE]

∎

From Lemmas 3 and 4 we deduce immediately that

Corollary 5.

We set $\gamma=\frac{1}{\omega}$ ,

•

If $\frac{\tau}{\omega}\in(0,\gamma/2-\sqrt{\gamma-1})$ :

[TABLE]

•

If $\frac{\tau}{\omega}\in(\gamma/2-\sqrt{\gamma-1},\gamma/2)$ :

[TABLE]

Remark 2.

These asymptotic formulas turn out to be already accurate in the "cryptographic range" as it is shown in Figure 1.

Amazingly enough these formulas can be simplified a lot in the second case of the corollary as shown by the following theorem.

Theorem 2 (Asymptotic complexity of statistical decoding).

**

•

If $\tau\in\left(0,\frac{1}{2}-\sqrt{\omega-\omega^{2}}\right)$ : $\pi(\omega,\tau)=2\omega\log_{2}(r)-2\tau\log_{2}(1-r)-2(1-\tau)\log_{2}(1+r)+2H(\omega)$ where $r$ is the smallest root of $(1-\omega)X^{2}-(1-2\tau)X+\omega=0$ .

•

If $\tau\in\left(\frac{1}{2}-\sqrt{\omega-\omega^{2}},\frac{1}{2}\right)$ : $\pi(\omega,\tau)=H(\omega)+H(\tau)-1.$

Proof.

The first case is just a slight rewriting. To prove the formula corresponding to the second case let us recall that the $z$ that appears in the second case of Corollary 5 satisfies $p^{\prime}(z)=0$ where

[TABLE]

Let

[TABLE]

Let us first differentiate this expression with respect to $\omega$ :

[TABLE]

Since $z=re^{{\text{\bf i}}\varphi}$ with $r=\frac{1}{\sqrt{\gamma-1}}$ , we deduce that

[TABLE]

Substituting this expression for $2\Re(\log_{2}(z))$ in (21) yields

[TABLE]

We continue the proof by differentiating now $f(\omega,\tau)$ with respect to $\tau$ :

[TABLE]

Recall that $z$ is also given by one of the two roots of $(1-\omega)X^{2}+(1-2\tau)X+\omega=0$ (see Theorem 1 for the root which is actually chosen) and therefore

[TABLE]

From this we deduce that

[TABLE]

These two results on the derivative imply that

[TABLE]

for some constant $C$ which is easily seen to be equal to $-1$ by letting $\omega$ go to [math] and $\tau$ go to $\frac{1}{2}$ in $f(\omega,\tau)$ .

∎

4 The binomial model

[FKI07] introduced another model for the parity-check equations used in statistical decoding. Instead of assuming that they are chosen randomly of a given weight $w$ , the authors of [FKI07] assume that they are random binary words of length $n$ where the entries are chosen independently of each other according to a Bernoulli distribution of parameter $w/n$ . In other words, the expected weight is still $w$ but the weight of the parity-check equation is not fixed anymore and may vary. We will call it the binomial model of weight $w$ and length $n$ and refer to our model as the constant weight model of weight $w$ . The binomial model presents the advantage of simplifying significantly the analysis of statistical decoding. It is easy to analyze the simple statistical decoding algorithm that we consider here and to compute asymptotically the number of parity-check equations that ensure successful decoding. We will do this in what follows. But the authors of [FKI07] went further since they were even able to analyze asymptotically an iterative version of statistical decoding by following some of the ideas of [SV04]. They showed that

Proposition 6 ([FKI07, Proposition 2.1 p.405]).

In the binomial model of weight $w$ and length $n$ , the number of check sums that are necessary to correct with large enough probability $t$ errors by using the iterative decoding algorithm of [FKI07] is well estimated by $O(J_{\text{min}})$ with

[TABLE]

where the constant in the “big O” depends on the ratio $t/n$ .

Let us first show that naive statistical decoding performs almost as well when we forget about polynomial factors. It makes sense in order to compare both models to introduce some additional notation.

[TABLE]

where $h$ is a parity-check equation chosen according to the binomial model and the probability is taken over the random choice of $h$ in this model (and $\mathbb{P}^{\text{bin}}$ means that we take the probabilities according to the binomial model). These quantities do not depend on $i$ . It will also be convenient to define $\varepsilon^{\text{bin}}_{0}$ and $\varepsilon^{\text{bin}}_{0}$ as

[TABLE]

The computations of [FKI07, Sec II. B] show that

[TABLE]

This implies that

[TABLE]

It is also convenient in order to distinguish both models to rename the quantities $q_{0}$ , $q_{1}$ , $\varepsilon_{0}$ and $\varepsilon_{1}$ that were introduced before by referring to them as $q^{\text{con}}_{0}$ , $q^{\text{con}}_{1}$ , $\varepsilon^{\text{con}}_{0}$ and $\varepsilon^{\text{con}}_{1}$ respectively. We can perform the same statistical test as before by computing from $m$ parity-check equations $h^{1},\dots,h^{m}$ all involving the bit $i$ we want to decode, the quantity

[TABLE]

The expectation of this quantity is $E_{b}\mathop{=}\limits^{\triangle}m\left(\frac{1}{2}+\varepsilon^{\text{bin}}_{b}\right)$ depending on the value $b\in\{0,1\}$ of the bit we want to decode. We decide that the bit we want to decode is equal to [math] if $V_{m}<\frac{E_{0}+E_{1}}{2}$ and $1$ otherwise. As before, we observe that by Chernoff’s bound we make a wrong decision with probability at most $2\cdot 2^{-m\frac{(\varepsilon^{\text{bin}}_{1}-\varepsilon^{\text{bin}}_{0})^{2}}{2\ln(2)}}$ . This probability can be made to be of order $o(1/n)$ by choosing $m$ as $m=K\log n\frac{1}{(\varepsilon^{\text{bin}}_{1}-\varepsilon^{\text{bin}}_{0})^{2}}$ for a suitable constant $K$ . In this case, decoding the whole sequence succeeds with probability $1-o(1)$ . In other words, naive statistical decoding succeeds for $m=O\left(\log n\frac{1}{(\varepsilon^{\text{bin}}_{1}-\varepsilon^{\text{bin}}_{0})^{2}}\right)$ .

We may observe now that

[TABLE]

This means that naive statistical decoding needs only marginally more equations in the binomial model (namely a multiplicative factor of order $O(\log n)$ ). To summarize the whole discussion, the number of parity-checks needed for decoding is

•

with iterative statistical decoding over the binomial model

[TABLE]

•

with naive statistical decoding over the binomial model

[TABLE]

•

with naive statistical decoding over the constant weight model

[TABLE]

One might wonder now whether there is a difference between both models. It is very tempting to conjecture that both models are very close to each other since the expected weight of the parity-checks is $w$ in both cases. However this is not the case, we are really in a large deviation situation where the bias of some extreme weights take over the bias corresponding to the typical weight of the parity check equations. To illustrate this point, we choose the weight to be $w=\omega n$ , the number of errors as $t=\tau n$ for some fixed $\omega$ and $\tau$ , and then let $n$ go to infinity. The normalized exponent222Here the number of equations is a function of the form $\tilde{O}\left(e^{\alpha(\tau,\omega)n}\right)$ and we mean here the coefficient $\alpha(\omega,\tau)$ . of the number of parity-check equations which is needed is

[TABLE]

in the binomial case, whereas $\lim\limits_{n\to+\infty}\frac{1}{n}\log_{2}\left(\frac{1}{(\varepsilon^{\text{con}}_{1}-\varepsilon^{\text{con}}_{0})^{2}}\right)$ is given by Theorem 2 in the constant weight case and both terms are indeed different in general. One case which is particularly interesting is when $\tau$ and $\omega$ are chosen as $\tau=H^{-1}(1-R)$ and $\omega=R/2$ , where $R$ is the code rate we consider. This corresponds to the hardest case of syndrome decoding and when the parity-check equations of this weight can be easily obtained as we will see in Section 6. The two normalized exponents are compared on Figure 2 as a function of the rate $R$ . As we see, there is a huge difference. The problem with the model chosen in [FKI07] is that it is a very favorable model for statistical decoding. To the best of our knowledge there are no efficient algorithms for producing such parity-checks when $\omega\leq R/2$ . Note that even such an algorithm were to exist, selecting appropriately only one weight would not change the exponential complexity of the algorithm (this will be proved in Section 5). In other words, in order to study statistical decoding we may restrict ourselves, as we do here, to considering only one weight and not a whole range of weights.

The difference between both formulas is even more apparent when considering the slopes at the origin as shown in Figure 3.

However both models get closer when the error weight decreases. For instance when considering a relative error $\tau=H^{-1}(1-R)/2$ , we see in Figure 4 that the difference between both models gets significantly smaller. Actually the difference vanishes when the relative error tends to [math], as shown by Proposition 7.

Proposition 7 (Asymptotic complexity of statistical decoding for a sub-linear error weight).

**

[TABLE]

Proof.

As $\tau$ decreases to [math], we consider for $\pi(\omega,\tau)$ the first formula which is given in Theorem 2. We have:

[TABLE]

with

[TABLE]

Let us compute now Taylor series expansion of $r$ when $\tau\rightarrow 0$ . We start with

[TABLE]

Now using the fact that:

[TABLE]

we have:

[TABLE]

And we deduce that:

[TABLE]

and therefore

[TABLE]

Now using the fact that:

[TABLE]

we have the asymptotic expansions with the logarithms:

[TABLE]

So we deduce that:

[TABLE]

So by plugging this expression with (24) in (23) we have the result.

∎

The sublinear case is also relevant to cryptography since several McEliece cryptosystems actually operate at this regime, this is true for the original McEliece system with fixed rate binary Goppa codes [McE78] or with the MDPC-McEliece cryptosystem [MTSB13]. In this regime, [CTS16] showed that all ISD algorithms have the same asymptotic complexity when the number $t$ of errors to correct is equal to $o(n)$ and this is given by:

[TABLE]

Let us compare the exponents of statistical decoding and the ISD algorithms when we want to correct a sub-linear error weight. When $t=o(n)$ the complexity we are after is subsexponential in the length. The only algorithm finding moderate weight parity-check equations in subexponential time we found is Algorithm 2. It produces parity-check equations of weight $Rn/2$ in amortized time $\tilde{O}(1)$ . So with this algorithm, the exponent of statistical decoding is given by $-2\tau\log_{2}(1-R)$ which is twice the exponent of all the ISDs. We did not conclude for a relative weight $<R/2$ as in any case, all the algorithms we found needed exponential time to output enough equations to perform statistical decoding. So unless one comes up with an algorithm that is able to produce parity-check equations of relative weight $<R/2$ in subexponential time, statistical decoding is not better that any ISDs when we have to correct $t=o(n)$ errors.

5 Studying the single weight case is sufficient

The previous section showed that if it is much more favorable when it comes to perform statistical decoding to produce parity-check equations following the binomial model of weight $w$ rather than parity-checks of constant weight $w$ . The problem is that as far as we know, there is no efficient way of producing moderate weight parity-check equations (let us say that we call moderate any weight $\leq 1+Rn/2$ ) which would follow such a model. Even the “easy case”, where $w=1+Rn/2$ and where it is trivial to produce such equations by simply putting the parity-check matrix in systematic form and taking rows in this matrix 333For more details see Section 6, does not follow the binomial model : the standard deviation of the parity-check equation weight is easily seen to be different between what is actually produced by the algorithm and the binomial model of weight $1+Rn/2$ . Of course, this does not mean that we should rule out the possibility that there might exist such efficient algorithms. We will however prove that under very mild conditions, that even such an algorithm were to exist then anyway it would produce by nature parity-checks of different weights and that we would have a statistical decoding algorithm of the same exponential complexity which would keep only one very specific weight. In other words, it is sufficient to care about the single weight case as we do here when we study just the exponential complexity of statistical decoding.

To verify this, we fix an arbitrary position we want to decode and assume that some algorithm has produced in time $T$ , $m=\sum_{j=1}^{n}m_{j}$ parity check equations involving this position where $m_{j}$ denotes the number of parity-check equations of weight $j$ . The equations of weight $j$ are denoted by $h_{1}^{j},\dots,h_{m_{j}}^{j}$ . Statistical decoding is based on simple statistics involving the values $\langle y,h_{s}^{j}\rangle$ . To simplify a little bit the expressions we are going to manipulate, let us introduce

[TABLE]

Similarly to Assumptions 1 and 2, we assume that the distribution of $\langle y,h_{s}^{j}\rangle$ is approximated by the distribution of $\langle y,h_{s}^{j}\rangle$ when $h_{s}^{j}$ is drawn uniformly at random among the words of weight $j$ and the $\langle y,h_{s}^{j}\rangle$ ’s are independent. So we have $X_{s}^{j}\sim\mathscr{B}(1/2+\varepsilon_{l}(j))$ under the hypothesis $\mathscr{H}_{l}$ and $\varepsilon_{l}(j)$ is the bias defined in Subsection 3.1 for a weight $j$ . Our aim now is to find a test distinguishing both hypotheses $\mathscr{H}_{0}$ and $\mathscr{H}_{1}$ . As in Subsection 3.1 it will be the Neymann-Pearson test. We define the following quantity where $\mathbb{P}_{\mathscr{H}_{l}}$ denotes the probability under the hypothesis $\mathscr{H}_{l}$ :

[TABLE]

The lemma of Neymann-Pearson tells to us to proceed as follows: if $q>\Theta$ , where $\Theta$ is some threshold, choose $\mathscr{H}_{0}$ and $\mathscr{H}_{1}$ otherwise. In this case, no other statistic test will lead to lower false detection probabilities at the same time. In our case, it is enough to set the threshold $\Theta$ to [math] since it can be easily verified that no other choices will not change the exponent of the number of samples we need for having vanishing false detection probabilities. We set $p_{l}(j)\mathop{=}\limits^{\triangle}1/2+\varepsilon_{l}(j)$ , $I_{0}(j)=\#\{0\in\{x_{1}^{j},\cdots,x_{m_{j}}^{j}\}\}$ and $I_{1}(j)=\#\{1\in\{x_{1}^{j},\cdots,x_{m_{j}}^{j}\}\}$ , we have:

[TABLE]

Therefore by taking the natural logarithm of this expression and $\sum_{k=1}^{m_{j}}X_{k}^{j}=I_{1}(j)$ and $I_{1}(j)+I_{0}(j)=m_{j}$ , we have:

[TABLE]

We now use the Taylor series expansion around [math] : $\ln(1/2+x)=-\ln(2)+2x-\frac{4x^{2}}{2}+\frac{8x^{3}}{3}+o(x^{3})$ and we deduce for $i$ in $\{0,1\}$ :

[TABLE]

We have,

[TABLE]

where

[TABLE]

and $C$ is the constant defined by:

[TABLE]

This computation suggests to use the random variables $Y_{s}^{j}$ to build our distinguisher with the Neyman-Pearson likelihood test. By the assumptions on the $X_{s}^{j}$ ’s, the $Y_{s}^{j}$ ’s are independent and we have under $\mathscr{H}_{l}$ :

[TABLE]

The expectation of $Y_{s}^{j}$ under $\mathscr{H}_{l}$ is given by:

[TABLE]

As for our previous distinguisher we define the random variable $V_{m}$ for $m=\sum_{j=1}^{n}m_{j}$ uniform and independent draws of vectors $h_{s}^{j}$ in ${\mathscr{H}}_{w_{j},i}$ :

[TABLE]

The expectation of $V_{m}$ depends on which hypothesis $\mathscr{H}_{l}$ holds. When hypothesis $\mathscr{H}_{l}$ holds, we denote the expecation of $V_{n}$ by $E_{l}$ . The difference $E_{0}-E_{1}$ is given by:

[TABLE]

The deviations of $V_{m}$ around its expectation will be quantified through Hoeffding’s bound which gives in this case up to constant factors in the exponent the right behavior of the probability that $V_{m}$ deviates from its expectation

Proposition 8 (Hoeffding’s Bound).

Let $Y_{1},\cdots,Y_{m}$ independent random variables, $a_{1},\cdots,a_{m}$ and $b_{1},\cdots,b_{m}$ with $a_{s}<b_{s}$ such that:

[TABLE]

We set $Z_{m}=\sum_{s=1}^{m}Y_{s}$ , then:

[TABLE]

In order to distinguish both hypotheses, we set $t=\frac{E_{0}-E_{1}}{2}$ . So under $\mathscr{H}_{l}$ , we have

[TABLE]

We decide that hypothesis $\mathscr{H}_{1}$ holds if $V_{m}<\frac{E_{0}+E_{1}}{2}$ and that $\mathscr{H}_{0}$ holds otherwise. It is clear that the probability $P_{e}$ to make a wrong decision with this distinguisher is smaller than $2e^{-\frac{1}{2}\left(\sum_{j=1}^{n}m_{j}(\varepsilon_{0}(j)-\varepsilon_{1}(j))^{2}\right)}$ . If we want $P_{e}\leq 2e^{-\eta}$ for any fixed $\eta$ , $m_{1},\cdots,m_{n}$ have to be such that:

[TABLE]

Note that this is really the right order (up to some contant factor) for the amount of equations which is needed (the Hoeffding bound captures well up to constant factors the probability of the error of the distinguisher in this case) and using optimal Bayesian decision does not allow to change up to multiplicative factors the number of equations that are needed for a fixed relative error weight. Now assume that

Assumption 3.

If we can compute $m$ parity-check equations of weight $w$ in time $T$ , we are able to compute $n\cdot m$ parity-check equations of this weight in time $O(nT)$ .

This assumption holds for all “reasonable” randomized algorithms producing random parity-checks with uniform/quasi uniform probability as long as $n\cdot m$ is at most some constant fraction (with a constant $<1$ ) of the total number of parity-check equations. Now we set $j_{0}$ such that:

[TABLE]

Clearly if we take now instead of the original $m$ parity-check equations just the $n\cdot m_{j_{0}}$ parity check equations of weight $j_{0}$ the probability does of error does not get smaller than the bound $2e^{-\eta}$ that we had before since

[TABLE]

So, under Assumption 3 if our distinguisher with several weights has enough parity-check equations available, we are able in polynomial time to compute $n\cdot m_{j_{0}}$ parity-check equations of weight $w_{j_{0}}$ where $j_{0}$ is chosen such that (26) holds and with these parity-check equations the distinguisher of Subsection 3.1 can work too. The complexity of statistical decoding without the phase of computation of the parity-check equations is the number of parity-check equations that it is needed. So, under Assumption 3, its complexity with our first distinguisher will be for each codelength $n$ the same up to a polynomial mutiplicative factor as the complexity with the second distinguisher. Moreover, under Assumption 3 the complexity of the computation of the parity-check equations that is needed for both distinguishers is the same up to a polynomial factor. As the $\varepsilon_{1}(j)-\varepsilon_{0}(j)$ are exponentially small in $n$ , in order to have a probability of success which tends to $1$ , the $m_{j}$ ’s of both distinguisher have to be of order $\tilde{O}\left(\frac{1}{(\varepsilon_{0}(j)-\varepsilon_{1}(j))^{2}}\right)$ . It leads to the conclusion that the asymptotic exponent of the statistical decoding is the same with considering some well chosen weight or several weights. We stress that this conclusion is about an asymptotic study of the complexity of statistical decoding. Indeed, in practice Algorithms 2 and 3 can output many parity-check equations of weight ”close” to $Rn/2$ and $r+(Rn-l)/2$ . It will be counter-productive not to keep them and use them with the distinguisher we just described.

6 A simple way of obtaining moderate weight parity-check equations

As we are now able to give a formula for $\pi(\omega,\tau)$ we come back to the algorithm

ParityCheckComputationw in order to estimate $\pi^{complete}(\omega,\tau)$ . There is an easy way of producing parity-check equations of moderate weight by Gaussian elimination. This is given in Algorithm 2 that provides a method for finding parity-check equations of weight $w=\frac{Rn}{2}$ of an $[n,Rn]$ random code. Gaussian elimination (GElim) of an $Rn\times n$ matrix $G_{0}$ consists in finding $U$ ( $Rn\times Rn$ and non-singular) such that:

[TABLE]

$L_{j}(G)$ denotes the $j-$ th row of $G$ in Algorithm 2.

Algorithm 2 is a randomized algorithm. Randomness comes from the choice of the permutation $P$ . It is straightforward to check that this algorithm returns $P_{Rn/2}$ parity-check equations of weight $Rn/2$ in time $\tilde{O}\left(P_{Rn/2}\right)$ .

Now we set $\tau=H^{-1}(1-R)$ . This relative weight, which corresponds to the Gilbert-Varshamov bound, is usually used to measure the efficiency of decoding algorithms. Indeed it corresponds to the critical error weight below which we still have with probability $1-o(1)$ a unique solution to the decoding problem. It can be viewed as the weight for which the decoding problem is the hardest, since the larger the weight the more difficult the decoding problem seems to be (this holds at least for all known decoding algorithms of generic linear codes). As a consequence of Propositions 2 and 4, we have the following theorem:

Theorem 3.

[Naive Statistical Decoding’s asymptotic complexity]

*With the computation of parity-check equations of weight $Rn/2$ thanks to

ParityCheckComputationRn/2, we have:*

[TABLE]

where $\pi(R/2,\tau)$ is given by Theorem 2.

Exponents (as a function of $R$ ) of Prange’s ISD and statistical decoding are given in Figure 5. As we see the difference is huge. This version of statistical decoding can not be considered as an improvement over ISDs. However, as $\omega\mapsto\pi(\omega,\tau)$ for $\tau$ fixed is an increasing function in $\omega$ , we have to study the case $\omega<R/2$ . It is the subject of the next section. We will give there an algorithm computing efficiently parity-check equations of smaller weight than $Rn/2$ . However we also prove there that no matter how efficiently we perform the pre-computation step, any version of statistical decoding is worse than Prange’s ISD.

7 Improvements and limitations of statistical decoding

7.1 Framework

Before giving an improvement and giving lower bounds on the complexity of statistical decoding, we would like to come back to the computation problem of the ${\mathscr{S}}_{i}$ ’s in the complexity of statistical decoding. Our aim is to clarify the picture a little bit. We stress that statistical decoding complexity is, if the ${\mathscr{S}}_{i}$ ’s are already computed and stored, (up to a polynomial factor) the number of equations we use to take our decision. We denote by ${\mathscr{D}}_{w}$ the part of statistical decoding which uses these parity-check equations to perform the decoding and by ${\mathscr{A}}_{w}$ the randomized algorithm used for outputting a certain number of random parity-check equations of weight $w$ . ParityCheckComputationw is assumed to make a certain number of calls to ${\mathscr{A}}_{w}$ . It is assumed that ${\mathscr{A}}_{w}$ outputs $N_{w}$ parity-check equations of weight $w$ in time $T_{w}$ each time we run it. We assume that statistical decoding needs $\tilde{O}(P_{w})$ equations. If we consider the computations of parity-check equations as part of statistical decoding, its complexity is given by:

[TABLE]

When $\frac{T_{w}}{N_{w}}=\tilde{O}(1)$ , we say $\mathscr{A}_{w}$ gives equations in amortized time $\tilde{O}(1)$ . With this condition if we assume $P_{w}\geq N_{w}$ , the complexity is the number of equations needed.

In any case, complexity of statistical decoding is lower-bounded by $\tilde{O}(P_{w})$ and the lower the equation weight $w$ , the lower the number of equations $P_{w}$ we need for performing statistical decoding. The goal of this section is to show how to find many parity-check equations of weight $<Rn/2$ in an efficient way and to give a minimal weight for which it makes sense to make this operation.

7.2 A lower bound on the complexity of statistical decoding

As we just pointed out, statistical decoding needs $\tilde{O}\left(P_{w}\right)$ parity-check equations of weight $w$ to work. Its complexity is therefore always greater than $\tilde{O}\left(P_{w}\right)$ . We assume again the code we want to decode to be a random code. This assumption is standard in the cryptographic context. The expected number of parity-check equations of weight $w$ in an $[n,Rn]$ random binary linear code is $\frac{\binom{n}{w}}{2^{Rn}}$ . Obviously if $w$ is too small there are not enough equations for statistical decoding to work, we namely need that

[TABLE]

The minimum $\omega_{0}(R,\tau)$ such that this holds is clearly given by the minimal $\omega$ such that the following expression holds

[TABLE]

So $\omega_{0}(R,\tau)$ gives the minimal relative weight such that asymptotically the number of parity-check equations needed for decoding is exactly the number of parity-check equations of weight $w_{0}(R,\tau)$ in the code, where $w_{0}(R,\tau)\mathop{=}\limits^{\triangle}\omega_{0}(R,\tau)n$ . Below this weight, statistical decoding can not work (at least not for random linear codes). In other words the asymptotic exponent of statistical decoding is always lower-bounded by $\pi(w_{0}(R,\tau),\tau)$ .

In the case of a relative error weight given by the Gilbert-Varshamov bound $\tau_{\text{DGV}}=H^{-1}(1-R)$ , Theorem 3 leads to the conclusion that

[TABLE]

Moreover for all relative weights greater than $\omega_{0}(R,\tau_{\text{DGV}})$ the number of parity-check equations that are needed is exactly the number of parity-check equations of this weight that exist in a random code. This result is rather intriguing and does not seem to have a simple interpretation. The relative minimal weight $w_{0}(R,\tau_{\text{DGV}})$ is in relationship with the first linear programming bound of McEliece-Rodemich-Rumsey-Welch and can be interpreted through its relationship with the zeros of Krawtchouk polynomials. This bound arises from the fact that from Theorem 3, we know that $\omega_{0}(R,\tau_{\text{DGV}})$ corresponds to the relative weight where we switch from the complex case to the real case, and this happens precisely when we leave the region of zeros of the Krawtchouk polynomials.

Thanks to Figure 6 which compares Prange’s ISD, statistical decoding with parity-check equations of relative weight $R/2$ and $\omega_{0}(R,\tau)$ with $\tau=H^{-1}(1-R)$ , we clearly see on the one hand that there is some room of improving upon naive statistical decoding based on parity-check equations of weight $Rn/2$ , but on the other hand that even with the best improvement upon statistical decoding we might hope for, we will still be above the most naive information set decoding algorithm, namely Prange’s algorithm.

7.3 An improvement close to the lower bound

The goal of this subsection is to present an improvement to the computation of parity-check equations and to give its asymptotic complexity. R. Overbeck in [Ove06, Sec. 4] showed how to compute parity-check equations thanks to Stern’s algorithm. We are going to use this algorithm too. However, whereas Overbeck used many iterations of this algorithm to produce a few parity-check equations of small weight, we observe that this algorithm produces in a natural way during its execution a large number of parity-check equations of relative weight smaller than $R/2$ . We will analyze this process here and show that it yields an algorithm $\mathscr{A}_{w}$ that gives equations in amortized time $\tilde{O}(1)$ .

To find parity-check equations, we described an algorithm which just performs Gaussian elimination and selection of sufficiently sparse rows. In fact, it is the main idea of Prange’s algorithm. As we stressed in introduction, this algorithm has been improved rather significantly over the years (ISD family). Our idea to improve the search for parity-check equations is to use precisely these improvements. The first significant improvement is due to Stern and Dumer [Ste88, Dum91]. The main idea is to solve a sub-problem with the birthday paradox. We are going to describe this process and show how it allows to improve upon naive statistical decoding.

We begin by choosing a random permutation matrix $P\in\mathbb{F}_{2}^{n\times n}$ and putting the matrix $GP$ into the systematic form:

[TABLE]

We solve CSD( $G_{2},r,0_{[l]}$ ).
For each solution $e$ , we output $e_{s}=(eG_{1}^{T},e)P^{T}$ .

Remark 3.

We recall that solving CSD( $G_{2},r,0_{[l]}$ ) means to find $r$ columns of $G_{2}$ which yield [math].

$\cdot$ Soundness: We have

[TABLE]

and therefore $e$ is a parity-check equation of $\mathscr{C}$ .

$\cdot$ Number of solutions: The number of solutions is given by the number of solutions of 1. Furthermore, the complexity of this algorithm is up to a polynomial factor given by the complexity of 1.

Remark 4.

This algorithm may not provide in one step enough solutions. In this case, we have to put $G$ in another systematic form (i.e. choose another permutation). The randomness of our algorithm will come from this choice of permutation matrix.

$\cdot$ Solutions’ weight: In our model $G$ is supposed to be random. So we can assume the same hypothesis for $G_{2}$ . As the length of its rows is $Rn-l$ , we get asymptotically parity-check equations of weight:

[TABLE]

The first part of this algorithm can be viewed as the first part of ISD algorithms. There is a general presentation of these algorithms in [FS09] in Section 3. All the efforts that have been spent to improve Prange’s ISD can be applied to solve the first point of our algorithm. To solve this point, Dumer suggested to put $G_{2}$ in the following form:

[TABLE]

and to build the lists:

[TABLE]

Then we intersect these two lists with respect to the second coordinate and we keep the associated first coordinate. In other words, we get:

[TABLE]

Remark 5.

This process is called a fusion.

Algorithm 3 summarizes this formally.

As we neglect polynomial factors, the complexity of Algorithm 3. is given by:

[TABLE]

Indeed, we only have to enumerate the hash table construction (first factor) and the construction of ${\mathscr{S}}$ . In order to estimate $\#{\mathscr{S}}$ we use the following classical proposition:

Proposition 9.

Let $L_{1},L_{2}\subseteq\{0,1\}^{l}$ be two lists where inputs are supposed to be random and distributed uniformly. Then, the expectation of the cardinality of their intersection is given by:

[TABLE]

As we supposed $G_{2}$ random, we can apply this proposition to DumerFusion. Therefore,

Proposition 10 (DumerFusion’s complexity).

**

DumerFusion*’s complexity is given by:*

[TABLE]

and it provides on average

[TABLE]

solutions

In order to study this algorithm asymptotically, we introduce the following notations and relative parameters:

Notation 6.

$\cdot$ $N_{r,l}\mathop{=}\limits^{\triangle}\frac{\binom{(n(1-R)+l)/2}{r/2}^{2}}{2^{l}}$ ;

$\cdot$ $T_{r,l}\mathop{=}\limits^{\triangle}\binom{(n(1-R)+l)/2}{r/2}+\frac{\binom{(n(1-R)+l)/2}{r/2}^{2}}{2^{l}}$ ;

$\cdot$ $\rho=\frac{r}{n}$ ;

$\cdot$ $\lambda=\frac{l}{n}$ .

We may observe that $N_{r,l}$ gives the number of parity-check equations that DumerFusion outputs in one iteration and $T_{r,l}$ is the running time of one iteration. There are many ways of choosing $r$ and $l$ . However in any case (see Subsection 7.2), as the weight of parity-check equations we get with DumerFusion is $(r+\frac{R-l}{2})(1+o(1))$ we have to choose $r$ and $l$ such that

[TABLE]

which is equivalent to

[TABLE]

The following lemma gives an asymptotic choice of $\rho$ and $\lambda$ that allows to get parity-check equations in amortized time $\tilde{O}(1)$ :

Lemma 11.

If

[TABLE]

DumerFusion* provides parity-check equations of relative weight $\rho+\frac{R-\lambda}{2}$ in amortized time $\tilde{O}(1)$ . Moreover, with this constraint we have asymptotically :*

[TABLE]

Proof.

We remark that $T_{r,l}=N_{r,l}+\binom{(n(1-R)+l)/2}{r/2}$ . Our goal is to find $\rho,\lambda$ such that asymptotically $\frac{T_{r,l}}{N_{r,l}}=\tilde{O}(1)$ . The constraint (28) follows from $\binom{u}{v}=\tilde{O}\left(2^{u\cdot H(u/v)}\right)$ .

∎

We are now able to give the asymptotic complexity of statistical decoding with the use of DumerFusion strategy.

Theorem 4.

With the constraints (27), (28) and

[TABLE]

for $(\rho,\lambda)$ we have:

[TABLE]

Proof.

Thanks to (28) and (29) we use Subsection 7.1 and we conclude that under theses constraints we have $\pi(\rho+(r-\lambda)/2,\tau)=\pi^{complete}(\rho+(r-\lambda)/2,\tau)$ . ∎

Remark 6.

We summarize the meaning of the constraints as:

•

With (27) we are sure there exists enough parity-check equations for statistical decoding to work;

•

With (28) DumerFusion gives parity-check equations in amortized time $\tilde{O}(1)$ ;

•

With (29) DumerFusion provides always no more equations in one iteration than we need.

In order to get the optimal statistical decoding complexity we minimize $\pi(\rho+(R-\lambda)/2,\tau)$ (with $\pi(\rho+(R-\lambda)/2,\tau)$ given by Theorem 2) under constraints (27), (28) and (29). The exponent of statistical decoding with this strategy is given in Figure 7.

As we see, DumerFusion with our strategy allows statistical decoding to be optimal for rates close to [math]. We can further improve DumerFusion with ideas of [MMT11] and [BJMM12], however this comes at the expense of having a much more involved analysis and would not allow to go beyond the barrier of the lower bound on the complexity of statistical decoding given in the previous subsection. Nevertheless with the same strategy, these improvements lead to better rates with an optimal work of statistical decoding.

8 Conclusion

In this article we have revisited statistical decoding with a rigorous study of its asymptotic complexity. We have shown that under Assumption 1 and 2 this algorithm is regardless of any strategy we choose for producing the moderate weight parity-check equations needed by this algorithm always worse than Prange ISD for the hardest instance of decoding (i.e. for a number of errors equal to Gilbert Varshamov bound). In this case a very intriguing phenomenon happens, we namely need for a large range of parity-check weights all the parity-check available in the code to be be able to decode with this technique. It seems very hard to come up with choices of rate, error weight and length for which statistical decoding might be able to compete with ISD even if this can not be totally ruled out by the study we have made here. However there are clearly more sophisticated techniques which could be used to improve upon statistical decoding. For instance using other strategies by grouping positions together and using all parity-check equations involving bits in this group could be another possible interesting generalization of statistical decoding.

Bibliography16

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[Bar 97] Alexander Barg. Complexity issues in coding theory. Electronic Colloquium on Computational Complexity , October 1997.
2[BJMM 12] Anja Becker, Antoine Joux, Alexander May, and Alexander Meurer. Decoding random binary linear codes in 2 n / 20 superscript 2 𝑛 20 2^{n/20} : How 1 + 1 = 0 1 1 0 1+1=0 improves information set decoding. In Advances in Cryptology - EUROCRYPT 2012 , Lecture Notes in Comput. Sci. Springer, 2012.
3[CTS 16] Rodolfo Canto-Torres and Nicolas Sendrier. Analysis of information set decoding for a sub-linear error weight. In Post-Quantum Cryptography 2016 , Lecture Notes in Comput. Sci., pages 144–161, Fukuoka, Japan, February 2016.
4[Dum 91] Ilya Dumer. On minimum distance decoding of linear codes. In Proc. 5th Joint Soviet-Swedish Int. Workshop Inform. Theory , pages 50–52, Moscow, 1991.
5[FKI 07] Marc P. C. Fossorier, Kazukuni Kobara, and Hideki Imai. Modeling bit flipping decoding based on nonorthogonal check sums with application to iterative decoding attack of Mc Eliece cryptosystem. IEEE Trans. Inform. Theory , 53(1):402–411, 2007.
6[FS 09] Matthieu Finiasz and Nicolas Sendrier. Security bounds for the design of code-based cryptosystems. In M. Matsui, editor, Advances in Cryptology - ASIACRYPT 2009 , volume 5912 of Lecture Notes in Comput. Sci. , pages 88–105. Springer, 2009.
7[IS 98] Mourad E.H. Ismail and Plamen Simeonov. Strong asymptotics for Krawtchouk polynomials. Journal of Computational and Applied Mathematics , pages 121–144, 1998.
8[Jab 01] Abdulrahman Al Jabri. A statistical decoding algorithm for general linear block codes. In Bahram Honary, editor, Cryptography and coding. Proceedings of the 8 th IMA International Conference , volume 2260 of Lecture Notes in Comput. Sci. , pages 1–8, Cirencester, UK, December 2001. Springer.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Statistical Decoding

Abstract

1 Introduction

2 Notation

Notation 1**.**

3 Statistical Decoding

Problem 1**.**

Assumption 1**.**

Notation 2**.**

3.1 Bias in the parity-check sum distribution

Assumption 2**.**

Proposition 1** (Chernoff’s Bound).**

3.2 The statistical decoding algorithm

Notation 3**.**

Notation 4**.**

Definition 1** (Asymptotic complexity of statistical decoding).**

Remark 1**.**

Theorem 1** ([IS98, Th. 3.1]).**

Remark 1**.**

Notation 5**.**

Lemma 2**.**

Proof.

Lemma 3**.**

Proof.

Lemma 4**.**

Proof.

Corollary 5**.**

Remark 2**.**

Theorem 2** (Asymptotic complexity of statistical decoding).**

Proof.

4 The binomial model

Proposition 6** ([FKI07, Proposition 2.1 p.405]).**

Proposition 7** (Asymptotic complexity of statistical decoding for a sub-linear error weight).**

Proof.

5 Studying the single weight case is sufficient

Proposition 8** (Hoeffding’s Bound).**

Assumption 3**.**

6 A simple way of obtaining moderate weight parity-check equations

Theorem 3**.**

7 Improvements and limitations of statistical decoding

7.1 Framework

7.2 A lower bound on the complexity of statistical decoding

7.3 An improvement close to the lower bound

Remark 3**.**

Remark 4**.**

Remark 5**.**

Proposition 9**.**

Proposition 10** (DumerFusion’s complexity).**

Notation 6**.**

Lemma 11**.**

Proof.

Theorem 4**.**

Proof.

Remark 6**.**

8 Conclusion

Notation 1.

Problem 1.

Assumption 1.

Notation 2.

Assumption 2.

Proposition 1 (Chernoff’s Bound).

Notation 3.

Notation 4.

Definition 1 (Asymptotic complexity of statistical decoding).

Remark 1.

Theorem 1 ([IS98, Th. 3.1]).

Remark 1.

Notation 5.

Lemma 2.

Lemma 3.

Lemma 4.

Corollary 5.

Remark 2.

Theorem 2 (Asymptotic complexity of statistical decoding).

Proposition 6 ([FKI07, Proposition 2.1 p.405]).

Proposition 7 (Asymptotic complexity of statistical decoding for a sub-linear error weight).

Proposition 8 (Hoeffding’s Bound).

Assumption 3.

Theorem 3.

Remark 3.

Remark 4.

Remark 5.

Proposition 9.

Proposition 10 (DumerFusion’s complexity).

Notation 6.

Lemma 11.

Theorem 4.

Remark 6.