The Velocity of the Propagating Wave for Spatially Coupled Systems with   Applications to LDPC Codes

Rafah El-Khatib; Nicolas Macris

arXiv:1701.04318·cs.IT·January 17, 2017

The Velocity of the Propagating Wave for Spatially Coupled Systems with Applications to LDPC Codes

Rafah El-Khatib, Nicolas Macris

PDF

TL;DR

This paper derives an analytical formula for the velocity of wave-like error profiles in spatially coupled codes, providing insights into decoding dynamics and potential applications to finite size scaling.

Contribution

It introduces a formalism to calculate the wave velocity in spatially coupled systems, applicable to LDPC codes and other scalar systems, enhancing understanding of decoding behavior.

Findings

01

The velocity formula matches numerical evaluations across various noise levels.

02

Application to finite size scaling laws offers new analytical tools.

03

Analysis extends to compressive sensing and generalized LDPC codes.

Abstract

We consider the dynamics of message passing for spatially coupled codes and, in particular, the set of density evolution equations that tracks the profile of decoding errors along the spatial direction of coupling. It is known that, for suitable boundary conditions and after a transient phase, the error profile exhibits a "solitonic behavior". Namely, a uniquely-shaped wavelike solution develops, that propagates with constant velocity. Under this assumption we derive an analytical formula for the velocity in the framework of a continuum limit of the spatially coupled system. The general formalism is developed for spatially coupled low-density parity-check codes on general binary memoryless symmetric channels which form the main system of interest in this work. We apply the formula for special channels and illustrate that it matches the direct numerical evaluation of the velocity for a…

Figures12

Click any figure to enlarge with its caption.

Tables4

Table 1. TABLE I: Normalized velocities for LDPC( x 4 , x 6 superscript 𝑥 4 superscript 𝑥 6 x^{4},x^{6} ) on the BEC with a spatial length 1024 1024 1024 , for several w 𝑤 w sizes, and ϵ = 0.6 italic-ϵ 0.6 \epsilon=0.6 . The values in the table can be compared to v BEC = 0.0333 subscript 𝑣 BEC 0.0333 v_{\text{\tiny BEC}}=0.0333 and v l = 0.0293 subscript 𝑣 𝑙 0.0293 v_{l}=0.0293 .

	$w = 3$	$w = 5$	$w = 8$	$w = 16$
$v_{e}$	0.0325	0.0335	0.0337	0.0339
$v_{B} / α$	0.0473	0.0410	0.0380	0.0356

Table 2. TABLE II: Normalized velocities for LDPC( x 3 , x 6 superscript 𝑥 3 superscript 𝑥 6 x^{3},x^{6} ) on the BEC for spatial length 1024 1024 1024 , w = 8 𝑤 8 w=8 , and several ϵ italic-ϵ \epsilon values.

	$ϵ = 0.45$	$ϵ = 0.46$	$ϵ = 0.47$	$ϵ = 0.48$
$v_{e}$	0.0667	0.0458	0.0267	0.0117
$v_{BEC}$	0.0660	0.0449	0.0272	0.0115
$v_{l}$	0.0506	0.0373	0.0240	0.0108
$v_{B} / α$	0.0781	0.0541	0.0332	0.0142
$v_{B_{2}} / α$	0.6970	0.5008	0.3068	0.1291

Table 3. TABLE III: Normalized velocities of the waves within the Gaussian approximation on the ( 3 , 6 ) 3 6 (3,6) and ( 4 , 8 ) 4 8 (4,8) -regular code ensembles with L c = 100 subscript 𝐿 𝑐 100 L_{c}=100 , w = 3 𝑤 3 w=3 .

$2 / σ_{n}^{2}$	2.33	2.35	2.38	2.40
$v_{GA}$ $(3, 6)$	0.0176	0.0205	0.0265	0.0310
$v_{e}$ $(3, 6)$	0.0150	0.0208	0.0300	0.0358
$v_{GA}$ $(4, 8)$	0.0237	0.0258	0.0312	0.0381
$v_{e}$ $(4, 8)$	0.0217	0.0250	0.0308	0.0342

Table 4. TABLE IV: Values of γ 𝛾 \gamma and γ ¯ ¯ 𝛾 \bar{\gamma} for ϵ ( ℓ , r , L ) − ϵ = 0.04 subscript italic-ϵ ℓ 𝑟 𝐿 italic-ϵ 0.04 \epsilon_{(\ell,r,L)}-\epsilon=0.04 , L c = 100 subscript 𝐿 𝑐 100 L_{c}=100 , and several values of ℓ ℓ \ell and r 𝑟 r

l	r	$ϵ_{MAP}$	$γ$	$\bar{γ}$
3	6	0.4881	2.155	1.960
4	8	0.4977	2.120	1.779
5	10	0.4994	2.095	1.733
6	12	0.4999	2.075	1.722
4	12	0.3302	2.140	1.778
5	15	0.3325	2.115	1.746
4	6	0.6656	2.100	1.735

Equations255

H (x) = \int d x (α) lo g_{2} (1 + e^{- α}) .

H (x) = \int d x (α) lo g_{2} (1 + e^{- α}) .

α = α_{1} + α_{2},

α = α_{1} + α_{2},

α = 2 tanh^{- 1} (tanh \frac{α _{1}}{2} tanh \frac{α _{2}}{2}) .

α = 2 tanh^{- 1} (tanh \frac{α _{1}}{2} tanh \frac{α _{2}}{2}) .

\displaystyle\begin{cases}(\mathtt{x}_{1}\varoast\mathtt{x}_{2})(E)&=\int\mathrm{d}\mathtt{x}_{2}(\alpha)\mathtt{x}_{1}(E-\alpha),\\ (\mathtt{x}_{1}\boxast\mathtt{x}_{2})(E)&=\int\mathrm{d}\mathtt{x}_{2}(\alpha)\mathtt{x}_{1}\Big{(}2\tanh^{-1}\Big{(}\frac{\tanh(E/2)}{\tanh(\alpha/2)}\Big{)}\Big{)}.\end{cases}

\displaystyle\begin{cases}(\mathtt{x}_{1}\varoast\mathtt{x}_{2})(E)&=\int\mathrm{d}\mathtt{x}_{2}(\alpha)\mathtt{x}_{1}(E-\alpha),\\ (\mathtt{x}_{1}\boxast\mathtt{x}_{2})(E)&=\int\mathrm{d}\mathtt{x}_{2}(\alpha)\mathtt{x}_{1}\Big{(}2\tanh^{-1}\Big{(}\frac{\tanh(E/2)}{\tanh(\alpha/2)}\Big{)}\Big{)}.\end{cases}

{Δ_{\infty} \boxast x = x, Δ_{0} \boxast x = Δ_{0}, Δ_{0} \varoast x = x, Δ_{\infty} \varoast x = Δ_{\infty} .

{Δ_{\infty} \boxast x = x, Δ_{0} \boxast x = Δ_{0}, Δ_{0} \varoast x = x, Δ_{\infty} \varoast x = Δ_{\infty} .

⎩ ⎨ ⎧ H (x \varoast y) + H (x \boxast y) = H (x) + H (y), H (x \varoast a) + H (x \boxast a) = H (a), H (a \varoast b) + H (a \boxast b) = 0,

⎩ ⎨ ⎧ H (x \varoast y) + H (x \boxast y) = H (x) + H (y), H (x \varoast a) + H (x \boxast a) = H (a), H (a \varoast b) + H (a \boxast b) = 0,

\tilde{x}^{(t + 1)} = c_{h} \varoast λ^{\varoast} (ρ^{\boxast} (\tilde{x}^{(t)})),

\tilde{x}^{(t + 1)} = c_{h} \varoast λ^{\varoast} (ρ^{\boxast} (\tilde{x}^{(t)})),

h_{BP} = {h \in [0, 1] : \tilde{x} = c_{h} \varoast λ^{\varoast} (ρ^{\boxast} (\tilde{x})) ⟹ \tilde{x} = Δ_{\infty}} .

h_{BP} = {h \in [0, 1] : \tilde{x} = c_{h} \varoast λ^{\varoast} (ρ^{\boxast} (\tilde{x})) ⟹ \tilde{x} = Δ_{\infty}} .

h_{MAP} = {h \in [0, 1] : n \to \infty lim inf \frac{1}{n} E [H (X^{n} ∣ Y^{n} (h))] > 0},

h_{MAP} = {h \in [0, 1] : n \to \infty lim inf \frac{1}{n} E [H (X^{n} ∣ Y^{n} (h))] > 0},

W_{s} (x)

W_{s} (x)

γ \to 0 lim \frac{1}{γ} (W_{s} (x) + γ η) - W_{s} (x)) = 0,

γ \to 0 lim \frac{1}{γ} (W_{s} (x) + γ η) - W_{s} (x)) = 0,

\displaystyle\tilde{\mathtt{x}}_{z}^{(t+1)}=\mathtt{c}_{z}\varoast\lambda^{\varoast}\Bigg{(}\frac{1}{w}\sum\limits_{i=0}^{w-1}\rho^{\boxast}\Big{(}\frac{1}{w}\sum\limits_{j=0}^{w-1}\tilde{\mathtt{x}}_{z+i-j}^{(t)}\Big{)}\Bigg{)}.

\displaystyle\tilde{\mathtt{x}}_{z}^{(t+1)}=\mathtt{c}_{z}\varoast\lambda^{\varoast}\Bigg{(}\frac{1}{w}\sum\limits_{i=0}^{w-1}\rho^{\boxast}\Big{(}\frac{1}{w}\sum\limits_{j=0}^{w-1}\tilde{\mathtt{x}}_{z+i-j}^{(t)}\Big{)}\Bigg{)}.

\displaystyle\mathtt{x}_{z}^{(t+1)}=\frac{1}{w}\sum\limits_{i=0}^{w-1}\mathtt{c}_{z-i}\varoast\lambda^{\varoast}\Bigg{(}\frac{1}{w}\sum\limits_{j=0}^{w-1}\rho^{\boxast}\Big{(}\mathtt{x}_{z-i+j}^{(t)}\Big{)}\Bigg{)}.

\displaystyle\mathtt{x}_{z}^{(t+1)}=\frac{1}{w}\sum\limits_{i=0}^{w-1}\mathtt{c}_{z-i}\varoast\lambda^{\varoast}\Bigg{(}\frac{1}{w}\sum\limits_{j=0}^{w-1}\rho^{\boxast}\Big{(}\mathtt{x}_{z-i+j}^{(t)}\Big{)}\Bigg{)}.

W (\underline{x}) =

W (\underline{x}) =

x^{(t + 1)} = ϵ λ (1 - ρ (1 - x^{(t)}))

x^{(t + 1)} = ϵ λ (1 - ρ (1 - x^{(t)}))

W_{BEC} (x)

W_{BEC} (x)

\displaystyle x_{z}^{(t+1)}=\frac{1}{w}\sum\limits_{i=0}^{w-1}\epsilon_{z-i}\lambda\Big{(}\frac{1}{w}\sum\limits_{j=0}^{w-1}\big{(}1-\rho(1-x_{z-i+j}^{(t)})\big{)}\Big{)}.

\displaystyle x_{z}^{(t+1)}=\frac{1}{w}\sum\limits_{i=0}^{w-1}\epsilon_{z-i}\lambda\Big{(}\frac{1}{w}\sum\limits_{j=0}^{w-1}\big{(}1-\rho(1-x_{z-i+j}^{(t)})\big{)}\Big{)}.

x

x

\displaystyle v=\frac{\Delta E}{\int_{\mathbb{R}}\mathrm{d}z\,H\Big{(}\rho^{\prime\boxast}(\mathtt{X}(z))\boxast\mathtt{X}^{\prime}(z)^{\boxast 2}\Big{)}},

\displaystyle v=\frac{\Delta E}{\int_{\mathbb{R}}\mathrm{d}z\,H\Big{(}\rho^{\prime\boxast}(\mathtt{X}(z))\boxast\mathtt{X}^{\prime}(z)^{\boxast 2}\Big{)}},

Δ E = W_{s} (x_{BP}) - W_{s} (Δ_{\infty}),

Δ E = W_{s} (x_{BP}) - W_{s} (Δ_{\infty}),

X (z)

X (z)

\displaystyle\Delta\mathcal{W}(\mathtt{x})=\int_{\mathbb{R}}\mathrm{d}z\,\Big{(}P(z,\mathtt{x})-P(z,\mathtt{x_{0}})\Big{)},

\displaystyle\Delta\mathcal{W}(\mathtt{x})=\int_{\mathbb{R}}\mathrm{d}z\,\Big{(}P(z,\mathtt{x})-P(z,\mathtt{x_{0}})\Big{)},

P (z, x) = \frac{1}{R ^{'} ( 1 )} H (R^{\boxast} (x (z, t)))

P (z, x) = \frac{1}{R ^{'} ( 1 )} H (R^{\boxast} (x (z, t)))

\displaystyle-\frac{1}{L^{\prime}(1)}H\Big{(}\mathtt{c}(z)\varoast L^{\varoast}\big{(}\int_{0}^{1}\mathrm{d}s\,\rho^{\boxast}(\mathtt{x}(z+s,t))\big{)}\Big{)}.

\displaystyle\frac{\delta\Delta\mathcal{W}}{\delta\mathtt{x}}[\mathtt{\eta}(z,t)]=\frac{\mathrm{d}}{\mathrm{d}\gamma}\Delta\mathcal{W}(\mathtt{x}+\gamma\mathtt{\eta})\Big{|}_{\gamma=0},

\displaystyle\frac{\delta\Delta\mathcal{W}}{\delta\mathtt{x}}[\mathtt{\eta}(z,t)]=\frac{\mathrm{d}}{\mathrm{d}\gamma}\Delta\mathcal{W}(\mathtt{x}+\gamma\mathtt{\eta})\Big{|}_{\gamma=0},

\displaystyle\frac{\delta\Delta\mathcal{W}}{\delta\mathtt{x}}[\eta(z,t)]=\int_{\mathbb{R}}\mathrm{d}z\,H\Bigg{(}\Big{(}\int_{0}^{1}\mathrm{d}u\,

\displaystyle\frac{\delta\Delta\mathcal{W}}{\delta\mathtt{x}}[\eta(z,t)]=\int_{\mathbb{R}}\mathrm{d}z\,H\Bigg{(}\Big{(}\int_{0}^{1}\mathrm{d}u\,

\displaystyle\int_{\mathbb{R}}\mathrm{d}z\,H\Big{(}(\mathtt{x}(z,t+1)

\displaystyle\int_{\mathbb{R}}\mathrm{d}z\,H\Big{(}(\mathtt{x}(z,t+1)

\displaystyle\int_{\mathbb{R}}\mathrm{d}z\,H\Big{(}(\mathtt{x}(z,t+1)

\displaystyle\int_{\mathbb{R}}\mathrm{d}z\,H\Big{(}(\mathtt{x}(z,t+1)

Δ W (x) = W_{s} (x) + W_{i} (x),

Δ W (x) = W_{s} (x) + W_{i} (x),

W_{s} (x)

W_{s} (x)

W_{i} (x)

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

The Velocity of the Propagating Wave

for Spatially Coupled Systems with

Applications to LDPC Codes

Rafah El-Khatib, and Nicolas Macris R. El-Khatib and Nicolas Macris are with the School of Computer and Communication Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland, e-mails: {rafah.el-khatib,nicolas.macris}epfl.ch.

Abstract

We consider the dynamics of message passing for spatially coupled codes and, in particular, the set of density evolution equations that tracks the profile of decoding errors along the spatial direction of coupling. It is known that, for suitable boundary conditions and after a transient phase, the error profile exhibits a “solitonic behavior”. Namely, a uniquely-shaped wavelike solution develops, that propagates with constant velocity. Under this assumption we derive an analytical formula for the velocity in the framework of a continuum limit of the spatially coupled system. The general formalism is developed for spatially coupled low-density parity-check codes on general binary memoryless symmetric channels which form the main system of interest in this work. We apply the formula for special channels and illustrate that it matches the direct numerical evaluation of the velocity for a wide range of noise values. A possible application of the velocity formula to the evaluation of finite size scaling law parameters is also discussed. We conduct a similar analysis for general scalar systems and illustrate the findings with applications to compressive sensing and generalized low-density parity-check codes on the binary erasure or binary symmetric channels.

Index Terms:

Message passing, density evolution, potential functional, threshold saturation, soliton, wave propagation, compressive sensing.

I Introduction

Spatial coupling was first introduced by Felstrom and Zigangirov in the context of low-density parity-check (LDPC) codes [1]. Spatially coupled codes have been shown to be capacity-achieving on binary memoryless symmetric (BMS) channels under belief propagation (BP) decoding. The capacity-achieving property is due to the “threshold saturation” of the BP threshold of the coupled system towards the maximum a-posteriori (MAP) threshold of the uncoupled code ensemble [2], [3]. Spatial coupling has also been applied to several other problems besides channel coding [4], [5], such as lossy source compression [6], [7], compressive sensing [8], [9], [10], random constraint satisfaction problems [11], [12], [13], and a coupled Curie-Weiss (toy) model [14], [15].

Consider a single (uncoupled) LDPC code. To construct the corresponding coupled code of spatial length $L_{c}+w$ , we take $L_{c}+w$ replicas of the single system and “couple” every $w$ adjacent single systems by means of a uniform window function. At every iteration of the BP algorithm, the variable and check nodes of the coupled graph exchange messages which are described by a set of coupled density evolution (DE) iterative equations. The solution to the DE equations, called the decoding profile $\mathtt{x}$ , is a vector of “error distributions” (more precisely, distributions of the BP log-likelihood estimates) along the spatial axis of coupling. More specifically, let the integer $z\in\{-w+1,\dots,L_{c}\}$ denote the position along the spatial direction of the graph construction (on which the replicas are spread). Then the $z^{th}$ component of $\mathtt{x}$ , call it $\mathtt{x}_{z}$ , denotes the distribution of the BP log-likelihood estimate at the $z^{\text{th}}$ position. In the special case of the binary erasure channel (BEC) this component is reduced to the usual scalar erasure probability $0\leq x_{z}\leq 1$ at position $z$ along the spatial axis of coupling.

Spatially coupled codes perform well, and are capacity achieving, due to the threshold saturation phenomenon that is proved for general BMS channels in [2], [3]. More specifically, as long as the channel noise is below the MAP threshold, the decoding profile of a spatially coupled code converges to the all- $\Delta_{\infty}$ vector after enough iterations of the BP algorithm, where $\Delta_{\infty}$ is the Dirac mass at infinite log-likelihood (i.e. perfect knowledge of the bits). In the special case of the BEC, the all- $\Delta_{\infty}$ vector corresponds to a vector of scalar erasure probabilities driven to zero by DE iterations. On the other hand, the probability distribution of the log-likelihoods of bits of the corresponding uncoupled code only converge to $\Delta_{\infty}$ when the channel noise is below the BP threshold (which is lower than the MAP threshold).

The threshold saturation phenomenon is made possible due to “seeding” at the boundaries of the spatially coupled code. Seeding means that we fix the bits at the boundaries so that the probability distributions of their log-likelihoods are $\Delta_{\infty}$ . This facilitates BP decoding near the boundaries, and this effect is propagated along the rest of the coupled chain. The minimum size of the seed that guarantees the propagation of the decoding effect is of the same order as the size $w$ of the coupling window; however, an exact determination of the minimum possible such size is an still an interesting open question.

When the channel noise is between the BP and the MAP thresholds, and the underlying uncoupled ensemble has a unique non-trivial stable BP fixed point that blocks decoding, an interesting phenomenon that has been empirically observed is the appearance of a solitonic decoding wave after a certain number of transient iterations of the BP algorithm. This soliton is characterized by a fixed shape that seems independent of the initial condition and has a constant traveling velocity that we henceforth call $v$ . This phenomenology is discussed in more details in Section II-C. Figures 2 and 3 in Section II-C show an example of the transient phase and of the soliton in the case of spatially coupled codes on the BEC. The main goal of this work is to derive a formula for the velocity of the soliton for general BMS channels.

The decoding wave has recently been studied in the context of coding when transmission takes place over the BEC. In [16], it is proved that the solitonic wave solution exists and bounds on the velocity of the soliton are derived. However, the independence of the unique shape of the wave from the initial conditions remains an open question. In [17], more complex coupled systems are studied, where it is possible to have more than one non-trivial stable BP fixed point, and there again some bounds on the velocity of the soliton are provided. The solitonic behavior has also been studied for the coupled Curie-Weiss toy model [14] and in [15] a formula for the velocity, as well as an approximation, are derived and tested numerically.

In the first part of this work, we derive a general formula for the velocity of the wave in the asymptotic limit $L_{c}\gg w\gg 1$ in the context of coding when transmission takes place over general BMS channels (see Equ. (15)). This limit enables us to formulate the problem in a “continuum limit” which makes the derivations quite tractable. We show, with the use of numerical simulations, that this continuum limit yields good approximations for the velocity of the original discrete system. For simplicity, we limit ourselves to the case where the underlying uncoupled LDPC code has only one non-trivial stable BP fixed point.

Our derivation rests on the assumption that the soliton indeed appears. More precisely, we assume that after an initial transient phase, the decoding profile develops a unique shape, independent of the initial condition, and travels with a constant velocity $v$ . This assumption can be strictly true only in an asymptotic limit of a very large chain length and a large iteration number (or time). It is an interesting open problem to make this space-time asymptotic limit precise and rigorously prove that the soliton appears and is independent of the initial condition. We conjecture that our velocity formula is exact in such a limit.

The formula for the velocity of the wave greatly simplifies when we consider transmission over the BEC, because the decoding profile reduces to a scalar vector of erasure probabilities. For transmission over general BMS channels, we also simplify the analysis by applying the Gaussian approximation [18], [19]. This consists of approximating the DE densities and the channel by suitable “symmetric” Gaussian densities. Since the mean $m$ and the variance $\sigma^{2}$ of these symmetric Gaussian densities are related by $\sigma^{2}=2m$ , the analysis then reduces to that of a one-dimensional scalar system, whose technical difficulty is similar to that of the case of transmission over the BEC. We thus obtain a more tractable velocity formula and compare the numerical predictions of these velocity formulas with the empirical value of the velocity for finite values of $L_{c}$ and $w$ . Good agreement is found, on practically the whole range of values within $[\epsilon_{\text{\tiny BP}},\epsilon_{\text{\tiny MAP}}]$ , even for small values of the window size $w$ .

It is of theoretical as well as practical interest to have a hold on the analytical expression of the velocity of the wave. The velocity is also related to other fundamental quantities that describe a coding system, such as the finite-size scaling law that predicts the error probability of finite-length spatially coupled codes. In [20], the scaling law for a finite-length spatially coupled $(\ell,r,L_{c})$ code, when transmission takes place over the BEC, is derived. Involved in this scaling law are parameters that can be estimated using the value of the velocity of the decoding wave. Using values of the velocity computed in our work, we provide reasonably good estimates of these parameters.

In the second part of this work, we consider general spatially coupled scalar bipartite systems (that are not restricted to coding) governed by a general message passing algorithm. In this setting, the system is scalar (one-dimensional) since the messages exchanged between the nodes are scalar. Due to seeding at the boundary, the “profile” (we no longer call it the “decoding profile”) exhibits the same phenomenology as in coding. Namely, a solitonic behavior appears after a short transient phase. We derive a formula for the velocity of the soliton for such systems and illustrate it on two applications: compressive sensing and generalized LDPC (GLDPC) codes.

The derivations of the velocity formulas in both parts of the work use the same tools and assumptions. We combine the use of the “potential functional” introduced and used in a series of works [3], [11], [14], [21], [22], as well as the continuum limit $L_{c}\gg w\gg 1$ which makes the derivations analytically tractable. The potential is a “variational formulation” of the message passing algorithm on coding systems. It is a functional whose stationary points are the fixed points of the density evolution equations described by this algorithm, and has been used to prove threshold saturation in [3] for general BMS channels. A significant part of the formalism in [3] is used in the present work. We also note that potential formulations have been used to characterize the fixed point(s) of general scalar systems at the MAP threshold using displacement convexity in [23], [24]. An extension of the ideas in these works could shed some light on the question of the independence of the soliton’s shape from the initial conditions.

Section II introduces a few preliminary notions that we will need and reviews the phenomenology of the solitonic wave. In Section III, we formulate the continuum limit and state our main formula for the velocity of the soliton on general BMS channels; the derivation is presented in Section IV. Comparisons with numerical experiments are presented in Section V. These concern transmission over the BEC as well as general BMS channels in the so-called Gaussian approximation. We also discuss a possible application of our formula to scaling laws for finite-size ensembles in Section VI. The case of general scalar spatially coupled systems is treated in Section VII, and illustrated for the examples of generalized LDPC codes (on the BEC or BSC channels) and compressive sensing. We present concluding remarks and propose further directions in Section VIII.

A summary of this work has appeared in [25], [26].

II Preliminaries

We consider (almost) the same setting as in [3] and adopt most of the notation introduced in that work. For more information about the formalism in these preliminaries, one can consult [27].

We denote by $M(\bar{\mathbb{R}})$ the space of probability measures $\mathtt{x}$ on the extended real numbers $\alpha\in\bar{\mathbb{R}}=\mathbb{R}\cup\{\infty\}$ . Here $\alpha\in\bar{\mathbb{R}}$ should be interpreted as a “log-likelihood variable”. We call the measure $\mathtt{x}$ symmetric if $\int_{E}\mathrm{d}\mathtt{x}(\alpha)=\int_{-E}e^{-\alpha}\mathrm{d}\mathtt{x}(\alpha)$ for all measurable sets $E\subset\bar{\mathbb{R}}$ .

We define an entropy functional $H:\mathcal{M}\rightarrow\mathbb{R}$ that maps a finite probability measure from $M(\bar{\mathbb{R}})$ to a real number. It is defined as

[TABLE]

Note that this is a linear functional. Linearity is used in an important way to compute the entropy of convex combinations of measures (which also yield probability measures). But we will also compute the “entropy” associated to differences of measures by setting $H(\mathtt{x}_{1}-\mathtt{x}_{2})\equiv H(\mathtt{x}_{1})-H(\mathtt{x}_{2})$ . In other words, the entropy functional is extended in an obvious way to the space of signed measures.

In the remainder of this work, we will use the Dirac masses $\Delta_{0}(\alpha)$ and $\Delta_{\infty}(\alpha)$ at zero and infinite log-likelihood, that have entropies $H(\Delta_{0})=1$ and $H(\Delta_{\infty})=0$ , respectively.

We will also use the standard variable-node and check-node convolution operators $\varoast$ and $\boxast$ for log-likelihood ratio message distributions involved in DE equations [27]. For $\mathtt{x}_{1}$ , $\mathtt{x}_{2}\in M(\bar{\mathbb{R}})$ , the usual convolution $\mathtt{x}_{1}\varoast\mathtt{x}_{2}$ is the density of

[TABLE]

and $\mathtt{x}_{1}\boxast\mathtt{x}_{2}$ is the density of

[TABLE]

More formally, for any measurable set $E\in\mathbb{R}$ , the operators are defined by

[TABLE]

We note that the identities of the $\varoast$ and $\boxast$ operators are $\Delta_{\infty}$ and $\Delta_{0}$ , and their annihilators are $\Delta_{0}$ and $\Delta_{\infty}$ . More explicitly,

[TABLE]

Each operation, taken separately, is associative, commutative, and linear. However, when they are taken together, there is no distributive law; also, they don’t associate in the sense that $\mathtt{x}_{1}\varoast(\mathtt{x}_{2}\boxast\mathtt{x}_{3})\neq(\mathtt{x}_{1}\varoast\mathtt{x}_{2})\boxast\mathtt{x}_{3}$ and $\mathtt{x}_{1}\boxast(\mathtt{x}_{2}\varoast\mathtt{x}_{3})\neq(\mathtt{x}_{1}\boxast\mathtt{x}_{2})\varoast\mathtt{x}_{3}$ . We will also use the so-called duality rules

[TABLE]

where $\mathtt{x},\mathtt{y}\in M(\bar{\mathbb{R}})$ and $\mathtt{a}$ , $\mathtt{b}$ are differences of probability measures $\mathtt{a}=\mathtt{x}_{1}-\mathtt{x}_{2}$ , $\mathtt{b}=\mathtt{x}_{3}-\mathtt{x}_{4}$ , $\mathtt{x}_{i}\in M(\bar{\mathbb{R}})$ , $i=1,2,3,4$ .

II-A Single System

Consider an (uncoupled) LDPC( $\lambda,\rho$ ) code ensemble and transmission over the BMS channel. Here, $\lambda(y)=\sum_{l}\lambda_{l}y^{l-1}$ and $\rho(y)=\sum_{r}\rho_{r}y^{r-1}$ are the usual edge-perspective variable-node and check-node degree distributions, respectively. The node-perspective degree distributions $L$ and $R$ are defined by $L^{\prime}(y)=L^{\prime}(1)\lambda(y)$ and $R^{\prime}(y)=R^{\prime}(1)\rho(y)$ , respectively. Moreover, consider communication over a family of BMS channels whose distribution $\mathtt{c}_{\mathtt{h}}(\alpha)$ in the log-likelihood domain is parametrized by the channel entropy111In the literature, this quantity is often denoted by $\mathtt{c}(\mathtt{h})$ . $H(\mathtt{c}_{\mathtt{h}})=\mathtt{h}$ .

Let $\tilde{\mathtt{x}}^{(t)}$ denote the variable-node output distribution of the BP algorithm at iteration $t\in\mathbb{N}$ . We can track the average behavior of the BP decoder by means of the DE iterative equations that are written as a recursion in terms of the variable-node output distribution as follows

[TABLE]

with initial condition $\tilde{\mathtt{x}}^{(0)}=\Delta_{0}$ (equivalently, we can take the perhaps more natural initial condition $\tilde{\mathtt{x}}^{(0)}=\mathtt{c}_{\mathtt{h}}$ ).

There are two thresholds of interest for us. The first one is the algorithmic threshold; it is defined for a family of BMS channels whose channel distributions $\mathtt{c}_{\mathtt{h}}(\alpha):\mathbb{R}\rightarrow M(\bar{\mathbb{R}})$ are ordered by degradation and parametrized by their entropy $H(\mathtt{c}_{\mathtt{h}})=\mathtt{h}$ . It is also called the BP threshold of the family and is defined as

[TABLE]

The second threshold corresponds to optimal (MAP) decoding

[TABLE]

where $H(X^{n}|Y^{n}(\mathtt{h}))$ is the conditional Shannon entropy of the input given by the channel observations, and $\mathbb{E}$ is the expectation over the code ensemble.

The potential functional $W_{s}(\mathtt{x})$ , $\mathtt{x}\in\mathcal{M}(\bar{\mathbb{R}})$ , of the “single” or uncoupled system is

[TABLE]

The fixed point form of the DE equation (5) is obtained by setting to zero the functional derivative of $W_{s}(\mathtt{x};\mathtt{c})$ with respect to $\mathtt{x}$ . In other words, $\mathtt{x}=\mathtt{c}_{\mathtt{h}}\varoast\lambda^{\varoast}(\rho^{\boxast}(\mathtt{x}))$ is equivalent to

[TABLE]

where $\mathtt{\eta}$ is a difference of two probability measures (see [3] for the proof of this statement). The BP and MAP thresholds, $\mathtt{h}_{\text{\tiny BP}}$ and $\mathtt{h}_{\text{\tiny MAP}}$ , respectively, can be obtained from the analysis of the stationary points of the potential function. See [3], [28] for more details and a rigorous discussion of this issue.

Remark about notation. In the remainder of this work, most of the time, we omit the subscript $\mathtt{h}$ from $\mathtt{c}_{\mathtt{h}}$ and the argument $\alpha$ from $\mathtt{x}(\alpha)$ . This is because we will need a subscript (resp. an argument) $z$ that represents the position along the chain in the discrete (resp. continuous) case.

II-B Spatially Coupled System

For standard LDPC codes, the BP threshold $\mathtt{h}_{\text{\tiny BP}}$ is, in general, lower than the MAP threshold $\mathtt{h}_{\text{\tiny MAP}}$ . The definitions of the BP and MAP thresholds above extend to the spatially coupled setting. Spatial coupling exhibits two attractive properties. First, the MAP threshold is conserved under coupling in the limit $L_{c}\to+\infty$ and for all $w$ . The proof of this statement is found in [28] (see also [29], [30]). Second, the BP threshold of the coupled system saturates to its MAP threshold as proved in [2], [3]. The main consequence of threshold saturation is that one can decode perfectly up to the $\mathtt{h}_{\text{\tiny MAP}}$ .

Let us now describe the density evolution and potential functional formalism for the spatially coupled code ensemble. Consider $L_{c}+w$ “replicas” of the single system described in Section II-A, placed on the spatial coordinates $z\in\{-w+1,\dots,L_{c}\}$ . The system at position $z$ is coupled to $w$ neighboring systems by means of a coupling window. For simplicity, we consider a uniform coupling window. We denote by $\tilde{\mathtt{x}}_{z}^{(t)}$ the variable-node output distribution at position $z\in\{-w+1,\dots,L_{c}\}$ on the spatial axis and at time $t\in\mathbb{N}$ . The DE equation of the coupled system takes the form

[TABLE]

In this equation, $\mathtt{c}_{z}=\mathtt{c}$ , for $z\in\{1,\dots,L_{c}\}$ and $\mathtt{c}_{z}=\Delta_{\infty}$ for $z\in\{-w+1,\dots,0\}$ . Furthermore, we fix the left boundary to $\mathtt{x}_{z}^{(t)}=\Delta_{\infty}$ for $z\in\{-w+1,\dots,0\}$ , for all $t\in\mathbb{N}$ . These conditions express perfect information at the left boundary which is what enables seeding and the decoding wave propagation along the chain of coupled codes. The initial condition (8) is $\mathtt{x}_{z}^{(0)}=\Delta_{0}$ for $z\in\{1,\dots,L_{c}+w-1\}$ .

It will be convenient to work with a smoothed version of the profile $\tilde{\mathtt{x}}_{z}^{(t)}$ , namely $\mathtt{x}_{z}^{(t)}=\frac{1}{w}\sum\limits_{i=0}^{w-1}\tilde{\mathtt{x}}_{z-i}^{(t)}$ , which is the check-node input distribution. Then, using this change of variables, (8) can be rewritten as

[TABLE]

Just as in the single system case, this DE equation can be expressed as the stationarity condition of a potential functional (see [3])

[TABLE]

where $\underline{\mathtt{x}}=(\mathtt{x}_{-w+1},\dots,\mathtt{x}_{L+w-1})$ . The fixed point form of (9) is equivalent to $\lim_{\gamma\to 0}\gamma^{-1}(W(\underline{\mathtt{x}}+\gamma\underline{\mathtt{\eta}})-W(\underline{\mathtt{x}}))=0$ for $\underline{\mathtt{\eta}}=(\mathtt{\eta}_{-w+1},\dots,\mathtt{\eta}_{L+w-1})$ where $\mathtt{\eta}_{i}$ are differences of probability measures.

II-C Phenomenological observations

Our derivation is far from rigorous and is based on an assumption derived from a phenomenological picture observed from simulations. We summarize the main observations in this paragraph for the case of transmission over the BEC channel. This channel also gives us the opportunity to illustrate the formalism outlined in Sections II-A and II-B in a concrete case.

The BEC has channel distribution $\mathtt{c}_{\epsilon}(\alpha)=\epsilon\Delta_{0}+(1-\epsilon)\Delta_{\infty}$ , where $\epsilon$ is the erasure probability, and $H(\mathtt{c}_{\epsilon})=\epsilon$ (hence $\mathtt{h}=\epsilon$ ). The density of the BP estimates of log-likelihood variables can be parametrized as $\mathtt{x}^{(t)}(\alpha)=x^{(t)}\Delta_{0}(\alpha)+(1-x^{(t)})\Delta_{\infty}(\alpha)$ , where $x^{(t)}\in[0,1]$ is interpreted as the erasure probability at iteration $t\in\mathbb{N}$ . The DE equation becomes a one-dimensional iterative map

[TABLE]

over scalars in $[0,1]$ . These iterations are always initialized with $x^{(0)}=1$ or, equivalently, $x^{(0)}=\epsilon$ . The corresponding fixed point equation is the stationarity condition for the potential function

[TABLE]

Note that the potential function is defined up to a constant which is set here so that $W_{\text{\tiny BEC}}(0)=0$ . Figure 1 illustrates the potential function for a $(3,6)$ -regular Gallager ensemble, for several values of $\epsilon$ . For $\epsilon<0.4294$ , the potential function (12) is strictly increasing, and equivalently the DE iterations are driven to the unique minimum at $x=0$ . At $\epsilon_{\text{\tiny BP}}=0.4294$ a horizontal inflexion point appears and a second non-trivial local minimum $x_{\text{\tiny BP}}$ appears; this minimum corresponds to the non-trivial fixed point reached by DE iterations. It is known that the MAP threshold is equal to the erasure probability where the non-trivial local minimum is at the same height as the trivial one and that decoding becomes impossible once the non-trivial minimum becomes a global minimum. For this example, this happens when $\epsilon_{\text{\tiny MAP}}=0.4881$ . Figure 1 also shows the energy gap that is defined for $\epsilon_{\text{\tiny BP}}\leq\epsilon\leq\epsilon_{\text{\tiny MAP}}$ as $\Delta E=W_{\text{\tiny BEC}}(x_{\text{\tiny BP}})-W_{\text{\tiny BEC}}(0)$ . At the MAP threshold, we have $\Delta E=0$ .

Let us now describe the phenomenology of the soliton (decoding wave) for spatially coupled codes. Our discussion is limited to the case where the underlying code ensemble has a single non-trivial DE fixed point (equivalently, the potential function has a single non-trivial local minimum). One can show that this is always the case for regular code ensembles. For irregular degree distributions the situation may be more complicated with many non-trivial fixed points that appear. For transmission over the BEC, equation (9) reads

[TABLE]

Here, $\epsilon_{z}=0$ for $z\in\{-w+1,\dots,0\}$ and $\epsilon_{z}=\epsilon$ for $z\in\{1,\dots,L_{c}\}$ . Furthermore, we fix the left boundary to $x_{z}^{(t)}=0$ for $z\in\{-w+1,\dots,0\}$ , for all $t\in\mathbb{N}$ . These are the “perfect seeding” conditions which enable the initiation of decoding. The initial condition for the iterations is $x_{z}^{(0)}=1$ (or $\epsilon$ ) for $z\in\{1,\dots,L_{c}+w-1\}$ .

The evolution of the decoding wave can be decomposed into two phases: a transient and a stationary phase. In the transient phase, we observe a profile of erasure probabilities $x_{z}$ changing shape. The segment initialized to $x_{z}^{(0)}=1$ quickly drops to $x_{z}\approx x_{\text{\tiny BP}}$ where it remains stuck on the far right for large values of $z$ . The seeding region, on the other hand, starts progressing towards the right-hand side and, after a few iterations, a fixed profile shape develops. This transient phase is illustrated in Figure 2 for an irregular code. Overall, it only lasts for a few iterations (of the order of $5$ iterations in this example). After this transient phase is over, one observes a stationary phase with a solitonic behavior, as depicted in Figure 3. The profile of erasure probabilities has a stationary shape with a front at position $z_{\text{\tiny front}}$ that moves, at a constant speed, towards the right. The soliton is relatively well-localized within approximately $2w$ positions and quickly approaches $x_{z}\to 0$ for $z<z_{\text{\tiny front}}$ and $x_{z}\to x_{\text{\tiny BP}}$ for $z>z_{\text{\tiny front}}$ . The stationary phase and its soliton are depicted in Figure 3 for a finite spatially coupled $(3,6)$ -regular ensemble with chain length $L_{c}=50$ and $w=3$ for $\epsilon=0.46$ . In this figure, we plot the decoding profile every 30 iterations starting at the $30^{th}$ iteration (the leftmost curve) and until the $150^{th}$ iteration (the rightmost curve). The kink increases sharply from $x_{z}=0$ to $x_{z}=x_{\text{\tiny BP}}=0.3789$ over a width of the order of $2w=6$ .

III Continuum limit and main result

III-A Continuum Limit

We now consider the coupled system in the continuum limit, in which the length of the coupling chain $L_{c}$ is first taken very large $L_{c}\rightarrow+\infty$ , and then the window size is taken very large $w\rightarrow+\infty$ . The continuum limit has already been considered for the special case of the BEC in [16], [23], [24]. We slightly abuse notation by keeping the same symbols for the profile, the spatial position, and the channel distribution in the continuum limit. Thus, we denote by $\mathtt{x}(\cdot,\cdot)$ the continuous profile of distributions and set $\mathtt{x}(\frac{z}{w},t)\equiv\mathtt{x}_{z}^{(t)}$ . We then replace $\frac{z}{w}\to z$ so that the new $z$ is the continuous variable on the spatial axis, $z\in\mathbb{R}$ .

In view of the discussion of the phenomenology in Section II-C, we consider the class of profiles satisfying the “natural boundary conditions” $\mathtt{x}(z,t)\to\Delta_{\infty}$ when $z\to-\infty$ for all $t\in\mathbb{R}$ , $\mathtt{x}(z,t)\to\mathtt{x}_{\text{\tiny BP}}$ when $z\to+\infty$ for all $t\in\mathbb{N}$ , where $\mathtt{x}_{\text{\tiny BP}}$ is the unique non-trivial stable fixed point of DE for the single system Equ. (5).

The BMS channel distribution is now also continuous, and we denote by $\mathtt{c}(z)$ the channel distribution at the continuous spatial position $z\in\mathbb{R}$ . The DE equation (9) then takes the form

[TABLE]

The initial condition at $t=0$ is given by a profile $\mathtt{x}(z,0)$ that interpolates between the two limiting values of the boundary condition, namely $\mathtt{x}(z,0)\to\Delta_{\infty}$ when $z\to-\infty$ and $\mathtt{x}(z,0)\to\mathtt{x}_{\text{\tiny BP}}$ when $z\to+\infty$ .

III-B Statement of Main Result

We consider the channel entropy $\mathtt{h}\in[\mathtt{h}_{\text{\tiny BP}},\mathtt{h}_{\text{\tiny MAP}}]$ . The phenomenology tells us that: (i) after a transient phase, the profile develops a fixed shape $\mathtt{X}(\cdot)$ ; (ii) the shape is independent of the initial condition; (iii) the shape travels at constant speed $v$ ; (iv) the shape satisfies the boundary conditions $\mathtt{X}(z)\to\Delta_{\infty}$ for $z\to-\infty$ and $\mathtt{X}(z)\to\mathtt{x}_{\text{\tiny BP}}$ for $z\to+\infty$ . We thus formalize these observations and make an ansatz:

Ansatz. For each $\mathtt{h}\in[\mathtt{h}_{\text{\tiny BP}},\mathtt{h}_{\text{\tiny MAP}}]$ there exist a constant $v\geq 0$ and a family of probability measures $\mathtt{X}(z)$ (indexed by $z\in\mathbb{R}$ ) satisfying the boundary conditions $\mathtt{X}(z)\to\Delta_{\infty}$ for $z\to-\infty$ and $\mathtt{X}(z)\to\mathtt{x}_{\text{\tiny BP}}$ for $z\to+\infty$ , such that, for $t\to+\infty$ and $|z-vt|=O(1)$ , the solution of DE (14) is independent of the initial condition and satisfies $\mathtt{x}(z,t)\to\mathtt{X}(z-vt)$ .

Implicit in this ansatz is that we restrict ourselves here to underlying code ensembles that have only one non-trivial (stable) BP fixed point. This is true for regular codes for example (but is not limited to this case). Ensembles with many non-trivial fixed points could lead to more complicated phenomenologies as emphasized in [17] and would require to modify the ansatz.

Velocity of the soliton for general BMS channels. Under the assumption above the velocity of the soliton is given by

[TABLE]

where $\Delta E$ is the energy gap defined as

[TABLE]

and we recall that $W_{s}$ is the potential (6) of the uncoupled system , $\mathtt{x}_{\text{\tiny BP}}$ is the non-trivial BP fixed point to which the uncoupled system converges, and $\Delta_{\infty}$ is the trivial fixed point (Dirac mass at infinity).

Let us make a few remarks. In this formula, the prime denotes the derivative $\mathtt{X}^{\prime}(z)=\lim_{\delta\to 0}\delta^{-1}(\mathtt{X}(z+\delta)-\mathtt{X}(z))$ which is to be interpreted as a difference between two measures. The energy gap is only defined for $\mathtt{h}_{\text{\tiny BP}}\leq\mathtt{h}\leq\mathtt{h}_{\text{\tiny MAP}}$ , that is, when the single potential $W_{s}$ has a non-trivial non-negative local minimum (see e.g. Figure 1). It is exactly equal to zero when $\mathtt{h}=\mathtt{h}_{\text{\tiny MAP}}$ , which confirms the fact that the velocity of decoding is zero (no decoding occurs) in this case. Note also that, with our normalizations, $W_{s}(\Delta_{\infty})=0$ .

Formula (15) involves the shape $\mathtt{X}(\cdot)$ . Using the DE equation, the ansatz $\mathtt{x}(z,t)\to\mathtt{X}(z-vt)$ , and the approximation $\mathtt{x}(z,t+1)-\mathtt{x}(z,t)\approx-v\mathtt{X}^{\prime}(z-vt)$ , valid for small $v$ , we find after a change of variables that $\mathtt{X}(z)$ is the solution of

[TABLE]

To obtain the shape $\mathtt{X}(z)$ and the velocity $v$ , one must iteratively solve the closed system of equations formed by (15) and (17). Note that the assumption of small $v$ used above is strictly valid for $\mathtt{h}$ close to $\mathtt{h}_{\text{\tiny MAP}}$ . However, numerical simulations confirm that in practice the resulting velocity formula is precise over the whole range $[\mathtt{h}_{\text{\tiny BP}},\mathtt{h}_{\text{\tiny MAP}}]$ .

IV Derivation of velocity formula for BMS channels

Let us briefly outline of the main steps of derivation. We first write down a potential functional which gives, in the continuous setting, the DE fixed point equation corresponding to (14). This enables us to formulate the DE iterations as a sort of gradient descent equation (Section IV-A). From there on, we use the ansatz in Section III-B to derive the velocity formula (15).

IV-A Density evolution as gradient descent

We call $\Delta\mathcal{W}(\mathtt{x})$ the potential functional of the coupled system in the continuum limit obtained from (10). This limit involves an integral over the spatial direction $z\in\mathbb{R}$ and, in order to get a convergent result, we must subtract a “reference energy”. Essentially, any static reference profile, here called $\mathtt{x}_{0}(z)$ , that satisfies the boundary conditions $\mathtt{x}_{0}(z)\to\Delta_{\infty}$ , $z\to-\infty$ and $\mathtt{x}_{0}(z)\to x_{\text{\tiny BP}}$ , $z\to+\infty$ , will do the job. For concreteness, one can take a Heaviside-like profile $\mathtt{x}_{0}(z)=\Delta_{\infty}$ , $z<0$ , $\mathtt{x}_{0}(z)=x_{\text{\tiny BP}}$ , $z\geq 0$ . The potential functional is thus defined as follows,

[TABLE]

where $P(z,\mathtt{x})$ is a $z$ -dependent functional of $\mathtt{x}$ equal to

[TABLE]

In Appendix A, we calculate the functional derivative of $\Delta\mathcal{W}(\mathtt{x})$ in a direction $\mathtt{\eta}(z,t)$ defined as

[TABLE]

and find

[TABLE]

From (14) and (21) we deduce that

[TABLE]

We note that, using the duality rule (4) for $\mathtt{a}=\mathtt{x}(z,t+1)-\mathtt{x}(z,t)$ and $b=\rho^{\prime\boxast}(\mathtt{x}(z,t))\boxast\eta(z,t)$ (recall that $\eta$ must be a difference of two measures so that $b$ also is such a difference) and the associativity of $\boxast$ , this equation can also be formulated as

[TABLE]

In this form, we recognize a sort of infinite-dimensional gradient descent equation in a space of measures. This reformulation of DE forms the basis of the derivation of the velocity formula.

IV-B Final steps of the derivation

The potential functional can be decomposed in a “single system” part and a contribution that contains the “interaction” due to coupling. We have

[TABLE]

with

[TABLE]

where

[TABLE]

and

[TABLE]

Note, for future use, that in fact $P_{s}(z,\mathtt{x})=W_{s}(\mathtt{x}(z,t))$ is the single system potential (6) “at position $z$ ”. With these definitions, the gradient descent equation (23) can be written as

[TABLE]

Now, we use the ansatz to compute the three terms in this equation in the regime $t\to+\infty$ , $z\to+\infty$ such that $|z-vt|=O(1)$ . We will choose the direction $\eta(z,t)=\mathtt{X}^{\prime}(z-vt)$ .

We start with the left-hand side of (29). From $\mathtt{x}(z,t)\to\mathtt{X}(z-vt)$ and the approximation $\mathtt{x}(z,t+1)-\mathtt{x}(z,t)\approx-v\mathtt{X}^{\prime}(z-vt)$ , together with the special choice of $\eta(z,t)$ , we can rewrite the left hand side of (29) as

[TABLE]

Using the commutativity of the operator $\boxast$ , this is equal to

[TABLE]

Note that we can shift the argument in the integrals $z-vt\to z$ , and this term becomes independent of time.

Now, we consider the first functional derivative on the right hand side of (29), when $\eta(z,t)=\mathtt{X}^{\prime}(z-vt)$ . It should be clear that we can immediately make the change of variables in the integrals $z-vt\to z$ which simplifies the formulas. By the calculations in Appendix A, we find

[TABLE]

In order to simplify the above, we remark the following

[TABLE]

Noticing that the first two terms on the right-hand side cancel out, and using the duality rule (4) for the third term, we get the integrand in (32). In other words, the integrand in (32) equals $\frac{\mathrm{d}}{\mathrm{d}z}P_{s}(z,\mathtt{X})=\frac{\mathrm{d}}{\mathrm{d}z}W_{s}(z,\mathtt{X}(z))$ and

[TABLE]

We now show that the functional derivative of the interaction part in (29) does not contribute when $\eta(z,t)$ is replaced by $\mathtt{X}^{\prime}(z)$ . By directly applying the definition of the functional derivative, we find

[TABLE]

We notice that the integrand is a total derivative; namely, it is equal to

[TABLE]

Due to the boundary conditions, we have $\lim_{z\to-\infty}\mathtt{X}(z)=\lim_{z\to-\infty}\mathtt{X}(z+u)=\Delta_{\infty}$ and $\lim_{z\to+\infty}\mathtt{X}(z)=\lim_{z\to+\infty}\mathtt{X}(z+u)=\mathtt{x}_{\text{\tiny BP}}$ , and we can conclude that the total derivative integrates to zero, thus

[TABLE]

Finally, replacing (31), (33), and (35) in (29) we get the simple relationship

[TABLE]

which yields the velocity formula (15).

V Applications to specific channels and comparisons with Numerical Experiments

V-A Binary Erasure Channel (BEC)

The formula for the velocity, when transmission takes place over the BEC, can be obtained by directly simplifying the general formula in (15). We note that, since the BEC yields a scalar system, one can also use the formula for general scalar systems in Section VII (that covers cases beyond coding theory also). We will suppose that the underlying code LDPC $(\lambda,\rho)$ is such that the DE equation has a single non-trivial fixed point $x_{\text{\tiny BP}}\neq 0$ . Furthermore, we fix $\epsilon_{\text{\tiny BP}}\leq\epsilon\leq\epsilon_{\text{\tiny MAP}}$ (recall the channel entropy reduces to $H(\mathtt{c})=\mathtt{h}=\epsilon$ here).

The channel distribution can be written as $\mathtt{c}=\epsilon\Delta_{0}+(1-\epsilon)\Delta_{\infty}$ , and the profile is of the form $\mathtt{x}(z,t)=x(z,t)\Delta_{0}+(1-x(z,t))\Delta_{\infty}$ where $0\leq x(z,t)\leq 1$ is the scalar erasure probability at position $z$ and time $t$ . This tends to a fixed shape

[TABLE]

where $0\leq X(z)\leq 1$ satisfies $\lim_{z\to-\infty}X(z)=0$ , $\lim_{z\to+\infty}X(z)=x_{\text{\tiny BP}}$ . We have also

[TABLE]

We also note the following identities valid for scalar maps $f,g:\mathbb{R}\to[0,1]$ (such as $\lambda$ , $\rho$ , $L$ , $R$ and their derivatives)

[TABLE]

Let us compute the denominator of (15). Using (3), (38), and (39) we have

[TABLE]

and since $H(\Delta_{0})=1$ , $H(\Delta_{\infty})=0$ , and the entropy functional is linear, we obtain the denominator of (15) as

[TABLE]

For the numerator of (15), we have $\Delta E=W_{\text{\tiny BEC}}(x_{\text{\tiny BP}})-W_{\text{\tiny BEC}}(0)$ , where the single system potential on the BEC is obtained from (6) using again (3) and (39). The exercise yields

[TABLE]

Putting together these results, the velocity (15) becomes

[TABLE]

(Note that, with our normalizations, $W_{\text{\tiny BEC}}(0)=0$ for all $\epsilon$ .) The erasure profile $X(z)$ has to be computed from the one-dimensional integral equation

[TABLE]

Obviously, the velocity vanishes when $\epsilon\to\epsilon_{\text{\tiny MAP}}$ since then $W_{\text{\tiny BEC}}(x_{\text{\tiny BP}})\to W_{\text{\tiny BEC}}(0)=0$ . An important quantity is the slope of the velocity at $\epsilon_{\text{\tiny MAP}}$ . To compute it, we remark that $W_{\text{\tiny BEC}}$ has an explicit dependence on $\epsilon$ , as well as an implicit one through $x_{\text{\tiny BP}}(\epsilon)$ . Thus,

[TABLE]

so that, for $\epsilon\to\epsilon_{\text{\tiny MAP}}$ , the Taylor expansion to first order yields

[TABLE]

Note that we used $x_{\text{\tiny BP}}(\epsilon)\to x_{\text{\tiny BP}}(\epsilon_{\text{\tiny MAP}})=x_{\text{\tiny MAP}}$ where $x_{\text{\tiny MAP}}$ is defined as the point $x\neq 0$ where the potential is stationary and vanishes. This yields the linear approximation for the velocity

[TABLE]

where $X_{\text{\tiny MAP}}(\cdot)$ is the erasure probability profile obtained when $\epsilon=\epsilon_{\text{\tiny MAP}}$ .

It is interesting to compare (40) with the upper bound of Theorem 1 in [17] for a discrete system

[TABLE]

In [17] the derivation of the bound yields $\alpha\leq 2$ (for $L_{c}$ and $w$ large enough) but it is conjectured based on numerical simulations that $\alpha=1$ would be a tight bound. Obviously, (40) and (43) are consistent. We note for reference that another upper bound is also derived in [17], namely

[TABLE]

where and $x_{u}$ and $x_{\text{\tiny BP}}$ are respectively the non-trivial unstable and stable fixed points of the potential of the uncoupled system $W_{s}(\cdot;\cdot)$ . We do not discuss this further because in practice this turns out to be a very loose bound.

We now compare the analytical velocity formula (40) with the empirical velocity (called $v_{e}$ below) obtained by simulating the discrete DE equation; we show that it provides a very good approximation for the (real) value of the velocity even for relatively small values of $w$ . For the simulations, we consider the spatially coupled $(3,6)$ and $(4,6)$ -regular code ensembles, as well as two irregular LDPC codes (described later). We run the simulations for several values of the chain length $L_{c}=256,1024$ and the window size $w=3,5,8,16$ . The empirical velocity is the velocity calculated from erasure probability profiles of the discrete DE equation. Consider two (discrete) profiles $\underline{x}^{(t_{1})}$ and $\underline{x}^{(t_{2})}$ at any two iterations $t_{1}$ and $t_{2}$ , respectively, with $t_{1}<t_{2}$ . After the transient phase is over, the profiles are identical up to translation. We call a “kink” the part of the profile where there is a fast increase from [math] to $x_{\text{\tiny BP}}$ in the erasure probability. The kink “position” is the coordinate such that the height is equal to $x_{\text{\tiny BP}}/2$ , and $\Delta z$ is the difference of two such positions (on two different profiles). Then, the empirical velocity $v_{e}$ is

[TABLE]

In practice, we get reliable results by taking pairs of profiles separated by $20$ iterations and averaging this ratio over every consecutive pair of profiles. Note that we normalize the velocity by $w$ to be able to compare systems with different window widths.

In Table I, we give empirical values $v_{e}$ of the normalized velocities for the spatially coupled $(4,6)$ -regular code ensemble, with transmission over the BEC(0.6), when the spatial length is $1024$ positions and the channel parameter is fixed to $\epsilon=0.6$ (between the BP and MAP thresholds), for different values of the window size $w$ . We observe that the result of our formula $v_{\text{\tiny BEC}}$ gives a good estimate of the empirical velocity $v_{e}$ for all the demonstrated values of the window size. We also observe that the linear approximation gives a good estimate when the channel parameter is not too far from the MAP threshold $\epsilon_{\text{\tiny MAP}}$ . The upper bound $v_{B}$ [17] gives a better estimate as the window size grows larger.

In Table II, we give empirical values of the normalized velocities for the spatially coupled $(3,6)$ -regular code ensemble, with transmission over the BEC( $\epsilon$ ), when the spatial length equals $1024$ and the window size $w=8$ , for different values of the channel parameter $\epsilon$ . One can compare these values with those in [17] (up to a factor equal to $w$ due to the normalization). The result of the formula $v_{\text{\tiny BEC}}$ gives the closest estimate to the empirical velocity $v_{e}$ for all values of $\epsilon$ .

Figures 4 and 5 show the empirical velocity $v_{e}$ , the analytical velocity $v_{\text{\tiny BEC}}$ , and the upper bound $v_{B}$ for the spatially coupled $(3,6)$ -regular code ensemble, with spatial length $256$ and window size $w=3$ . We remark that our formula fits very well, for the $(3,6)$ -regular code, with the empirical velocity for all values of the channel parameter $\epsilon\in[\epsilon_{\text{\tiny BP}},\epsilon_{\text{\tiny MAP}}]=[0.43,0.488]$ . The agreement is quite good also for the $(4,6)$ -regular code and very good for more than half of the interval $[\epsilon_{\text{\tiny BP}},\epsilon_{\text{\tiny MAP}}]\approx[0.515,0.719]$ .

We also illustrate the results for two irregular code ensembles in Figures 6 and 7. The first one has node degree distributions $L(x)=0.3x^{2}+0.6x^{3}+0.1x^{5}$ and $R(x)=x^{4}$ , spatial length $1024$ , and window size $w=4$ . The agreement between $v_{\text{\tiny BEC}}$ and $v_{e}$ is excellent for the whole range $\epsilon\in[\epsilon_{\text{\tiny BP}},\epsilon_{\text{\tiny MAP}}]=[0.657,0.719]$ . The second one has $L(x)=0.4x^{3}+0.3x^{4}+0.3x^{5}$ and $R(x)=0.5x^{8}+0.5x^{12}$ , spatial length $256$ , and window size $w=3$ . The agreement between the velocities is also very good for most of the range $\epsilon\in[\epsilon_{\text{\tiny BP}},\epsilon_{\text{\tiny MAP}}]=[0.311,0.385]$ .

V-B Gaussian Approximation (GA)

DE equations relate probability densities and as such we may need to track an infinite set of parameters (except for the BEC where the space of densities can be parametrized by a single real number). In many situations, such as the case when we have large degrees, for example, the densities are well approximated by Gaussians, which enables us to project the DE equations down to a low dimensional space. There are several variants of the Gaussian approximation (see for example [31], [18], [19]), and here we use it in a form called the “reciprocal channel approximation” proposed in [18], [19].

The idea is to assume that the densities of the LLR messages appearing in the DE equations are symmetric Gaussian densities. Such densities take the form

[TABLE]

with the mean $m$ and variance $\sigma^{2}$ satisfying $\sigma^{2}=2m$ . Furthermore, the channel density $\mathtt{c}$ is replaced by that corresponding to a BIAWGNC( $\sigma_{n}^{2}$ ) with the same entropy $H(\mathtt{c})$ . Density evolution can then conveniently be expressed in terms of the entropies $p_{z}^{(t)}=H(\mathtt{x}_{z}^{(t)})$ . This is done as follows. Let $\psi(m)$ denote the entropy222For indications on the numerical implementation of this function see [27], pp.194 and 237. of a symmetric Gaussian density of mean $m$ given by

[TABLE]

Thus $\psi^{-1}(p)$ denotes the mean of a symmetric Gaussian density $\mathtt{x}$ of entropy $p=H(\mathtt{x})$ . Take two symmetric Gaussian densities $\mathtt{x}_{1}$ and $\mathtt{x}_{2}$ of means $m_{1}$ and $m_{2}$ and entropies $p_{1}=\psi(m_{1})$ and $p_{2}=\psi(m_{2})$ . We have, in general,

[TABLE]

which just expresses the fact that a usual convolution of two Gaussian densities of means $m_{1}$ and $m_{2}$ is a Gaussian density of mean $m_{1}+m_{2}$ . On the other hand $\mathtt{x}_{1}\boxast\mathtt{x}_{2}$ is not exactly Gaussian so there is no exact formula but the idea here is to preserve the duality rule $H(\mathtt{x}_{1}\boxast\mathtt{x}_{2})+H(\mathtt{x}_{1}\otimes\mathtt{x}_{2})=H(\mathtt{x}_{1})+H(\mathtt{x}_{2})$ . Writing this relation as

[TABLE]

and noting that $H(\Delta_{0}-\mathtt{x}_{1})=1-p_{1}$ , $H(\Delta_{0}-\mathtt{x}_{2})=1-p_{2}$ suggests the approximation

[TABLE]

Looking at the entropies of the DE equations (47) and (48) imply (we will limit ourselves to regular codes for simplicity)

[TABLE]

and setting $p^{(t)}=H(\mathtt{x}^{(t)})$ , $q^{(t)}=H(\mathtt{y}^{(t)}$ we find

[TABLE]

These equations can be combined into

[TABLE]

The corresponding potential function is easily found from (6)

[TABLE]

For the coupled system, we denote by $p_{z}$ the average over positions $\{z,\dots,z+w\}$ of the entropy of symmetric Gaussian densities emanating from the variable nodes. The coupled DE equations then take the form

[TABLE]

This coupled recursion can be solved with appropriate boundary conditions and one observes a scalar wave propagation, as shown in Figure 8.

We are now ready to discuss the application to the velocity formula. The continuum limit is obtained exactly as in Section III-A. The assumption that the density $\mathtt{x}(z,t)$ tends to a fixed shape $\mathtt{X}(z-v_{\text{\tiny GA}}t)$ after the transient phase implies that its entropy $p(z,t)$ tends to $P(z-v_{\text{\tiny GA}}t)\equiv H(\mathtt{X}(z-v_{\text{\tiny GA}}t))$ , where $P(z)$ is a scalar function (independent of initial conditions) satisfying the integral equation

[TABLE]

and the boundary conditions $\lim_{z\to-\infty}P(z)=0$ and $\lim_{z\to+\infty}=p_{\text{\tiny BP}}$ where $p_{\text{\tiny BP}}$ is the non-trivial fixed point of (51). We will now show that the velocity formula reduces to

[TABLE]

To derive (55), we consider the denominator in (15) and write it as follows

[TABLE]

Computing each entropy in the Gaussian approximation, we find for the bracket on the right-hand side

[TABLE]

In Appendix B, we compute the limit of this term when $\delta\to 0$ by appropriate Taylor expansions and find

[TABLE]

This concludes the derivation of (55).

Table III gives a comparison of analytical and empirical velocities, $v_{\text{\tiny GA}}$ and $v_{e,\text{\tiny GA}}$ , obtained for the $(3,6)$ and the $(4,8)$ -regular ensembles, for a spatial length of $100$ and a window size $w=3$ for different values of $\psi^{-1}(H(c))=\sigma_{n}^{2}/2$ (twice the signal to noise ratio). We also plot both velocities for the $(3,6)$ -regular ensemble for the same parameters in Figure 9. We conjecture that the errors incurred from these plots are due to numerical errors involved in computing the functions $\psi$ and its inverse.

VI Application to Scaling Laws for Finite-Length Coupled Codes

The authors in [20] propose a scaling law to predict the error probability of a finite-length spatially coupled $(\ell,r,L_{c})$ code when transmission takes place over the BEC. The derived scaling law depends on scaling parameters, one of which we will relate to the velocity of the decoding wave. The $(\ell,r,L_{c})$ ensemble considered in [20] differs slightly from the purely random ensemble we consider in this work. However, as we will see, our formula for the velocity yields results that are reasonably good for this application. We briefly describe this ensemble and the scaling law.

The $(\ell,r,L_{c})$ ensemble combines the benefits of purely random codes (that we consider in this work) and protograph-based codes [32]. The randomness involved in the construction makes the ensemble relatively easy to analyze, and the structure added to the construction due to its similarity to protograph-based codes improves the performance of the code. The ensemble is constructed as follows: Make $L_{c}+w$ copies of an uncoupled code at positions $z=-w+1,\dots,L_{c}$ . All edges are erased then reconnected such that a variable node at position $z_{0}$ has exactly one edge with each set of check nodes at positions $z_{0}+i$ , where $i=0,\dots,\ell-1$ . The check nodes are chosen such that the regularity of their degree is maintained. Therefore, every variable node has $\ell$ emanating edges and every check node has $r$ such edges.

We consider transmission over the BEC. In this case, the BP decoder can be seen as a peeling decoder [33]. Whenever a variable node is decoded, it is removed from the graph along with its edges. One way to track this peeling process is to analyze the evolution of the degree distribution of the residual graph across iterations, which serves as a sufficient statistic. This statistic can be described by a system of differential equations, whose solution determines the mean and variance of the fraction of degree-one check nodes and the variance around this mean at any time during the decoding process. We call $\hat{r}_{1}$ the mean.

It has been shown in [20] that there exists a steady state phase where the mean and the variance are constant. It is exactly during this phase that one can observe the progression of the soliton. We note that here we consider one-sided termination instead of two-sided termination (as considered in [20]), so the fraction $\hat{r}_{1}$ here is equal to half the fraction called $\hat{r}_{1}(*)$ in [20].

Let $\epsilon_{(\ell,r,L_{c})}$ denote the BP threshold of the finite-size $(\ell,r,L_{c})$ ensemble (for large $L_{c}$ this is close to $\epsilon_{\text{\tiny MAP}}$ due to threshold saturation). We can write the first-order Taylor expansion of $\hat{r}_{1}\big{|}_{\epsilon}$ around $\epsilon<\epsilon_{(\ell,r,L_{c})}$ as

[TABLE]

where $\Delta\epsilon=\epsilon_{(\ell,r,L_{c})}-\epsilon$ . Thus, for a given $\epsilon<\epsilon_{(\ell,r,L_{c})}$ and since $\hat{r}_{1}\big{|}_{\epsilon_{(\ell,r,L_{c})}}=0$ (by definition), we obtain

[TABLE]

This parameter $\gamma$ enters in the scaling law and is therefore quite important. So far, it has been determined only experimentally. It would clearly be desirable to have a theoretical handle on $\gamma$ . It is argued in [20] that $\gamma\approx\bar{\gamma}$ where $\bar{\gamma}=x_{\text{\tiny BP}}/c$ and $c$ is a real positive constant that behaves like $\Delta\epsilon/v_{\text{\tiny BEC}}$ , i.e.,

[TABLE]

It is expected that this formula becomes exact in an asymptotic limit where threshold saturation takes place $\epsilon_{(\ell,r,L_{c})}\to\epsilon_{\text{\tiny MAP}}$ . Using the linearization (42), we obtain

[TABLE]

The parameter $\bar{\gamma}$ is simply equal to the erasure probability times the slope of the velocity at $\epsilon_{\text{\tiny MAP}}$ .

We compare the values of $\gamma$ and $\bar{\gamma}$ for different values of $\ell$ and $r$ , at a channel parameter $\epsilon=\epsilon_{(\ell,r,L_{c})}-0.04$ , in Table IV. The experimental values of $\gamma$ are taken from [20] and, for those of $\bar{\gamma}$ , we use the analytical velocity (40). We observe that the numbers roughly agree. There are two reasons that can explain the discrepancies. Firstly, we derive the velocity for the purely random spatially coupled graph ensemble whereas the ensemble considered in [20] is more structured. Note also that as $\ell$ , $r$ increase, the window size of the structured ensemble increases, so the finite size effects at fixed $L_{c}=100$ may be more marked. Secondly, expression (59) is valid when $\epsilon\to\epsilon_{(\ell,r,L_{c})}$ , whereas in IV $\Delta\epsilon=0.04$ , which is relatively large (this choice in [20] is due to stability issues in numerical integration techniques when $\epsilon\to\epsilon_{(\ell,r,L_{c})}$ ). We conjecture that the second issue is the dominant reason for the difference between the values of $\gamma$ and $\bar{\gamma}$ and that, in fact, the velocity for the structured ensemble is not very different from the one predicted by our formula (40).

VII Velocity for General Scalar Coupled Systems

In this section, we consider general scalar spatially coupled systems. That is, we do not restrict ourselves to coding problems; however, we only consider systems in which the “density evolution type” analysis of message passing algorithms involves scalar values. Our main result is again a general formula for the velocity of the soliton in the framework of general scalar coupled systems. There are numerous systems that fall in this class - coding with transmission on the BEC being one of the simplest - and in Sections VII-D, VII-E we will illustrate our results with two examples, namely generalized LDPC codes on the BEC and BSC, as well as compressive sensing.

VII-A General scalar systems

We adopt the framework and notation in [21]. We denote by $\epsilon\in[0,\epsilon_{\text{\tiny max}}]$ where $\epsilon_{\text{\tiny max}}\in(0,\infty)$ the interval of values for the control parameter $\epsilon$ . Consider bounded, smooth functions that are increasing in both their arguments $g:[0,x_{\text{\tiny max}}(\epsilon)]\times[0,\epsilon_{\text{\tiny max}}]\to[0,y_{\text{\tiny max}}(\epsilon)]$ and $f:[0,y_{\text{\tiny max}}(\epsilon)]\times[0,\epsilon_{\text{\tiny max}}]\to[0,x_{\text{\tiny max}}(\epsilon)]$ where $x_{\text{\tiny max}}(\epsilon)$ , $y_{\text{\tiny max}}(\epsilon)\in(0,\infty)$ and $y_{\text{\tiny max}}(\epsilon)=g(x_{\text{\tiny max}}(\epsilon);\epsilon)$ . The scalar recursions that interest us are

[TABLE]

where $t\in\mathbb{N}$ is the iteration number. The recursion is initialized with $x^{(0)}=x_{\text{\tiny max}}$ . Since $f(g([0,x_{\text{\tiny max}}(\epsilon)])))\subset[0,x_{\text{\tiny max}}(\epsilon)]$ , the initialization of (61) implies that $x^{(1)}\leq x^{(0)}=x_{\text{\tiny max}}$ and more generally $x^{(t+1)}\leq x^{(t)}$ . Thus $x^{(t)}$ will converge to a limiting value $x^{(\infty)}$ and this limit is a fixed point since $f$ and $g$ are continuous. The fixed points of the recursion (61) can be described as stationary points of a single system potential function $U_{s}$ defined as

[TABLE]

where $F(x;\epsilon)=\int_{g(0;\epsilon)}^{x}\mathrm{d}s\,f(s;\epsilon)$ and $G(x;\epsilon)=\int_{0}^{x}\mathrm{d}s\,g(s;\epsilon)$ . Without loss of generality, this function is normalized so that $U_{s}(x)=0$ .

We define $x_{\text{\tiny good}}(\epsilon)$ as the fixed point of the recursion (61) that is reached with an initialization $x^{(0)}=0$ . Furthermore, the algorithmic threshold333Here, we mean the algorithmic threshold of the message passing algorithm underlying the recursion. is defined as

[TABLE]

The monotonicity of $f$ and $g$ implies that, for $\epsilon<\epsilon_{a}$ , the basin of attraction of $x_{\text{\tiny good}}$ is the whole interval $[0,x_{\max}(\epsilon)]$ . Moreover $x_{\text{\tiny good}}$ is the unique stationary point of the potential function and is a minimum since it is an attractive fixed point. For $\epsilon>\epsilon_{a}$ we have $x^{(\infty)}\neq x_{\text{\tiny good}}$ and we set $x^{(\infty)}=x_{\text{\tiny bad}}$ (where $x_{\text{\tiny bad}}$ depends on $\epsilon$ ). Note that this is an attractive fixed point and is thus a (local) minimum of $U_{s}(x)$ . The two attractive fixed points are separated by at least one unstable fixed point $x_{\text{\tiny unst}}$ which is a local maximum of $U_{s}(x)$ . We henceforth assume that there does not appear any other fixed point besides $x_{\text{\tiny good}}$ , $x_{\text{\tiny unst}}$ , and $x_{\text{\tiny bad}}$ . With this assumption in mind, we define the energy gap as

[TABLE]

and the potential threshold as the unique value $\epsilon_{\text{\tiny pot}}$ such that $\Delta E=0$ (it can be shown that $\Delta E$ in non-increasing in $\epsilon$ ).

The corresponding spatially coupled recursions are obtained by placing $L_{c}+w$ replicas of the single system on the spatial positions $z\in\{-w+1,\dots,L_{c}\}$ and coupling them with a uniform coupling window of size $w$ . The coupled recursion takes the form

[TABLE]

Motivated by the phenomenology observed in many examples (e.g. for the BEC or for compressve sensing), in order to study the stationary phase where a soliton appears, we fix the boundary conditions as $x_{z}^{(t)}=x_{\text{\tiny good}}$ , for $z=\{-w+1,\dots,-1\}$ and all $t\in\mathbb{N}$ . The initialization of the recursion is $x_{z}^{(0)}=x_{\text{\tiny bad}}$ , for $z=\{0,\dots,L_{c}\}$ . The corresponding potential functional is given by

[TABLE]

where $\mathbf{x}=(x_{-w+1},\dots,x_{L_{c}})$ . The fixed point equation (65) can be obtained by setting the derivative with respect to $\mathbf{x}$ of the potential $U_{c}(\mathbf{x})$ to zero.

The spatially coupled recursions (65) display the threshold saturation property. Namely, for all $\epsilon<\epsilon_{\text{\tiny pot}}$ the fixed point $x_{z}^{(\infty)}$ , $z=-w+1,\dots,L$ , of the recursion (65) is equal to a constant profile $x_{\text{\tiny good}}$ . In the remainder of this section, we consider the range $\epsilon\in[\epsilon_{\text{\tiny BP}},\epsilon_{\text{\tiny MAP}}]$ . It is for these values of the parameter $\epsilon$ that a soliton propagating at finite speed is observed, after a transient phase lasting only for a few iterations. The soliton is a kink with a front at position $z_{\text{\tiny front}}$ , making a quick transition of width $O(2w)$ , between the two values $x_{z}^{(t)}\approx x_{\text{\tiny good}}$ for $z<<z_{\text{\tiny front}}$ and $x_{z}^{(t)}\approx x_{\text{\tiny bad}}$ for $z>>z_{\text{\tiny front}}$ .

The simplest example to keep in mind for the setting described above, as well as for the next paragraph, is the case of LDPC $(\lambda,\rho)$ codes with transmission over the BEC $(\epsilon)$ where $f(x;\epsilon)=\epsilon\lambda(x)$ and $g(x;\epsilon)=\rho(x)$ and $U_{s}(x)$ is equal to (12). Here $\epsilon_{a}=\epsilon_{\text{\tiny BP}}$ , $\epsilon_{\text{\tiny pot}}=\epsilon_{\text{\tiny MAP}}$ , $x_{\text{\tiny good}}=0$ and $x_{\text{\tiny bad}}=x_{\text{\tiny BP}}$ is the non-trivial BP fixed point.

VII-B Continuum limit and velocity formula for scalar systems

We consider the system in the limit $L_{c}\gg w\gg 1$ and formulate a continuum approximation. The coupled recursion (65) becomes

[TABLE]

We take the boundary condition $x(z,t)\to x_{\text{\tiny good}}$ when $z\to-\infty$ and $x(z,t)\to x_{\text{\tiny bad}}$ when $z\to+\infty$ . This boundary condition captures the profiles obtained after the transient phase has passed, and is well adapted to the study of the soliton propagation.

Velocity formula for scalar systems. As before, we assume that there exists a constant $v>0$ such that, for $t\to+\infty$ and $|z-vt|=O(1)$ , the profile $x(z,t)\to X(z-vt)$ , where $X(z)$ is independent of the initial condition $x(z,0)$ and satisfies $\lim_{z\to-\infty}X(z)=x_{\text{\tiny good}}$ , $\lim_{z\to+\infty}X(z)=x_{\text{\tiny bad}}$ . Under this assumption, the velocity of the soliton is

[TABLE]

where the shape $X(z)$ satisfies

[TABLE]

VII-C Derivation of the velocity formula (68)

The derivation of (68) follows closely that in Section IV-B, so we will be quite brief. The first step is to introduce a continuum version of $U(\mathbf{x})$ , which we call $\Delta\mathcal{U}(x(\cdot,\cdot))$ . We define $x_{0}(z)$ as a static (time-independent) profile that satisfies the boundary conditions $x_{0}(z)\to x_{\text{\tiny good}}$ when $z\to-\infty$ and $x_{0}(z)\to x_{\text{\tiny bad}}$ when $z\to+\infty$ (for example, one may take a Heaviside step function). This is a reference profile in order to have well-defined integrals in the following expression

[TABLE]

As long as $x(z,t)$ and $x_{0}(z)$ converge to their limiting values fast enough, the integrals over the spatial axis are well defined. Evaluating the functional derivative444Defined as $\lim_{\gamma\to 0}\gamma^{-1}(\Delta\mathcal{U}(x(\cdot,\cdot)+\gamma\eta(\cdot,\cdot))-\Delta\mathcal{U}(x(\cdot,\cdot)))$ of $\Delta\mathcal{U}[x(\cdot,\cdot);\epsilon]$ in an arbitrary direction $\eta(\cdot,\cdot)$ , we find that (67) is equivalent to a gradient descent equation

[TABLE]

Now we use the ansatz $x(z,t)\to X(z-vt)$ and apply (70) for the special direction $\eta(z,t)=X^{\prime}(z-vt)$ . Using also the approximation $X(z-vt)\approx X(z)-vX^{\prime}(z)$ for small $v$ we get (after a change of variables $z\to z+vt$ )

[TABLE]

We then proceed to compute the right-hand side of (70). We split the potential functional into two parts: the “single system potential” $\mathcal{U}_{s}(x(\cdot,\cdot))$ that remains if we ignore the coupling effect, and the “interaction potential” $\mathcal{U}_{i}(x(\cdot,\cdot))$ that captures the effect of coupling. That is, $\Delta\mathcal{U}=\mathcal{U}_{s}+\mathcal{U}_{i}$ , with

[TABLE]

The computation of each functional derivative at $x(z,t)\to X(z-vt)$ in the direction $X^{\prime}(z-vt)$ yields

[TABLE]

and

[TABLE]

Replacing (72) and (73) in (70), we obtain the velocity formula (68).

VII-D Generalized LDPC (GLDPC) Codes

A GLDPC code is a code represented by a bipartite graph, such that the rules of the check nodes do not depend on parity (as do usual LDPC codes) but on a primitive BCH code. An attractive property of BCH codes is that they can be designed to correct a chosen number of errors. For instance, one can design a BCH code so that it corrects all patterns of at most $e$ erasures on the BEC, and all error patterns of weight at most $e$ on the binary symmetric channel (BSC). We consider a GLDPC code with degree-2 variable nodes and degree- $n$ check nodes whose rules are given by a primitive BCH code of blocklength $n$ .

We give a short description of a BCH code of blocklength $n$ and minimum distance $d=2e+1$ (see [34] for more details. A BCH code is a cyclic code over a finite field GF( $b^{\beta}$ ) where $b$ is a prime power and $\beta$ is an integer. Let $a$ be a primitive element of GF( $b^{\beta}$ ). Then each element of GF( $b^{\beta}$ ) can be written in the form $a^{i}$ , $i\in\mathbb{N}$ . For each element $a^{i}$ we can define a minimal polynomial $m_{i}(\cdot)$ which is the monic polynomial over GF( $b$ ) with smallest degree. The generator polynomial $\theta(\cdot)$ over GF( $b$ ) of the BCH code is defined as the least common multiple of $m_{1}(\cdot),\dots,m_{d}(\cdot)$ .

Consider transmission on the BEC or BSC and denote by $\epsilon$ the channel parameter. The density evolution recursions have been derived in [35] for both channels, based on a bounded distance decoder for the BCH code. For $n$ and $e$ fixed, we can write the update equations of the message passing algorithm as (61) with

[TABLE]

Here, we have $\epsilon_{\text{\tiny max}}=x_{\text{\tiny max}}=y_{\text{\tiny max}}=1$ . Moreover, one checks easily by differentiation (with respect to $x$ ) that the potential function $U_{\text{\tiny GLDPC}}(x)$ of the system is given by

[TABLE]

For numerical implementation purposes, it is useful to note that

[TABLE]

where $B(a,b)=\frac{(a-1)!(b-1)!}{(a+b-1)!}$ denotes the Beta function, and that $g(x)$ is equal to the regularized incomplete Euler Beta function so that

[TABLE]

This potential has $x=0$ as a trivial stationary point (equivalently, this is a trivial fixed point of DE as can be seen from the expressions of $f$ and $g$ ) and develops a non-trivial minimum at $x_{\text{\tiny BP}}\neq 0$ for $\epsilon>\epsilon_{\text{\tiny BP}}$ . Note that $\epsilon_{\text{\tiny BP}}$ is found as usual as the first horizontal inflexion point. The MAP threshold is given by $\epsilon_{\text{\tiny MAP}}$ such that $U_{\text{\tiny GLDPC}}(x_{\text{\tiny BP}})=U_{\text{\tiny GLDPC}}(0)=0$ .

The formula for the velocity of the soliton appearing for coupled GLDPC codes is found from (68). The energy gap for $\epsilon_{\text{\tiny BP}}\leq\epsilon\leq\epsilon_{\text{\tiny MAP}}$ is now $\Delta E=U_{\text{\tiny GLDPC}}(x_{\text{\tiny BP}})-U_{\text{\tiny GLDPC}}(0)$ . Figure 10 shows the velocities (normalized by $w$ ) for the spatially coupled GLPDC code with $n=15$ and $e=3$ , when the coupling parameters satisfy $L_{c}+w=500$ and $w=3$ . We plot the velocities for $\epsilon\in[\epsilon_{\text{\tiny BP}},\epsilon_{\text{\tiny MAP}}]=[0.348,0.394]$ . We observe that the formula for the velocity provides a very good estimation of the empirical velocity $v_{e}$ .

VII-E Compressive Sensing

Let $\mathbf{s}$ be a length- $n$ signal vector where the components are i.i.d. copies of a random variable $S$ . We take $m$ linear measurements of the signal and assume that the measurement matrix has i.i.d Gaussian elements distributed like $\mathcal{N}(0,1/\sqrt{n})$ . We define $\delta=m/n$ as the measurement ratio and fix it to a constant value when $n\to\infty$ . The relation between $\delta$ and the parameter $\epsilon$ defined in Section VII-A is $\epsilon=\delta^{-1}$ . We assume that the power of the variable $S$ is normalized to 1; that is, $\mathbb{E}[S^{2}]=1$ . We also assume that each component of the signal $\mathbf{s}$ is corrupted by independent Gaussian noise of variance $\sigma^{2}=1/\mathtt{snr}$ . To recover $\mathbf{s}$ one implements the so-called approximate message passing (AMP) algorithm.

It is well-known that the analysis of the AMP algorithm is given by state evolution [10]. Let $Y=\sqrt{\mathtt{snr}}\,S+Z$ where $Z\sim\mathcal{N}(0,1)$ , let $\hat{S}(Y)=\mathbb{E}_{S|Y}[S|Y]$ the minimum mean square estimator, and set

[TABLE]

for the $\mathtt{mmse}$ function. The state evolution equations (which track the mean squared error of the AMP estimate) then correspond to the recursion (61) with

[TABLE]

Here $x$ is interpreted as the mean square error predicted by the AMP estimate of the signal. State evolution is initialized with $x=1$ which corresponds to no knowledge about the signal. We will take $\delta$ as the control parameter. Note that we have $\delta_{\text{\tiny max}}=1$ , $x_{\text{\tiny max}}=\mathtt{mmse}(0)$ , $y_{\text{\tiny max}}=g(x_{\text{\tiny max}})$ . The potential function is equal to

[TABLE]

where $I(A;B)$ is the mutual information between two random variables $A$ and $B$ . To check that this potential gives back the correct state evolution equation as a stationarity condition, we simply differentiate it with respect to $x$ thanks to the well known relation $\frac{\mathrm{d}}{\mathrm{d}\mathtt{snr}}I(S;\sqrt{\mathtt{snr}}S+Z)=\frac{1}{2}\mathtt{mmse}(\mathtt{snr})$ .

To illustrate the potential function in a concrete case we take the Bernoulli-Gaussian distribution as the prior distribution over the signal components

[TABLE]

Figure 11 shows the potential function for $\rho=0.1$ , $\mathtt{snr}=10^{5}$ , and several values of $\delta$ (the measurement fraction). We observe that, for $\delta>\delta_{\text{\tiny AMP}}=0.208$ , there is a unique minimum $x_{\text{\tiny good}}$ which is a fixed point of state evolution when it is initialized to $x=1$ . In this phase, there are enough measurements so that the reconstruction of the signal is good and the mean square error is small. At $\delta_{\text{\tiny AMP}}=0.208$ a horizontal inflexion point develops in the potential function. For $\delta<\delta_{\text{\tiny AMP}}$ a second minimum appears at a higher mean square error $x_{\text{\tiny bad}}$ and the reconstruction of the AMP algorithm is bad. The optimal threshold corresponding to the minimum mean square error estimator is found when $\delta$ is such that the two minima of the potential are at the same height, namely $\delta_{\text{\tiny opt}}$ is given by the solution of the equation $U_{\text{\tiny SC}}(x_{\text{\tiny bad}})=U_{\text{\tiny SC}}(x_{\text{\tiny good}})$ . For our example one finds $\delta_{\text{\tiny opt}}=0.157$ . This threshold is reached by the AMP algorithm on the spatially coupled system.

Fix $\delta\in[\delta_{\text{\tiny opt}},\delta_{\text{\tiny AMP}}]=[0.157,0.208]$ . In this regime spatially coupled state evolution develops a soliton which represents the profile of mean square error along the spatial direction. The formula for the velocity $v_{\text{\tiny CS}}$ of this soliton is obtained from (68) where the energy gap is now $\Delta E=U_{\text{\tiny SC}}(x_{\text{\tiny bad}})-U_{\text{\tiny SC}}(x_{\text{\tiny good}})$ . Figure 12 shows the velocities (normalized by $w$ ) for the spatially coupled compressive sensing system with $\mathtt{snr}=10^{5}$ , $\rho=0.1$ and with the coupling parameters satisfy $L_{c}+w=250$ and $w=4$ . We plot in Figure 12 the velocities for $\delta\in[\delta_{\text{\tiny opt}},\delta_{\text{\tiny AMP}}]=[0.157,0.208]$ . It is clear that the formula for the velocity provides a good estimation of the empirical velocity $v_{e}$ .

VIII Conclusion

Our formulas for the velocities of the solitons that appear in spatially coupled codes for BMS channels and for general spatially coupled scalar systems (e.g. compressive sensing) rest on the approximation of the discrete system by a continuum one. We believe that this approximation becomes exact in an asymptotic limit of infinite spatial length and window size (keeping the order $L_{c}\gg w\gg 1$ ). It is an interesting open problem to quantify the quality of this approximation already for $L_{c}$ infinite and $w$ finite but large. The numerical results tend to indicate that the approximation is already quite good for small values of $w$ , when it is of the order of a few positions.

Another important and interesting open question concerns the proof of the ansatz, namely proving that the shape of the soliton is unique and independent of the initial condition on the profile. Settling this question would show that the velocity of the soliton is independent of the size of the seed that initiates decoding or signal reconstruction at the boundaries.

The formulas for the velocity involve the whole shape of the soliton in the denominator. For optimization purposes it would be desirable to have a good approximation (or bound) on the denominator that would involve only primitive quantities related to the underlying uncoupled ensemble (such as the degree distributions, the single system potential, etc). Such an approximation scheme has been proposed in [25] for the special case of the transmission over the BEC where it works quite well close to the MAP threshold. It would be desirable to find an extension to the more general situations considered in the present paper.

Appendix A Functional derivatives

In this appendix we derive equations (21) and (32).

A-A Derivation of Equ. (21)

We calculate the functional derivative of $\Delta\mathcal{W}(\mathtt{x})$ in a direction $\mathtt{\eta}(z,t)$ as follows.

[TABLE]

where the function $P(\cdot,\cdot)$ is defined in (19). Then, taking the derivative with respect to $\gamma$ yields

[TABLE]

We notice that the first and third terms in the integral cancel out due to the commutativity of the operator $\boxast$ . By rearranging the averaging functions in the last term, we obtain

[TABLE]

By noticing that $\mathtt{y}=\int_{0}^{1}\mathrm{d}u\,\mathtt{c}(z-u)\varoast\lambda^{\varoast}(\int_{0}^{1}\mathrm{d}s\,\rho^{\boxast}(\mathtt{x}(z-u+s,t)))$ is a probability measure and $\mathtt{a}=\rho^{\prime\boxast}(\mathtt{x}(z,t))\boxast\mathtt{\eta}(z,t)$ is a difference of probability measures, we can use the second duality rule in (4) $H(\mathtt{y}\boxast\mathtt{a})+H(\mathtt{y}\varoast\mathtt{a})=H(\mathtt{a})$ to rewrite the above as, freely using the commutativity of $\boxast$ ,

[TABLE]

A-B Derivation of Equ. (32)

We calculate the functional derivative of $\mathcal{W}_{s}(\mathtt{X})$ in the direction $\mathtt{X}^{\prime}(z)$ as follows.

[TABLE]

We notice here that on the right side of the last equality, the first and third terms under the integral cancel out. Using the second duality rule in (4), and noticing that $\mathtt{X}(z)$ is a probability measure and $\rho^{\prime\boxast}(\mathtt{X}(z))\boxast\mathtt{X}^{\prime}(z)$ is a difference of probability measures, we can rewrite the functional derivative as

[TABLE]

Appendix B Derivation of expression (58)

Our goal in this appendix is to show that (57) reduces to (58) when $\delta\to 0$ . We first reorganize (57) as follows (up to multiplication by $1/\delta^{2}$ )

[TABLE]

When we Taylor expand each entropy $\psi(\cdots)$ around

[TABLE]

to second order we observe that the first order terms cancel and what remains is

[TABLE]

This is equal to

[TABLE]

Next, we write $\psi^{-1}(1-p(z+\delta))$ as

[TABLE]

and Taylor expand $\psi^{-1}(\cdots)$ around $1-p(z)$ to obtain

[TABLE]

Multiplying by $1/\delta^{2}$ , taking the limit $\delta\to 0$ , and using the relation $(\psi^{-1})^{\prime}(\cdots)=1/(\psi^{\prime}(\psi^{-1}(\cdots)))$ we obtain

[TABLE]

Acknowledgment

The authors would like to thank Ruediger Urbanke and Markus Stinner for discussions and suggestions on applications to finite-length scaling laws, and Tongxin Li for interactions on compressive sensing during his summer internship in EPFL.

Bibliography35

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] A. J. Felstrom and K. S. Zigangirov, “Time-varying periodic convolutional codes with low-density parity-check matrix,” IEEE Transactions on Information Theory , pp. 2181–2190, 1999.
2[2] S. Kudekar, T. Richardson, and R. Urbanke, “Spatially coupled ensembles universally achieve capacity under belief propagation,” IEEE Transactions on Information Theory , vol. 59, no. 12, pp. 7761–7813, 2013.
3[3] S. Kumar, A. J. Young, N. Macris, and H. D. Pfister, “Threshold saturation for spatially coupled LDPC and LDGM codes on BMS channels,” IEEE Transactions on Information Theory , vol. 60, no. 12, pp. 7389–7415, 2014.
4[4] M. Lentmaier, A. Sridharan, K. S. Zigangirov, and D. J. Costello Jr, “Terminated LDPC convolutional codes with thresholds close to capacity,” in International Symposium on Information Theory Proceedings (ISIT) . IEEE, 2005, pp. 1372–1376.
5[5] M. Lentmaier, A. Sridharan, D. J. Costello Jr, and K. Zigangirov, “Iterative decoding threshold analysis for LDPC convolutional codes,” IEEE Transactions on Information Theory , vol. 56, no. 10, pp. 5274–5289, 2010.
6[6] V. Aref, N. Macris, R. Urbanke, and M. Vuffray, “Lossy source coding via spatially coupled LDGM ensembles,” in Information Theory Proceedings (ISIT), 2012 IEEE International Symposium on . IEEE, 2012, pp. 373–377.
7[7] V. Aref, N. Macris, and M. Vuffray, “Approaching the Rate-Distortion Limit With Spatial Coupling, Belief Propagation, and Decimation,” IEEE Transactions on Information Theory 61 (7), 3954-3979 , vol. 61, no. 7, pp. 3954–3979, 2015.
8[8] S. Kudekar and H. D. Pfister, “The effect of spatial coupling on compressive sensing,” in 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton) . IEEE, 2010, pp. 347–353.