Arithmetic Coding Based Multi-Composition Codes for Bit-Level Distribution Matching
Marcin Pikus, Wen Xu

TL;DR
This paper introduces multi-composition codes and an arithmetic coding scheme to enhance distribution matching efficiency, especially for short message lengths, outperforming constant-composition methods in rate and divergence.
Contribution
It proposes a novel multi-composition coding approach with an efficient arithmetic coding scheme, improving performance over traditional CCDM for short blocks.
Findings
MCDM encodes more data than CCDM.
Lower KL divergence achieved with MCDM.
Enhanced performance for short block messages.
Abstract
A distribution matcher (DM) encodes a binary input data sequence into a sequence of symbols (codeword) with desired target probability distribution. The set of the output codewords constitutes a codebook (or code) of a DM. Constant-composition DM (CCDM) uses arithmetic coding to efficiently encode data into codewords from a constant-composition (CC) codebook. The CC constraint limits the size of the codebook, and hence the coding rate of the CCDM. The performance of CCDM degrades with decreasing output length. To improve the performance for short transmission blocks we present a class of multi-composition (MC) codes and an efficient arithmetic coding scheme for encoding and decoding. The resulting multi-composition DM (MCDM) is able to encode more data into distribution matched codewords than the CCDM and achieves lower KL divergence, especially for short block messages.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsError Correcting Code Techniques · Advanced Wireless Communication Techniques · Cooperative Communication and Network Coding
Arithmetic Coding Based Multi-Composition Codes for Bit-Level Distribution Matching
Marcin Pikus12 and Wen Xu1 M. Pikus is with the Huawei Technologies Duesseldorf GmbH, D-80992 Munich, Germany, and also with the Institute for Communications Engineering, Technische Universität München, D-80333 Munich, Germany (e-mail: [email protected]).W. Xu is with the Huawei Technologies Duesseldorf GmbH, D-80992 Munich, Germany (e-mail: [email protected]). 2Institute for Communications Engineering, Technische Universität München, Arcisstr. 21, 80290 Munich, Germany
1Huawei Technologies, Munich Research Center, Riesstr. 25, 80992 Munich, Germany
Abstract
A distribution matcher (DM) encodes a binary input data sequence into a sequence of symbols (codeword) with desired target probability distribution. The set of the output codewords constitutes a codebook (or code) of a DM. Constant-composition DM (CCDM) uses arithmetic coding to efficiently encode data into codewords from a constant-composition (CC) codebook. The CC constraint limits the size of the codebook, and hence the coding rate of the CCDM. The performance of CCDM degrades with decreasing output length. To improve the performance for short transmission blocks we present a class of multi-composition (MC) codes and an efficient arithmetic coding scheme for encoding and decoding. The resulting multi-composition DM (MCDM) is able to encode more data into distribution matched codewords than the CCDM and achieves lower KL divergence, especially for short block messages.
Index Terms:
distribution matching, arithmetic coding, multi-composition, probabilistic shaping.
I Introduction
A distribution matcher (DM) reversibly maps a sequence of independent and uniformly distributed bits into a sequence of symbols to emulate a target distribution . The output of the DM approximates a sequence of independent and identically distributed (IID) symbols, each distributed according to . The accuracy of the approximation is measured by the Kullback–Leibler (KL) divergence between the probability distribution of the DM’s output and the probability distribution of the IID sequence. An inverse distribution matcher (DM*-1*) performs the inverse operation recovering from . DMs can be used in communication systems, such as probabilistic amplitude shaping (PAS) [1], to adjust the distribution of transmitted symbols to a distribution beneficial for a certain channel, e.g., a distribution achieving capacity, or reducing the peak-to-average-power ratio. PAS was recently proposed for the 5G mobile system [2].
We focus here on block-to-block (b2b) DMs, where the input and output sequences have fixed lengths denoted by , and , respectively. The ratio is called the matching rate. Variable length DMs, i.e., non b2b DMs, lead to synchronization problems, error propagation, and variable transmission rate [3]. A DM can be seen as an encoder which maps input bits to a non-binary codeword of length . The set of possible codewords, i.e., the codebook (or code), is chosen such that a certain output distribution is emulated. To achieve a good performance, i.e., high matching rate and low normalized KL divergence, relatively long output sequences are needed [4]. Long sequences imply a large codebook which may not be stored in memory. DMs which need to store the codebooks, e.g., look-up-table mappings, have thus limited performance.
A Constant-Composition DM (CCDM) [4] is a DM which employs arithmetic coding to generate codewords on-the-fly. In this way, a large codebook can be used without the need of storage. The CCDM is asymptotically optimal for , i.e., it achieves the maximal matching rate and vanishing normalized KL divergence. For finite however, the CCDM suffers from rate loss and high divergence that increases with decreasing . The CCDM uses a constant-composition (CC) codebook, which constraints each codeword to have a fixed number of occurrences of each of the symbols from the alphabet, i.e., each codeword has the same composition. The CC constraint contributes to increased KL divergence and decreased matching rate by limiting the number of codewords. It is therefore of interest to improve the DM performance for smaller values of for at least two reasons: 1) low performance DM, e.g., with high rate loss, can waive the benefits of PAS; 2) throughput and parallelization—it allows to replace one DM with output length by parallel DMs with output lengths without significant decrease in the performance. This leads to increased throughput of PAS systems, as DM is the throughput limiting component in the PAS transmission chain.
In this work, we introduce a Multi-Composition (MC) codes which can be used to build a MC DM (MCDM). A MC codebook is a codebook which contains codewords of multiple compositions. By a proper construction of the MC codebook, the MCDM is able to employ an arithmetic coding algorithm to generate codewords from the MC codebook on-the-fly. The MCDM generalizes the CCDM, and the aforementioned asymptotic optimality of the CCDM holds for the MCDM. By relaxing the CC constraint, we obtain a DM which achieves higher matching rate and lower KL divergence for any . Just recently, two other solutions have also been proposed in the literature, i.e., partition based DM [5] and shell mapping DM (SMDM) [6]. Roughly speaking, our MCDM is able to index more codewords than [5], thus leading to better performance. The proposed MCDM can also be implemented with low-complexity arithmetic coding requiring computational complexity [7], and can operate with and of arbitrary length. For the binary case, we present a MCDM whose performance can approach that of the optimal b2b DM.
The SMDM is optimal DM, which is suitable for short output lengths due to its complexity [6]. The CCDM is asymptotically optimal for long output lengths [4]. The proposed MCDM is based on arithmetic coding just as CCDM. This way, one algorithm (with different parameters) can be used for short and long sequences. This reduces the complexity of the system.
The structure of this work is organized as follows. In Sec. II we introduce distribution matching and the CCDM. In Sec. III we briefly describe how arithmetic coding can be applied for distribution matching. In Sec. IV we present the MCDM which can be efficiently implemented with the arithmetic coding scheme. Simulation results are given in Sec. V. Finally, conclusions are drawn in Sec. VI.
We use the following notations. We denote random variables (RVs) by capital letters, such as , and realizations by small letters, such as . A row vector is denoted by a bold symbol, e.g., . The -th entry in the vector is denoted by , and a subvector of is denoted by . The length of a vector is denoted by . E.g., we have . A RV uniformly distributed on a set is denoted by , i.e., means that for .
II Distribution Matching
A one-to-one b2b DM is an injective function from binary input sequences to codewords from the codebook , i.e.,
[TABLE]
where is the output alphabet. We assume that the input sequence is a random vector consisting of IID Bernoulli distributed bits. The output sequence of the DM is thus a random vector uniformly distributed on . The goal of the DM is to make its output "look" as if it was a sequence of IID RVs, each distributed according to the target probability distribution . This is usually performed by minimizing the normalized KL divergence between the DM’s output and the IID sequence
[TABLE]
The divergence can thus be minimized by choosing a proper codebook . The empirical probability of a single symbol outputted by a DM is defined as
[TABLE]
where we use to denote the number of occurrences of in the sequence and denotes the number of codewords in the codebook (size of the codebook). In the literature, it is often believed that we need (or ) to minimize the divergence for finite . However, this may not necessarily be true as we shall see in Sec. V or as pointed out by authors in [6, Example 2]. The divergence (2) can be equivalently written as
[TABLE]
where in the entropy of a RV with distribution . We observe that the necessary condition for vanishing normalized divergence is that 111Since ., however only for [8]. Thus, we can only expect for large . Equation (4) also suggests that for a given , larger codebooks, i.e., with greater , are preferred.
In [9] it was shown that non-binary distribution matching, i.e., with , can be well approximated by multiple, binary distribution matchings, i.e., with , for typical use cases. For simplicity, we focus on the binary distribution matching in the following sections. The MCDM can be also directly used for non-binary distribution matching, as explained is Sec. IV-B.
CCDM was introduced in [4]. It uses a modified coding scheme [7] based on arithmetic coding to efficiently encode data into the codewords from a CC codebook. In the CC codebook each codeword has the same composition.
Definition 1**.**
Assume . A composition of a vector is a vector containing the numbers of occurrences in of each of the symbols from the alphabet . We denote a composition by
[TABLE]
Example 1**.**
.
That is, the CC codebook with the composition is
[TABLE]
and the size of the codebook can be expressed by the multinomial coefficient
[TABLE]
To guarantee a one-to-one mapping between the binary input sequences and the codewords, the CCDM can use at most input sequences of length , where reads as
[TABLE]
where is the floor function.
III Arithmetic Coding in Distribution Matching
The coding scheme [7] based on arithmetic coding has been proposed to efficiently realize encoding and decoding for the so-called -out-of- codebooks, i.e., the binary constant-weight codebooks, which are a spacial case of the CC codebooks for binary alphabets. In [4] it is shown that the arithmetic coding scheme from [7] can be utilized for the CC codebooks with non-binary alphabets, i.e., for CCDM implementation. In what follows, we demonstrate that the arithmetic coding scheme presented in [7] can be also employed to efficiently implement our MCDM.
For simplicity, we consider the binary output alphabet . Assume an arbitrary codebook . Each input data sequence corresponds to a distinct point from the interval . On the other hand, each codeword corresponds to a distinct subinterval of the interval . The subintervals are chosen such that they partition the interval , i.e., they are pairwise disjoint and . At the encoder, an input data sequence is mapped to a codeword if the corresponding point lines inside the corresponding interval . At the decoder, first an interval is determined based on the received codeword . Then, a point is determined and decoded to the sequence .
Assume a binary input sequence . Let denote a function which returns the natural binary code (NBC) number corresponding to the sequence , i.e.,
[TABLE]
The sequence is mapped to a point via
[TABLE]
An interval for a codeword can be computed recursively using a chosen probability model on codeword’s bits. The model is specified in terms of the conditional probabilities (also called branching probabilities) of the next bit given the previous bits, i.e., , where is a sequence denoting a prefix of the codeword. The beginning and the width of the interval can be computed by applying iteratively equations (12) and (13) for
[TABLE]
where denotes a concatenation of and , denotes an empty sequence, and stands for . Equation (12) implies a lexicographical ordering of the codewords according to with the most-significant-bit . That is, for two codewords and , if , the will be placed in the interval below the . Applying the above equations result in partitioning such that
[TABLE]
That is, the codewords’ intervals partition and are ordered according to lexicographical ordering . E.g., see Fig. 1.
A one-to-one mapping between data sequences and codewords can be established if each interval contains at most one point . This can be guaranteed by letting the distance between two adjacent points to be grater than the largest interval, i.e.,
[TABLE]
Since we are interested in maximizing , it is reasonable to choose equal length intervals, i.e., . In this case the greatest fulfilling (16) equals .
From (15) we have that the length of the interval is equal to the probability of the codeword (by using the probability model on codeword’s bits). We are interested in finding the conditional probabilities and , where is a binary sequence constituting the prefix of the codeword, such that the probability of each codeword is equal. Let denote the number of codewords in that have prefix , i.e., . We define the following conditional probabilities
[TABLE]
For any we have
[TABLE]
which shows that by employing model as in (17) we can obtain intervals of equal length.
To encode data into the codewords from we apply arithmetic decompression of the input sequence using the model (17), see Algorithm 1 for details. To decode the codeword, we apply arithmetic compression on the codeword using the same model (17), see Algorithm 2 for details. Retrieving from is the final step of Algorithm 2, and is performed in the line which follows from (10) and the fact that . In practice, to avoid numerical underflow, the intervals have to be rescaled during each step. The coding scheme can be also implemented using only integer calculations. For further implementation considerations, see e.g. [7].
Following the above steps we can efficiently encode/decode data into/from the codewords . In general, not all codewords from will be used since, by (16), we can use at most codewords. The selection of the used codewords is done implicitly by the encoding/decoding algorithm. We introduce the notion of the base codebook, which contains all codewords, i.e., the selected and non-selected ones.
Definition 2**.**
A base codebook, denoted by , for the coding scheme from Sec. III is a codebook which is used to compute the branching probabilities (17), i.e.,
[TABLE]
for any and any prefix .
The actual codebook , is the codebook actually used by the encoder/decoder, i.e.,
[TABLE]
where is the encoder function. The actual codebook is a subset of the base codebook implicitly chosen by the encoding/decoding algorithm.
IV Multi-Composition Codebooks
Assume an arbitrary base codebook . Using the coding scheme from Sec. III, we are able to encode/decode data into/from codewords from . This involves computing the probabilities (17). In a general case, finding (17) entails evaluating by counting the codewords from which have the prefix . This is not feasible for large codebooks which we target. Therefore, we introduce a structure into the base codebook .
We observe that is easy to compute for base codebooks containing all codewords of a single composition. Assume a binary composition and the codebook , i.e., the so-called -out-of- codebook containing all codewords of Hamming weight . For any prefix , we have
[TABLE]
Consequently, is simple to compute for base codebooks containing all codewords of multiple compositions. Assume a set of different binary compositions
[TABLE]
where for , and the base codebook . Here, we will refer to such a codebook as the MC base codebook. For any prefix , we have
[TABLE]
which is easy to evaluate. Note that depends only on two parameters , and , therefore it is also possible to store the precomputed values in a look-up-table (LUT). For implementation we need only to store the values of (or ). As (or ) depends only on and , we need to store at most values. For an arbitrary base codebook, depends on the whole sequence and it is not feasible to store all values in a LUT. Here, we will refer to a DM using the base MC codebook and the coding scheme from Sec. III as the Multi-Composition Distribution Matcher (MCDM).
IV-A Some Special Cases
By selecting the set of compositions (23) we can obtain a specific base codebook.
Definition 3**.**
A -out-of- codebook is a codebook with codewords of Hamming weight at least and at most , i.e.,
[TABLE]
Based on Definition 3, we have the following special cases.
IV-A1 -out-of- codebook
By selecting we obtain a CC codebook as in the CCDM. For CC codebook the probability in (17) has a particularly simple form, i.e.,
[TABLE]
IV-A2 -out-of- codebook
By selecting we obtain a codebook which contains two adjacent compositions. We refer to such a MCDM as 2C-MCDM. The probability in (17) also admits a simple form
[TABLE]
As such, the CCDM can be changed into the 2C-MCDM by just changing the denominator in the applied model for arithmetic coding.
IV-A3 -out-of- codebook
By selecting we obtain a codebook which contains all sequences up to Hamming weight . This is the optimal codebook for an ideal DM which can encode into a codebook of arbitrary size.
Lemma 1**.**
Assume the output alphabet , and the target probability such that . Assume a DM with output . The actual codebook which minimizes the normalized KL divergence (2)
- (a)
consists of most likely codewords according to , if we require for some **[3]**. This DM can be implemented by the SMDM **[6]**. 2. (b)
is a -out-of-* codebook for some , if is not constrained [8].*
Proof.
See [3, Sec. IV], [8, Lemma 5]. ∎
The MCDM with the -out-of- codebook, which we will refer to as Opt-MCDM, can be seen as an approximation of the optimal DM from Lemma 1b. Note that when the -out-of- codebook size is equal to , the Opt-MCDM is the optimal DM (as in Lemma 1a). In practice, the Opt-MCDM offers close to optimal performance. The probability in (17) is
[TABLE]
which depends only on and , and it can therefore be precomputed and stored in a LUT for efficient implementation.
IV-B Non-binary Case
The coding scheme described in Sec. III can be adapted for non-binary distribution matching. The probability model on codewords’ symbols can be obtained from the base codebook via equations analogous to (17). An MC codebook contains all codewords from multiple compositions, and the expression for becomes a sum of multinomials, which is admissible to evaluate or store for shorter codewords or fewer compositions in the base codebook. However, for the base codebooks with large number of compositions or long codewords, the storage/computation requirements can become prohibitive for large alphabet. Unless some structure is added when choosing the compositions, the MCDM is better suited for binary distribution matching, e.g., it can be used for non-binary distribution matching in combination with the bit-level distribution matcher [9].
V Results
V-A Distribution Matching Performance
We compare the CCDM with the -out-of- codebook, MCDM with the -out-of- codebook (2C-MCDM), and MCDM with the -out-of- codebook (Opt-MCDM). We use the binary output alphabet and the target probability distribution with . We vary the output length . For each of the DMs we find the base codebook which minimizes the KL divergence . This is equivalent to finding
[TABLE]
where is the -out-of-, -out-of-, and -out-of- codebook for the CCDM, 2C-MCDM, and Opt-MCDM, respectively. Next, we apply the coding scheme as in Sec. III to build DMs using the aforementioned optimized base codebooks. The results are presented in Fig. 2. The divergence and the empirical output distribution were computed by enumerating all codewords for and for higher via Monte-Carlo sampling. The number of samples was chosen so that the relative error of estimates lies within with probability not smaller than . The matching rate is evaluated exactly. We also compute the parameters of the optimal DM from Lemma 1a implemented by SMDM [6].
Fig. 2 shows the superior performance of the MC codebooks in terms of the matching rate and KL divergence. Opt-MCDM stays very close to the optimal DM from Lemma 1a. In Fig. 2(a) the optimal DM achieves higher rate than the entropy of the target distribution for . This is because for , the optimal base codebook is the -out-of- codebook which contains all codewords of length . This demonstrates that the lowest KL divergence can be achieved by performing no distribution matching at all, and hence the optimal codebook has . For large , for all DMs, as this is a necessary condition for vanishing divergence for , as observed in Sec. II. In Fig. 2(c) for , coincide for all DMs. However, in Fig. 2(b) the Opt-MCDM achieves the lowest KL divergence thanks to the largest codebook, confirming the observations from Sec. II.
V-B PAS Framework
In practice, applying the Opt-MCDM instead of the CCDM for in a PAS communication system would mean ca. increase in the transmission rate (see Fig. 2(a)). Alternatively, assume we target the KL divergence . Instead of using one CCDM with , we can use parallel Opt-MCDMs with to increase the throughput, as shown in Fig. 2(b).
Motivated by this considerations we apply the Opt-MCDM to a PAS system in a bit-level setup as in [9]. We follow the steps exactly as in [9] but instead of using the CCDM as a building block, we employ the Opt-MCDM for each bit-level. For each bit-level’s target probability we find the optimal codebook as in Sec. V-A. We compare the results with [9] employing the CCDMs, and the bit-interleaved coded-modulation (BICM) scheme without shaping [10]. For fair comparison, all schemes use the same WiMAX LDPC B-code of rate and codeword length of bits. LDPC decoder performs iterations. BICM operates with constellations: -QAM, -QAM, and -QAM, which corresponds to transmission rates of , , bits per channel use (b/CU), respectively. Shaping schemes use the -QAM constellation and match the BICM transmission rates by applying appropriate transmit signal distributions. Frame error rate (FER) versus signal-to-noise ratio (SNR) curves are presented in Fig. 3. By employing the MCDM instead of the CCDM, we gain , , and dB, at FER= for rates , , b/CU, respectively.
VI Conclusions
In this work, we presented arithmetic coding based distribution matcher which uses multi-composition codes. The multi-composition distribution matcher generalizes the state of the art constant-composition distribution matcher, and is able to achieve higher matching rate and lower KL divergence.
VII Acknowledgments
Part of this work has been performed in the framework of the Horizon 2020 project ONE5G (ICT-760809) receiving funds from the European Union. The authors would like to acknowledge the contributions of their colleagues in the project, although the views expressed in this contribution are those of the authors and do not necessarily represent the project.
The authors would like to thank Onurcan Iscan, Ronald Böhnke, Najeeb Ul Hassan from Huawei Technologies, for discussions and helpful comments for improving the manuscript.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] G. Böcherer et al. , “Bandwidth efficient and rate-matched low-density parity-check coded modulation,” IEEE Trans. Commun. , vol. 63, no. 12, pp. 4651–4665, Dec 2015.
- 2[2] R 1-1700076, “Signal shaping for QAM constellations,” Huawei, Hi Silicon, 3GPP TSG RAN 1 NR Ad Hoc Meeting , Jan 2017.
- 3[3] G. Böcherer and R. A. Amjad, “Block-to-block distribution matching,” Jun 2013. [Online]. Available: http://arxiv.org/abs/1302.1020
- 4[4] P. Schulte and G. Böcherer, “Constant composition distribution matching,” IEEE Trans. Inf. Theory , vol. 62, pp. 430–434, Jan 2016.
- 5[5] T. Fehenberger et al. , “Partition-Based Distribution Matching,” Jan. 2018. [Online]. Available: https://arxiv.org/abs/1801.08445
- 6[6] P. Schulte and F. Steiner, “Shell Mapping for Distribution matching,” Mar. 2018. [Online]. Available: https://arxiv.org/abs/1803.03614
- 7[7] T. V. Ramabadran, “A coding scheme for m-out-of-n codes,” IEEE Trans. Commun. , vol. 38, pp. 1156–1163, Aug 1990.
- 8[8] P. Schulte and B. C. Geiger, “Divergence scaling of fixed-length, binary-output, one-to-one distribution matching,” Aug. 2017. [Online]. Available: http://arxiv.org/abs/1701.07371
