Universal secure rank-metric coding schemes with optimal communication overheads
Umberto Mart\'inez-Pe\~nas

TL;DR
This paper introduces a method to transform rank-metric coding schemes to reduce communication overheads in secure network coding and storage systems, achieving optimal rates and universal efficiency.
Contribution
It presents a novel transformation technique for linear rank-metric codes that lowers communication overheads while maintaining security and error correction capabilities.
Findings
Achieves universally optimal communication overheads for n ≤ m
Transforms Gabidulin codes to improve efficiency
Applicable to other rank distance codes when n > m
Abstract
We study the problem of reducing the communication overhead from a noisy wire-tap channel or storage system where data is encoded as a matrix, when more columns (or their linear combinations) are available. We present its applications to reducing communication overheads in universal secure linear network coding and secure distributed storage with crisscross errors and erasures and in the presence of a wire-tapper. Our main contribution is a method to transform coding schemes based on linear rank-metric codes, with certain properties, to schemes with lower communication overheads. By applying this method to pairs of Gabidulin codes, we obtain coding schemes with optimal information rate with respect to their security and rank error correction capability, and with universally optimal communication overheads, when , being and the number of columns and number of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Universal secure rank-metric coding schemes with optimal communication overheads 111Parts of this paper have been accepted for presentation at the IEEE International Symposium on Information Theory, Aachen, Germany, June 2017. [19]
Umberto Martínez-Peñas [email protected] Department of Mathematical Sciences, Aalborg University, Denmark
Abstract
We study the problem of reducing the communication overhead from a noisy wire-tap channel or storage system where data is encoded as a matrix, when more columns (or their linear combinations) are available. We present its applications to reducing communication overheads in universal secure linear network coding and secure distributed storage with crisscross errors and erasures and in the presence of a wire-tapper. Our main contribution is a method to transform coding schemes based on linear rank-metric codes, with certain properties, to schemes with lower communication overheads. By applying this method to pairs of Gabidulin codes, we obtain coding schemes with optimal information rate with respect to their security and rank error correction capability, and with universally optimal communication overheads, when , being and the number of columns and number of rows, respectively. Moreover, our method can be applied to other families of maximum rank distance codes when . The downside of the method is generally expanding the packet length, but some practical instances come at no cost.
Keywords: Communication overheads, crisscross error-correction, decoding bandwidth, information-theoretical security, rank-metric codes.
MSC: 94A60, 94A62, 94B99.
1 Introduction
Universal secure linear network coding with errors and erasures was first studied in [22], where rank-metric coding schemes were proposed to protect messages sent over a linearly coded network from link errors, erasures and information leakage to a wire-tapper. Similarly, rank-metric codes have been applied to storage systems where data is stored as a matrix and where errors and erasures affect several rows and/or columns, also called crisscross errors and erasures [21]. These errors and erasures have been recently motivated by correlated and mixed failures in distributed storage systems where data is stored in several data centers (columns), which in turn store several blocks of data (rows). See [14].
In this paper, we study how to reduce the communication overhead from such a noisy wire-tap channel or storage system to the receiver, when more columns, or their linear combinations, are available: Less ingoing links to the receiver fail in the network case, or more data centers are available and contacted in the distributed storage case. As it has been noticed in secret sharing in the literature [2, 13, 24], which corresponds to Hamming-metric erasure-correction and security, if more pieces of data (columns in our case) are available, they can be preprocessed via subpacketization so that the overall transmitted information from the channel or storage system to the receiver is reduced.
A similar concept of subpacketization has been recently developed for Reed-Solomon codes in [10]. In another direction, coding schemes recovering part of the encoded data (a node in a storage system, for instance), with respect to the Hamming metric, have already been studied, giving rise to regenerating codes [5, 6, 20], which reduce communication bandwidth, and locally repairable codes [9, 12, 23], which reduce the number of contacted nodes. The latter codes have been recently extended to the rank metric in [14]. In contrast, our aim is to recover the whole uncoded data while reducing the communication bandwidth, as in [2, 13, 24], but with respect to the rank metric and, as a consequence, with respect to the crisscross metric.
We illustrate and motivate the problem with a pair of examples. The details of the constructions will be given in Subsection 5.1.
Example 1**.**
Consider a linearly coded network, as in [22, Sec. VII-A], over a finite field of size (-bit symbols), with packet length , number of outgoing links from the source , at least ingoing links to the sink, and where links may be wire-tapped and ingoing links to the sink may fail.
In [22, Th. 11], a coding scheme is given with optimal information rate , able to correct the given number of erasures and secure under the given number of observations over such network, independently of its inner code (universally). The overall communication overhead from the last ingoing links to the sink is of packets: The source wants to transmit uncoded packets and the sink receives encoded packets.
Thanks to Theorem 2 and dividing each packet into subpackets of length each, we will obtain a coding scheme with the same parameters, but such that the overall communication overhead at the ingoing links to the sink is of packets (the minimum possible) if none of them fail (only packets are received by the sink).
Example 2**.**
Let again and , and consider a distributed storage system where data is stored as an matrix over the same finite field (), where each column corresponds to a data center that stores symbols over , that is, -bit symbols. Assume that data centers may fail or not be available, errors occur along rows and/or columns due to certain correlations, and a wire-tapper eavesdrops data centers. Assume also that and .
As in the previous example, the use of a pair of maximum rank distance codes allows to obtain the desired reliability and security while achieving the optimal information rate (see [21]), with a communication overhead of packets from the contacted data centers to the receiver. Again, in this work we obtain a coding scheme with the same parameters but where the communication overhead is reduced to packets (the minimum possible) if no errors occur and all data centers are available and contacted.
The paper is organized as follows: In Section 2, we establish the information-theoretical setting, defining coherent linearized noisy wire-tap channels, which we take from [22], and we establish a method of subpacketization that allows to use linear codes over the extension field. In Section 3, we define communication overheads for these linearized channels and give lower bounds on these parameters similar to those in [13]. In Section 4, we give the main contribution of this paper, which is a general method to transform coding schemes based on pairs of linear rank-metric codes, with certain properties, into coding schemes with lower communication overheads. In Section 5, we apply Gabidulin codes [8, 21] to obtain coding schemes with optimal information rates and communication overheads for , which can be seen as a rank-metric analog of the constructions in [2, 13]. However, our method allows us to correct errors, and not only erasures as in the secret sharing case [2, 13], and can be applied to other families of maximum rank distance codes, such as those in [7] for . Finally, in Section 6, we discuss the applications in universal secure linear network coding and secure distributed storage with crisscross errors and erasures.
Notation
Throughout the paper, we fix a prime power and positive integers , , , , , , and . We denote by the finite field with elements, denotes the set of row vectors of length over , and denotes the set of matrices over . In this paper, a code is a subset of either or , whose linearity properties are specified in each case. For an -linear code , we will denote by its dual code with respect to the usual -bilinear inner product. We also use the notation and whenever , and we denote by , and the entropy, conditional entropy and mutual information of the random variables and , respectively (see [3]), where logarithms will always be taken with base .
2 Information-theoretical setting and preliminaries
2.1 Coherent linearized channels and coset coding schemes
We will consider the secret message to be a uniform random variable in , and we will consider noisy wire-tap channels (which can also be thought of as distributed storage systems) as given in [22]:
Definition 1** (Coherent linearized channel [22]).**
We define a coherent linearized noisy wire-tap channel with errors, erasures with erasure matrix of rank at least , and observations as a channel with input a variable , output to the receiver , and output to the eavesdropper , together with a conditional probability distribution such that
[TABLE]
where and , for a given .
In [22], it is shown that a linearly coded network over with link errors, erasures and information leakage, and where the last coding coefficients are known to the receiver, can be modelled as a coherent linearized noisy wire-tap channel. We will focus on this scenario and discuss how to translate the results to the distributed storage scenario with crisscross errors and erasures in Subsection 6.2, since the latter can be seen as a simpler case.
As encoders, we consider coset coding schemes as in [16, Def. 7], which are a particular case of those in [22].
Definition 2** (Coset coding schemes [16]).**
A coset coding scheme over the field with secret message set and coded message set is a randomized function
[TABLE]
where, for every , is the uniform random variable over a set . To allow correct decoding, we also assume that if . Finally, we define the information rate of the scheme as
[TABLE]
In linear network coding, universal reliability and security means correcting a number of link errors and erasures and being secure under a number of link observations, independently of the network inner code. This leads in [22] to the following definition:
Definition 3** (Universal schemes [22]).**
We say that the coset coding scheme is:
Universally -error and -erasure-correcting if, for every coherent linearized channel with errors, erasures and erasure matrix , there exists a decoding function such that
[TABLE]
for all and all . 2. 2.
Universally secure under observations if, for every coherent linearized channel with observations, it holds that
[TABLE]
or equivalently , for all and all .
2.2 Using linear codes over the extension field
In what follows, we will make use of codes that are linear over the extension field . To that end, we need to see how to identify matrices in with matrices in :
Definition 4**.**
Fix a basis of as a vector space over , and define the map
[TABLE]
as follows: Given a matrix with entries , for and , we define as the unique matrix with coefficients , for and , such that
[TABLE]
for and . Finally, we define the rank over of a matrix as the rank over of the matrix , and we denote it by .
The key result is that the effect of coherent linearized noisy wire-tap channels in Definition 1 remains unchanged by the map , as we will now see:
Lemma 1**.**
Let , and . It holds that
[TABLE]
and by definition.
Proof.
The additive property of is clear from the definition, so we may assume that . Denote the entries of and as in Definition 4, and let , and be the entries of , and , respectively, for , , and . It holds that
[TABLE]
[TABLE]
but it also holds that
[TABLE]
for and . Since is a basis of over and , for , and , we conclude that
[TABLE]
for , and , which means that , and the result follows. ∎
Hence we may identify the sets and , seen as -linear vector spaces together with the metric given by the rank and the function , respectively. We will do this repeatedly throughout the paper.
To conclude the section, we recall the construction of coset coding schemes in [16, Def. 4] based on pairs of -linear codes with (no subpacketization).
Definition 5** (Nested coset coding schemes [16]).**
A nested coset coding scheme (with ) is a coset coding scheme such that , where are -linear codes and is a vector space isomorphism over , for an -linear space such that , where denotes the direct sum of vector spaces.
To measure the reliability and security of these coding schemes, we need the concept of relative minimum rank distance, which is a particular case of [16, Def. 2]:
Definition 6** (Relative minimum rank distance [16]).**
Given -linear codes , we define their relative minimum rank distance as
[TABLE]
The minimum rank distance of a single code is defined as .
The next result, which follows directly from [16, Cor. 5 and Th. 4], gives the mentioned reliability and security performance of nested coset coding schemes. Recall that we denote by the dual of an -linear code with respect to the usual -bilinear inner product in .
Lemma 2** ([16]).**
Given -linear codes , the nested coset coding scheme in Definition 5 is universally -error and -erasure-correcting if, and only if, , and is universally secure under observations if, and only if, .
Observe that and , hence the minimum rank distances of and give sufficient conditions on the number of correctable errors and erasures and on the number of links that may be wire-tapped without information leakage, respectively.
3 Communication overheads in coherent linearized channels
In this section we formalize how, as in communication efficient secret sharing [2, 13, 24], if a coset coding scheme is able to correct errors and erasures, but pieces of information are available (the rank of is at least ), then we may reduce the communication overhead from the channel to the receiver by making use of the additional linearly independent rows of . Observe that only erasures, and not errors, are considered in the Hamming analog described in [2, 13, 24].
Let be a coset coding scheme, let be of rank (if has rank , we may delete or ignore linearly dependent rows), let be its rows, and let be preprocessing functions, where , for . We define their correction capability with respect to as follows:
Definition 7**.**
For a full-rank matrix , the preprocessing functions , for , are -error-correcting with respect to the coset coding scheme if there exists a decoding function such that
[TABLE]
for all , all and all error matrices of rank at most with columns .
We define then the decoding bandwidth and communication overhead as -analogs of those in [13, Def. 2]:
Definition 8** (Decoding bandwidth and communication overhead).**
For a full-rank matrix and functions , for , we define their decoding bandwidth and communication overhead, respectively, as
[TABLE]
Thus, if a packet is a vector in , then the decoding bandwidth is the amount (which need not be an integer due to the subpacketization) of packets that the receiver obtains, or needs to obtain, from the channel, and the communication overhead is the difference with respect to the original number of uncoded packets.
Observe that, fixing and (thus the information rate), we may only focus on communication overheads, since both behave equally.
To measure the quality of a coset coding scheme, we need the following two bounds. The first is given in [22, Th. 12] and can be seen as a -analog of the bound in [13, Prop. 1], although considering also errors and not only erasures:
Proposition 1** ([22]).**
If the coset coding scheme is universally -error and -erasure-correcting, and universally secure under observations, then
[TABLE]
Next we give a -analog of the bound in [13, Th. 1], again adding the effect of errors, which was not considered in [13]:
Proposition 2**.**
If the coset coding scheme is universally secure under observations, then for a full-rank matrix and preprocessing functions , for , that are -error-correcting with respect to , it holds that:
[TABLE]
Proof.
We may assume without loss of generality that as in the proof of [13, Th. 1].
First, we prove that the preprocessing functions , for are [math]-error-correcting with respect to . If they were not, then there would exist and , with , such that
[TABLE]
On the other hand, there exist such that , for , and , for . Thus we see that the preprocessing functions , for , cannot be -error-correcting with respect to , which is a contradiction.
Next, defining , where , for and , we may prove exactly as in the proof of [13, Th. 1] that
[TABLE]
and also
[TABLE]
Now using that , and combining Equations (6) and (7), we conclude that
[TABLE]
[TABLE]
and the bound on follows by substracting to this inequality. ∎
4 A general construction based on linear rank-metric codes
In this section, given a nested coset coding scheme (Definition 5) able to correct errors and erasures, for fixed positive integers and , and given an arbitrary set such that (in particular for ), we construct a coset coding scheme able to correct errors and any erasures with lower communication overheads than the original scheme, for all . Moreover, both the original scheme and the modified one are universally secure under the same number of observations. The downside of the method is multiplying the packet length of the original coset coding scheme by a parameter , depending on the involved codes, to achieve the desired subpacketization. The main result of the section is the following:
Theorem 1**.**
Take -linear codes , a positive integer such that , and choose any subset such that . Denote , where , and assume that there exists a sequence of nested -linear codes
[TABLE]
such that
[TABLE]
for . Define then , , , for , and
[TABLE]
There exists a coset coding scheme that is universally -error and -erasure-correcting if , and is universally secure under observations if .
In addition, for any and any full-rank matrix , there exist preprocessing functions , for , which are -error-correcting with respect to , whenever , and such that
[TABLE]
where , for such that .
4.1 Description of the construction for Theorem 1
Let the notation be as in Theorem 1 and take a generator matrix of and a generator matrix of of the form
[TABLE]
for some matrix . Decreasingly in , take a generator matrix of of the form
[TABLE]
for some matrix . Next define the following positive integers, which are analogous to the integers defined in [13, Eq. (11)]:
[TABLE]
Let be the secret message and generate uniformly at random a matrix . Divide and as follows:
[TABLE]
where
[TABLE]
for . Next, we define the matrices
[TABLE]
where , and where the matrices are defined iteratively as follows: For , the components of the -th column block
[TABLE]
are the components (after some fixed rearrangement) of
[TABLE]
whose size is (observe that ). For convenience, we define the matrices
[TABLE]
for .
Finally, we define the coset coding scheme by
[TABLE]
To conclude, we define as follows. For , for , and for a full-rank matrix , we define by restricting to its first rows.
4.2 Proof of Theorem 1
Let the notation be as in Theorem 1 and as in the previous subsection. We prove each statement in Theorem 1 separately:
1) The coset coding scheme is universally -error and -erasure-correcting if : Take of rank at least and an error matrix such that . Divide in the same way as and , that is,
[TABLE]
where , and observe that , for . From
[TABLE]
we obtain by Lemma 2, since and . By definition, we have obtained , for . Hence substracting from , we may obtain , and thus we obtain again by Lemma 2. Now, we have also obtained , for . Proceeding iteratively in the same way, we see that we may obtain all the matrices , for , and thus we obtain the whole message .
2) The coset coding scheme is universally secure under any observations: We first need the following preliminary lemma, which follows from [18, Th. 3]:
Lemma 3**.**
Let and let be -linear codes. If , then
[TABLE]
where , for a code .
Proof.
See Appendix A. ∎
Take , and assume that the eavesdropper obtains
[TABLE]
The random variable has support inside the -linear vector space
[TABLE]
Recall from [3, Th. 2.6.4] that, if a random variable has support in the set , then . Hence
[TABLE]
where dimensions are taken over . On the other hand, using the analogous notation for instead of , it holds that
[TABLE]
since, given a value of , the variable is a uniform random variable over an -linear affine space obtained by translating the vector space . Hence we obtain that
[TABLE]
[TABLE]
where the last equality follows from Lemma 3. Thus and we are done.
3) The preprocessing functions are -error-correcting for any , where : Fix and a full-rank matrix , and let be preprocessing functions as in the previous subsection, for , and where is such that .
Let be an error matrix such that , and let be its columns. By definition, is the -th column of
[TABLE]
for , and a submatrix of , which thus satisfies that . Therefore, we may obtain the matrix as in item 1, since . By definition, the matrices are contained in . Moreover, the matrices are also contained in , and from them we obtain by definition and . Now, the matrices are contained in and we also have , hence we may obtain by definition and . Continuing iteratively in this way, we may obtain all and hence the message .
Finally, we have that
[TABLE]
[TABLE]
5 MRD codes and coset coding schemes with optimal communication overheads
In this section, we apply Theorem 1 to pairs of Gabidulin codes [8, 21] and their cartesian products [7]. The first family yields optimal coset coding schemes when in the sense of (4) and (5), and the second family constitutes a family of maximum rank distance (MRD) codes when [7, Cor. 1].
We recall the definition of MRD codes for convenience of the reader. The Singleton bound for an arbitrary (linear or not) code was first given in [4, Th. 6.3]:
[TABLE]
We then say that is MRD if equality holds in (10). In another direction, a Singleton bound on the relative minimum rank distance of a pair of -linear codes was first given in [16, Prop. 3]:
[TABLE]
Thus if is MRD and , then equality is satisfied in (11).
5.1 Coset coding schemes based on Gabidulin codes
In this subsection we will make use of Gabidulin codes, which were introduced independently in [8, Sec. 4] and [21, Sec. III]. Throughout this subsection, we will assume that .
Definition 9** ([8, 21]).**
Fix a basis of as a vector space over , and let . The Gabidulin code of dimension and length over , constructed from the previous basis, is the -linear code with parity-check matrix given by
[TABLE]
It was proven in [8, Th. 6] and [21, Th. 2] that the code satisfies
[TABLE]
constituting thus a family of MRD codes covering all parameters when . Moreover it is clear from the definition that, for a fixed basis of over , they form a nested sequence of codes:
[TABLE]
Thus the next theorem follows directly from Theorem 1:
Theorem 2**.**
Choose integers and such that and , and choose any subset such that .
Now, fix a basis of over , let be -linear Gabidulin codes of dimensions and (that is, and ), respectively, and denote the elements in by .
The coset coding scheme in Theorem 1 based on this pair of codes and the subsequence of (13) given by the Gabidulin codes , that is, , for , satisfies , is universally -error and -erasure-correcting if , and is universally secure under observations if . In particular, the scheme is optimal in the sense of (4). Moreover, it holds that
[TABLE]
In addition, for any and any full-rank matrix , there exist preprocessing functions , for , which are -error-correcting and satisfying equality in (5), hence having optimal communication overheads for all .
Observe that the packet length of the original Gabidulin codes is multiplied by , which depends only on the maximum number of observations, the number of correctable errors and the set of possible erasures .
However, there are instances as Example 1 where, due to a particular subpacketization, we need not expand the packet length, hence we obtain a strict improvement on the communication overheads at no cost on the rest of the parameters.
We now give the details of Example 1 and Example 2, which share the same construction: With the given parameters, the construction in [22, Th. 11] gives by choosing and . However, decomposing the packet length as , with and , we may choose , , and , thus , , and , and the example follows.
5.2 Coset coding schemes based on MRD cartesian products
In this subsection, we will make use of cartesian products of Gabidulin codes, which yield again MRD codes, but in the case , in contrast with plain Gabidulin codes as in the previous subsection. To the best of our knowledge, this is the only known family of MRD -linear codes in when .
Throughout this subsection, we will assume that , for some positive integer . Take another integer , and consider the cartesian product
[TABLE]
where is a Gabidulin code as in Definition 9. It is proven in [7, Cor. 1] that
[TABLE]
and therefore is MRD. Since the codes can be taken in a nested sequence for a fixed basis of over , as in Equation (13), the next result also follows directly from Theorem 1:
Theorem 3**.**
Choose integers and such that and , and choose any subset with elements .
Define and the -linear codes
[TABLE]
for , and observe that , hence .
The coset coding scheme in Theorem 1 based on these codes satisfies , is universally -error and -erasure-correcting if , and is universally secure under observations if . Moreover, it holds that
[TABLE]
In addition, for any and any full-rank matrix , there exist preprocessing functions , for , which are -error-correcting and such that
[TABLE]
Observe that the particular case corresponds to the particular case in Theorem 2.
6 Applications
6.1 Universal secure linear network coding
Consider a network with outgoing links from a source and ingoing links to a sink, and where the source wants to transmit packets, encoded into packets (all of the same length), to the sink. Linear network coding, introduced in [1, 15, 17], consists in sending linear combinations over of the received packets at each node of the network, which increases throughput with respect to storing and forwarding.
In this scenario, link errors and erasures expand through the network and an eavesdropper may obtain linear combinations of the sent packets. Thus if the coefficients of the final linear combinations are known to the receiver, then a linearly coded network, with link errors, erasures and observations, can be modelled as a coherent linearized noisy wire-tap channel [22], as in Definition 1.
Assume that the packet length is at least , and fix positive integers , and with and . In [22, Th. 11] a construction (pairs of Gabidulin codes) is given such that , which is optimal due to (4).
However, assuming that is big enough and the erasure matrix (see Definition 1) is taken at random as in [11], then it will be full-rank with high probability and can be thought of as a number of erased ingoing links to the sink, due to noise, link failure or the action of the adversary.
Theorem 2 gives an alternative construction to [22, Th. 11] with optimal , where if more than ingoing links to the sink are available, the sink can contact the corresponding nodes after exchanging feedback on the number of available nodes, and reduce the communication overhead (hence the amount of packets received by the sink) to its optimal value in view of (5).
6.2 Secure distributed storage with crisscross errors and erasures
Errors and erasures occurring along several rows and/or columns of a matrix over are called crisscross errors and erasures in the literature, and can happen in memory chips and magnetic tapes, for instance (see [21]). Recently, crisscross error and erasure-correction has gained attention in the context of distributed storage where data is stored in several data centers (columns), which in turn store several blocks of data (rows), where mixed and/or correlated failures may occur (see [14]).
In this work, we consider a storage system where data is stored as an matrix over , where columns are thought of as data centers that are contacted to obtain information from, and rows are blocks of data expanding across the different data centers and sharing correlated errors. More formally, we consider column erasures (equivalently, data centers being available and contacted) together with crisscross errors and where an eavesdropper may listen to a number of columns (data centers).
We formalize crisscross error-correction in the following definitions, which we take from [21, Sec. I]:
Definition 10** (Crisscross weights [21]).**
A cover of a matrix is a pair of sets and such that if , then or . We then define the crisscross weight of as
[TABLE]
We may then formalize crisscross error and erasure-correction, together with security, as follows:
Definition 11**.**
For a subset , define the matrix as that constituted by the rows of the identity matrix indexed by . We say that the coset coding scheme is:
Crisscross -error and -erasure-correcting if, for every with , there exist a decoding function such that
[TABLE]
for all , all with , and all . 2. 2.
Secure under column-observations if
[TABLE]
for any matrix constituted by columns of , for all .
In this scenario, pieces of data correspond to columns, instead of linear combinations of columns, hence we will consider preprocessing functions depending on a subset of columns , where . Hence we may formalize the crisscross error-correction capability of preprocessing functions as follows:
Definition 12**.**
For a subset with , the preprocessing functions , for , are -crisscross error-correcting with respect to if there exists a decoding function such that
[TABLE]
where and denotes the -th column of , for , for all and all error matrices of crisscross weight at most with columns .
The decoding bandwidth and communication overhead of such functions are defined as in Definition 8.
We now see that the bounds (4) and (5) also hold in this context:
Proposition 3**.**
If the coset coding scheme is crisscross -error and -erasure-correcting, and secure under column-observations, then
[TABLE]
Moreover, for a subset with and preprocessing functions , for , that are -crisscross error-correcting with respect to , it holds that:
[TABLE]
Proof.
Define . We will prove that is -crisscross erasure-correcting. If it is not, then there exists a subset with , and there exist and , where , such that
[TABLE]
Next take a set of the form , where and (recall that ). There exist matrices of crisscross weight at most such that
[TABLE]
Hence cannot be crisscross -error and -erasure-correcting, and we reach a contradiction. Now, this implies that is a classical secret sharing scheme with alphabet , reconstruction and privacy . Thus it follows directly from [13, Th. 1] that
[TABLE]
and we are done.
Finally, the bound (18) can be proven in the same way as the bound (5). ∎
To conclude, we observe that a coset coding scheme, together with preprocessing functions, which are universally (rank) error and erasure-correcting and universally secure in the sense of Definitions 3 and 7 are also crisscross erasure and error-correcting and secure under a given number of column observations in the sense of Definitions 11 and 12, with exactly the same parameters. Thus all constructions in this paper can be directly translated into the context of this subsection.
For illustration purposes, we show how to translate Theorem 2 to this context, thus obtaining coset coding schemes which are optimal in the sense of (17) and (18) for all parameters, whenever .
Corollary 1**.**
Assume , choose integers and such that and , and choose any subset with elements .
The coset coding scheme in Theorem 2 with these parameters satisfies , is crisscross -error and -erasure-correcting if , and is secure under column-observations if . In particular, the scheme is optimal in the sense of (17). Moreover, it holds that
[TABLE]
In addition, for any and any subset with , there exist preprocessing functions , for , which are -crisscross error-correcting and satisfying equality in (18), hence having optimal communication overheads for all .
Observe that optimal crisscross error and erasure-correcting coding schemes can also be obtained by using maximum distance separable (MDS) codes in , by identifying this vector space with , as noticed in [21]. However, such constructions may require extremely large finite fields, for instance for Reed-Solomon codes, whereas rank-metric codes allow to obtain optimal coding schemes with the only constraint , being unrestricted, allowing in particular using binary fields ().
7 Conclusion and open problems
In this paper, we have studied the problem of reducing the communication overhead on a noisy wire-tap channel or storage system where data is encoded as a matrix. The method developed in Section 4 allows to reduce the communication overhead, when more columns are available, at the cost of expanding the packet length (number of rows). However, in the optimal case of pairs of Gabidulin codes (Section 5), strict improvements on the communication overheads are possible at no cost on the rest of the parameters, as shown in Example 1 for practical instances in the applications. We leave as open problem to study when the packet length need not be expanded. Another interesting open problem is to extend our method to codes that are linear over the base field , instead of the extension field . This would allow to use all possible MRD codes [4].
Appendix A Proof of Lemma 3
Fix -linear codes and a matrix in the rest of the appendix.
We start with an auxiliary result, which is a particular case of [18, Th. 3]:
Lemma 4** ([18]).**
It holds that
[TABLE]
where denotes the -linear vector space generated by the rows of the matrix .
Given an -linear code , consider the map defined by , for . It is surjective and its kernel is , where . Therefore
[TABLE]
Using this equation and computing dimensions, it follows that
[TABLE]
Now, using that and the previous lemma, it holds that . Hence the result follows by (19).
Acknowledgement
The author gratefully acknowledges the support from The Danish Council for Independent Research (Grant No. DFF-4002-00367 and Grant No. DFF-5137-00076B “EliteForsk-Rejsestipendium”), and is thankful for the guidance of his advisors Olav Geil and Diego Ruano. This manuscript was written in part when the author was visiting the University of Toronto. He greatly appreciates the support and hospitality of Frank R. Kschischang, and is thankful for valuable discussions on this work.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] R. Ahlswede, N. Cai, S. Y. R. Li, and R. W. Yeung, “Network information flow,” IEEE Trans. Inform. Theory , vol. 46, no. 4, pp. 1204–1216, 2000.
- 2[2] R. Bitar and S. E. Rouayheb, “Staircase codes for secret sharing with optimal communication and read overheads,” in Proc. 2016 IEEE International Symposium on Information Theory (ISIT) , 2016, pp. 1396–1400.
- 3[3] T. M. Cover and J. A. Thomas, Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing), 2nd Edition . Wiley-Interscience, 2006.
- 4[4] P. Delsarte, “Bilinear forms over a finite field, with applications to coding theory,” Journal of Combinatorial Theory, Series A , vol. 25, no. 3, pp. 226–241, 1978.
- 5[5] A. G. Dimakis, P. B. Godfrey, Y. Wu, M. J. Wainwright, and K. Ramchandran, “Network coding for distributed storage systems,” IEEE Trans. Inform. Theory , vol. 56, no. 9, pp. 4539–4551, 2010.
- 6[6] A. G. Dimakis, K. Ramchandran, Y. Wu, and C. Suh, “A survey on network codes for distributed storage,” Proceedings of the IEEE , vol. 99, no. 3, pp. 476–489, 2011.
- 7[7] E. M. Gabidulin, A. V. Ourivski, B. Honary, and B. Ammar, “Reducible rank codes and their applications to cryptography,” IEEE Trans. Inform. Theory , vol. 49, no. 12, pp. 3289–3293, 2003.
- 8[8] E. M. Gabidulin, “Theory of codes with maximum rank distance,” Probl. Inf. Transm. , vol. 21, no. 1, pp. 1–12, 1985.
