Convertible Codes: Efficient Conversion of Coded Data in Distributed   Storage

Francisco Maturana; K. V. Rashmi

arXiv:1907.13119·cs.IT·July 31, 2019

Convertible Codes: Efficient Conversion of Coded Data in Distributed Storage

Francisco Maturana, K. V. Rashmi

PDF

Open Access

TL;DR

This paper introduces a new class of codes called convertible codes that enable resource-efficient conversion of encoded data in distributed storage, reducing overhead compared to traditional re-encoding methods.

Contribution

The authors formalize code conversion, define convertible codes, and provide optimal constructions with tight bounds on resource usage in the merge regime.

Findings

01

Achieved tight bounds on node accesses during code conversion.

02

Constructed explicit MDS convertible codes optimal in the merge regime.

03

Provided low-field-size constructions for a broad parameter range.

Abstract

Large-scale distributed storage systems typically use erasure codes to provide durability of data in the face of failures. A set of $k$ blocks to be stored is encoded using an $[n, k]$ code to generate $n$ blocks that are then stored on different storage nodes. The redundancy configuration is chosen based on the failure rates of storage devices, and is typically kept constant. However, a recent work by Kadekodi et al. shows that the failure rate of storage devices vary significantly over time, and that adapting the redundancy configuration in response to such variations provides significant benefits. Converting the redundancy configuration of already encoded data by re-encoding requires significant overhead on resources such as accesses, device IO, network bandwidth, and compute cycles. In this work, we first present a framework to formalize the notion of code conversion: the process…

Equations43

∣ W_{2} ∣ = min {k^{I} - ∣ W_{1} ∣, r^{F}} .

∣ W_{2} ∣ = min {k^{I} - ∣ W_{1} ∣, r^{F}} .

∣ D_{i} ∣ \geq ∣ W_{1} ∣ + ∣ W_{2} ∣.

∣ D_{i} ∣ \geq ∣ W_{1} ∣ + ∣ W_{2} ∣.

∣ D_{i} ∣ \geq ∣ W_{1} ∣ + r^{F} .

∣ D_{i} ∣ \geq ∣ W_{1} ∣ + r^{F} .

∣ D_{i} ∣ \leq ∣ W_{1} ∣ + r^{I} .

∣ D_{i} ∣ \leq ∣ W_{1} ∣ + r^{I} .

∣ D_{i} ∣ \geq k^{I} - max {∣ U_{i} ∣ - r^{F}, 0} \geq min {r^{F}, k^{I}},

∣ D_{i} ∣ \geq k^{I} - max {∣ U_{i} ∣ - r^{F}, 0} \geq min {r^{F}, k^{I}},

∣ W_{3} ∣ = max {0, ∣ W_{2} ∣ - r^{F}} .

∣ W_{3} ∣ = max {0, ∣ W_{2} ∣ - r^{F}} .

∣ D_{i} ∣ + ∣ W_{3} ∣ \geq k^{I} .

∣ D_{i} ∣ + ∣ W_{3} ∣ \geq k^{I} .

∣ D_{i} ∣ + ∣ W_{2} ∣ - r^{F} \geq k^{I} .

∣ D_{i} ∣ + ∣ W_{2} ∣ - r^{F} \geq k^{I} .

∣ D_{i} ∣ + ∣ U_{i} ∣ - ∣ W_{1} ∣ \leq k^{I} + r^{I} .

∣ D_{i} ∣ + ∣ U_{i} ∣ - ∣ W_{1} ∣ \leq k^{I} + r^{I} .

P^{I} = 111 1 θ θ^{2} 1 θ^{2} θ^{4} P^{F} = 111111 1 θ θ^{2} θ^{3} θ^{4} θ^{5} 1 θ^{2} θ^{4} θ^{6} θ^{8} θ^{10}

P^{I} = 111 1 θ θ^{2} 1 θ^{2} θ^{4} P^{F} = 111111 1 θ θ^{2} θ^{3} θ^{4} θ^{5} 1 θ^{2} θ^{4} θ^{6} θ^{8} θ^{10}

P^{F} = P_{r^{F}}^{I} P_{r^{F}}^{I} diag (1, θ^{k^{I}}, θ^{2 k^{I}}, \dots, θ^{k^{I} (r^{F} - 1)}) P_{r^{F}}^{I} diag (1, θ^{2 k^{I}}, θ^{2 \cdot 2 k^{I}}, \dots, θ^{2 k^{I} (r^{F} - 1)}) ⋮ P_{r^{F}}^{I} diag (1, θ^{(λ - 1) k^{I}}, θ^{2 (λ - 1) k^{I}}, \dots, θ^{(λ - 1) k^{I} (r^{F} - 1)}),

P^{F} = P_{r^{F}}^{I} P_{r^{F}}^{I} diag (1, θ^{k^{I}}, θ^{2 k^{I}}, \dots, θ^{k^{I} (r^{F} - 1)}) P_{r^{F}}^{I} diag (1, θ^{2 k^{I}}, θ^{2 \cdot 2 k^{I}}, \dots, θ^{2 k^{I} (r^{F} - 1)}) ⋮ P_{r^{F}}^{I} diag (1, θ^{(λ - 1) k^{I}}, θ^{2 (λ - 1) k^{I}}, \dots, θ^{(λ - 1) k^{I} (r^{F} - 1)}),

det (R) = σ \in Perm (t) \sum sgn (σ) l = 1 \prod t R [l, σ (l)] = σ \in Perm (t) \sum sgn (σ) θ^{E_{σ}}

det (R) = σ \in Perm (t) \sum sgn (σ) l = 1 \prod t R [l, σ (l)] = σ \in Perm (t) \sum sgn (σ) θ^{E_{σ}}

where E_{σ} = l = 1 \sum t (i_{l} - 1) (j_{σ (l)} - 1),

E_{σ^{'}} - E_{σ}

E_{σ^{'}} - E_{σ}

= (i_{b} - i_{a}) (j_{c} - j_{a}) > 0

E^{*} (λ, k^{I}, r^{I}, r^{F})

E^{*} (λ, k^{I}, r^{I}, r^{F})

= (1/6) \cdot max ⎩ ⎨ ⎧ r^{F} (r^{F} - 1) (3 λ k^{I} - r^{F} - 1), r^{I} (r^{I} - 1) (3 k^{I} - r^{I} - 1), k^{I} (k^{I} - 1) (3 r^{I} - k^{I} - 1) ⎭ ⎬ ⎫ .

T_{m}:\begin{array}[]{cccccc}b_{1}&b_{2}&b_{3}&\cdots&b_{m-1}&b_{m}\\ b_{2}&b_{3}&\cdots&\cdots&b_{m}\\ b_{3}&\vdots&\text{\reflectbox{$\ddots$}}&\text{\reflectbox{$\ddots$}}\\ \vdots&\vdots&\text{\reflectbox{$\ddots$}}\\ b_{m-1}&b_{m}\\ b_{m}\\ \end{array}

T_{m}:\begin{array}[]{cccccc}b_{1}&b_{2}&b_{3}&\cdots&b_{m-1}&b_{m}\\ b_{2}&b_{3}&\cdots&\cdots&b_{m}\\ b_{3}&\vdots&\text{\reflectbox{$\ddots$}}&\text{\reflectbox{$\ddots$}}\\ \vdots&\vdots&\text{\reflectbox{$\ddots$}}\\ b_{m-1}&b_{m}\\ b_{m}\\ \end{array}

P^{I} = \vline p_{1} \vline \vline p_{2} \vline \vline p_{3} \vline \vline p_{4} \vline P^{F} = [p_{1} p_{3} p_{2} p_{4}]

P^{I} = \vline p_{1} \vline \vline p_{2} \vline \vline p_{3} \vline \vline p_{4} \vline P^{F} = [p_{1} p_{3} p_{2} p_{4}]

P^{F} = b_{1} ⋮ b_{λ k^{I}} \dots ⋱ \dots b_{r^{F}} ⋮ b_{λ k^{I} + r^{F} - 1} Q = b_{1} ⋮ b_{k}^{I} \dots ⋱ \dots b_{(λ - 1) k^{I} + r^{F}} ⋮ b_{λ k^{I} + r^{F} - 1}

P^{F} = b_{1} ⋮ b_{λ k^{I}} \dots ⋱ \dots b_{r^{F}} ⋮ b_{λ k^{I} + r^{F} - 1} Q = b_{1} ⋮ b_{k}^{I} \dots ⋱ \dots b_{(λ - 1) k^{I} + r^{F}} ⋮ b_{λ k^{I} + r^{F} - 1}

P^{I} = \vline p_{1} \vline \vline p_{2} \vline \vline p_{3} \vline P^{F} = [p_{1} p_{2} p_{2} p_{3}]

P^{I} = \vline p_{1} \vline \vline p_{2} \vline \vline p_{3} \vline P^{F} = [p_{1} p_{2} p_{2} p_{3}]

P^{I} = b_{1} b_{2} ⋮ b_{k^{I}} b_{k^{I} + 1} b_{k^{I} + 2} ⋮ b_{2 k^{I}} \dots \dots ⋱ \dots b_{(r^{I} - 1) k^{I} + 1} b_{(r^{I} - 1) k^{I} + 2} ⋮ b_{r^{I} k^{I}} P^{F} = b_{1} b_{2} ⋮ b_{λ k^{I}} b_{k^{I} + 1} b_{k^{I} + 2} ⋮ b_{(λ + 1) k^{I}} \dots \dots ⋱ \dots b_{(r^{F} - 1) k^{I} + 1} b_{(r^{F} - 1) k^{I} + 2} ⋮ b_{(λ + r^{F} - 1) k^{I}}

P^{I} = b_{1} b_{2} ⋮ b_{k^{I}} b_{k^{I} + 1} b_{k^{I} + 2} ⋮ b_{2 k^{I}} \dots \dots ⋱ \dots b_{(r^{I} - 1) k^{I} + 1} b_{(r^{I} - 1) k^{I} + 2} ⋮ b_{r^{I} k^{I}} P^{F} = b_{1} b_{2} ⋮ b_{λ k^{I}} b_{k^{I} + 1} b_{k^{I} + 2} ⋮ b_{(λ + 1) k^{I}} \dots \dots ⋱ \dots b_{(r^{F} - 1) k^{I} + 1} b_{(r^{F} - 1) k^{I} + 2} ⋮ b_{(λ + r^{F} - 1) k^{I}}

r^{F} \leq (s - λ + 1) ⌊ \frac{r ^{I}}{s} ⌋ + max {(r^{I} mod s) - λ + 1, 0}, for q \geq s k^{I} + ⌊ \frac{r ^{I}}{s} ⌋ - 1.

r^{F} \leq (s - λ + 1) ⌊ \frac{r ^{I}}{s} ⌋ + max {(r^{I} mod s) - λ + 1, 0}, for q \geq s k^{I} + ⌊ \frac{r ^{I}}{s} ⌋ - 1.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Storage Technologies · Caching and Content Delivery · Distributed systems and fault tolerance

Full text

Convertible Codes: Efficient Conversion of Coded Data in Distributed Storage

Francisco Maturana and K. V. Rashmi

Computer Science Department

Carnegie Mellon University

{fmaturan, rvinayak}@cs.cmu.edu

Abstract

Large-scale distributed storage systems typically use erasure codes to provide durability of data in the face of failures. A set of $k$ blocks to be stored is encoded using an $[n,k]$ code to generate $n$ blocks that are then stored on different storage nodes. The redundancy configuration (that is, the parameters $n$ and $k$ ) is chosen based on the failure rates of storage devices, and is typically kept constant. However, a recent work by Kadekodi et al. shows that the failure rate of storage devices vary significantly over time, and that adapting the redundancy configuration in response to such variations provides significant benefits: a $11\%$ to $44\%$ reduction in storage space requirement, which translates to enormous amounts of savings in resources and energy in large-scale storage systems. However, converting the redundancy configuration of already encoded data by simply re-encoding (the default approach) requires significant overhead on system resources such as accesses, device IO, network bandwidth, and compute cycles.

In this work, we first present a framework to formalize the notion of code conversion—the process of converting data encoded with an $[n^{I},k^{I}]$ code into data encoded with an $[n^{F},k^{F}]$ code while maintaining desired decodability properties, such as the maximum-distance-separable (MDS) property. We then introduce convertible codes, a new class of codes that allow for code conversions in a resource-efficient manner. For an important parameter regime (which we call the merge regime) along with the widely used linearity and MDS decodability constraint, we prove tight bounds on the number of nodes accessed during code conversion. In particular, our achievability result is an explicit construction of MDS convertible codes that are optimal for all parameter values in the merge regime albeit with a high field size. We then present explicit low-field-size constructions of optimal MDS convertible codes for a broad range of parameters in the merge regime. Our results thus show that it is indeed possible to achieve code conversions with significantly lesser resources as compared to the default approach of re-encoding.

I Introduction

Large-scale distributed storage systems form the bedrock of modern data processing systems. Such storage systems comprise hundreds of thousands of storage devices and routinely face failures in their day-to-day operation [1, 2, 3, 4]. In order to provide resiliency against such failures, storage systems employ redundancy, typically in the form of erasure codes [5, 6, 7, 8]. Under erasure coding, a set of $k$ data blocks to be stored is encoded using an $[n,\ k]$ code to generate $n$ coded blocks. A set of $n$ encoded blocks that correspond to the same $k$ original data blocks is called a “stripe”. Each of the $n$ coded blocks in a stripe is stored on a different storage node (typically chosen from different failure domains). The amount of redundancy added using an erasure code is a function of the redundancy configuration, that is, parameters $n$ and $k$ . These parameters are chosen so as to achieve predetermined thresholds on reliability and availability, such as the mean-time-to-data-loss (MTTDL).

The key factor that determines MTTDL for chosen parameters is the failure rate of the storage devices In a recent work [9], Kadekodi et al. show that failure rates of storage devices in large-scle storage systems vary significantly over time (for example, by more than 3.5-fold for certain disk families). Thus, it is advantageous to change the redundancy configuration in response to such variations Kadekodi et al. [9] present a case for tailoring erasure code parameters to the observed failure rates and show that an $11\%$ to $44\%$ reduction in storage space can be achieved by adapting the redundancy configuration according to the changing failure rates. Such a reduction in storage space requirement translates to significant savings in the cost of resources and energy consumed in large-scale storage systems.

In particular, disk failure rates exhibit a bathtub curve during the lifetime of disks, which is characterized by three phases: infancy, useful life, and wearout, in that order [9]. Disk failure rate during infancy and wearout can be multiple times higher than during useful life. As a consequence, the chosen redundancy setting will likely be too high for some periods, which is a waste of resources, and too low for other periods, which increases the risk of data loss. Kadekodi et al. [9] address this problem by changing the code rate (that is, the parameters of the erasure coding scheme) as the devices go through different phases of life. For example, given a group of nodes with certain failure characteristics, the system may use a $[14,10]$ code during infancy, then convert to a $[24,20]$ code during useful life, and finally convert back to a $[14,10]$ code during wearout. We refer the reader to [9] for an in-depth study on failure rate variations and the advantages of adapting the erasure-code parameters with these variations.

Adapting the redundancy configuration requires modifying the code rate for all the stripes that have at least one block stored on a certain disk group when the failure rate of that disk group changes by more than a threshold amount [9]. Changing the code rate, that is the parameters of the erasure code, employed on already encoded data can be highly resource intensive, potentially requiring to access multiple storage devices, read large amounts of data, transfer it over the network, and re-encode it. Modifying the code parameters using the default approach requires reading at least $k$ blocks from each stripe, transferring over the network and re-encoding. In large-scale storage systems, disks are deployed in large batches, and hence a large number of disks go through failure-rate transitions concurrently. Thus, adapting redundancy configuration by using the default approach of re-encoding generates highly varying and prohibitively large load spikes, which adversely affect the foreground traffic. This places significant burden on precious cluster resources such as accesses, disk IO, network bandwidth, and computation cycles (CPU). Furthermore, in some cases these conversions need be performed urgently, such as the case where there is an unexpected rise in failure rates and conversion is necessary to reduce the risk of data loss. In such cases, it is necessary to be able to perform fast conversions. Motivated by these applications, in this paper, we initiate a formal study of such code conversions by exploring the following questions:

•

What are the fundamental limits on resource consumption of code conversions?

•

How can one design codes that efficiently facilitate code conversions?

Formally, the goal is to convert data that is already encoded using an $[n^{I},k^{I}]$ code (denoted by $\mathcal{C}^{I}$ ) into data encoded using an $[n^{F},k^{F}]$ code (denoted by $\mathcal{C}^{F}$ )111The superscripts $I$ and $F$ stand for initial and final respectively, representing the initial and final state of the conversion., with desired constraints on decodability such as both initial and final codes satisfying the maximum-distance-separable (MDS) property. Clearly, it is always possible to read the original data (and decode if needed) and re-encode according to $\mathcal{C}^{F}$ . However, such a re-encoding approach requires accessing several nodes ( $k^{I}$ nodes per stripe for MDS codes), reading out all the data, transferring over the network, and re-encoding, which consumes large amounts of access, disk IO, network bandwidth, and CPU resources.

The question then is whether one can perform such conversions in a more resource-efficient manner, while satisfying the decodability constraints. We now present an example showing how resource-efficient conversion can be achieved in a simple manner for certain parameters.

Example 1.

Consider $n^{I}=k^{I}+1,\,n^{F}=k^{F}+1$ , and $k^{F}=2k^{I}$ , with the requirement that both $\mathcal{C}^{I}$ and $\mathcal{C}^{F}$ are MDS. This conversion can be achieved by “merging” two stripes of the initial code into one stripe, for each stripe of the final code. Let us focus on the number of blocks accessed during conversion. Using the default approach of re-encoding to achieve the conversion requires accessing $k^{I}$ blocks from two stripes of encoded data under $\mathcal{C}^{I}$ (initial stripes) to create one stripe of encoded data under $\mathcal{C}^{F}$ (final stripe). That is, each stripe of encoded data under the final code $\mathcal{C}^{F}$ requires accessing $2k^{I}$ blocks. Alternatively, as depicted in Figure 1, one can choose $\mathcal{C}^{I}$ and $\mathcal{C}^{F}$ to be systematic, single-parity-check codes, with the parity block holding the XOR of the data blocks in each stripe (shown with a shaded box in the figure). To convert from $\mathcal{C}^{I}$ to $\mathcal{C}^{F}$ , one can compute the XOR between the single parity in each stripe, and store the result as the parity block for the stripe under $\mathcal{C}^{F}$ . This alternative approach requires accessing only two blocks for each final stripe, and thus is significantly more efficient in the number of accessed blocks as compared to the default approach.

In this paper, we first propose a novel framework that formalizes the concept of code conversion, that is, the process of converting data encoded with an $[n^{I},k^{I}]$ code into data encoded with an $[n^{F},k^{F}]$ code while maintaining desired decodability properties, such as maximum-distance-separable (MDS) property. We then introduce a new class of code pairs, which we call convertible codes, which allow for resource-efficient conversions. We begin the study of this new class of code pairs, by focusing on an important regime where $k^{F}=\lambda k^{I}$ for any integer $\lambda\geq 2$ with arbitrary values of $n^{I}$ and $n^{F}$ , which we call the merge regime. Furthermore, we focus on the access cost of code conversion, which corresponds to the total number of nodes that participate in the conversion. Keeping the number of nodes accessed small makes conversion less disruptive and allows the unaffected nodes to remain available for serving client requests. In addition, reducing the number of accesses also reduces disk IO, network bandwidth and CPU consumed.

We prove tight bounds on the access cost of conversions for linear MDS codes in the merge regime. In particular, our achievability result is an explicit construction of MDS convertible codes that are access-optimal for all parameters values in the merge regime albeit with a high field size. Finally, we present a sequence of practical low-field-size constructions of access-optimal MDS convertible codes in the merge regime based on Hankel arrays. These constructions lead to a tradeoff between field size and the parameter values they cover with the two extreme points corresponding to (1) $n^{F}-k^{F}\leq\lfloor(n^{I}-k^{I})/\lambda\rfloor$ requiring a field size $q\geq\max\{n^{I}-1,n^{F}-1\}$ , and (2) $n^{F}-k^{F}\leq n^{I}-k^{I}-\lambda+1$ requiring a field size $q\geq k^{I}r^{I}$ . Thus, our results show that code conversions can be achieved with a significantly lesser resource overhead as compared to the default approach of re-encoding. Furthermore, all the constructions presented have the added benefit that they continue to be optimal for a wide range of parameters, which allows to handle the case where the parameters of the final code are unknown a priori.

The rest of the paper is organized as follows. Section II discusses related work. Section III formalizes the notion of code conversions and presents a framework for studying convertible codes. Section IV shows the derivation of lower bounds on the access cost of conversions for linear MDS codes in the merge regime. Section V describes a general explicit construction for MDS codes in the merge regime that meets the access cost lower bounds, albeit with a high field size. Section VI describes low-field-size constructions for MDS codes in the merge regime, which provide a tradeoff between field size and range of parameter values they cover. Finally, Section VII presents our conclusions and discuss future directions.

II Related work

There is extensive literature on the use of erasure codes for reliable data storage. In storage systems, failures can be effectively modeled as erasures, and thereby, erasure codes can be used to provide tolerance to failures, at the cost of some storage overhead [10, 11]. Maximum distance separable (MDS) codes are often used for this purpose, since they achieve the optimal tradeoff between failure tolerance and storage overhead. A well-known and often-used family of MDS codes is Reed-Solomon codes [12].

When using erasure codes in storage systems, a host of other overheads and performance metrics, in addition to storage overhead, comes into picture. Encoding/decoding complexity, node repair performance, degraded read performance, field size, and other metrics can significantly affect real system performance. Several works in the literature have studied these aspects.

The encoding and decoding of data, and the finite field arithmetic that they require, can be compute intensive. Motivated by this, array codes [13, 14, 15, 16] are designed to use XOR operations exclusively, which are typically faster to execute, and aim to decrease the complexity of encoding and decoding.

The repair of failed nodes can incur a large amount of data read and transfer, burdening device IO and network bandwidth. Several approaches have been proposed to alleviate the impact of repair operations. Dimakis et al. [17] proposed a new class of codes called regenerating codes that minimize the amount of network bandwidth consumed during repair operations. Several explicit constructions of regenerating codes have been proposed (for example, see [18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28]) as well as generalizations (for example, see [29, 30, 31]). It has been shown that meeting the lower bound on the repair bandwidth requirement when MDS property and high rate are desired necessitates a large value for the so called “sub-packetization” [32, 33, 34, 35], which negatively affects certain key performance metrics in storage systems [3]. To overcome this issue, several works [36, 37, 38] have proposed code constructions that relax the requirement of meeting lower bounds on IO and bandwidth requirements for repair operations. For example, the Piggybacking framework [37] provides a general framework to construct repair-efficient codes by transforming any existing codes, while allowing a small sub-packetization (even as small as $2$ ). The above discussed works construct vector codes in order to improve the efficiency of repair operation. The papers [39, 40, 41] propose repair algorithms for (scalar) Reed-Solomon codes that reduce the network bandwidth consumed during repair by downloading elements from a subfield rather than the finite field over which the code is constructed. Network bandwidth consumed is another metric to optimize for during conversion. In this paper, we only focus on the access cost.

Another class of codes, called local codes [42, 43, 44, 45, 46, 47, 48, 49, 50, 51], focuses on the locality of codeword symbols during repair, that is, the number of nodes that need to be accessed when repairing a single failure. Local codes improve repair and degraded read performance, since missing information can be recovered without having to recover the full data. The locality metric for repair that local codes optimize for is similar to the access cost metric for conversion that we optimize for in this work as both these metrics aim to minimize the number of nodes accessed.

There are several classical techniques for creating new codes from existing ones [12]. For example, techniques such as puncturing, extending, shortening, and others which can be used to modify codes. These techniques, however, do not consider the cost of performing such modifications to data that is already encoded, which is the focus of our work.

Several works [52, 53] study the problem of two stage encoding: first generating a certain number of parities during the encoding process and then adding additional parities. As discussed in [52], adding additional parities can be conceptually viewed as a repair process by considering the new parity nodes to be generated as failed nodes. Furthermore, as shown in [19], for MDS codes, the bandwidth requirement for repair of even a single node is lower bounded by the same amount as in regenerating codes that require repair of all nodes. Thus one can always employ a regenerating code to add additional parities with minimum bandwidth overhead. However, when MDS property and high rate are desired, as discussed above, using regenerating codes requires a large sub-packetization. The paper [53] employs the Piggybacking framework [36, 37] to construct codes that overcome the issue of large sub-packetization factor. The scenario of adding a fixed number of additional parities, when viewed under the setting of conversions, corresponds to having $k^{I}=k^{F}$ and $n^{I}<n^{F}$ .

Another related work [54] proposes a storage system that uses two erasure codes. One of the codes prioritizes the network bandwidth required for recovery, while the other prioritizes storage overhead, and data is converted between the two codes according to the workload. This application constitutes another motivation for resource-efficient conversions. To reduce the cost of code conversion, the system [54] uses product codes [12] and locally repairable codes [44], and the local parities are leveraged during conversion. The authors, however, choose codes from these two families ad hoc, and do not focus on the problem of designing these codes to minimize the cost of code conversion.

Several works [55, 56, 57] study the update operation in erasure coded storage systems, and the problem of maintaining consistency in such mutable storage systems. The cost of updates is another metric to optimize for in convertible codes, which we do not consider in this paper. In the current paper, the focus is on immutable storage systems which comprise a vast majority of large-scale storage systems.

III A framework for studying code conversions

In this section, we formally define and study code conversions and introduce convertible codes.

Suppose one wants to convert data that is already encoded using an $[n^{I},k^{I}]$ initial code $\mathcal{C}^{I}$ into data encoded using an $[n^{F},k^{F}]$ final code $\mathcal{C}^{F}$ . Assume, without loss of generality, that each node has a fixed storage capacity $\alpha$ . In the initial and final configurations, the system stores the same information, but encoded differently. In order to capture the changes in the dimension of the code during conversion, we consider $M=\operatorname{lcm}(k^{I},k^{F})$ number of “message” symbols (i.e., the data to be stored) over a finite field $\mathbb{F}_{q}$ , denoted by $\mathbf{m}\in\mathbb{F}_{q}^{M}$ . This corresponds to multiple stripes in the initial and final configurations. We note that this need for considering multiple stripes in order to capture the smallest instance of the problem deviates from existing literature on the repair problem in distributed storage codes where a single stripe is sufficient to capture the problem.

Since there are multiple stripes, we first specify an initial partition $\mathcal{P}_{I}$ and a final partition $\mathcal{P}_{F}$ of the set $[M]$ , which map the message symbols of $\mathbf{m}$ to their corresponding initial and final stripes. The initial partition $\mathcal{P}_{I}\subseteq 2^{[M]}$ is composed of $M/k^{I}$ disjoint subsets of size $k^{I}$ , and the final partition $\mathcal{P}_{F}\subseteq 2^{[M]}$ is composed of $M/k^{F}$ disjoint subsets of size $k^{F}$ . In the initial (respectively, final) configuration, the data indexed by each subset $S\in\mathcal{P}_{I}\ (\text{respectively},\mathcal{P}_{F})$ is encoded using the code $\mathcal{C}^{I}\ (\text{respectively},\mathcal{C}^{F})$ . The codewords $\{\mathcal{C}^{I}(\mathbf{m}_{S}),\,S\in\mathcal{P}_{I}\}$ are referred to as initial stripes, and the codewords $\{\mathcal{C}^{F}(\mathbf{m}_{S}),\,S\in\mathcal{P}_{F}\}$ are referred to as final stripes, where $\mathbf{m}_{S}$ corresponds to the projection of $\mathbf{m}$ onto the coordinates in $S$ and $\mathcal{C}(\mathbf{m}_{S})$ is the encoding of $\mathbf{m}_{S}$ under code $\mathcal{C}$ . We now formally define code conversion and convertible codes.

Definition 1 (Code conversion).

A conversion from an initial code $\mathcal{C}^{I}$ to a final code $\mathcal{C}^{F}$ with initial partition $\mathcal{P}_{I}$ and final partition $\mathbf{P}^{F}$ is a procedure, denoted by $T_{{\mathcal{C}^{I}}\!\to{\mathcal{C}^{F}}}$ , that for any $\mathbf{m}$ , takes the set of initial stripes $\{\mathcal{C}^{I}(\mathbf{m}_{S})\mid S\in\mathcal{P}_{I}\}$ as input, and outputs the corresponding set of final stripes $\{\mathcal{C}^{F}(\mathbf{m}_{S})\mid S\in\mathcal{P}_{F}\}$ .

The descriptions of the initial and final partitions and codes, along with the conversion procedure, define a convertible code.

Definition 2 (Convertible code).

A $({n^{I}},{k^{I}};{n^{F}},{k^{F}})$ convertible code over $\mathbb{F}_{q}$ is defined by: (1) a pair of codes $(\mathcal{C}^{I},\mathcal{C}^{F})$ where $\mathcal{C}^{I}$ is an $[n^{I},k^{I}]$ code over $\mathbb{F}_{q}$ and $\mathcal{C}^{F}$ is an $[n^{F},k^{F}]$ code over $\mathbb{F}_{q}$ ; (2) a pair of partitions $\mathcal{P}_{I},\mathcal{P}_{F}$ of $[M=\operatorname{lcm}(k^{I},k^{F})]$ such that each subset in $\mathcal{P}_{I}$ is of size $k^{I}$ and each subset in $\mathcal{P}_{F}$ is of size $k^{F}$ ; and (3) a conversion procedure $T_{{\mathcal{C}^{I}}\!\to{\mathcal{C}^{F}}}$ that on input $\{\mathcal{C}^{I}(\mathbf{m}_{S})\mid S\in\mathcal{P}_{I}\}$ outputs $\{\mathcal{C}^{F}(\mathbf{m}_{S})\mid S\in\mathcal{P}_{F}\}$ for all $\mathbf{m}\in\mathbb{F}_{q}^{M}$ .

In addition, typically additional constraints on the distance (i.e., decodability) of the codes $\mathcal{C}^{I}$ and $\mathcal{C}^{F}$ would be imposed, such as requiring both codes to be MDS.

Example 2.

Suppose we want to transition from a $[n^{I}=3,k^{I}=2]$ code $\mathcal{C}^{I}$ to a $[n^{F}=5,k^{F}=3]$ code $\mathcal{C}^{F}$ . We consider data $\mathbf{m}$ of length $M=\operatorname{lcm}(k^{I}=2,k^{F}=3)=6$ . In the initial configuration, the data is partitioned into three stripes, each one composed of three blocks encoding two message symbols. For example, if $\mathcal{P}_{I}=\{\{1,2\},\{3,4\},\{5,6\}\}$ then the initial stripes are $\mathcal{C}^{I}(\mathbf{m}_{1},\mathbf{m}_{2}),\,\mathcal{C}^{I}(\mathbf{m}_{3},\mathbf{m}_{4})$ , and $\mathcal{C}^{I}(\mathbf{m}_{5},\mathbf{m}_{6})$ . In the final configuration, the data is partitioned into two stripes, each one composed of five blocks encoding three message symbols. For example, if $\mathcal{P}_{F}=\{\{1,2,3\},\{4,5,6\}\}$ then the final stripes are $\mathcal{C}^{F}(\mathbf{m}_{1},\mathbf{m}_{2},\mathbf{m}_{3})$ , and $\mathcal{C}^{F}(\mathbf{m}_{4},\mathbf{m}_{5},\mathbf{m}_{6})$ . Note that a different valid final partition could have been $\mathcal{P}_{F}=\{\{1,3,5\},\{2,4,6\}\}$ .

The conversion procedure $T_{{\mathcal{C}^{I}}\!\to{\mathcal{C}^{F}}}$ must take $\{\mathcal{C}^{I}(\mathbf{m}_{1},\mathbf{m}_{2}),\,\mathcal{C}^{I}(\mathbf{m}_{3},\mathbf{m}_{4}),\,\mathcal{C}^{I}(\mathbf{m}_{5},\mathbf{m}_{6})\}$ as input, and output $\{\mathcal{C}^{F}(\mathbf{m}_{1},\mathbf{m}_{2},\mathbf{m}_{3}),\allowbreak\mathcal{C}^{F}(\mathbf{m}_{4},\mathbf{m}_{5},\mathbf{m}_{6})\}$ . In this example, the codes $\mathcal{C}^{I},\,\mathcal{C}^{F}$ , the partitions $\mathcal{P}_{I},\,\mathcal{P}_{F}$ , and procedure $T_{{\mathcal{C}^{I}}\!\to{\mathcal{C}^{F}}}$ define a $({n^{I}={3}},{k^{I}={2}};{n^{F}={5}},{k^{F}={3}})$ convertible code.

*Remark 1**.*

Note that the definition of convertible codes (Definition 2) assumes that $({n^{I}},{k^{I}};{n^{F}},{k^{F}})$ are fixed a priori, and are known at code construction time. This will be helpful in understanding the fundamental limits of the conversion process. In practice, this assumption might not always hold. For example, the parameters $n^{F},k^{F}$ depend on the node failure rates that are yet to be observed. Interestingly, it is indeed possible for a $({n^{I}},{k^{I}};{n^{F}},{k^{F}})$ convertible code to facilitate conversion for multiple values of $n^{F},k^{F}$ , as is the case for the code constructions presented in this paper.

The overhead of conversion in a convertible code is determined by the cost of the conversion procedure $T_{{\mathcal{C}^{I}}\!\to{\mathcal{C}^{F}}}$ , as a function of the parameters $({n^{I}},{k^{I}};{n^{F}},{k^{F}})$ . Towards minimizing the overhead of the conversion, our general objective is to design codes $(\mathcal{C}^{I},\mathcal{C}^{F})$ , partitions $(\mathcal{P}_{I},\mathcal{P}_{F})$ and conversion procedure $T_{{\mathcal{C}^{I}}\!\to{\mathcal{C}^{F}}}$ that satisfy Definition 2 and minimize the conversion cost for given parameters $({n^{I}},{k^{I}};{n^{F}},{k^{F}})$ , subject to desired decodability constraints on $\mathcal{C}^{I}$ and $\mathcal{C}^{F}$ .

Depending on the relative importance of various resources in the cluster, one might be interested in optimizing the conversion with respect to various types of costs such as access, network bandwidth, disk IO, CPU, etc., or a combination of these costs. The general formulation of code conversions above provides a powerful framework to theoretically reason about convertible codes. In what follows, we will focus on a specific regime and a specific cost model.

IV Lower bounds on access cost of code conversion

The focus of this section is on deriving lower bounds on the access cost of code conversion. We consider one of the fundamental regimes of convertible codes, that corresponds to merging several initial stripes of a code into a single, longer final stripe. Specifically, the convertible codes in this regime have $k^{F}=\lambda k^{I}$ , where $\lambda\geq 2$ is the number of initial stripes merged, with arbitrary values of $n^{I}$ and $n^{F}$ . We call this regime as merge regime. We additionally require that both the initial and final code are linear and MDS. Since linear MDS codes are widely used in storage systems and are well understood in the Coding Theory literature, they constitute a good starting point.

We focus on the access cost of code conversion, that is, the number of blocks that are affected by the conversion. The access cost of conversion measures the total number of blocks accessed during conversion. Each new block needs to be written, and hence requires accessing a node. Similarly, each block from the initial stripes that is read, requires accessing a node. Therefore, minimizing access cost amounts to minimizing the sum of the number of new blocks written and the number of blocks read from the initial stripes.222Readers who are familiar with the literature on regenerating codes might observe that convertible codes optimizing for the access cost are “scalar” codes as opposed to being “vector” codes. Keeping this number small makes code conversion less disruptive and allows the unaffected nodes to remain available for application-specific purposes throughout the procedure, for example, to serve client requests in a storage system. Furthermore, reducing the number of accesses also reduces disk IO, network bandwidth and CPU consumed.

In Section V, we will show that the lower bounds on the access cost derived in this section are in fact achievable. Therefore, we refer to MDS convertible codes in the merge regime that achieve these lower bounds as access-optimal.

Definition 3 (Access-optimal).

A linear MDS $({n^{I}},{k^{I}};{n^{F}},{k^{F}=\lambda k^{I}})$ convertible code is said to be access-optimal if and only if it attains the minimum access cost over all linear MDS $({n^{I}},{k^{I}};{n^{F}},{k^{F}=\lambda k^{I}})$ convertible codes.

We first start with a description of the notation in Section IV-A and then derive lower bounds on the access cost in Section IV-B.

IV-A Notation

Let $\mathcal{C}^{I}$ be an $[n^{I},k^{I}]$ MDS code over field $\mathbb{F}_{q}$ , specified by generator matrix $\mathbf{G}^{I}$ , with columns (that is, encoding vectors) $\{\mathbf{g}^{I}_{1},\ldots,\mathbf{g}^{I}_{n^{I}}\}\subseteq\mathbb{F}_{q}^{k^{I}}$ . Let $\lambda\geq 2$ be an integer, and let $\mathcal{C}^{F}$ be an $[n^{F},k^{F}=\lambda k^{I}]$ MDS code over field $\mathbb{F}_{q}$ , specified by generator matrix $\mathbf{G}^{F}$ , with columns (that is, encoding vectors) $\{\mathbf{g}^{F}_{1},\ldots,\mathbf{g}^{F}_{n^{F}}\}\subseteq\mathbb{F}_{q}^{k^{F}}$ . Let $r^{I}=n^{I}-k^{I}$ and $r^{F}=n^{F}-k^{F}$ . When $\mathcal{C}^{I}$ and $\mathcal{C}^{F}$ are systematic, $r^{I}$ and $r^{F}$ correspond to the initial number of parities and final number of parities, respectively. All vectors are assumed to be column vectors. We will use the notation $\mathbf{v}[l]$ to denote the $l$ -th coordinate of a vector $\mathbf{v}$ .

We will represent all the code symbols in the initial stripes as being generated by a single $\lambda k^{I}\times\lambda n^{I}$ matrix $\tilde{\mathbf{G}}^{I}$ , with encoding vectors $\{\mathbf{\tilde{g}}^{I}_{{i},{j}}\mid i\in[\lambda],j\in[n^{I}]\}\subseteq\mathbb{F}_{q}^{k^{F}}$ . This representation can be viewed as embedding the column vectors of the generator matrix $\mathbf{G}^{I}$ in an $\lambda k^{I}$ -dimensional space, where the index set $\mathcal{K}_{i}=\{(i-1)k^{I}+1,\ldots,ik^{I}\},i\in[\lambda]$ corresponds to the encoding vectors for initial stripe $i$ . Let $\mathbf{\tilde{g}}^{I}_{{i},{j}}$ denote the $j$ -th encoding vector in the initial stripe $i$ in this (embedded) representation. Thus, $\mathbf{\tilde{g}}^{I}_{{i},{j}}[l]=\mathbf{g}^{I}_{j}[l-(i-1)k^{I}]$ for $l\in\mathcal{K}_{i}$ , and $\mathbf{\tilde{g}}^{I}_{{i},{j}}[l]=0$ otherwise. As an example, Figure 2 shows the values of the defined terms for the single parity-check code from Figure 1 with $n^{I}=3,k^{I}=2,n^{F}=5,k^{F}=4$ .

At times, focus will be only on the coordinates of an encoding vector of a certain initial stripe $i$ . For this purpose, define $\operatorname{proj}_{\mathcal{K}_{i}}(\mathbf{v})\in\mathbb{F}_{q}^{k^{I}}$ to be the projection of $\mathbf{v}\in\mathbb{F}_{q}^{k^{F}}$ to the coordinates in an index set $\mathcal{K}_{i}$ , and for a set $\mathcal{V}$ of vectors, $\operatorname{proj}_{\mathcal{K}_{i}}(\mathcal{V})=\{\operatorname{proj}_{\mathcal{K}_{i}}(\mathbf{v})\mid\mathbf{v}\in\mathcal{V}\}$ . For example, $\operatorname{proj}_{\mathcal{K}_{i}}(\mathbf{\tilde{g}}^{I}_{{i},{j}})=\mathbf{g}^{I}_{j}$ for all $i\in[\lambda]$ and $j\in[n^{I}]$ .

The following sets of vectors are defined: the encoding vectors from initial stripe $i$ , $\mathcal{S}^{I}_{i}=\{\mathbf{\tilde{g}}^{I}_{{i},{j}}\mid j\in[n^{I}]\}$ , all the encoding vectors from all the initial stripes, $\mathcal{S}^{I}=\cup_{i\in[\lambda]}\mathcal{S}^{I}_{i}$ , and all the encoding vectors from the final stripe $\mathcal{S}^{F}=\{\mathbf{g}^{F}_{j}\mid j\in[n^{F}]\}$ .

We use the term unchanged blocks to refer to blocks from the initial stripes that remain as is (that is, unchanged) in the final stripe. The blocks in the final stripe that were not present in the initial stripes are called new, and the blocks from the initial stripes that do not carry over to the final stripe are called retired. For example, in Figure 1, all the data blocks are unchanged blocks (unshaded boxes), the single parity block of the final stripe is a new block, and the two parity blocks from the initial stripes are retired blocks. Each unchanged block corresponds to a pair of identical initial and final encoding vectors, that is, a tuple of indices $(i,j,l)$ such that $\mathbf{\tilde{g}}^{I}_{{i},{j}}=\mathbf{g}^{F}_{l}$ . For instance, the example in Figure 1 has four unchanged blocks, corresponding to the identical encoding vectors $\mathbf{\tilde{g}}^{I}_{{i},{j}}=\mathbf{g}^{F}_{2(i-1)+j}$ for $i,j\in[2]$ . The final encoding vectors $\mathcal{S}^{F}$ can thus be partitioned into the following sets: unchanged encoding vectors from initial stripe $i$ , $\mathcal{U}_{i}=\mathcal{S}^{F}\cap\mathcal{S}^{I}_{i}$ for all $i\in[\lambda]$ , and new encoding vectors $\mathcal{N}=\mathcal{S}^{F}\setminus\mathcal{S}^{I}$ .

From the point of view of conversion cost, unchanged blocks are ideal, because they require no extra work. On the other hand, constructing new blocks require accessing blocks from the initial stripes. When a block from the initial stripes is accessed, all of its contents are downloaded to a central location, where they are available for the construction of all new blocks. For example, in Figure 1, one block from each initial stripe is accessed during conversion.

During conversion, new blocks are constructed by reading blocks from the initial stripes. That is, every new encoding vector is simply a linear combination of a specific subset of $\mathcal{S}^{I}$ . Define the read access set for an MDS $({n^{I}},{k^{I}};{n^{F}},{k^{F}=\lambda k^{I}})$ convertible code as the set of tuples $\mathcal{D}\in[\lambda]\times[n^{I}]$ such that the set of new encoding vectors $\mathcal{N}$ is contained in the span of the set $\{\mathbf{\tilde{g}}^{I}_{{i},{j}}\mid(i,j)\in\mathcal{D}\}$ . Furthermore, define the index sets $\mathcal{D}_{i}=\{j\mid(i,j)\in\mathcal{D}\}$ , $\forall i\in[\lambda]$ which denote the encoding vectors accessed from each initial stripe.

IV-B Lower bounds on the access cost of code conversion

In this subsection, we present lower bounds on the access cost of linear MDS convertible codes in the merge regime. This is done in four steps:

We show that in the merge regime, all possible pairs of partitions $\mathbf{P}^{I}$ and $\mathbf{P}^{F}$ partitions are equivalent up to relabeling, and hence do not need to be specified. 2. 2.

An upper bound on the maximum number of unchanged blocks is proved. We call convertible codes that meet this bound as “stable”. 3. 3.

Lower bounds on the access cost of linear MDS convertible codes are proved, under the added restriction that the convertible codes are stable. 4. 4.

The stability restriction is removed, by showing that non-stable linear MDS convertible codes necessarily incur higher access cost, and hence it suffices to consider only stable MDS convertible codes.

We now start with the first step. In the general regime, partition functions need to be specified since they indicate how message symbols from the initial stripes are mapped into the final stripes. In the merge regime, however, there is only one final stripe, and hence the choice of the partition functions does not matter.

Proposition 1.

For every $({n^{I}},{k^{I}};{n^{F}},{k^{F}=\lambda k^{I}})$ convertible code, all possible pairs of initial and final partitions $(\mathcal{P}_{I},\mathcal{P}_{F})$ are equivalent up to relabeling.

Proof.

Given that $M=\operatorname{lcm}(k^{I},\lambda k^{I})=\lambda k^{I}$ , there is only one possible final partition $\mathcal{P}_{F}=\{[\lambda k^{I}]\}$ . Thus, regardless of $\mathcal{P}_{I}$ , all data in the initial stripes will get mapped to the same final stripe. By relabeling blocks, any two initial partitions can be made equivalent. ∎

Thus, the analysis of convertible codes in the merge regime in this regime can be simplified by noting that the choice of partitions $\mathbf{P}^{I}$ and $\mathbf{P}^{F}$ is inconsequential.

Since one of the terms in access cost is the number of new blocks, a natural way to reduce access cost is to maximize the number of unchanged blocks. However, there is a limit on the number of blocks that can remain unchanged.

Proposition 2.

In an MDS $({n^{I}},{k^{I}};{n^{F}},{k^{F}=\lambda k^{I}})$ convertible code, there can be at most $k^{I}$ unchanged vectors from each initial stripe. Thus, there can be at most $\lambda k^{I}$ unchanged vectors in total, or in other words, there will be at least $r^{F}$ new vectors.

Proof.

Every subset $\mathcal{V}\subseteq\mathcal{S}^{I}_{i}$ of size at least $k^{I}+1$ is linearly dependent, and thus if $\mathcal{V}\subseteq\mathcal{S}^{F}$ then $\mathcal{C}^{F}$ cannot be MDS. Hence, for each stripe $i\in[\lambda]$ , the amount of unchanged vectors $|\mathcal{U}_{i}|$ is at most $k^{I}$ . ∎

Since new blocks are constructed using only the contents of blocks read, it is clear that both the quantities that make up access cost are going to be related. Intuitively, more new blocks means that more blocks need to be read, resulting in higher access cost. With this intuition in mind, we will first focus on the case where the number of new blocks is the minimum: $|\mathcal{N}|=n^{F}-\lambda k^{I}=n^{F}-k^{F}=r^{F}$ . We refer to such codes as stable convertible codes.

Definition 4 (Stability).

An MDS $({n^{I}},{k^{I}};{n^{F}},{k^{F}=\lambda k^{I}})$ convertible code is stable if and only if it has exactly $\lambda k^{I}$ unchanged blocks, or in other words, exactly $r^{F}$ new blocks.

We first prove lower bounds on the access cost of stable linear MDS convertible codes, and then show that the access cost of conversion in MDS codes without this stability property can only be higher.

A natural question now is characterizing the minimum size of the read access set for conversion $\mathcal{D}$ for MDS codes. Clearly, accessing $k^{I}$ blocks from each initial stripe will always suffice, since this is sufficient to decode all the original data. Thus, in a minimum size $\mathcal{D}$ we can upper bound the size of each $\mathcal{D}_{i}$ by $|\mathcal{D}_{i}|\leq k^{I},\;i\in[\lambda]$ .

The first lower bound on the size of $\mathcal{D}_{i}$ will be given by the interaction between $r^{F}$ and the MDS property.

Lemma 3.

For all linear stable MDS $({n^{I}},{k^{I}};{n^{F}},{k^{F}=\lambda k^{I}})$ convertible codes, the read access set $\mathcal{D}_{i}$ from each initial stripe $i\in[\lambda]$ satisfies $|\mathcal{D}_{i}|\geq\min\{k^{I},r^{F}\}$ .

Proof.

By the MDS property, every subset $\mathcal{V}\in\mathcal{S}^{F}$ of size at most $k^{F}=\lambda k^{I}$ is linearly independent. For any initial stripe $i\in[\lambda]$ , consider the set of all unchanged encoding vectors from other stripes, $\cup_{\ell\neq i}\mathcal{S}^{I}_{\ell}$ , and pick any subset of new encoding vectors $\mathcal{W}\subseteq\mathcal{N}$ of size $|\mathcal{W}|=\min\{k^{I},r^{F}\}$ . Consider the subset $\mathcal{V}=(\cup_{\ell\neq i}\mathcal{S}^{I}_{\ell}\cup\mathcal{W})$ : it is true that $\mathcal{V}\subseteq\mathcal{S}^{F}$ and $|\mathcal{V}|=(\lambda-1)k^{I}+\min\{k^{I},r^{F}\}\leq k^{F}$ . Therefore, all the encoding vectors in $\mathcal{V}$ are linearly independent.

Notice that the encoding vectors in $\mathcal{V}\setminus\mathcal{W}$ contain no information about initial stripe $i$ and complete information about every other initial stripe $\ell\neq i$ . Therefore, the information about initial stripe $i$ in each encoding vector in $\mathcal{W}$ has to be linearly independent since, otherwise, $\mathcal{V}$ could not be linearly independent. Formally, it must be the case that $\mathcal{W}_{i}=\operatorname{proj}_{\mathcal{K}_{i}}(\mathcal{W})$ has rank equal to $\min\{k^{I},r^{F}\}$ (recall from Section IV-A that $\mathcal{K}_{i}$ is the set of coordinates belonging to initial stripe $i$ ). However, by definition, the subset $\mathcal{W}_{i}$ must be contained in the span of $\{\mathbf{g}^{I}_{j}\mid j\in\mathcal{D}_{i}\}$ . Therefore, the rank of $\{\mathbf{g}^{I}_{j}\mid j\in\mathcal{D}_{i}\}$ is at least that of $\mathcal{W}_{i}$ , which implies that $|\mathcal{D}_{i}|\geq\min\{k^{I},r^{F}\}$ . ∎

Therefore, in general we need to access at least $r^{F}$ vectors from each initial stripe, unless $r^{F}\geq k^{I}$ , in which case we need to access $k^{I}$ encoding vectors, that is, the full data.

We next show that, in a linear MDS stable convertible code in the merge regime, when the number of new blocks $r^{F}$ is bigger than $r^{I}$ , at least $k^{I}$ blocks need to be accessed from each initial stripe. The intuition behind this result is the following: in an MDS stable convertible code in the merge regime, when the number of new blocks $r^{F}$ is bigger than $r^{I}$ , during a conversion one is forced to read more than $r^{I}$ blocks. Hence there must exist blocks from the initial stripes that are both unchanged and are read during conversion. Since the unchanged blocks that are read are also present in the final stripe, the information read from these blocks is not useful in creating a new block that retains the MDS property for the final code unless $k^{I}$ blocks (that is, full data) are read.

Lemma 4.

For all linear stable MDS $({n^{I}},{k^{I}};{n^{F}},{k^{F}=\lambda k^{I}})$ convertible codes, if $r^{I}<r^{F}$ then the read access set $\mathcal{D}_{i}$ from each initial stripe $i\in[\lambda]$ satisfies $|\mathcal{D}_{i}|\geq k^{I}$ .

Proof.

When $r^{F}\geq k^{I}$ , this lemma is equivalent to Lemma 3, so assume $r^{I}<r^{F}<k^{I}$ . From the proof of Lemma 3, for every initial stripe $i\in[\lambda]$ it holds that $|\mathcal{D}_{i}|\geq r^{F}$ . Since $r^{F}>r^{I}$ , this implies that $\mathcal{D}_{i}$ must contain at least one index of an unchanged encoding vector.

Choose a subset of at most $k^{F}=\lambda k^{I}$ encoding vectors from $\mathcal{S}^{F}$ , which must be linearly independent by the MDS property. In this subset, include all the unchanged encoding vectors from the other initial stripes, $\cup_{l\neq i}\mathcal{S}^{I}_{l}$ . Then, choose all the unchanged encoding vectors from initial stripe $i$ that are accessed during conversion, $\mathcal{W}_{1}=(\{\mathbf{\tilde{g}}^{I}_{{i},{j}}\mid j\in\mathcal{D}_{i}\}\cap\mathcal{U}_{i})$ . For the remaining vectors (if any), choose an arbitrary subset of new encoding vectors, $\mathcal{W}_{2}\subseteq\mathcal{N}$ , such that:

[TABLE]

It is easy to check that the subset $\mathcal{V}=\cup_{l\neq i}\mathcal{S}^{I}_{l}\cup\mathcal{W}_{1}\cup\mathcal{W}_{2}$ is of size at most $k^{F}=\lambda k^{I}$ , and therefore it is linearly independent. This choice of $\mathcal{V}$ follows from the idea that the information contributed by $\mathcal{W}_{1}$ to the new encoding vectors is already present in the unchanged encoding vectors, which will be at odds with the linear independence of $\mathcal{V}$ .

Since the elements of $\mathcal{W}_{1}$ and $\mathcal{W}_{2}$ are the only encoding vectors in $\mathcal{V}$ that contain information from initial stripe $i$ , it must be the case that $\widetilde{\mathcal{W}}=\operatorname{proj}_{\mathcal{K}_{i}}(\mathcal{W}_{1})\cup\operatorname{proj}_{\mathcal{K}_{i}}(\mathcal{W}_{2})$ has rank $|\mathcal{W}_{1}|+|\mathcal{W}_{2}|$ . Moreover, $\widetilde{\mathcal{W}}$ is contained in the span of $\{\mathbf{g}^{I}_{j}\mid j\in\mathcal{D}_{i}\}$ by definition, so it holds that:

[TABLE]

From Equation 1, there are two cases:

Case 1: $k^{I}-|\mathcal{W}_{1}|\leq r^{F}$ . Then $|\mathcal{W}_{2}|=k^{I}-|\mathcal{W}_{1}|$ and by Equation 2 it holds that $|\mathcal{D}_{i}|\geq|\mathcal{W}_{1}|+|\mathcal{W}_{2}|=k^{I}$ .

Case 2: $k^{I}-|\mathcal{W}_{1}|>r^{F}$ . Then $|\mathcal{W}_{2}|=r^{F}$ and by Equation 2 it holds that:

[TABLE]

Notice that there are only $r^{I}$ retired (i.e. not unchanged) encoding vectors in stripe $i$ . Since every accessed encoding vector is either in $\mathcal{W}_{1}$ or is a retired encoding vector, it holds that:

[TABLE]

By combining Equation 3 and Equation 4, we arrive at the contradiction $r^{F}\leq r^{I}$ , which occurs because there are not enough retired blocks in the initial stripe $i$ to ensure that the final code has the MDS property. Therefore, case 1 always holds, and $|\mathcal{D}_{i}|\geq k$ . ∎

Combining the above results leads to the following theorem on the lower bound of read access set size of linear stable MDS convertible codes.

Theorem 5.

Let $d^{*}({n^{I}},{k^{I}};{n^{F}},{k^{F}})$ denote the minimum integer $d$ such that there exists a linear stable MDS $({n^{I}},{k^{I}};{n^{F}},{k^{F}=\lambda k^{I}})$ convertible code with read access set $\mathcal{D}$ of size $|\mathcal{D}|=d$ . For all valid parameters, $d^{*}({n^{I}},{k^{I}};{n^{F}},{k^{F}})\geq\lambda\min\{k^{I},r^{F}\}$ . Furthermore, if $r^{I}<r^{F}$ , then $d^{*}({n^{I}},{k^{I}};{n^{F}},{k^{F}})\geq\lambda k^{I}$ .

Proof.

Follows directly from Lemma 3 and Lemma 4. ∎

So far we have focused on deriving lower bounds on the access cost of conversion for stable MDS convertible codes, which have the maximum number of unchanged blocks. That is, convertible codes that have $\lambda k^{I}$ unchanged blocks and $r^{F}$ new blocks. We next show that this lower bound generally applies even for non-stable convertible codes by proving that increasing the number of new blocks from the minimum possible does not decrease the lower bound on the size of the read access set $\mathcal{D}$ .

Lemma 6.

The lower bounds on the size of the read access set from Theorem 5 hold for all (including non-stable) linear MDS $({n^{I}},{k^{I}};{n^{F}},{k^{F}=\lambda k^{I}})$ convertible codes.

Proof.

We show that, even for non-stable convertible codes, that is, when there are more than $r^{F}$ new blocks, the bounds on the read access set $\mathcal{D}$ from Theorem 5 still hold.

Case 1: $r^{I}\geq r^{F}$ . Let $i\in[\lambda]$ be an arbitrary initial stripe. We lower bound the size of $\mathcal{D}_{i}$ by invoking the MDS property on a subset $\mathcal{V}\subseteq\mathcal{S}^{F}$ of size $|\mathcal{V}|=\lambda k^{I}$ that minimizes the size of the intersection $|\mathcal{V}\cap\mathcal{U}_{i}|$ . There are exactly $r^{F}$ encoding vectors in $\mathcal{S}^{F}\setminus\mathcal{V}$ , so the minimum size of the intersection $|\mathcal{V}\cap\mathcal{U}_{i}|$ is $\max\{|\mathcal{U}_{i}|-r^{F},0\}$ . Clearly, the subset $\operatorname{proj}_{\mathcal{K}_{i}}(\mathcal{V})$ has rank $k^{I}$ due to the MDS property. Therefore, it holds that $|\mathcal{D}_{i}|+\max\{|\mathcal{U}_{i}|-r^{F},0\}\geq k^{I}$ . By reordering, the following is obtained:

[TABLE]

which means that the bound on $\mathcal{D}_{i}$ established in Lemma 3 continues to hold for non-stable codes.

Case 2: $r^{I}<r^{F}$ . Let $i\in[\lambda]$ be an arbitrary initial stripe, let $\mathcal{W}_{1}=(\{\mathbf{\tilde{g}}^{I}_{{i},{j}}\mid j\in\mathcal{D}_{i}\}\cap\mathcal{U}_{i})$ be the unchanged encoding vectors that are accessed during conversion, and let $\mathcal{W}_{2}=\mathcal{U}_{i}\setminus\mathcal{W}_{1}$ be the unchanged encoding vectors that are not accessed during conversion. Consider the subset $\mathcal{V}\subseteq\mathcal{S}^{F}$ of $k^{F}=\lambda k^{I}$ encoding vectors from the final stripe such that $\mathcal{W}_{1}\subseteq\mathcal{V}$ and the size of the intersection $\mathcal{W}_{3}=(S\cap\mathcal{W}_{2})$ is minimized. Since $\mathcal{V}$ may exclude at most $r^{F}$ encoding vectors from the final stripe, it holds that:

[TABLE]

By the MDS property, $\mathcal{V}$ is a linearly independent set of encoding vectors of size $k^{F}$ , and thus, must contain all the information to recover the contents of every initial stripe, and in particular, initial stripe $i$ . Since all the information in $\mathcal{V}$ about stripe $i$ is in either $\mathcal{W}_{3}$ or the accessed encoding vectors, it must hold that:

[TABLE]

From Equation 5, there are two cases:

Subcase 2.1: $|\mathcal{W}_{2}|-r^{F}\leq 0$ . Then $|\mathcal{W}_{3}|=0$ , and by Equation 6 it holds that $|\mathcal{D}_{i}|\geq k^{I}$ , which matches the bound of Lemma 4.

Subcase 2.2: $|\mathcal{W}_{2}|-r^{F}>0$ . Then $|\mathcal{W}_{3}|=|\mathcal{W}_{2}|-r^{F}$ , and by Equation 6 it holds that:

[TABLE]

The initial stripe $i$ has $k^{I}+r^{I}$ blocks. By the principle of inclusion-exclusion we have that:

[TABLE]

By using Equation 7, Equation 8 and the fact that $|\mathcal{W}_{2}|=|\mathcal{U}_{i}|-|\mathcal{W}_{1}|$ , we conclude that $r^{I}\geq r^{F}$ , which is a contradiction and means that subcase 2.1 always holds in this case. ∎

The above result, along with the fact that the lower bound in Theorem 5 is achievable (as will be shown in Section V), implies that all access-optimal linear MDS convertible codes in the merge regime have the minimum possible number of new blocks (which is $r^{F}$ as shown in 2), that is they are stable.

Lemma 7.

All access-optimal MDS $({n^{I}},{k^{I}};{n^{F}},{k^{F}=\lambda k^{I}})$ convertible codes are stable.

Proof.

Lemma 6 shows that the lower bound on the read access set $\mathcal{D}$ for stable linear MDS convertible codes continues to hold in the non-stable case. Furthermore, this bound is achievable by stable linear MDS convertible codes in the merge regime (as will be shown in Section V). The number of new blocks written during conversion under stable MDS convertible codes is $r^{F}$ . On the other hand, the number of new blocks under a non-stable convertible code is strictly greater than $r^{F}$ . Thus, the overall access cost of a non-stable MDS $({n^{I}},{k^{I}};{n^{F}},{k^{F}=\lambda k^{I}})$ convertible code is strictly greater than the access cost of an access-optimal $({n^{I}},{k^{I}};{n^{F}},{k^{F}=\lambda k^{I}})$ convertible code. ∎

Thus, for MDS convertible codes in the merge regime, it suffices to focus only on stable codes. Combining all the results above, leads to the following key result.

Theorem 8.

For all linear MDS $({n^{I}},{k^{I}};{n^{F}},{k^{F}=\lambda k^{I}})$ convertible codes, the access cost of conversion is at least $r^{F}+\lambda\min\{k^{I},r^{F}\}$ . Furthermore, if $r^{I}<r^{F}$ , the access cost of conversion is at least $r^{F}+\lambda k^{I}$ .

Proof.

Follows from Theorem 5, Lemma 6, and the definition of access cost. ∎

In Section V we show that the lower bound of Theorem 8 is achievable for all parameters. Thus, Theorem 8 implies that it is possible to perform conversion of MDS convertible codes in the merge regime with significantly less access cost than the naïve strategy if and only if $r^{F}\leq r^{I}<k^{I}$ . For example, for an MDS $({n^{I}={14}},{k^{I}={10}};{n^{F}={24}},{k^{F}={20}})$ convertible code the naïve strategy has an access cost of $n^{F}=24$ , while the optimal access cost is $(\lambda+1)r^{F}=12$ , which corresponds to savings in access cost of $50\%$ .

V Achievability: Explicit access-optimal convertible codes in the merge regime

In this section, we present an explicit construction of access-optimal MDS convertible codes for all parameters in the merge regime. In Section V-A, we describe the construction of the generator matrices for the initial and final code. Then, in Section V-B, we prove that the code described by this construction has optimal access cost during code conversion.

V-A Explicit construction

Recall that, in the merge regime, $k^{F}=\lambda k^{I}$ , for any integer $\lambda\geq 2$ and arbitrary $n^{I}$ and $n^{F}$ . Also, recall that $r^{I}=n^{I}-k^{I}$ and $r^{F}=n^{F}-k^{F}$ . Notice that when $r^{I}<r^{F}$ , or $k^{I}\leq r^{F}$ , constructing an access-optimal convertible code is trivial. In those cases, one can simply access all the $k^{F}=\lambda k^{I}$ data blocks of the initial stripes, which meets the bound stated in Theorem 5. Thus, assume $r^{F}\leq\min\{r^{I},k^{I}\}$ .

Let $\mathbf{G}^{I},\mathbf{G}^{F}$ be the generator matrices of $\mathcal{C}^{I},\mathcal{C}^{F}$ respectively. Our construction is systematic, that is, both $\mathcal{C}^{I}$ and $\mathcal{C}^{F}$ are systematic MDS codes. Thus $\mathbf{G}^{I},\mathbf{G}^{F}$ are of the form $\mathbf{G}^{I}=[\mathbf{I}|\mathbf{P}^{I}]$ and $\mathbf{G}^{F}=[\mathbf{I}|\mathbf{P}^{F}]$ , where $\mathbf{P}^{I}$ is a $k^{I}\times r^{I}$ matrix and $\mathbf{P}^{F}$ is a $k^{F}\times r^{F}$ matrix. Therefore, to define the initial and final code, only $\mathbf{P}^{I}$ and $\mathbf{P}^{F}$ need to be specified. Let $\mathbb{F}_{q}$ be a finite field of size $q=p^{D}$ , where $p$ is any prime and the degree $D$ depends on the convertible code parameters and will be specified later in this section. Let $\theta$ be a primitive element of $\mathbb{F}_{q}$ .

Define entry $(i,j)$ of $\mathbf{P}^{I}\in\mathbb{F}_{q}^{k^{I}\times r^{I}}$ as $\theta^{(i-1)(j-1)}$ , where $(i,j)$ ranges over $[k^{I}]\times[r^{I}]$ . Entry $(i,j)$ of $\mathbf{P}^{F}\in\mathbb{F}_{q}^{k^{F}\times r^{F}}$ is defined in an identical fashion, as $\theta^{(i-1)(j-1)}$ , where $(i,j)$ ranges over $[k^{F}]\times[r^{I}]$ .

For example, for $k^{I}=3,r^{I}=3,k^{F}=6,r^{F}=3$ , the matrices $\mathbf{P}^{I}$ and $\mathbf{P}^{F}$ would be:

[TABLE]

Our explicit construction is stable (recall from Lemma 7 that all access-optimal MDS convertible codes in the merge regime are stable), that is, it has exactly $k^{F}=\lambda k^{I}$ unchanged encoding vectors. Given that our construction is also systematic it follows that these unchanged encoding vectors correspond exactly to the systematic elements of $\mathcal{C}^{F}$ .

V-B Proof of optimal access cost during conversion

Throughout this section, we use the following notation for submatrices: let $M$ be a $n\times m$ matrix, the submatrix of $M$ defined by row indices $\{i_{1},\ldots,i_{a}\}$ and column indices $\{j_{1},\ldots,j_{b}\}$ is denoted by $M[i_{1},\ldots,i_{a};j_{1},\ldots,j_{b}]$ . For conciseness, we use $*$ to denote all row or column indices, e.g., $M[*;j_{1},\ldots,j_{b}]$ denotes the submatrix composed by columns $\{j_{1},\ldots,j_{b}\}$ , and $M[i_{1},\ldots,i_{a};*]$ denotes the submatrix composed by rows $\{i_{1},\ldots,i_{a}\}$ .

We first recall an important fact about systematic MDS codes.

Proposition 9 ([12]).

Let $\mathcal{C}$ be an $[n,k]$ code with generator matrix $G=[I|P]$ . Then $\mathcal{C}$ is MDS if and only if $P$ is superregular, that is, every square submatrix of $P$ is nonsingular333This definition of superregularity is different from the definition introduced in [58], which is sometimes used in the context of convolutional codes.. ∎

Thus, to be MDS, both $\mathbf{P}^{I}$ and $\mathbf{P}^{F}$ need to be superregular.

From the bound in Lemma 3, to be access-optimal during conversion when $r^{F}\leq k^{I}$ , the columns of $\mathbf{P}^{F}$ (that is, the new encoding vectors) have to be such that they can be constructed by only accessing $r^{F}$ columns of $\mathbf{G}^{I}$ (that is, the initial encoding vectors) during conversion. Thus, it suffices to show that the columns of $\mathbf{P}^{F}$ can be constructed by accessing only $r^{F}$ columns of $\mathbf{P}^{I}$ during conversion. To capture this property, we introduce the following definition.

Definition 5 ( $t$ -column constructible).

We will say that an $n\times m_{1}$ matrix $M_{1}$ is $t$ -column constructible from an $n\times m_{2}$ matrix $M_{2}$ if and only if there exists a subset $S\subseteq\operatorname{cols}(M_{2})$ of size $t$ , such that the $m_{1}$ columns of $M_{1}$ are in the span of $S$ . We say that a $\lambda n\times m_{1}$ matrix $M_{1}$ is $t$ -column block-constructible from an $n\times m_{2}$ matrix $M_{2}$ if and only if for every $i\in[\lambda]$ , the submatrix $M_{1}[(i-1)n+1,\ldots,in;*]$ is $t$ -column constructible from $M_{2}$ .

Theorem 10.

A systematic $({n^{I}},{k^{I}};{n^{F}},{k^{F}=\lambda k^{I}})$ convertible code with $k^{I}\times r^{I}$ initial parity generator matrix $\mathbf{P}^{I}$ and $k^{F}\times r^{F}$ final parity generator matrix $\mathbf{P}^{F}$ is MDS and access-optimal, if the following two conditions hold: (1) if $r^{I}\geq r^{F}$ then $\mathbf{P}^{F}$ is $r^{F}$ -column block-constructible from $\mathbf{P}^{I}$ , and (2) $\mathbf{P}^{I},\mathbf{P}^{F}$ are superregular.

Proof.

Follows from 9 and Definition 5. ∎

Thus, we can reduce the problem of proving the optimality of a systematic MDS convertible code in the merge regime to that of showing that matrices $\mathbf{P}^{I}$ and $\mathbf{P}^{F}$ satisfy the two properties mentioned in Theorem 10.

We first show that the construction specified in Section V-A satisfies condition (1) of Theorem 10.

Lemma 11.

Let $\mathbf{P}^{I},\mathbf{P}^{F}$ be as defined in Section V-A. Then $\mathbf{P}^{F}$ is $r^{F}$ -column block-constructible from $\mathbf{P}^{I}$ .

Proof.

Consider the first $r^{F}$ columns of $\mathbf{P}^{I}$ , which we denote as $\mathbf{P}^{I}_{r^{F}}=\mathbf{P}^{I}[*;1,\ldots,r^{F}]$ . Notice that $\mathbf{P}^{F}$ can be written as the following block matrix:

[TABLE]

where $\operatorname{diag}(a_{1},a_{2},\ldots,a_{n})$ is the $n\times n$ diagonal matrix with $a_{1},\ldots,a_{n}$ as the diagonal elements. From this representation, it is clear that $\mathbf{P}^{F}$ can be constructed from the the first $r^{F}$ columns of $\mathbf{P}^{I}$ . ∎

It only remains to show that the construction specified in Section V-A satisfies condition (2) of Theorem 10, that is, that $\mathbf{P}^{I}$ and $\mathbf{P}^{F}$ are superregular. To do this, we consider the minors of $\mathbf{P}^{I}$ and $\mathbf{P}^{F}$ as polynomials on $\theta$ . We show that, due to the structure of the the matrices $\mathbf{P}^{I}$ and $\mathbf{P}^{F}$ as specified in Section V-A, none of these polynomials can have $\theta$ as a root as long as the field size is sufficiently large. Therefore none of the minors can be zero.

Lemma 12.

Let $\mathbf{P}^{I},\mathbf{P}^{F}$ be as defined in Section V-A. Then $\mathbf{P}^{I}$ and $\mathbf{P}^{F}$ are superregular, for sufficiently large field size.

Proof.

Let $\mathbf{R}$ be a $t\times t$ submatrix of $\mathbf{P}^{I}$ or $\mathbf{P}^{F}$ , determined by the row indices $i_{1}<i_{2}<\cdots<i_{t}$ and the column indices $j_{1}<j_{2}<\cdots<j_{t}$ , and denote entry $(i,j)$ of $\mathbf{R}$ as $\mathbf{R}[i,j]$ . The determinant of $\mathbf{R}$ is defined by the Leibniz formula:

[TABLE]

$\operatorname{Perm}(t)$ is the set of all permutations on $t$ elements, and $\operatorname{sgn}(\sigma)\in\{-1,1\}$ is the sign of the permutation $\sigma\in\operatorname{Perm}(t)$ (the sign of a permutation $\sigma$ depends on the number of inversions in $\sigma$ ). Clearly, $\det(\mathbf{R})$ defines a univariate polynomial $f_{\mathbf{R}}\in\mathbb{F}_{p}[\theta]$ . We will now show that $\deg(f_{\mathbf{R}})=\sum_{l=1}^{t}(i_{l}-1)(j_{l}-1)$ by showing that there is a unique permutation $\sigma^{*}\in\operatorname{Perm}(t)$ for which $E_{\sigma^{*}}$ achieves this value, and that this is the maximum over all permutations in $\operatorname{Perm}(t)$ . This means that $f_{\mathbf{R}}$ has a leading term of degree $E_{\sigma^{*}}$ .

To prove this, we show that any permutation $\sigma\in\operatorname{Perm}(t)\backslash\{\sigma^{*}\}$ can be modified into a permutation $\sigma^{\prime}$ such that $E_{\sigma^{\prime}}>E_{\sigma}$ . Specifically, we show that $\sigma^{*}=\operatorname{\sigma_{\mathrm{id}}}$ , the identity permutation. Consider $\sigma\in\operatorname{Perm}(t)\backslash\{\operatorname{\sigma_{\mathrm{id}}}\}$ : let $a$ be the smallest index such that $\sigma(a)\neq a$ , let $b=\sigma^{-1}(a)$ , and let $c=\sigma(a)$ . Let $\sigma^{\prime}$ be such that $\sigma^{\prime}(a)=a$ , $\sigma^{\prime}(b)=c$ , and $\sigma^{\prime}(d)=\sigma(d)$ for $d\in[t]\backslash\{a,b\}$ . In other words, $\sigma^{\prime}$ is the result of “swapping” the images of $a$ and $b$ in $\sigma$ . Notice that $a<b$ and $a<c$ . Then, we have that:

[TABLE]

The last inequality comes from the fact that $a<b$ implies $i_{a}<i_{b}$ and $a<c$ implies $j_{a}<j_{c}$ . Therefore, $\deg(f_{\mathbf{R}})=\max_{\sigma\in\operatorname{Perm}(t)}E_{\sigma}=E_{\operatorname{\sigma_{\mathrm{id}}}}$ .

Let $E^{*}(\lambda,k^{I},r^{I},r^{F})$ be the maximum degree of $f_{\mathbf{R}}$ over all submatrices $\mathbf{R}$ of $\mathbf{P}^{I}$ or $\mathbf{P}^{F}$ . Then, $E^{*}(\lambda,k^{I},r^{I},r^{F})$ corresponds to the diagonal with the largest elements in $\mathbf{P}^{I}$ or $\mathbf{P}^{F}$ . In $\mathbf{P}^{F}$ this is the diagonal of the square submatrix formed by the bottom $r^{F}$ rows. In $\mathbf{P}^{I}$ it can be either the diagonal of the square submatrix formed by the bottom $r^{I}$ rows, or by the right $k^{I}$ columns. Thus, we have that:

[TABLE]

Let $D=E^{*}(\lambda,k^{I},r^{I},r^{F})+1$ . Then, if $\det(\mathbf{R})=0$ for some submatrix $\mathbf{R}$ , $\theta$ is a root of $f_{\mathbf{R}}$ , which is a contradiction since $\theta$ is a primitive element and the minimal polynomial of $\theta$ over $\mathbb{F}_{q}$ has degree $D>\deg(f_{\mathbf{R}})$ [12]. ∎

This construction is practical only for small values of these parameters since the required field size grows rapidly with the lengths of the initial and final codes. In Section VI we present practical low-field-size constructions.

Combining the above results leads to the following key result on the achievability of the lower bounds on access cost derived in Section IV.

Theorem 13.

The explicit construction provided in Section V-A yields access-optimal linear MDS convertible codes for all parameter values in the merge regime.

Proof.

Follows from Theorem 10, Lemma 11, and Lemma 12. ∎

VI Low field-size constructions based on superregular Hankel arrays

In this section we present alternative constructions for $({n^{I}},{k^{I}};{n^{F}},{k^{F}=\lambda k^{I}})$ convertible codes that require a significantly lower (polynomial) field size than the general construction presented in Section V.

Key idea. The key idea behind our constructions is to take the matrices $\mathbf{P}^{I}$ and $\mathbf{P}^{F}$ as submatrices from a specially constructed triangular array of the following form:

[TABLE]

such that every submatrix of $T_{m}$ is superregular. Here, (1) $b_{1},\ldots,b_{m}$ are (not necessarily distinct) elements from $\mathbb{F}_{q}$ , and (2) $m$ is at most the field size $q$ . The array $T_{m}$ is said to have Hankel form, which means that $T_{m}[i,j]=T_{m}[i-1,j+1]$ , for all $i\in[2,m],\,j\in[m-1]$ . We denote $T_{m}$ a superregular Hankel array. Such an array can be constructed by employing the algorithm proposed in [59] (where the algorithm was employed to construct generalized Cauchy matrices to yield generalized Reed-Solomon codes). We note that the algorithm outlined in [59] takes the field size $q$ as input, and generates $T_{q}$ as the output. It is easy to see that $T_{q}$ thus generated can be truncated to generate the triangular array $T_{m}$ for any $m\leq q$ .

We construct the initial and final codes by taking submatrices $\mathbf{P}^{I}$ and $\mathbf{P}^{F}$ from superregular Hankel arrays (the submatrices have to be contained in the triangle where the array is defined). This guarantees that $\mathbf{P}^{I}$ and $\mathbf{P}^{F}$ are superregular. In addition, we exploit the Hankel form of the array by carefully choosing the submatrices that form $\mathbf{P}^{I}$ and $\mathbf{P}^{F}$ to ensure that $\mathbf{P}^{F}$ is $r^{F}$ -column block-constructible from $\mathbf{P}^{I}$ . Given the way we construct these matrices and the properties of $T_{m}$ , all the initial and final codes presented in this subsection are generalized doubly-extended Reed-Solomon codes [59].

The above idea yields a sequence of constructions with a tradeoff between the field size and the range of $r^{F}$ supported. We first present the two constructions at the extreme ends of this tradeoff, which we call Hankel-I and Hankel-II. Construction Hankel-I, described in Section VI-A, can be applied whenever $r^{F}\leq\lfloor r^{I}/\lambda\rfloor$ , and requires a field size of $q\geq\max\{n^{I}-1,n^{F}-1\}$ . Construction Hankel-II , described in Section VI-B, can be applied whenever $r^{F}\leq r^{I}-\lambda+1$ , and requires a field size of $q\geq k^{I}r^{I}$ . We then discuss the constructions that fall in between these two constructions in the tradeoff between field size and coverage of $r^{F}$ values in Section VI-C. In Section VI-C we also provide a discussion on the ability of these constructions to be optimal even when parameters of the final code are a priori unknown. Throughout this section we will assume that $\lambda\leq r^{I}\leq k^{I}$ . The ideas presented here are still applicable when $r^{I}>k^{I}$ , but the constructions and analysis change in minor ways.

VI-A Hankel-I construction

Hankel-I construction provides an access-optimal linear MDS $({n^{I}},{k^{I}};{n^{F}},{k^{F}=\lambda k^{I}})$ convertible code when $r^{F}\leq\lfloor r^{I}/\lambda\rfloor$ , and requires a field size of $q\geq\max\{n^{I}-1,n^{F}-1\}$ . Notice that this construction has no penalty in terms of field size for access-optimal conversion, since it has the same field size requirement as the maximum between a pair of $[n^{I},k^{I}]$ and $[n^{F},k^{F}]$ Reed-Solomon codes [12]. We start by illustrating the construction with an example.

Example 3.

Consider the parameters $({n^{I}={9}},{k^{I}={5}};{n^{F}={12}},{k^{F}={10}})$ . First, we construct a superregular Hankel array of size $n^{F}-1=11$ , $T_{11}$ , employing the algorithm in [59]. Then choose $\mathbf{P}^{I}$ and $\mathbf{P}^{F}$ from $T_{11}$ as shown in Figure 3. Checking that these matrices are superregular follows from the superregularity of $T_{11}$ . Furthermore, notice that the chosen parity matrices have the following structure:

[TABLE]

From this structure, it is clear that $\mathbf{P}^{F}$ is $2$ -column block-constructible from $\mathbf{P}^{I}$ . The field size required for this construction is $n^{F}-1=11$ .

General construction. Now we describe how to construct $\mathbf{P}^{I},\mathbf{P}^{F}$ for all valid parameters $\lambda,k^{I},r^{I},r^{F}$ , where $r^{F}\leq\lfloor r^{I}/\lambda\rfloor$ . As seen in Example 3, this construction works by splitting the encoding vectors corresponding to the $r^{I}$ initial parities into $\lambda$ groups, which are then combined to obtain the (at most) $\lfloor r^{I}/\lambda\rfloor$ new encoding vectors.

Let $T_{m}$ be as defined in Equation 13, with $m=n^{F}-1$ . Choose $\mathbf{P}^{F}$ to be the $k^{F}\times r^{F}$ submatrix of the top-left elements of $T_{m}$ . Denote the $k^{I}\times((\lambda-1)k^{I}+r^{F})$ submatrix of the top-left elements of $T_{m}$ as Q:

[TABLE]

We choose $\mathbf{P}^{I}$ to be any $k^{I}\times r^{I}$ submatrix of $\mathbf{Q}$ that includes columns $\{l,k^{I}+l,\ldots,(\lambda-1)k^{I}+l\}$ . The Hankel form of array $T_{m}$ implies that $T_{m}[k^{I}(i-1)+j,l]=T_{m}[j,k^{I}(i-1)+l]$ for all $i\in[\lambda],\,j\in[k^{I}]$ . As a consequence, we have that the $l$ -th column of $\mathbf{P}^{F}$ is equal to the vertical concatenation of columns $(l,k^{I}+l,\ldots,(\lambda-1)k^{I}+l)$ of $\mathbf{Q}$ .

Since both $\mathbf{P}^{I}$ and $\mathbf{P}^{F}$ are submatrices of $T_{m}$ , they are superregular. Furthermore, since every column of $\mathbf{P}^{F}$ is the concatenation of $\lambda$ columns of $\mathbf{P}^{I}$ , it is clear that $\mathbf{P}^{F}$ is $r^{F}$ -column block-constructible from $\mathbf{P}^{I}$ . Thus $\mathbf{P}^{I}$ and $\mathbf{P}^{F}$ satisfy both the sufficient properties laid out in Theorem 10, and hence Hankel-I construction is access-optimal during conversion.

(Access-optimal) Conversion process. During conversion, the $k^{I}$ data blocks from each of the $\lambda$ initial stripes remain unchanged, and become the $k^{F}=\lambda k^{I}$ data blocks from the final stripe as detailed below. The $r^{F}$ new (parity) blocks from the final stripe are constructed by accessing blocks from the initial stripes. To construct the $l$ -th new block (corresponding to the $l$ -th column of $\mathbf{P}^{F}$ , $l\in[r^{F}]$ ), read parity block $(i-1)k^{I}+l$ from each initial stripe $i\in[\lambda]$ , and then sum the $\lambda$ blocks read. The encoding vector of the new block will be equal to the sum of the encoding vectors of the blocks read (recall from Section IV-A that the initial encoding vectors are embedded into a $k^{F}$ dimensional space). This is done for every new encoding vector $l\in[r^{F}]$ .

VI-B Hankel-II construction

Hankel-II construction, in contrast to the Hankel-I construction above, can handle a broader range of parameter values, at the cost of a slightly larger field-size requirement. In particular, we present a construction of access-optimal MDS $({n^{I}},{k^{I}};{n^{F}},{k^{F}=\lambda k^{I}})$ convertible code for all $r^{F}\leq r^{I}-\lambda+1$ , requiring a field size of $q\geq k^{I}r^{I}$ . We start with an example illustrating this construction.

Example 4.

Consider parameters $({n^{I}={7}},{k^{I}={4}};{n^{F}={10}},{k^{F}={8}})$ . First, we construct a superregular Hankel array of size $k^{I}r^{I}=12$ , $T_{12}$ , by choosing $q=13$ as the field size, and employing the algorithm in [59]. Then choose $\mathbf{P}^{I}$ and $\mathbf{P}^{F}$ from $T_{12}$ as shown in Figure 4. Both matrices are superregular by the superregularity of $T_{12}$ . Notice that the chosen parity matrices have the following structure:

[TABLE]

It is easy to see that $\mathbf{P}^{F}$ is $2$ -column block-constructible from $\mathbf{P}^{I}$ .

General construction. Now we describe how to construct $\mathbf{P}^{I}$ and $\mathbf{P}^{F}$ for all valid parameters $\lambda,k^{I},r^{I},r^{F}$ such that $r^{F}\leq r^{I}-\lambda+1$ . As seen in Example 4, this construction works by choosing the $r^{I}$ initial parity encoding vectors so that any $\lambda$ consecutive initial parity encoding vectors can be combined into a new encoding vector.

Let $T_{m}$ be as in Equation 13, with $m\geq k^{I}r^{I}$ . We take $\mathbf{P}^{I}$ and $\mathbf{P}^{F}$ as the following submatrices of $T_{m}$ :

[TABLE]

The Hankel form of array $T_{m}$ guarantees that the $i$ -th column of $\mathbf{P}^{F}$ corresponds to the concatenation of columns $(i,i+1,\ldots,i+\lambda-1)$ of $\mathbf{P}^{I}$ . Thus, $\mathbf{P}^{F}$ is $r^{F}$ -column block-constructible from $\mathbf{P}^{I}$ . Furthermore, since $\mathbf{P}^{I}$ and $\mathbf{P}^{F}$ are submatrices of $T_{m}$ , they are superregular.

(Access-optimal) Conversion process. During conversion, the $k^{I}$ data blocks from each of the $\lambda$ initial stripes remain unchanged, and become the $k^{F}=\lambda k^{I}$ data blocks from the final stripe. The $r^{F}$ new (parity) blocks from the final stripe are constructed by accessing blocks from the initial stripes as detailed below. To construct the $l$ -th new block (corresponding to the $l$ -th column of $\mathbf{P}^{F}$ , $l\in[r^{F}]$ ), read parity block $l+i-1$ from each initial stripe $i\in[\lambda]$ , and then sum the $\lambda$ blocks read. The encoding vector of the new block will be equal to the sum of the encoding vectors of the blocks read (recall from Section IV-A that the initial encoding vectors are embedded into a $k^{F}$ dimensional space). This is done for every new encoding vector $l\in[r^{F}]$ .

VI-C Sequence of Hankel-based constructions and Handling a priori unknown parameters

Sequence of Hankel-based constructions. Our idea of Hankel-array-based construction yields a sequence of access-optimal MDS convertible codes with a tradeoff between field size and the range of $r^{F}$ supported. The two constructions presented in Section VI-A and Section VI-B are the two extreme points of this tradeoff.

In particular, our construction can support, for all $s\in\{\lambda,\lambda+1,\ldots,r^{I}\}$ :

[TABLE]

The parameter $s$ corresponds to the number of groups into which the encoding vectors corresponding to the $r^{I}$ initial parities are split. That is, each group of consecutive initial parity encoding vectors has size $\lfloor r^{I}/s\rfloor$ or $\lceil r^{I}/s\rceil$ . The Hankel-I construction corresponds to $s=\lambda$ and Hankel-II corresponds to $s=r^{I}$ .

Handling a priori unknown parameters. So far, we had assumed that the parameters of the final code, $n^{F},k^{F}$ , are known a priori and are fixed. As discussed in Section III, this is useful in developing an understanding of the fundamental limits of code conversion. When realizing code conversion in practice, however, the parameters $n^{F},k^{F}$ might not be known at code construction time (as it depends on the empirically observed failure rates). Thus, it is of interest to be able to convert a code optimally to multiple different parameters. The Hankel-array based constructions presented above indeed provide such a flexibility. Our constructions continue to enable access-optimal conversion for any ${k^{F}}^{\prime}=\lambda^{\prime}k^{I}$ and ${n^{F}}^{\prime}={r^{F}}^{\prime}+{k^{F}}^{\prime}$ with $0\leq{r^{F}}^{\prime}\leq r^{F}$ and $2\leq{\lambda}^{\prime}\leq\lambda$ .

VII Conclusions and Future directions

In this paper, we propose the “code conversion” problem, that models the problem of converting data encoded with an $[n^{I},k^{I}]$ code into data encoded with an $[n^{F},k^{F}]$ code in a resource-efficient manner. The proposed problem is motivated by the practical necessity of reducing the overhead of redundancy adaptation in erasure-coded storage systems. This is a new opportunity beckoning coding theorists to enable large-scale real-world storage systems to adapt their redundancy levels to varying failure rates of storage devices, thereby achieving significant savings in resources and energy consumption. We present the framework of convertible codes for studying code conversions, and fully characterize the fundamental limits for the access cost of conversions for an important regime of convertible codes. Furthermore, we present practical low-field-size constructions for access-optimal convertible codes for a wide range of parameters.

This work leads to a number of challenging an potentially impactful open problems. An important future direction is to go beyond the merge regime considered in this paper and study the fundamental limits on the access cost and construct optimal convertible codes for general parameter regimes. Another important future direction is to analyze the fundamental limits on the overhead of other cluster resources during code conversions, such as network bandwidth, disk IO, and CPU consumption, and construct convertible codes optimizing these resources. Note that while the access-optimal convertible codes, considered in this paper, also reduce the total network bandwidth, disk IO, and CPU overhead during conversion as compared to the default approach, the overhead on these other resources may not be optimal.

Acknowledgements

We thank Michael Rudow for his valuable feedback and helpful comments during the writing of this paper.

Bibliography59

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] D. Ford, F. Labelle, F. Popovici, M. Stokely, V. Truong, L. Barroso, C. Grimes, and S. Quinlan, “Availability in globally distributed storage systems,” in USENIX Symposium on Operating Systems Design and Implementation , 2010.
2[2] K. V. Rashmi, N. B. Shah, D. Gu, H. Kuang, D. Borthakur, and K. Ramchandran, “A solution to the network challenges of data recovery in erasure-coded distributed storage systems: A study on the Facebook warehouse cluster,” in Proceedings of USENIX Hot Storage , Jun. 2013.
3[3] ——, “A Hitchhiker’s guide to fast and efficient data reconstruction in erasure-coded data centers,” in ACM SIGCOMM , 2014.
4[4] M. Sathiamoorthy, M. Asteris, D. Papailiopoulos, A. G. Dimakis, R. Vadali, S. Chen, and D. Borthakur, “XO Ring elephants: Novel erasure codes for big data,” in VLDB Endowment , 2013.
5[5] S. Ghemawat, H. Gobioff, and S. Leung, “The Google file system,” in ACM SIGOPS Operating Systems Review , vol. 37, no. 5. ACM, 2003, pp. 29–43.
6[6] D. Borthakur, R. Schmidt, R. Vadali, S. Chen, and P. Kling, “HDFS RAID - Facebook.” [Online]. Available: http://www.slideshare.net/ydn/hdfs-raid-facebook
7[7] C. Huang, H. Simitci, Y. Xu, A. Ogus, B. Calder, P. Gopalan, J. Li, and S. Yekhanin, “Erasure coding in Windows Azure storage,” in Proceedings of USENIX Annual Technical Conference (ATC) , 2012.
8[8] Apache Software Foundation, “Apache hadoop: HDFS erasure coding,” accessed: 2019-07-23. [Online]. Available: https://hadoop.apache.org/docs/r 3.0.0/hadoop-project-dist/hadoop-hdfs/HDFS Erasure Coding.html

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Convertible Codes: Efficient Conversion of Coded Data in Distributed Storage

Abstract

I Introduction

Example 1**.**

II Related work

III A framework for studying code conversions

Definition 1** (Code conversion).**

Definition 2** (Convertible code).**

Example 2**.**

Remark 1*.*

IV Lower bounds on access cost of code conversion

Definition 3** (Access-optimal).**

IV-A Notation

IV-B Lower bounds on the access cost of code conversion

Proposition 1**.**

Proof.

Proposition 2**.**

Proof.

Definition 4** (Stability).**

Lemma 3**.**

Proof.

Lemma 4**.**

Proof.

Theorem 5**.**

Proof.

Lemma 6**.**

Proof.

Lemma 7**.**

Proof.

Theorem 8**.**

Proof.

V Achievability: Explicit access-optimal convertible codes in the merge regime

V-A Explicit construction

V-B Proof of optimal access cost during conversion

Proposition 9** ([12]).**

Definition 5** (ttt-column constructible**).

Theorem 10**.**

Proof.

Lemma 11**.**

Proof.

Lemma 12**.**

Proof.

Theorem 13**.**

Proof.

VI Low field-size constructions based on superregular Hankel arrays

VI-A Hankel-I construction

Example 3**.**

VI-B Hankel-II construction

Example 4**.**

VI-C Sequence of Hankel-based constructions and Handling a priori unknown parameters

VII Conclusions and Future directions

Acknowledgements

Example 1.

Definition 1 (Code conversion).

Definition 2 (Convertible code).

Example 2.

*Remark 1**.*

Definition 3 (Access-optimal).

Proposition 1.

Proposition 2.

Definition 4 (Stability).

Lemma 3.

Lemma 4.

Theorem 5.

Lemma 6.

Lemma 7.

Theorem 8.

Proposition 9 ([12]).

Definition 5 ( $t$ -column constructible).

Theorem 10.

Lemma 11.

Lemma 12.

Theorem 13.

Example 3.

Example 4.