Secure and Private Cloud Storage Systems with Random Linear Fountain Codes
Mohsen Karimzadeh Kiskani, Hamid Sadjadpour

TL;DR
This paper introduces SAPIR, an information-theoretic framework for secure and private data retrieval in distributed cloud storage using Random Linear Fountain codes, ensuring secrecy and privacy even with server collusion.
Contribution
It presents a novel coding scheme combining RLF codes with a PIR protocol that guarantees asymptotic perfect secrecy and privacy against colluding servers.
Findings
Achieves asymptotic perfect secrecy with at least one uncorrupted server.
Provides a PIR scheme that protects user privacy against multiple colluding servers.
Demonstrates the effectiveness of the approach in distributed storage systems.
Abstract
An information theoretic approach to security and privacy called Secure And Private Information Retrieval (SAPIR) is introduced. SAPIR is applied to distributed data storage systems. In this approach, random combinations of all contents are stored across the network. Our coding approach is based on Random Linear Fountain (RLF) codes. To retrieve a content, a group of servers collaborate with each other to form a Reconstruction Group (RG). SAPIR achieves asymptotic perfect secrecy if at least one of the servers within an RG is not compromised. Further, a Private Information Retrieval (PIR) scheme based on random queries is proposed. The PIR approach ensures the users privately download their desired contents without the servers knowing about the requested contents indices. The proposed scheme is adaptive and can provide privacy against a significant number of colluding servers.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Secure and Private Cloud Storage Systems with Random Linear Fountain Codes
Mohsen Karimzadeh Kiskani*†* and Hamid R. Sadjadpour*†* M. K. Kiskani*†* and H. R. Sadjadpour*†* are with the Department of Electrical Engineering, University of California, Santa Cruz. Email: {mohsen, hamid}@soe.ucsc.edu
Abstract
An information theoretic approach to security and privacy called Secure And Private Information Retrieval (SAPIR) is introduced. SAPIR is applied to distributed data storage systems. In this approach, random combinations of all contents are stored across the network. Our coding approach is based on Random Linear Fountain (RLF) codes. To retrieve a content, a group of servers collaborate with each other to form a Reconstruction Group (RG). SAPIR achieves asymptotic perfect secrecy if at least one of the servers within an RG is not compromised. Further, a Private Information Retrieval (PIR) scheme based on random queries is proposed. The PIR approach ensures the users privately download their desired contents without the servers knowing about the requested contents indices. The proposed scheme is adaptive and can provide privacy against a significant number of colluding servers.
Index Terms:
Cloud Storage, Security, Private Information Retrieval
I Introduction
Cloud networks have become a popular platform for data storage during the past decade. Cloud systems have been used in different applications such as healthcare [1]. Security of the stored data has always been a major concern for many cloud service providers. Many cloud service providers use encryption algorithms to encrypt the data on their servers. Dropbox, for instance, is using Advanced Encryption Standard (AES) to store the contents on its servers111https://www.dropbox.com/en/help/27. Since the encryption algorithms are computationally secure, an adversary may be able to break them with time. For instance, Data Encryption Standard (DES) which was once the official Federal Information Processing Standard (FIPS) in US is not considered secure anymore. An interesting problem in highly sensitive cloud services would then be to design information theoretic secure solutions which are immune to attackers in time.
To achieve perfect information theoretic secrecy using Shannon cipher system [2], the number of keys should be equal to the number of messages. Therefore, to retrieve the contents from the cloud using an information theoretically secure approach in which the contents are directly encoded with a different key, each user needs to store a huge number of keys which is not practical. In this paper, we propose to use the storage capability of the trusted servers to generate the keys by using the contents themselves and achieve asymptotic perfect secrecy. Our proposed technique is based on Random Linear Fountain (RLF) codes [3]. RLF codes have been shown [4, 5, 6] to be very useful in distributed storage systems.
On the other hand, in many distributed storage applications like Peer-to-Peer (P2P) distributed storage systems or distributed storage systems in which some of the servers are under the control of an oppressive government, a user wants to download a content in a way that the servers cannot determine which content is requested by the user. This is widely known as Private Information Retrieval (PIR) problem.
Our next contribution in this paper is a novel technique to address the PIR problem in distributed storage systems. Users use random queries to request data from the servers. These random queries are designed in a way that they can be used to retrieve any desired content while preventing any malicious agent with the knowledge of up to half of the random queries to gain information about the requested content. This is an important feature of the proposed technique that provides privacy in the presence of many colluding servers. Such a feature has not been presented in prior information theoretic PIR approaches [7] for coded storage systems. The proposed Secure And Private Information Retrieval (SAPIR) scheme provides both security and privacy for information retrieval.
The rest of the paper is organized as follows. Section II is dedicated to the related work on PIR and security in distributed storage systems. The assumptions and problem formulation are described in section III. We study the security and PIR aspects of SAPIR in sections IV and V, respectively. The simulation results are provided in section VI and the paper is concluded in section VII.
II Related Works
In this paper, we use Random Linear Fountain (RLF) codes [3] to encode the contents within the servers in the network. Significant capacity improvement can be achieved in wireless ad hoc and cellular networks [4, 5, 6, 8, 9, 10, 11, 12, 13, 14, 15, 16] using RLF codes. The application of fountain codes in distributed storage systems was previously studied in [17]. Similar coding techniques has been previously used in references like [18, 19, 20, 21, 22, 23, 24, 25] to provide quality of service in wireless systems.
The capacity of wireless ad hoc cached networks was studied in [5] and it was shown that RLF codes can achieve perfect secrecy asymptotically without considering the PIR problem. In the current paper, RLF codes are used to simultaneously achieve security and privacy in distributed cloud storage systems.
While MDS codes [26, 27] show good repair capability, these codes are not particularly designed to provide security. Authors in [28] have studied the security of distributed storage systems with MDS codes and [29] has proposed a construction for repairable and secure fountain codes. Reference [29] achieves security by concatenating Gabidulin codes with Repairable Fountain Codes (RFC). Their specific design allows to use Locally Repairable Fountain Codes for secure repair of the lost data. Unlike [29] which has focused on the security of the repair links using concatenated codes, the current paper presents simultaneous security and privacy of the data storage nodes by only using RLF codes. References [30] and [31] have studied the problem of security in the presence of overhearing interference in cooperative communications. Further [32, 33] studied the same problem on multi-tier networks.
The authors in [34] have numerically studied the wiretap network with a simple topology in which there is a relaying node between the transmitter and the receiver. In the current paper, we considered the general network with a cloud infrastructure in which the servers are cooperating to reconstruct the contents.
The idea of PIR was originally introduced in [35] for uncoded databases. Recently, there has been a renewed interest in studying PIR for storage systems utilizing different coding techniques. Reference [36] was among the first references to study the problem of PIR for coded storage systems. They proved that with only one extra bit, PIR can be achieved. However, the solution in [36] requires that the number of servers grows with the data record size. Reference [37] assumed that the number of servers is fixed and established the trade-off between storage and retrieval costs and demonstrated the fundamental limits on the cost of PIR for coded storage systems. The authors in [7] studied the problem of PIR for MDS coded storage systems and introduced a scheme to achieve PIR in MDS coded databases but the security aspect was not addressed in that paper. They have also assumed that the databases are able to store all the contents which may not be a realistic assumption. Unlike prior work [36, 37, 7] which have only studied PIR for coded databases, we are interested in achieving simultaneous security and PIR. Further, as far as we know, this is the first work to study the problem of PIR for a fountain coded-based distributed storage system. The proposed PIR scheme is easily scalable to the cases when up to half of the servers are colluding to obtain information about the content or content index which makes this technique very robust against large number of colluding servers.
III Problem Formulation
The network is composed of servers each capable of storing contents. These servers are denoted by . A total number of contents exist within the network and each content has bits, i.e., .
III-A RLF Coding-Based Storage
The contents are randomly encoded and stored on the servers during the data preloading phase. The encoded file in the storage location of the server for any and will have the form
[TABLE]
where222Throughout the paper, the vectors are denoted in bold characters. denotes the vector of all contents and denotes an random encoding vector of [math]’s and ’s. Each content belongs to the Galois Field , i.e. . Throughout the paper, unless otherwise stated we assume that all the vector and matrix operations are in . The encoded files stored in server are where . Note that where is the random encoding matrix for server .
In RLF all random vectors are chosen independently and uniformly from which results in a random uniform choice of the encoding matrix where each element can be either [math] or with equal probability. Such an encoding matrix may not necessarily be full rank and may contain linearly dependent rows. This will result in redundant use of storage and may jeopardize the security by revealing more information. Hence, we propose a full rank encoding scheme based on RLF codes in which randomly created encoding vectors are discarded if they already exist in the span of the previously selected random encoding vectors. In other words, for each server we select linearly independent vectors to construct a full rank matrix of size for .
The encoding can be performed in a decentralized way. This means that each server can fill up its storage space independently of all the other servers during the data preloading phase. It can be shown [6] that the average minimum number of encoded files required to decode any desired content is very close to the optimal value of .
III-B Reconstruction Groups (RG)
After the data preloading phase, users can reconstruct their desired contents during content delivery phase. A desired file can be written as , where is an all zero vector of size except in the location is equal to 1. To retrieve , the user needs to access enough encoded files on the network servers in order to construct via ’s.
Since codes are constructed in , users need linearly independent encoding vectors to retrieve any of the contents. We assume that servers are divided into many different *RGs *. Servers within each RG collaborate with each other to retrieve any requested content. Therefore, the number of encoded files within a single RG should be at least equal to . The RGs are represented by and the number of servers within their corresponding RGs by where, . It is shown in [6] that the average minimum number of encoded files within each RG to retrieve all the contents is only slightly larger than . Therefore, for each RG where , the minimum value of is only slightly larger than . Notice that if is smaller than , then the servers will not be able to form a full rank matrix to retrieve all desired contents. In the case that storage systems store uncoded contents, we need exactly cache locations for storing files which is very close to our RLF technique and demonstrates that our RLF-coding based approach efficiently utilizes storage space. For large values of , i.e. , each server can become an RG by itself.
III-C Content Retrieval
Each RG stores randomly encoded files. The matrices of the servers in the RG form a full rank matrix . Therefore, any content with index can be retrieved from the servers by solving the linear equation in . Since this matrix is full rank, one possible solution can be given as
[TABLE]
To solve , servers within the RG should send their corresponding encoding matrices to one of the RG servers called that generates and computes from the above equation333Notice that the servers of an RG only need to send this information to once. This could be done even right after the data preloading phase.. If is such a solution, where is a local decoding vector for server , then server sends to server and then transmits to the requesting user. All of the server responses are then aggregated by the user to retrieve as
[TABLE]
However, this solution reveals the identity of the downloaded content to all the servers of the RG. This simple solution cannot be used for PIR but we will show in section IV that perfect secrecy can be achieved with this solution. A solution to preserve the privacy of the users is presented in section V.
IV Security
This section is dedicated to the study of security of our approach. If an adversary is able to wiretap all of the communication links between the RG servers and the user, it can perfectly retrieve using equation (3). We prove that perfect communication secrecy can be achieved when the adversary can wiretap all communication links between servers and user except one. We will prove this for the case when the user directly sends the request to the servers and the servers respond accordingly. Under this scenario, the adversary knows the requested content index but still unable to reduce its equivocation about the requested content.
Consider RG and without loss of generality, assume that an adversary can wiretap all of the links between servers and the user. Further assume that the user wants to directly download the content from these servers by sending the query to all these servers. Such a scenario is much more vulnerable to adversarial attacks compared to a scenario in which the requested base vectors are expanded in terms of random queries in order to guarantee privacy. When the query is received by all the servers,they will collectively solve the linear equation to find the decoding vector . Equation (3) can be rewritten as
[TABLE]
Since we assume that all of the responses from the servers can be wiretapped, we can assume that the first part of the above equation is known while the second part is secret to the adversary. Lets define and The requested content can be written as and since all operations are in , we have
[TABLE]
This is similar to the Shannon cipher system [2] in which an encoding function is mapping a message and a key to a codeword . In our problem , , and can be regarded as the message, key, and codeword respectively. The eavesdropper knows the encoded file but it cannot obtain any information about the message if a unique key with uniform distribution is used for each message.
The following theorem provides the necessary and sufficient condition [38] to obtain perfect secrecy.
Theorem 1**.**
If , a coding scheme achieves perfect secrecy if and only if
- •
For each pair , there exists a unique key such that .
- •
The key is uniformly distributed in .
Proof.
The proof can be found in section 3.1 of [38]. ∎
We will use Theorem 1 to prove that our approach can achieve asymptotic perfect secrecy. To use this theorem, first we prove that for large enough values of , the key is uniformly distributed.
Lemma 1**.**
The asymptotic distribution of bits of coded files on the servers tend to uniform. **
Proof.
The proof is skipped due to page limitations. A similar proof appears in [13]. ∎
This lemma paves the way to prove the following theorem.
Theorem 2**.**
For the proposed full rank encoding scheme if is large but , then the proposed encoded strategy provides asymptotic perfect secrecy against any eavesdropper which is capable of wiretapping all but one of the links from the servers to a user in a RG. **
Proof.
We formulated this problem as a Shannon cipher system assuming that , , and . The condition ensures that a unique vector exists for each requested message. Therefore, since full rank encoding scheme is used, then will be full rank and guarantees that a unique key exists for each requested message . Notice that if the size of the RG is large enough, then the unique choice of the key does not affect the solvability of the linear equation . Therefore, for any pair , a unique key exists such that . Further, we are guaranteed to have .
Notice that the key belongs to the set of all possible bit strings with bits. Lemma 1 proves that each encoded file is uniformly distributed among all -bit strings. Hence each key which is a unique summation of such encoded files is uniformly distributed among the set of all -bit strings. In other words, regardless of the distribution of the bits in files, can be any bit string with equal probability for large values of . Therefore, the conditions in Theorem 1 are met and perfect secrecy is achieved. ∎
Remark 1**.**
In this paper, we have assumed that the decoding vector and the encoding matrix are computed during the data preloading phase securely. Therefore, the eavesdropper cannot decode this information on any of the servers or have any knowledge about the key . **
Remark 2**.**
A naive approach to achieve perfect secrecy using the Shannon cipher system is to choose different keys from the set of uniform -bit strings and store them and use them to encode the files. However, since the file size is very large, this requires a significant amount of storage space to store the keys on the trusted servers which doubles the required storage capacity. The important contribution of our approach is that users do not need to store the keys and yet perfect secrecy can still be achieved with the help of trusted servers. **
V Private Information Retrieval
In PIR, the goal is to provide conditions that when a user downloads the content with index , the content index remains a secret to all of the servers. This is desirable in applications like Peer-to-Peer networks and in situations where some servers may have been compromised by the adversary. To achieve PIR, users send queries to the servers and servers respond to users based on those queries. These queries should be designed in a way that reveal no information to the servers about the requested content index. To formally define the information theoretic PIR, let be a random variable denoting the requested content index and let be a subset of at most queries. We have the following definition.
Definition 1**.**
A PIR scheme is capable of achieving perfect information theoretic PIR against colluding servers if for the set of all queries available to all of these servers and any number of contents we have
[TABLE]
where is the mutual information function. **
V-A Random Query Generation
To achieve PIR, the user chooses a fixed and sets . Then it picks query vectors from uniformly at random and statistically independent of each other. These will be the set of random queries. Therefore, we will have a set of i.i.d. random query vectors. In the following, we will prove that with a probability of at least , these random vectors span the whole -dimensional space of . The properties of random vectors that we have used for our coding technique, had been previously studied in [39].
Theorem 3**.**
Let be a matrix of size whose elements are independent random variables taking the values 0 and 1 with equal probability and let be the rank of the matrix in . Let and be fixed integers, . If and , then
[TABLE]
where the last product equals 1 for . **
Proof.
This is Theorem 3.2.1 in page 126 of [39]. ∎
Corollary 1**.**
For where , if we have
[TABLE]
Proof.
The proof follows for in Theorem 3. ∎
In the following, we will use these results for our proofs.
Definition 2**.**
We define the random variable as the minimum number of random query vectors to span the whole space of . **
Lemma 2**.**
The probability of the event that is zero and for any we have
[TABLE]
Proof.
This is a direct result of Corollary 1. ∎
Lemma 3**.**
The probability of the event that is less than for any . **
Proof.
Let . It is easy to verify from equation (9) that for we have
[TABLE]
Since , from equation (10) we arrive at
[TABLE]
Hence,
[TABLE]
∎
Lemma 4**.**
The probability of the event that is at least and at most for any . i.e.
[TABLE]
Proof.
The upper bound is already proved in equation (11). From Lemma 3 we have,
[TABLE]
∎
Theorem 4**.**
With a probability of at least , the set of random queries where spans the whole -dimensional space of . **
Proof.
From Lemma 4, we have
[TABLE]
This proves the theorem. ∎
Theorem 4 states that the probability of spanning the -dimensional space can arbitrarily go to 1 provided that the number of random vectors increases logarithmically with . For example, to span the -dimensional space with a probability of at least , it is enough to only have random vectors. Using these random query vectors, we can now show that even with a large number of colluding servers no information about the requested content index can be obtained. To prove this result, we need to prove some lemmas.
Let be the matrix of size whose columns are random query vectors. Matrix contains statistically independent random vectors. Let be the event that for a specific vector and a specific base vector , we have .
Lemma 5**.**
For any specific non-zero vector we have
[TABLE]
Proof.
Lets assume vector has ones. If , then vectors from the set of all vectors are added together to create . Lets denote these vectors by . Let denote the element of vector . Since the vectors are independent and their elements are also mutually independent, using binary summations in , we have
[TABLE]
We can easily prove that . To prove this, we can use induction on . This equation is valid for the base case . Assume that it is valid for . We have
[TABLE]
Similarly, it is easy to prove that . Hence, equation (14) can be simplified to . ∎
Lemma 6**.**
The following inequalities hold for ,
[TABLE]
where denotes the binary entropy function, i.e. . **
Proof.
The proof can be found in the appendix of [40]. ∎
We are now ready to prove the following theorem which shows that accessing a significant number of random queries in cannot help in reconstructing any of the base vectors for large .
Theorem 5**.**
Consider the set of statistically independent random uniform query vectors. For large enough values of with probability arbitrarily close to 1, none of the base vectors exist in the span of any subset with cardinality of at most where . **
Proof.
Consider any base vector and a non-zero vector . For this vector, computing in is equivalent to adding a subset of columns of whose set of indices is equal to the set of indices of non-zero elements in . If for some , then any subset which contains the column vectors of whose set of indices is equal to the set of indices of non-zero elements in also spans . In fact, the number of non-zero elements of or Hamming weight of (i.e., Ham()) is equal to the number of vectors that should be added to reconstruct .
Consider all vectors with Hamming weight less than or equal to where . Lemma 5 shows that for any , we have . Therefore, the asymptotic probability of existence of a subset with a cardinality of at most which spans for large values of can be found as
[TABLE]
where inequality (a) comes from the union bound and (b) holds by using Lemma 5 and counting all the vectors with Hamming weight less than and inequality (e) comes from Lemma 6. Notice that (c), (d), (e) and (f) are only valid for cases when . This shows that the probability of existence of any desired base in the span of any subset of vectors with cardinality less than goes to zero as grows if . ∎
Remark 3**.**
In practice the user generates enough number of random vectors to span the whole -dimensional space. Hence, it has a set of total random vectors. Then it chooses a subset of linearly independent vectors from them and use them as its query vectors. This way it is guaranteed that the queries will span the whole space of and any base vector can be represented in terms of these independent query vectors as
[TABLE]
The following lemma shows that the average required number of random vectors to span is very close to so in practice only a few number of random queries more than is needed to span . **
Lemma 7**.**
If is a random vector belonging to with elements having uniform distribution, the average minimum number of vectors to span the whole space of equals
[TABLE]
where asymptotically approaches the Erdős–Borwein constant (). **
Proof.
The proof can be found in [6]. ∎
Remark 4**.**
Since , if any vector does not exist in the span of any subset , of random query vectors, it will not exist in the span of any subset of random query vectors in too. So, Theorem 5 remains valid for this choice of random queries too. This means that in practice, every base vector is guaranteed to exist in the span of the query vectors but none of the base vectors exist in the span of any subset with probability close to one if for . **
V-B Responding to Queries
In this section, we assume that the user has chosen linearly independent random query vectors in and wants to download the content. Since is a set of vectors which spans the whole space of , the user can expand the base vector in terms of the query vectors in as mentioned in (16). Hence, the requested content can be expanded in terms of query vectors as
[TABLE]
where is either 0 or 1. Based on equation (18) the user requests some parts of the desired content from each RG so that none of the RGs can understand any information about the requested content.
To accomplish PIR, the user partitions the set of random queries whose corresponding decoding gains are non-zero into disjoint subsets . The choice of number of subsets (i.e. ) depends on the number of colluding servers. Each subset of queries is then sent to a different RG as depicted in Figure 1. Therefore, the requested content can be retrieved as
[TABLE]
The ultimate goal in PIR is to prevent any colluding group of servers to gain information about the requested content index. Assume that the number of colluding servers is . If any two colluding servers lie within the same RG, they receive the same subset of queries from the user. Therefore, without loss of generality we consider the worst scenario in which all the colluding servers lie within different RGs and all these colluding servers are able to collaboratively obtain all the queries . Based on Theorem 5, if the number of all query vectors in is less than for some , then no information can be achieved about the requested content index. This provides significant PIR capability for this technique.
Notice that since RGs have full rank encoding matrices, they can respond to any query that they receive. Assume that RG with the full rank encoding matrix receives the set of queries . This RG needs to send to the user. It can solve the linear equation
[TABLE]
in Galois Field for as
[TABLE]
Similar to before the server in the RG which has already acquired all the information in matrix , computes the overal query decoding solution which is a vector of size . If this vector is divided into equal size pieces as , then the server sends the portion of to server in the RG . More precisely, server receives a query response vector of size from for each . Then the server sends back to the coordinating server . The coordinating server then aggregates all the data received from multiple servers in the RG to construct as
[TABLE]
The coordinating server in the RG then transmits to the user.
Each RG only transmits one encoded file to the user. However all the servers within an RG need to collaborate with each other prior to responding to the queries sent from the user. Notice that communication between the servers are carried using high bandwidth fiber optic links while transmissions from the servers to the user are performed over low bandwidth links. In our computation of communication cost for achieving PIR, we only consider communication between the servers and the user in the low bandwidth links.
Remark 5**.**
It is worth mentioning that in this approach the coordinations between servers in an RG is necessary because the servers do not have full storage capacity to store all the contents. In fact if we also assume each server has high storage capacity similar to [7], then each server can act as an RG and there will be no communications between servers. **
V-C Trade-off Between Communication Cost and Privacy Level
In order to achieve PIR, each user needs to download more information. This additional bandwidth utilization is referred to as communication Price of Privacy (cPoP) [7] which is defined as follows. Note that the cost of sending queries are ignored because it is assumed that the size of contents are significantly higher than the size of the queries.
Definition 3**.**
The communication Price of Privacy (cPoP) is the ratio of the total number of bits downloaded by the user from the servers to the size of the requested file. **
To explain the trade-off between communication cost and level of privacy, assume that the user divides the queries into equal size groups of queries and sends each group of queries to a different RG. Each RG should respond to at most queries. If RGs collude to gain some information about the requested content index, then they will have access to a total of at most queries. We proved that knowing queries asymptotically gives no information about the requested content index if . Hence, if , then the colluding RGs will get no information about the requested content index. Therefore, if less than half of the RGs collude to gain some information about the requested content index, they cannot gain any information. We can increase to get the maximum possible level of privacy. However, the downside of increasing is that the communication Price of Privacy (cPoP) will also increase.
As discussed earlier, if the queries are sent to RGs then responses from these RGs are required to retrieve a content. Since each RG transmits an encoded file of size bits to the user the total number of bits downloaded by the user will be equal to and therefore the cPoP will be equal to .
V-D Full Size Servers
Assume that the servers have large storage capability such that each RG is only composed of 1 server. Our assumption of full rank encoding scheme guarantees that servers with storage ability of encoded files can be used to retrieve any desired content. In [7], the authors studied the use of MDS codes for PIR. They considered full size storage systems with MDS codes and they considered the case when only one of the databases is compromised. They proposed a PIR technique in which a cPoP of can be achieved in full size databases where is the MDS code rate. To compare our results with [7], notice that if we assume that there is only one malicious server in the cloud, then we can choose any two servers and send half of the queries to each one of them. This way we have a cPoP of 2 which is better than the results in [7] for .
VI Simulation
To numerically verify the results proved in section V, we created linearly independent random query vectors which are used to expand the bases. Figure 2 demonstrates the probability of the event that at least one of the base vectors exists in the span of vectors for and . Consistent with our results in section V, the probability of the event that a base exists in the span of any set of vectors goes quickly to zero.
It is proved [41] that the problem of finding the minimum spanning set of vectors is NP-Complete. It is even proved [42] that this problem is NP-Hard to approximate. Therefore, in general it is NP-Hard to find out if a given base exists in the span of at most vectors out of the vectors. For our simulations we have used a brute force approach to check if a given base exists in the span of at most vectors out of the existing random query vectors where .
VII Conclusions
In this paper, we have studied the problems of security and private information retrieval in distributed storage systems which are using a full rank encoding scheme based on Random Linear Fountain (RLF) codes. We have proposed an approach based on uniform random queries to achieve information theoretic PIR property. We have proved that our proposed technique can asymptotically achieve perfect secrecy for a distributed storage system. Our proposed solution is robust against a significant number of colluding servers in the network. We have also shown that our technique can outperform MDS codes for storage systems in terms of PIR cost for certain regimes.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Mohammad-Parsa Hosseini, Hamid Soltanian-Zadeh, Kost Elisevich, and Dario Pompili. Cloud-based deep learning of big eeg data for epileptic seizure prediction. In Signal and Information Processing (Global SIP), 2016 IEEE Global Conference on , pages 1151–1155. IEEE, 2016.
- 2[2] Claude E Shannon. Communication theory of secrecy systems*. Bell system technical journal , 28(4):656–715, 1949.
- 3[3] David JC Mac Kay. Fountain codes. IEE Proceedings-Communications , 152(6):1062–1068, 2005.
- 4[4] Mohsen Karimzadeh Kiskani and Hamid R. Sadjadpour. Capacity of cellular networks with femtocache. In IEEE Conference on Computer Communications Workshops, INFOCOM Workshops 2016, San Francisco, CA, USA, April 10-14, 2016 , pages 9–14, 2016.
- 5[5] Mohsen Karimzadeh Kiskani and Hamid R. Sadjadpour. Secure coded caching in wireless ad-hoc networks. In International Conference on Computing, Networking and Communications (ICNC) , January 2017.
- 6[6] Mohsen Karimzadeh Kiskani and Hamid R. Sadjadpour. Throughput analysis of decentralized coded content caching in cellular networks. IEEE Trans. Wireless Communications , 16(1):663–672, 2017.
- 7[7] Razan Tajeddine and Salim El Rouayheb. Private information retrieval from MDS coded data in distributed storage systems. ar Xiv preprint ar Xiv:1602.01458 , 2016.
- 8[8] Sajad Hataminia, Saeed Vahidian, Mohammadali Mohammadi, and Mahmoud Ahmadian-Attari. Performance analysis of two-way decode-and-forward relaying in the presence of co-channel interferences. IET Commun. , 8(18):3349–3356, Dec. 2014.
