Cumulus: Blockchain-Enabled Privacy Preserving Data Audit in Cloud
Prabal Banerjee, Nishant Nikam, Subhra Mazumdar, Sushmita Ruj

TL;DR
This paper introduces Cumulus, a blockchain-based system for privacy-preserving data audits in cloud storage that handles malicious and colluding parties, using smart contracts and state channels for practical implementation.
Contribution
It presents a novel blockchain-enabled protocol for secure data auditing that accounts for adversarial behavior and collusion, improving upon existing PoR schemes.
Findings
Prototype implementation on Ethereum demonstrates practical performance.
The protocol effectively handles malicious and colluding parties.
Blockchain smart contracts facilitate dispute resolution and payments.
Abstract
Data owners upload large files to cloud storage servers, but malicious servers may potentially tamper data. To check integrity of remote data, Proof of Retrievability (PoR) schemes were introduced. Existing PoR protocols assume that data owners and third-party auditors are honest and audit only the potentially malicious cloud server to check integrity of stored data. In this paper we consider a system where any party may attempt to cheat others and consider collusion cases. We design a protocol that is secure under such adversarial assumptions and use blockchain smart contracts to act as mediator in case of dispute and payment settlement. We use state channels to reduce blockchain interactions in order to build a practical audit solution. We implement and evaluate a prototype using Ethereum as the blockchain platform and show that our scheme has comparable performance.
| #Queries | Gas Cost | Gas Cost ($) | Block Overhead | Auditing Time | BC Time |
|---|---|---|---|---|---|
| 1 | 110087 | 1394 B | 72.808527ms | 6.220183447s | |
| 3 | 115220 | 1458 B | 65.074692ms | 6.215553521s | |
| 5 | 120486 | 1554 B | 69.638309ms | 7.229788269s | |
| 7 | 125619 | 1618 B | 67.183777ms | 5.818474706s | |
| 10 | 133451 | 1746 B | 69.449573ms | 6.219303512s |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBlockchain Technology Applications and Security · Cryptography and Data Security · Cloud Data Security Solutions
Cumulus: Blockchain-Enabled Privacy Preserving Data Audit in Cloud
Prabal Banerjee
Avail and Indian Statistical InstituteKolkataWest BengalIndia
,
Nishant Nikam
,
Subhra Mazumdar
TU Wien and Christian Doppler Laboratory Blockchain Technologies for the IoTViennaAustria
and
Sushmita Ruj
University of New South WalesSydneyAustralia
Abstract.
Data owners upload large files to cloud storage servers, but malicious servers may potentially tamper data. To check integrity of remote data, Proof of Retrievability (PoR) schemes were introduced. Existing PoR protocols assume that data owners and third-party auditors are honest and audit only the potentially malicious cloud server to check integrity of stored data. In this paper we consider a system where any party may attempt to cheat others and consider collusion cases. We design a protocol, Cumulus 111Cloud storage has no relation to clouds in the sky! But since this type of cloud indicates fair weather, often popping up on bright sunny days, we chose this name to illustrate that our protocol would make the future of cloud storage brighter, that is secure under such adversarial assumptions and use blockchain smart contracts to act as mediator in case of dispute and payment settlement. We use state channels to reduce blockchain interactions in order to build a practical audit solution. The security of the protocol has been proven in Universal Composability (UC) framework. Finally, we illustrate several applications of our basic protocol and evaluate practicality of our approach via a prototype implementation for fairly selling large files over the cryptocurrency Ethereum. We implement and evaluate a prototype using Ethereum as the blockchain platform and show that our scheme has comparable performance.
PoR, Cloud, Storage, Audit, Blockchain, DLT, Privacy
††ccs: Security and privacy Privacy-preserving protocols
1. Introduction
Recent years have seen enormous amount of data generated and people using multiple devices connected to the Internet. To cater to the need for accessing data across devices, cloud storage providers have come up. Similarly, there has been an increase in Anything-as-a-Service(XaaS) which needs data to be uploaded and stored on remote servers. To ensure integrity of uploaded data, Proof-of-Storage algorithms have been proposed. With publicly verifiable Proof-of-Retrievability schemes, data owners can potentially outsource this auditing task to third-party auditors who send challenges to storage servers. The storage servers compute responses for each challenge and gives a response. By validating the challenge-response pair, the auditor ensures that the stored file is intact and retrievable.
The inherent assumption in the existing PoR schemes is that the data owner is always honest. The server may be malicious and try to erase data in an attempt to reduce it’s cost. The PoR schemes offer guarantees that if the server responds correctly to the challenge set, then the file is recoverable with overwhelming probability. The auditor is trusted with checking the integrity of the stored data without gaining access to the data itself, but previous studies showed that the auditor might gain access by carefully selecting the challenge set it sends to the server, thus acting maliciously. Also, data owner may refuse payment to server and auditor in a bid to cut cost and act maliciously.
Blockchain, a distributed tamper-resistant ledger, has seen several use cases lately. With smart contract support, arbitrary logic can be enforced in a distributed manner, even in the presence of some malicious players. These abilities are used to make blockchain a trusted party to resolve disputes. Several works have used blockchain as a judge to settle disputes among involved parties (Bentov and Kumaresan, 2014; Kumaresan and Bentov, 2014; Dziembowski et al., 2018). Most major blockchain platforms have an inherent currency which is used to make meaningful monetary contracts among participants. Our objective in this paper is to address the problems discussed above, without having any trust assumptions. The proposed solution must be either efficient or comparable to the state-of-the-art.
1.1. Our Contribution
In this paper we aim to propose a blockchain based proof-of-storage model and prove its security even if data owner is corrupted. We study the collusion cases and argue that our system is secure unless the data owner, auditor and cloud server together collude and act maliciously. We implement a prototype using a modified version of Ethereum. We use the inherent currency of the blockchain platform to model and enforce our incentive structure. To make minimum blockchain interactions, we use state channels and perform off-chain transactions. Our experiments show that our improved system has comparable performance. We summarize the contribution as follows:
- •
We propose a blockchain based data audit model, Cumulus, where the auditor can be a third party.
- •
We design state channel based audit protocol to minimize blockchain commits.
- •
We use blockchain-based payment to incentivize players in the system.
- •
We prove the security of our protocol in the UC framework and show that the scheme is resilient to any form of collusion between participating entities.
- •
We implement a prototype on modified Ethereum and show comparable performance with overhead of around 6 seconds per audit phase.
2. Preliminaries and Background
2.1. Notations
We take to be the security parameter. An algorithm is a probabilistic polynomial-time algorithm when its running time is polynomial in and its output is a random variable which depends on the internal coin tosses of . A function is called negligible in if for all positive integers and for all sufficiently large , we have . An element chosen uniformly at random from set is denoted as . We use a secure digital signature algorithm (Gen,Sign,SigVerify), where Gen() is the key generation algorithm, Sign() is the signing algorithm and SigVerify() is the signature verification algorithm. We use a collision-resistant cryptographic hash function .
2.2. Bilinear Pairings
Definition: Let be two additive cyclic groups of prime order , and another cyclic group of order written multiplicatively. A pairing is a map , which satisfies the following properties:
- •
Bilinearity: \forall a,b\in\mathbb{Z}_{p}^{*},\ \forall P\in G_{1},\forall Q\in G_{2}:\
[TABLE]
where with group operation of multiplication modulo .
- •
Non-Degeneracy: where and are generators of and respectively and is the identity element in .
- •
Computability: There exists an efficient algorithm to compute .
A pairing is called symmetric if . When we use symmetric bilinear pairings, we refer to a map of the form with group ś support being . (Galbraith et al., 2008)
2.3. Proofs of Retrievability
PoR schemes are used to guarantee a client that her uploaded data stored with the server is not tampered. It was first introduced by Juels and Kaliski (Juels and Kaliski Jr, 2007). In the setup phase, the client encodes file with erasure codes to get preprocessed file , where each block of file is an element in . It computes authenticator tags for each block of and uploads to server along with the authenticators. During audit phase, the client sends random challenges to the server which acts as the prover and responds with a proof. The client verifies the proof and the server passes the audit if the verification goes through.
The correctness of the PoR algorithm ensures that an honest server always passes an audit, i.e., the challenge-response pair verification outputs . The soundness property ensures that can be retrieved from a server which passes the audits with non-negligible probability.
There are mainly two types of PoR schemes: privately verifiable and publicly verifiable. In privately verifiable schemes, the client herself audits the server, or the auditor knows secret about the data. In publicly verifiable PoR schemes, any third party auditor can generate challenges and verify responses by knowing public parameters of the client. A PoR scheme is called privacy preserving if the responses to a challenge does not reveal any knowledge about the data.
2.4. File Processing and Query Generation
File: A file is broken into chunks, where each chunk is one element of . Let the file be bits long. We refer to each file chunk as where and . We use chunk and block interchangeably to refer to each file chunk.
Query: be an -element set, where is a system parameter, and . The verifier chooses an -element subset of , uniformly at random. For each , .
2.5. Shacham Waters Public Verifiability Scheme
We use the Shacham and Waters Compact Proofs of Retrievability scheme with public verifiability (Shacham and Waters, 2008), which uses symmetric bilinear pairings. Let a user have a key pair where and . Let be a generator. For file block , authentication tag . The prover receives the query and sends back and . The verification equation is :
[TABLE]
The scheme has public verifiability because to generate authentication tags the private key is required. On the other hand, for the proof-of-retrievability protocol, public key is sufficient. In this paper, we refer to this scheme as Shacham-Waters.
2.6. Privacy Preserving Public Auditing for Secure Cloud Storage
To achieve strong privacy guarantees, we use Privacy Preserving Public Auditing for Secure Cloud Storage scheme by Wang et al. (Wang et al., 2013). Let and be hash functions and be a generator of . Let a user have a key pair where and such that and . For file block with identifier , authentication tag , where . The prover receives the query . It selects a random element and calculates along with . Finally the prover sends back where and . The verification equation is :
[TABLE]
The scheme is also publicly verifiable like Shacham-Waters protocol. Additionally, it is privacy preserving. In this paper, we refer to this scheme as PPSCS.
2.7. Blockchain
Blockchain is a tamper-resistant, append-only distributed ledger. Apart from acting as a non-repudiable log, blockchain can host distributed applications and perform arbitrary functions in the form of smart contracts. First introduced in Bitcoin (Satoshi and Nakamoto, 2008), it is a hash-linked chain of blocks, each block potentially containing multiple transactions. Participating nodes broadcast ledger updates in form of transactions. While there are various flavours of blockchain systems present based on the type of consensus protocol they use, we use a Proof-of-Work(PoW) based blockchain system. In a blockchain platform following PoW consensus protocol, special nodes, called miners, form blocks containing multiple transactions. The miners compete and solve a hash based challenge and the winner gets to propose the next block along with getting a mining reward.
Ethereum (Wood, 2014) is one of the most popular blockchain platforms supporting Turing-complete languages to write smart contracts. Ether is the cryptocurrency of the Ethereum platform and it is used to incentivize computations on the platform. The contracts are executed inside Ethereum Virtual Machine(EVM) which is uniform across all nodes, so as to have same output across the network. The amount of work done in terms of the number of operations done is calculated in terms of gas. A user submits transactions along with ethers to compensate for the work done by miners, according to the gas price. This acts as a transaction fee for the miners, and at the same time prevents running bad code like infinite loops which might harm the miners. Ethereum has a set of pre-compiled contracts which are codes running inside the host machine and not inside the EVM. Hence, the pre-compiled contracts cost less gas. Ethereum is open-source with an active community and has seen large scale adoption.
While active research is being performed to lower consensus time in public blockchain systems, traditional PoW chains need considerable time before a transaction reaches finality. Hence, it becomes hard to implement multi-commit protocols which are practical. For example, in Ethereum, the average block generation time is between 10-19 seconds. This delay might not be suitable for high frequency applications. On top of that, for each transaction to get mined, the user needs to incur additional cost in terms of transaction fee or gas costs. These problems make protocols like audit unsuitable as an auditor and server need to interact multiple times to exchange challenges and responses. A typical technique used to bypass these problems is performing off-chain transactions by opening state channels between pairs of users. The participants of a state channel exchange signed messages and perform on-chain transactions only when either they are finished with their interaction or some dispute arises. The blockchain either saves the final states of the participants or resolves disputes, whichever applicable. This reduces time and cost for the users.
3. Related Work
On Cloud Storage: The idea of auditing cloud storage servers was first introduced by Ateniese et al.(Ateniese et al., 2009), who defined Provable Data Possession (PDP) model. Juels and Kaliski (Juels and Kaliski Jr, 2007) first described a PoR scheme for static data using sentinels. A similar construction was provided by Naor et al. (Naor and Rothblum, 2009) using MAC-based authenticators. A study on various variants of PoR schemes with private verifiability is done by Dodis et al.(Dodis et al., 2009). First fully dynamic provable data possession was given by Erway et al.(Erway et al., 2009). A secure distributed cloud storage scheme called HAIL (high-availability and integrity layer) is proposed by Bowers et al.(Bowers et al., 2009) which attains POR guarantees. Shacham and Waters provided a PoR construction using BLS signatures, which had both public and private verifiability. Wang et al. argued that the auditor can retrieve information about files and hence proposed a privacy-preserving data audit scheme in (Wang et al., 2013). For dynamic data, an ORAM-based audit protocol was given in (Cash et al., 2017). More efficient protocols were given in (Dautrich et al., 2014; Wang et al., 2014; Sengupta and Ruj, 2016; Wang et al., 2017; Sengupta and Ruj, 2017; Fu et al., 2018; Li and Liu, 2020; Rabaninejad et al., 2020). Multiple server-based PoR schemes were formalized in (Paterson et al., 2016). Multi-user based data integrity check with revocable user access was proposed in (Thokchom and Saikia, 2020). An identity-based auditing scheme for medical data was proposed in (Xu et al., 2020a). Nayak and Tripathy (Nayak and Tripathy, 2021) came up with a protocol that guarantees a secure and efficient privacy-preserving provable data possession scheme (SEPDP) for cloud storage and extended it to support multiple data owners, batch auditing, and dynamic data operations. Their scheme fails to guarantee soundness and does not achieve zero-knowledge privacy. Ni et al. (Ni et al., 2022) addressed this issue by proposing an IDentity-based Privacy-Preserving Provable Data Possession scheme (ID-P3DP) based on the RSA assumption for secure cloud storage. It supports data privacy preservation against third-party auditors and multi-user aggregate verification. Yang et al. (Yang et al., 2022) propose an ID-based PDP with compressed cloud storage where cloud storage auditing can be achieved by using only encrypted data blocks in a self-verified way and original data blocks can be reconstructed from the outsourced data.
In OPOR(Armknecht et al., 2014), the authors define a formal framework where the auditing task is outsourced and provide a construction based on Shacham-Waters. Although it talks about handling all possible collusion cases, the scheme is shown to be secure for arbitrary collusion in terms of file recovery and not reliability, i.e., a misbehaving auditor may be able to prove innocence if it colludes with a malicious data owner. Also, in OPOR, the file is assumed to be encrypted and uploaded by the data owner to the server as the scheme does not ensure the privacy of the file. It also fails to provide a payment mechanism or arbitration. For multi-user owned data integrity check, Yuan et al. (Yuan and Yu, 2014) proposed a scheme that handled collusion between parties, but was shown to be flawed in (Zhang et al., 2015). In our paper, we provide an end-to-end system that handles data privacy, payment, and arbitration and is resilient to arbitrary collusion between parties.
Auditing using Blockchain: Wang et al. (Wang et al., 2020) proposed the notion of Non-Interactive Public Provable Data Possession (NI-PPDP) and designed a blockchain-based fair payment smart contract for cloud storage based on this primitive. However, this work still suffers from some limitations that it is not easy to trace the malicious participant and works if the cloud service provider does not collude with the auditor. Yue et al. (Yue et al., 2020) proposed a framework for decentralized Edge-Cloud Storage. Efficient verification is ensured by sampling verification and formulated rational sampling strategies. However, this work does not have any incentive mechanisms, and the computational cost and the communication cost cannot be evaluated. A blockchain-based efficient public integrity auditing scheme to resist misbehaved third-party auditor is proposed in (Li et al., 2021). Here the user is required to check the behaviors of auditors for a longer duration compared with that of the data integrity auditing performed by the auditor. The protocol fails to work in the presence of a fully malicious cloud server. Zhang et al. (Zhang et al., 2021) used the blockchain to record the interactions among users, service providers, and organizers in the data auditing process as evidence. Moreover, the smart contract was employed to detect service disputes. Xie et al. (Xie et al., 2022) use smart contracts to improve the reliability and stability of audit results. However, the authors assume that the data owner is a trusted entity. In (Li et al., 2022a), the authors proposed a privacy-preserving auditable service with traceable timeliness for public cloud Storage. However, it is assumed that the data owner does not collude with the cloud service provider. Additionally, the scheme works if the auditor does not collude with the data owner and the cloud.
On Storage with Blockchain: In recent times, various blockchain-based cloud servers have come up. IPFS(Benet, 2014) introduced a blockchain-based naming and storage system. Several other systems (Wilkinson et al., 2014; Ali et al., 2016; Vorick and Champine, 2014) use the concept of cloud storage in a decentralized fashion in a P2P network. In (Ateniese et al., 2017), the authors make the storage accountable and show how to integrate with Bitcoin. In (Benet and Greco, 2018), the designers use IPFS and cryptocurrency to make a storage-based marketplace. They use Proof of Replication to enforce storage among their peers. SpaceMint(Park et al., 2015) introduced a new cryptocurrency that adapts proof of space, and also proposed a different blockchain format and transaction types. Moran et al.(Moran and Orlov, 2016) introduced Proofs of Space-Time (PoSTs) and implemented a practical protocol for these proofs. An in-depth analysis of Proof of Replication mechanisms was done in (Fisch, 2018). While the use of blockchain as an enforcer and incentive distribution mechanism was tapped in these works, most works did not consider fairness among the services offered by the parties. IntegrityChain, a decentralized storage framework supporting provable data possession (PDP) based on blockchain was proposed in (Li et al., 2020). Fairness in trading is ensured as a party loses coins upon misbehaving and earns if it behaves honestly.
Several works use deduplication technology by performing auditing on one copy of multiple same data, thereby significantly reducing storage overhead. In (Xu et al., 2020b), a client-side data deduplication scheme based on bilinear-pair techniques has been proposed. Blockchain records the behaviors of entities in both data outsourcing and auditing processes, ensuring the credibility of audit results. The paper lacks a discussion on fairness. Tian et al. (Tian et al., 2022) proposed a blockchain-based secure deduplication scheme in decentralized storage. The scheme is based on the double-server storage model to achieve efficient space-saving while protecting data users from losing data under a single point of failure and duplicate-faking attack. Transparent integrity auditing was introduced in (Li et al., 2023) based on the blockchain. The authors have constructed a secure transparent deduplication scheme that supports deduplication over encrypted data. Li et al. (Li et al., 2022b) proposed a secure transparent deduplication scheme based on the blockchain that supports deduplication over encrypted data and enables users to attest the deduplication pattern on the cloud server. Another paper (Song et al., 5555) discusses a scheme that bridges secure deduplication and integrity auditing in encrypted cloud storage. However, all these works consider the cloud server to be semi-trusted or the third-party auditor to be trusted. In (Mishra et al., 2022), a blockchain-based secure decentralized public auditing in decentralized cloud storage has been proposed. The authors have used redactability for blockchain to handle security issues. Additionally, the model uses an efficient deduplication scheme to attain adequate storage savings while preserving the users from data loss due to duplicate faking attacks. Liu et al. (Liu et al., 2023) propose a blockchain-based compact audit-enabled deduplication scheme in decentralized storage. The protocol adopts an aggregatable vector commitment to generate audit tags to overcome the low coupling problem between deduplication and auditing. The drawback of the protocol is that it considers the storage service provider to be honest-but-curious and does not work if the party is malicious.
4. Protocol Overview
In this section we give an overview of the protocol and outline the deliverables we seek out of it. Figure 1 provides an outline of our protocol.
4.1. System Model
The protocol has three entities: data owner, cloud storage server and auditor. We use the blockchain layer as an arbitrator in case of disputes, using the Turing-complete capability of the blockchain platform to codify the actions in case of dispute. The native currency of the platform is used to distribute and control incentives. The immutability of the blockchain helps keep log of audit results and provides a transparent infrastructure without sacrificing on privacy. We assume that the data owner sets up the smart contract and the cloud server and auditor interact only if they agree to the terms of the contract. If using a permissionless blockchain, any entity can run the blockchain network and the smart contract can be deployed on it. In case a private blockchain network is used, we think the storage providers, auditors and data owners will participate in the blockchain. The work done by the participants are incentivized in terms of gas cost and hence there might be entities who participate in the network just to gain that incentive.
4.2. Adversarial Model & Assumptions
We would call a user honest, if she follows the protocol. Otherwise, we would call her malicious. A malicious user can deviate arbitrarily.
An adversary is a polynomial-time algorithm that can make any user malicious at any point of time, subject to some upper bound. Our adversary is dynamic in nature, that is, it can select its target based on current configuration of the system. It can make coordinated attacks, that is, it can control the malicious users and send/receive messages on their behalf. It can, of course, make a malicious user isolated and prescribe arbitrary instructions for her to perform. (Gilad et al., 2017)
However, the adversary cannot break cryptographic primitives like hash functions or signatures, except with negligible probability. It cannot interfere with honest users or their exchanges. Apart from this, we make the following assumptions:
(i) Among the peers in the blockchain, the adversary can only corrupt upto the bound of the underlying consensus protocol. For example, in PoW based blockchains the bound is 49%, and for PoS based blockchains the bound is 33%.
(ii) The three parties apart from the blockchain - owner, server and auditor - cannot be corrupted together. At most two of the three parties can be malicious at any point of time.
(iii) We assume that the adversary will not corrupt without sufficient incentive. We think of the adversary as a rational adversary.
4.3. Security Goals
We define the security goals that must be realized by Cumulus.
- •
Authenticity: The authenticity of storage requires that the cloud server cannot forge a valid proof of storage corresponding to the challenge set without storing the challenged chunks and their respective authentication tags untampered, except with a probability negligible in .
- •
Extractibility: The extractibility property requires the FetchFile() function to be able to recover the original file when interacting with a prover that correctly computes responses for non-negligible fraction of the query space.
- •
Privacy: The privacy of audit requires the auditor not to learn any property of the stored file chunks . The auditor generates queries to receive response. The auditor should not be able to derive , for any , from the response.
- •
Fairness: We notice that the cloud server and auditor offer services in exchange for payment from the data owner. The fairness property would require the following :
- –
If the cloud server stores , , then it receives adequate incentive. If it fails to keep the files intact, it gets penalized.
- –
If the auditor generates queries correctly, verifies responses and submits aggregated response to blockchain, then it receives an appropriate incentive. If not, it gets penalized.
- –
If the data owner gets services from cloud server and auditor as intended, then it has to pay according to the agreement. If she incurs losses due to a malicious party, she will be paid for the damage.
4.4. Protocol Phases
We break our protocol into four different phases. Let us outline the details of each of these phases.
Phase 0: Initialization Phase
- •
KeyGen: Initialized by the data owner, this algorithm generates a random public-private key-pair and public parameters based on the security parameter .
- •
RegisterOwner: A new data owner uses this function to setup her account. She deposits prerequisite money to initialize her account with, which is used for future payments. The identity information, ownerID, will be used to authorize all further transactions by this data owner. She also submits her public key which is used to verify signed messages submitted by the owner.
- •
RegisterServer: An existing data owner uses this function to supply identity information about the publicly known server, serverID, and public key of the server with whom she wants to store her data.
- •
RegisterAuditor: This function is called by the operative data owner to specify the publicly known auditor, auditorID, she wants to assign. If the owner wants the selected auditor to only audit a particular server, the owner may additionally supply that information.
Phase 1: Owner - Server
- •
FileTransfer: The owner divides the file F into blocks. Let . She generates authentication tags, , and calculates hash for . She sends , to the server. She receives from server, where is the signature tag. She checks SigVerify and . Then, she sends Sign to the blockchain.
Phase 2: Server - Auditor
- •
OpenChannel: This function Collects deposit from server and auditor and opens up a state channel between them to interact off-chain. It freezes the owner money to pay necessary parties once channel is closed.
- •
GenQuery: This function generates an audit query for the auditor based on the randomness derived from the last block of the blockchain. The query is then sent to the server for response.
- •
GenResponse: Given a query, this function generates an audit response. The server sends the response to the auditor.
- •
Verify: Given a response, this function verifies whether the response is correct or not. Based on this outcome of verification, we proceed with the next set of challenges.
- •
CloseChannel: This function receives aggregated
challenge-response along with the complaint, if any. It verifies whether the queries were valid and responses pass the audit. In case of complaint, it punishes the guilty party, else, it pays server and auditor as per norms of payment.
Phase 3: Owner - Server
- •
FetchFile: The data owner retrieves the stored file from the server using this function.
For a protocol to fit into our framework, the PoR scheme has to be publicly verifiable and needs to produce short aggregated proofs. Although multiple PoR schemes have our required properties, we chose Shacham-Waters in our first protocol named Audit using Blockchain(AuB). Shacham-Waters have complete security proofs along with practical overhead in terms of implementation. Also, it uses Homomorphic Linear Authenticators (HLA) which helps us have a very concise proof, which can be submitted to the ledger for verification upon closing state channel. The major drawback of Shacham-Waters is that it lacks privacy. Attacks have been shown that reveal parts of data from audit proofs. Hence, we further define a privacy preserving audit scheme using blockchain named Privacy Preserving Audit using Blockchain(PPAuB) using PPSCS which gives privacy guarantees by random masking. We discuss the designs of AuB and PPAuB in this section. Figure 2(b) provides a generic overview of the data flow in our construction.
We assume that the server and auditor are known entities in the system, i.e., their public keys, identities and addresses are known throughout the system. Also, we assume the server and auditor has some coins deposited in the system which can be used to penalize them in case of misbehavior. For simplicity, we assume a single file uploaded by a single owner to server . We assume authorizes auditor as the third-party auditor for performing audits. might as well act as herself and perform the audit protocol. Our security assumptions allow such a case because the protocol is resilient against collusion by owner and auditor.
5. Our Security Model
In this section, we model the security properties of Cumulus in the global Universally Composable (UC) framework (Canetti, 2000) and discuss the security guarantee of our proposed protocol in this framework. We assume the set of parties involved in the protocol is fixed and the public keys of all parties are known. For our protocol, we consider static corruption where a PPT (probabilistic polynomial time) adversary can corrupt any party at the beginning of the protocol. Once a party is corrupted, can read the internal state, as well as all of the incoming and outgoing messages, of that party.
We analyze the security of Cumulus in the real world and simulates its execution in an ideal world. We define an ideal functionality that acts like a trusted party and all the parties interact with the ideal functionality. The ideal functionality is like an abstraction of our proposed protocol where we prove the security properties realized by our protocol. We also define an ideal world adversary, , that attacks the ideal world functionality in the same way like the adversary attacks the protocol execution in the real world. We also define a PPT environment that sends and receives information in both the real and ideal world. Our protocol is UC-secure if the environment can distinguish its interaction with the real world and ideal world with negligible probability. We discuss next the communication model, the global ledger functionality , and the global random oracle before describing the ideal functionality for file integrity audit.
Communication Model
It is assumed that the communication between parties happen in a synchronized fashion, with protocol execution taking place in rounds. All honest parties are assumed to follow an ideal global clock (Badertscher et al., 2017) which keep tracks of the time of each round. Offline communication between honest parties is assumed to occur via ideal functionality , which ensures secure message transmission. Message send by party at round reaches party at . An adversary gets to know when the message is being sent out but doesn’t get to know the content of the message. Messages exchanged between parties and environment or parties and the adversary is assumed to take 0 rounds for transmission.
Global random oracles
We will use a global random oracle (Dziembowski et al., 2018) which is accessible by several instances of the protocol. It is like a log file, where given an input, an output is randomly generated. If the oracle is queried more than once on the same input, it returns the same output without regenerating one.
Global Ledger Functionality
We define a ledger functionality that captures the basic functionality of transferring coins between parties and also locking and unlocking coins from a smart contract (Dziembowski et al., 2019). We assume that has access to the global random oracle . The ledger functionality is used both in the real and ideal world and can be accessed by multiple instances of the protocol, hence it is called the global ideal functionality (Dziembowski et al., 2018).
We define in Fig.3 as stated in (Dziembowski et al., 2018). A list is maintained that keeps track of the coins locked in the contract denoted by contract identifier . The update function maintains the balance of each party in the local list . Any party present in the instantiation of the contract , locks coins in the contract by calling module freeze. If the balance of party in is more than 0, then the coins can be frozen in the contract. Else, an error no balance is returned. Upon calling the module unfreeze, the parties can unlock their locked coins from contract . Upon querying the latest state of , the global random oracle is called with as the input. Basically, this value acts as the which is used by the auditor in generating queries for the contract .
Ideal functionality for file integrity audit
The ideal functionality shows how a file owner shares a file with a cloud server , and later entrusts an auditor with the task of guaranteeing that the server has stored the file without deleting or modifying it and can retrieve it later as well.
has four phases as defined in Fig. 4: in the initialize phase, the file is transferred from to server and, locks coins in the ledger . and are the coins to be paid to and upon successful completion of service. In the channel open phase, the auditor initiates the opening of a payment channel by sending a request to the server. The purpose of the payment channel is to ensure off-chain auditing of files and make a promise of payment if the audit is successful. If the server does not agree to open the channel, then coins locked by are refunded. If both and are willing to open the channel, then they individually lock coins and respectively in the channel where and . If (or ), misbehaves then it will lose (or ) coins and the coins will be used for compensating and (or ).
In the query and response phase, auditor sends a query . checks if is generated from . It retrieves the seed by querying for latest state. If the is an invalid one, is penalized. Else, waits for a response from . if the latter sends abort, then it means that cannot answer the query and hence it loses coins. If the response is correct, then nothing is done. If the response is wrong, then is penalized. If after receiving the response, sends abort, then coins are used for compensating the server and owner. In channel close phase, if sends abort, it means that it had colluded with and trying to send a wrong query-response set and hence, is penalized. If this is not the case, then checks if the number of queries made is at least the . If this holds, receives coins and receives coins, marking the successful completion of file auditing. If the number of queries generated is less than the threshold, then is penalized.
Privacy and Security Properties
We discuss how the ideal functionality guarantees the following privacy and security properties:
- •
Authenticity: checks the response corresponding to query and matches each response with the file chunks it had received from . If the response is wrong, is penalized.
- •
Extractibility: Since a threshold is set which is a significant fraction of total file size, if the auditing is successful, then is assured that with high confidence, it will receive the correct file from . If the number of queries is less than the threshold, is penalized by the ideal functionality.
- •
Privacy: does not come to know about the file content, it only sees whether the auditing was correct or not based on unfreezing of coins from .
- •
Fairness: If and fails to open channel, coins are refunded to . If auditor is malicious then penalizes the former and compensates and using coins. If server is malicious, then penalizes . If both and behave as per the protocol specification, then both are rewarded.
6. Formal Description of the Protocol
The protocol defined in the -hybrid world. We define judge contract functionality before explaining the steps of the protocol. The ideal functionality (Doerner et al., 2018) is used for providing the interface of signing and verifying messages sends by the parties. Since ECDSA signatures are strongly existentially unforgeable (Johnson et al., 2001), we leverage on this property to argue for the security of the Cumulus.
The protocol interacts with an instance of the global ledger and the judge contract defined by ideal functionality . Any off-chain communication between any two honest parties occurs using ideal functionality for secure message transmission, (Canetti, 2000). The protocol is defined in Fig.8. It calls the judge contract defined in Fig.6. We mention the difference in the steps of the protocol for the privacy-preserving version in Fig.10 and the corresponding changes in the judge contract in Fig. 9.
7. Security Analysis
We show that any attack that can be performed on our protocol can also be simulated on , or in other words that our protocol is at least as secure as . To prove this, we design a simulator Sim, that acts like an ideal attacker for the ideal functionality. We show that no PPT environment can distinguish between interacting with the real world and interacting with the ideal world. In the real world, the environment sends instructions to a real attacker and interacts with our protocol. In the ideal world, sends attack instructions to Sim and interacts with .
UC Definition of security. Consider the protocol Cumulus, denoted as with access to the judge contract functionality , the global random oracle , ECDSA-signature functionality , and the global ledger functionality . Let be the ensemble of the outputs of the environment when interacting with the attacker and users running protocol on input , where is the security parameter, and auxiliary input , in the hybrid world having access to the ideal functionalities. The parties interact with each other via in the ideal world in the presence of the ideal attacker . be the ensemble of the outputs of the environment when interacting with the ideal world.
Theorem 7.1.
Global UC Security. Given that is the security parameter and ECDSA signatures are strongly existentially unforgeable, a protocol is said to GUC-realizes an ideal functionality in the -hybrid world if for all computationally bounded adversary attacking , there exist a probabilistic polynomial time (PPT) simulator such that for all PPT environment and for all , , and are computationally indistinguishable.
Proof: To prove that our proposed protocol is GUC-secure, we need to show that any PPT environment can distinguish the execution of the protocol in hybrid world from the ideal world with negligible probability. We construct a simulator that outputs all messages such that it looks like the hybrid world execution to . The first version of Cumulus, defined in Fig.8, is more efficient but has a weaker privacy guarantee, the second version defined in Fig.10 is more privacy-preserving but lacks efficiency. orchestrates the response depending on which version of the protocol it is trying to simulate in the ideal world.
The transcript of the protocol generated in the hybrid world must not be distinguishable even in presence of corrupted parties. We consider total of eight cases, the protocol execution with all honest parties, execution with a malicious owner , execution with a dishonest server , execution with dishonest auditor , three cases for execution with either of two parties in set being dishonest, and the case where all parties are corrupt.
We consider that internally simulates the execution of , suppressing any calls made to ledger and has access to the global random oracle and for signing messages on behalf of each party. We assume that are public parameters.
- •
Simulation without corruptions: The simulator , defined in Fig.11, is just required to match the transcript of all messages of the execution of Cumulus in the hybrid world. Since all three parties are honest, none of the messages exchanged between these parties is leaked to . Since has no access to the file , it generates a random bit string having length same as the size of the file, and performs all the operations on the dummy file. who has no access to the file, cannot distinguish between the output of and output by an auditor upon channel closure.
- •
Simulation when is dishonest: The simulator is defined in Fig.12. The dishonest owner will be able to generate such that it matches with the hash of the chunks had it possessed the correct file with probability . This is the only bad case where aborts but the protocol execution in hybrid world will continue. But since the probability is negligible, the two executions can be distinguished with negligible probability.
- •
Simulation when is dishonest: We define the simulator for a dishonest server in Fig. 13. In the first step, even if the dishonest sever manages to generate tag without actually having the message , it will never pass the check if . Since we consider ECDSA signature to be existentially unforgeable, the malicious sender cannot forge a signature without posessing the correct secret key.
In the auditing phase, given a query , the sender may try to guess the response . Probability of guessing the tuple correctly is , where and . This is a bad event and aborts if this event occurs. Since is a negligible quantity, probability of aborting is negligible. Since we consider the signature scheme is secure, the probability of this bad event occurring is negligible.
- •
Simulation when is dishonest: We define the simulator for a dishonest server in Fig. 14. The bad case can arise in two situation: (i) raises a false complaint that has send a wrong query. However, in that case, needs to forge signature as well and this is possible with negligible probability. (ii) The next bad case arises when submits a channel closure request but without the correct response. This will be detected by the ideal judge contract and probability of the bad event is 0. Also, it is possible that the malicious auditor has not interacted with server and tries to generate response for a given query set. Since we show that probability of guessing a such a tuple is in the previous instance where was dishonest, so the bad event is possible with negligible probability.
- •
Simulation when are dishonest: We define the simulator for a dishonest server in Fig. 15. The only point where both the parties can collude and try to steal coins from an honest is when they falsely raise a complaint about a wrong query. However, ECDSA signature is existentially unforgeable, the attack is possible only with negligible probability.
- •
Simulation when are dishonest: In no way can the two dishonest parties cheat an honest sender who posseses the correct response to each query. We define the simulator in Fig. 16.
- •
Simulation when are dishonest: We define the simulator for a dishonest server in Fig. 17. It is possible that the malicious auditor-server pair tries to generate response for a given query set. Probability of guessing the tuple correctly is , where and . This is a bad event and aborts if this event occurs. Since is a negligible quantity, probability of aborting is negligible.
- •
Simulation when all parties are dishonest: With all three parties behaving dishonestly, the action set is exponential in size and cannot be captured here. This case does not guarantee any fairness and might not even terminate. We therefore skip discussions for this bad case.
8. Implementation and Performance Analysis
In this section, we analyze a realistic cloud setting of blockchain enabled data audit scheme that we have implemented.
8.1. Implementation Setup
We implement AuB and evaluate a prototype using Ethereum as a blockchain platform. Our entire code is approximately lines, consisting of Ethereum smart contracts written in Solidity language, Go-Ethereum modifications written in Golang and other experimentation glue code written in Python and Bash. 222https://bit.ly/2L6W55n and https://bit.ly/2J0z1Ct
We needed to perform Bilinear Pairing checks for symmetric pairing inside Ethereum smart contract, to verify audit responses. In this regard, the original Ethereum platform does not support pairing based operations on symmetric groups natively. Post Byzantium, it had introduced pairing operations on a fixed asymmetric group, in order to support Zero-Knowledge proof verification. It was impractical to port some pairing-based cryptography library into Solidity and hence we modified the Ethereum code to include a new pre-compiled contract which supported pairing-based operations. To be specific, the new pre-compiled contract verified the audit equation given in Eq.1.
We have used the most popular Ethereum implementation, Go Ethereum also known as geth, which is written in Golang. For the mathematical operations, we needed a library which supports arithmetic in , elliptic curve groups with operations and bilinear pairing computation. Hence, we used the Golang wrapper (U, 2022) of the popular PBC Library(Lynn, 2019). We included the PBC library inside the geth code and introduced the pre-compiled contract. We used Type A pairings which are very fast, but elements take a lot of space to represent. Because of the modified Ethereum code, we used a private network for our experimentation.
For the Shacham-Waters audit code implementation, we used their extended definition with file sectors. As discussed in algorithmLABEL:transfer, we split file into blocks . For each block , tag is calculated, where is group whose support is . Calculating tags for the main file causes a significant overhead if we generate tags as above. So, we used the concept of sectors as introduced by Shacham-Waters. Let be a parameter and each block consist of sectors, where . As there is only one tag for one block (contains sectors), tag generation overhead is reduced by if is large enough.
8.2. Evaluation
We deployed our implementation on a private Ethereum network consisting of two nodes. We used a single machine with -core Intel Xeon and GB of RAM running Linux (Manjaro 64-Bit XFCE). The storage server, owner and auditor codes were running alongside the Ethereum nodes. The elliptic curve utilized in our experiment is a supersingular curve, with a base field size bits and the embedding degree . We use different file sizes, starting from KB to MB.
Sector size , is Bytes in our construction which is dependent upon parameters we used for the construction of elliptic curves. We use 1000 sectors per block in our construction which we noticed is optimized value for current setup.
The main objective of our prototype implementation was to observe the overhead in introducing the distributed computing platform. In particular, we wanted to calculate latency from each of the parties perspective, i.e., how much additional time does the owner spend in uploading and downloading files, the auditor spends in challenge-response and the server spends on file and audit management. To observe this, we perform same experiment with and without the blockchain related calls and look into the latencies in each case.
Firstly, we look into the latency faced by the owner during upload of file. In Fig. 18, we note that although for small files the blockchain latency remains considerable, with increasing file sizes, the commit time becomes negligible compared to the file upload time. As practical storage servers store files in order of Gigabytes, the overhead for the owner is negligible. A thing to note is that the same applies for the server as well because the owner latency includes the server signing during file uploads.
The protocol does not demand any additional ledger interactions during proof generation and hence we observed no significant overhead for the servers, as shown in Fig. 19.
An auditor performs audit over a long period of time. For example, an auditor may send one query to the server every hour. It may have to send the aggregated response to the smart contract only at the end of the day. Hence, in Fig. 20 although we observe a dominant overhead of blockchain interaction compared to response verification over 10 queries, we note that the time axis in not an honest representation of practice where the audit will be performed over a considerably long period of time as compared to the commit time.
Overall, in Fig. 21, we see that for the owner and server, our protocol adds minimal overhead. For the auditor, if it is compared against the span of the entire audit process, the additional latency remains negligible, given that the auditor performs the audit over a sufficiently long duration of time.
Table 1 shows the metrics calculated with different query sizes keeping the file size constant at 1 MB. The gas cost in USD is calculated at average gas price of 3 gwei and an exchange rate of 1 ETH = 153 USD. The empty block size in our private network is 540 bytes. This shows that if a single audit takes 1400 bytes, 1 MB data on the blockchain would accommodate roughly 750 audits. We note here that the block size increase is not linear to the number of queries as only the aggregated response is submitted to the blockchain. Each channel session can communicate a large number of audits hence in practice, thousands of such audits can be done with an overhead of few kilobytes.
In comparison to previous work like (Campanelli et al., 2017), not only is our proof generation time lower, our proof size is also smaller as it is aggregated over hundreds of proofs over time. This is because of the off-chain nature of our solution.
9. Discussion
We wanted to use blockchain as the source of randomness for generating query set. As given in (Bonneau et al., 2015), for small amounts of randomness, if the stakes are low enough, the blockchain can be used as a source of randomness. We believe that for our audit purposes, the incentive for parties to collude with miners is low enough. Any other public source of randomness could have been used. External sources of randomness have a separate trust assumption and then we would have needed to consider all the collusion cases with the random source. Our only requirement is that the peers of the blockchain network need to have access to the same source and must access the same random value in order to receive consensus. We referred a single block hash for each contract instance, hence the query set for a channel can be derived at once, after the opening of the channel.
File upload time is very much dependent on number of sectors per block as well as size of sector. Sector size , is determined by the parameters and choice of algorithm used for elliptic curve generation.
We have implemented our pairing check as a new pre-compiled contract. Hence, the gas required by the contract has been estimated by us. In a practical situation, either such a symmetric bilinear pairing support comes baked into Ethereum, in which case the community decides upon the gas cost, or, a private network is setup among interested parties where they themselves decide upon the gas requirement. The asymmetric pairing check pre-compiled contract takes as the gas ( is the number of points on the curve). Upon using similar calculation, our audit check transaction took gas. We have not used this in our performance metric as we think this will depend upon the platform.
The channel closing codes written in our contract is far from ideal. It does not take into account all possible corner cases, but arbitrary complicated code could have been implemented based on the requirements. We have just showed a sample code for the prototype. Also, we note here that in a classical audit scenario, both auditor and server needs to be online during audit phase. For our proposed state channel to work, this is the exact requirement and hence it imposes no additional restrictions.
10. Conclusion
In this paper we introduced a blockchain based privacy preserving audit protocol which is resilient even when any two out of the data owner, storage server and auditor is malicious. We used state channels to minimize blockchain commits thereby improving efficiency. Through smart contracts, we enforced the incentive mechanism in the system. We also build a prototype on modified Ethereum and show that the protocol incurs minimal overhead compared to existing PoR scheme.
In terms of future work, we wish to explore possibilities to enhance efficiency of the protocol by using other elliptic curves. We also aim to adopt an audit protocol without bilinear pairing operations so that it can be readily deployed on blockchain platforms like Ethereum, without modifying the codebase. This would enable us to test on networks beyond a private network, like testnets and main network.
Acknowledgment
This work is partially supported by Cisco University Research Program Fund, CyberGrants ID: #698039 and Silicon Valley Community Foundation. The authors would like to thank Chris Shenefiel and Samir Saklikar for their comments and suggestions. The work is also partially supported by the European Research Council (ERC) under the European Union’s Horizon 2020 research (grant agreement 771527- BROWSEC), by the Austrian Science Fund (FWF) through the projects PROFET (grant agreement P31621), and the project W1255- N23, by the Austrian Research Promotion Agency (FFG) through the COMET K1 SBA and COMET K1 ABC, by the Vienna Business Agency through the project Vienna Cybersecurity and Privacy Research Center (VISP), by the Austrian Federal Ministry for Digital and Economic Affairs, the National Foun- dation for Research, Technology and Development and the Christian Doppler Research Association through the Christian Doppler Laboratory Blockchain Technologies for the Internet of Things (CDL-BOT).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1(1)
- 2Ali et al . (2016) Muneeb Ali, Jude Nelson, Ryan Shea, and Michael J Freedman. 2016. Blockstack : A Global Naming and Storage System Secured by Blockchains. In USENIX Annual Technical Conference . https://doi.org/10.1056/NEJ Mvcm 0706789 · doi ↗
- 3Armknecht et al . (2014) Frederik Armknecht, Jens-Matthias Bohli, Ghassan O Karame, Zongren Liu, and Christian A Reuter. 2014. Outsourced proofs of retrievability. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security . ACM, 831–843.
- 4Ateniese et al . (2017) Giuseppe Ateniese, Michael T. Goodrich, Vassilios Lekakis, Charalampos Papamanthou, Evripidis Paraskevas, and Roberto Tamassia. 2017. Accountable storage. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) . https://doi.org/10.1007/978-3-319-61204-1_31 · doi ↗
- 5Ateniese et al . (2009) Giuseppe Ateniese, Seny Kamara, and Jonathan Katz. 2009. Proofs of Storage from Homomorphic Identification Protocols. In Advances in Cryptology – ASIACRYPT 2009 , Mitsuru Matsui (Ed.). Springer Berlin Heidelberg, Berlin, Heidelberg, 319–333.
- 6Badertscher et al . (2017) Christian Badertscher, Ueli Maurer, Daniel Tschudi, and Vassilis Zikas. 2017. Bitcoin as a transaction ledger: A composable treatment. In Advances in Cryptology–CRYPTO 2017: 37th Annual International Cryptology Conference, Santa Barbara, CA, USA, August 20–24, 2017, Proceedings, Part I 37 . Springer, 324–356.
- 7Benet (2014) Juan Benet. 2014. {IPFS} - Content Addressed, Versioned, {P 2P} File System. Co RR (2014). https://doi.org/10.1109/ICPADS.2007.4447808 ar Xiv:ar Xiv:1407.3561 v 1 · doi ↗
- 8Benet and Greco (2018) Juan Benet and Nicola Greco. 2018. Filecoin: A Decentralized Storage Network. Protocol Labs (2018). https://doi.org/10.1088/1126-6708/2007/08/019 ar Xiv:0611122 [hep-th] · doi ↗
