Location Privacy in Cognitive Radios with Multi-Server Private Information Retrieval
Mohamed Grissa, Attila A. Yavuz, and Bechir Hamdaoui

TL;DR
This paper proposes a multi-server PIR approach to enhance location privacy for both primary and secondary users in spectrum database-based cognitive radio networks, achieving high efficiency and information-theoretic privacy.
Contribution
It introduces the novel use of multi-server PIR in CRNs, leveraging synchronized databases to provide optimal privacy with reduced overhead.
Findings
Multi-server PIR achieves high efficiency in CRNs.
Provides information-theoretic privacy for PUs and SUs.
Validated through analytical and empirical evaluations.
Abstract
Spectrum database-based cognitive radio networks (CRNs) have become the de facto approach for enabling unlicensed secondary users (SUs) to identify spectrum vacancies in channels owned by licensed primary users (PUs). Despite its merits, the use of spectrum databases incurs privacy concerns for both SUs and PUs. Single-server private information retrieval (PIR) has been used as the main tool to address this problem. However, such techniques incur extremely large communication and computation overheads while offering only computational privacy. Besides, some of these PIR protocols have been broken. In this paper, we show that it is possible to achieve high efficiency and (information-theoretic) privacy for both PUs and SUs in database-driven CRN with multi-server PIR. Our key observation is that, by design, database-driven CRNs comprise multiple databases that are required, by theā¦
| Spectrum database | |
| Secondary user | |
| Cognitive radio network | |
| Number of spectrum databases | |
| Matrix modeling the content of | |
| Number of records in | |
| Size of the database in bits | |
| Size of one record of the database in bits | |
| Size of one word of the database in bits | |
| Number of words per block | |
| Index of the record sought by | |
| Privacy level (tolerated number of colluding s) | |
| Number of responding s | |
| Number of byzantine s |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Location Privacy in Cognitive Radios with Multi-Server Private Information Retrieval
Mohamed Grissa, Attila A. Yavuz, and Bechir Hamdaoui
Oregon State University, grissam,[email protected]
University of South Florida, [email protected]
Abstract
Spectrum database-based cognitive radio networks (s) have become the de facto approach for enabling unlicensed secondary users (s) to identify spectrum vacancies in channels owned by licensed primary users (s). Despite its merits, the use of spectrum databases incurs privacy concerns for both s and s. Single-server private information retrieval () has been used as the main tool to address this problem. However, such techniques incur extremely large communication and computation overheads while offering only computational privacy. Besides, some of these Ā protocols have been broken.
In this paper, we show that it is possible to achieve high efficiency and (information-theoretic) privacy for both s and s in database-driven Ā with multi-server . Our key observation is that, by design, database-driven s comprise multiple databases that are required, by the Federal Communications Commission, to synchronize their records. To the best of our knowledge, we are the first to exploit this observation to harness multi-server Ā technology to guarantee an optimal privacy for both s and s, thanks to the unique properties of database-driven . We showed, analytically and empirically with deployments on actual cloud systems, that multi-server Ā is an ideal tool to provide efficient location privacy in database-driven .
Index Terms:
Database-driven cognitive radio networks, location privacy, dynamic spectrum access, private information retrieval.
I Introduction
The rapid growth of connected wireless devices has dramatically increased the demand for wireless spectrum and led to a serious shortage in spectrum resources. Cognitive radio networks (s)Ā [1] have emerged as a promising technology for solving this shortage problem by enabling dynamic spectrum access (DSA), which improves the spectrum utilization efficiency by allowing unlicensed/secondary users (s) to exploit unused spectrum bands (aka spectrum holes or white spaces) of licensed/primary users (s).
Currently, two approaches are being adopted to identify these white spaces: spectrum sensing and geolocation spectrum databases. In the spectrum sensing-based approach, s need to sense the Ā channel to determine whether the channel is available for opportunistic use. The spectrum database-based approach, on the other hand, waives the sensing requirement and instead enables s to query a database () to learn about spectrum opportunities in their vicinity. This approach, already promoted and adopted by the Federal Communications Commission (FCC), was introduced as a way to overcome the technical hurdles faced by the spectrum sensing-based approaches, thereby enhancing the efficiency of spectrum utilization, improving the accuracy of available spectrum identification, and reducing the complexity of terminal devicesĀ [2]. Moreover, it pushes the responsibility and complexity of complying with spectrum policies to Ā and eases the adoption of policy changes by limiting updates to just a handful number of databases, as opposed to updating large numbers of devicesĀ [3].
FCC has designated nine entities (e.g. GoogleĀ [4], iconectivĀ [5], and MicrosoftĀ [6]) as TV bands device database administrators which are required to follow the guidelines provided by PAWS (Protocol to Access White Space) standardĀ [3]. PAWS sets guidelines and operational requirements for both the spectrum database and the s querying it. These include: s need to be equipped with geo-location capabilities, s must query Ā with their specific location to check channel availability before starting their transmissions, Ā must register s and manage their access to the spectrum, Ā must respond to sā queries with the list of available channels in their vicinity along with the appropriate transmission parameters. As specified by PAWS standard, s may be served by several spectrum databases and are required to register to one or more of these databases prior to querying them for spectrum availability. The spectrum databases are reachable via the Internet, and s querying these databases are expected to have some form of Internet connectivity[7].
FCC has established a new service in the 3.5 GHz band, known as Citizens Broadband Radio Service (CBRS), in which the spectrum is also managed through a central database-driven , aka spectrum access system (SAS), to enable spectrum sharing between military and federal incumbents and s. A separate entity with Environmental Sensing Capability (ESC) is responsible of populating s with data regarding s that do not wish to reveal their operational information such as their location or transmission characteristics. A similar concept, named licensed shared access (LSA), for the 2.3-3.4 GHz band is also being developed in Europe to enable s to opportunistically access spectrum resources in this band owned by incumbent military aircraft services and police wireless communications. A major difference compared to SAS, is that in LSA, s are responsible for populating s by providing their a priori information; i.e. their activities and, therefore the spectrum availability information, are known upfrontĀ [8].
I-A Location Privacy Issues in Database-Driven s
Despite their benefits, database-driven s suffer from serious security and privacy threats. Since they could be seen as a variant of of location based service (LBS), the disclosure of location information of s represents the main threat to s when it comes to obtaining spectrum availability from s. The fine-grained location, when combined with publicly available information, can easily reveal other personal information about an individual including his/her behavior, health condition, personal habits or even beliefs. For instance, an adversary can learn some information about the health condition of a user by observing that the user regularly goes to a hospital for example. The frequency and duration of these visits can even reveal the seriousness of a user illness and even the type of illness if the location corresponds to that of a specialty clinic. Matters get worse when s are mobile. As per the PAWS requirements, s need to query s whenever they change their location by at least 100 meters. This will make s constantly share their location as they move which could be exploited by a malicious service provider for tracking purposes.
The location privacy of s is not the only privacy concern that database-driven s suffer from. Indeed, the location privacy of s may also be critical in Ā systems such as , in the 3.5 GHz CBRS band, and LSA, in the 2.3-2.4 GHz band, where s are not commercial but rather military and governmental entities. To achieve efficient spectrum sharing without interference to military and federal incumbents, these systemsĀ require s, or entities with sensing capabilities such as ESC, to report sā operational data (including their location, frequencies time of use, etc.) to be included in the spectrum databases which may present serious privacy risks to these s.
Being aware of such potential privacy threats, both s and s may refuse to share their sensitive information with s, which may present a serious barrier to the adoption of database-based s, and to the public acceptance and promotion of the dynamic spectrum sharing paradigm. Therefore, there is a critical need for developing techniques to protect the location privacy of both s and s while allowing the latter to harness the benefits of the Ā paradigm without disrupting the functionalities that these techniques are designed for to promote dynamic spectrum sharing.
I-B Research Gap and Objectives
Despite the importance of the location privacy issue in s, only recently has it started to gain interest from the research communityĀ [9]. Some works focus on addressing this issue in the context of collaborative spectrum sensingĀ [10, 11, 12, 13, 14]; others address it in the context of dynamic spectrum auctionĀ [15]. Protecting sā location privacy in database-driven s is a more challenging task, merely because s are required, by protocol design, to provide their physical location to Ā to learn about spectrum opportunities in their vicinity. The heterogeneity of wireless devices and the versatility of services relying on the CRN technologyĀ [16] could also present some challenges in designing privacy-preserving mechanisms for users in s. In fact, privacy-preserving solutions need to embrace the different resource constraints of each Ā device and the various requirements of each service in terms of data rates and delay sensitivities. This makes it hard to leverage general purpose public key encryption-based techniques due to their high cost in terms of computation and communication overheads especially on resource-constrained devices. It is therefore crucial to design cost-effective protocols that offer strong privacy guarantees to users and also adapt to different systems requirements regardless of the constraints of the users.
The existing location privacy preservation techniques for database-driven Ā (e.g.,Ā [17, 2, 18, 19, 20, 21]) generally rely on three main lines of privacy preserving technologies, (i) k-anonymityĀ [22], (ii) differential privacyĀ [23] and (iii) single-server Private Information Retrieval ()Ā [24]. However, the direct adaptation of k-anonymity based techniques have been shown to yield either insecure or extremely costly resultsĀ [25]. The solutions adapting differential privacy (e.g.,Ā [20]) not only incur a non-negligible overhead, but also introduce a noise over the queries, and therefore they may negatively impact the accuracy of spectrum availability information.
Among these alternatives, single-server Ā seems to be the most popular. Ā technology is a suitable choice for database-driven s, as it permits privacy preserving queries on a public database, and therefore can enable a Ā to retrieve spectrum availability information from the database without leaking its location information. However, single-server Ā protocols rely on highly costly partial homomorphic encryption schemes, which need to be executed over the entire database for each query. Indeed, as we also demonstrated with our experiments in Section IV, the execution of a single query even with some of the most efficient single-server Ā schemesĀ [26] takes approximately seconds with a bandwidth on a moderate size database (e.g., entries). An end-to-end delay with the orders of seconds might be undesirable for spectrum sensing needs of s in real-life applications. Also, some of the state-of-the-art efficient computational Ā schemesĀ [27] that are used in the context of s have been shown to be brokenĀ [26]. Thus, there is a significant need for practical location privacy preservation approaches for database-driven s that can meet the efficiency and functionality requirements of s.
I-C Our Observation and Contribution
The objective of this paper is to develop efficient techniques for database-driven s that preserve the location privacy of s during their process of acquiring spectrum availability information. We also try to protect the operational privacy of s in systems that require incumbents to provide spectrum availability information to s. Specifically, we will aim for the following design objectives: (location privacy of s) Preserve the location privacy of s, whether fixed or mobile, while allowing them to receive spectrum availability information; (efficiency and practicality) Incur minimum computation, communication and storage overhead. The cryptographic delay must be minimum to permit fast spectrum availability decision for the s, and storage/processing cost must be low to enable practical deployments. (fault-tolerance and robustness) Mitigate the effects of system failures or misbehaving entities (e.g., colluding databases). *(location privacy of s) * The location information of s needs to be protected while still able to provide spectrum availability information to s. It is very challenging to meet all of these seemingly conflicting design goals simultaneously.
The main idea behind our proposed approaches is to harness special properties and characteristics of the database-driven Ā systems to employ private query techniques that can overcome the significant performance, robustness and privacy limitations of the state-of-the-art techniques. Specifically, our proposed approach is based on the following observation:
Observation: FCC requires that all of its certified databases synchronize their records obtained through registration procedures with one anotherĀ [28, 29] and need to be consistent across the other databases by providing exactly the same spectrum availability information, in any region, in response to sā queriesĀ [30]. That is, the same copy of spectrum database is available and accessible to the s via multiple (distinct) spectrum database administrators/providers. Is it possible exploit this observation to achieve efficiency location preservation techniques for database-driven ?
In practice, as stated in PAWS standardĀ [3], s have the option to register to multiple spectrum databases belonging to multiple service providers. Currently, many companies (e.g. GoogleĀ [4], iconectivĀ [5], etc) have obtained authorization from FCC to operate geo-location spectrum databases upon successfully complying to regulatory requirements. Several other companies are still underway to acquire this authorization[31]. Thus, it is more natural and realistic to take this fact into consideration when designing privacy preserving protocols for database-based s. Based on this observation, our main contribution is as follows:
Our Contribution: To the best of our knowledge, we are the first to exploit the fact that multiple copies of spectrum s are available by nature in database-driven s, and therefore it is possible to harness multi-server Ā techniquesĀ [24, 33] that offer information-theoretic privacy with substantial efficiency advantages over single-server . This is achieved by relying on Shamir secret sharing-based techniques to either divide the content of sā queries or the spectrum availability information, or both, among the different s to prevent these s from inferring sā location from their queries or from learning sā sensitive operational data from the spectrum availability information.
We show, analytically and experimentally with deployments on cloud systems, that our adaptation of multi-server Ā techniques significantly outperforms the state-of-the-art location privacy preservation methods as demonstrated in TableĀ I and detailed in Section IV. Moreover, our adaptations achieve information theoretical privacy while existing alternatives offer only computational privacy. This feature provides an assurance against even post-quantum adversariesĀ [34] and can avoid recent attacks on computational Ā [26].
Notice that, multi-server Ā techniques require the availability of multiple (synchronized) replicas of the database. Therefore, despite their high efficiency and security, they received a little attention from the practitioners. For instance, in traditional data outsourcing settings (e.g., private cloud storage), the application requires a client to outsource only a single copy of its database. The distribution and maintenance of multiple copies of the database across different service providers brings additional architectural and deployment costs, which might not be economically attractive for the client.
In this paper, we showcased one of the first natural use-cases of multi-server , in which the multiple copies of synchronized databases are already available by the original design of application (i.e., spectrum availability information in multi-database s), and therefore multi-server Ā does not introduce any extra overhead on top of the application. Exploiting this synergy between multi-database Ā and multi-server Ā permitted us to provide informational theoretical location privacy for s with a significantly better efficiency compared to existing single-server Ā approaches.
Desirable Properties: We outline the desirable properties of our approaches below.
- ā¢
Computational efficiency: The adapted approaches are much more efficient than existing location privacy preserving schemes. For instance, as shown in TableĀ I, Ā and Ā are more than orders of magnitudes faster than the schemes proposed by Troja et al.Ā [18, 19], and times faster than XPIRĀ [26] and Ā [2].
- ā¢
Information Theoretical Privacy Guarantees: They can achieve information-theoretic privacy which is the optimal privacy level that could be reached as opposed to computational privacy guarantees offered by existing approaches. In fact some of these approaches are prone to recent attacks on computational-Ā protocolsĀ [26] and are not secure against post-quantum adversariesĀ [34].
- ā¢
Low communication overhead: Our approaches incur a reasonable communication overhead that is a middle ground between the fastest computational Ā [26] and the most communication efficient computational Ā [35].
- ā¢
Fault-Tolerance and Robustness: Our proposed approaches are resilient to the issues that are associated with multi-server architectures: failures, byzantine behavior, and collusion. Even though the collusion of all of the service providers is unlikely to happen due to the competing nature of these companies and due to regulatory enforcement from bodies such as FCC to protect usersā data, we have however considered collusion in our system and security model. All proposed approaches can handle collusion of multiple s up to certain limit that is different for each approach. In addition, some of the proposed approaches can also handle faulty and byzantineĀ s. Besides, simply hacking s, when the proposed approaches are in place, will not be sufficient to learn usersā information since some of these protocols offer hybrid privacy protection by combining both computationalĀ and information-theoretic Ā protocols enabling them to offer computational privacy even when all of the s are compromised.
- ā¢
Experimental evaluation on actual cloud platforms: We deploy our proposed approaches on a real cloud platform, GENIĀ [36], to show their feasibility. In our experiment, we create multiple geographically distributed VMs each playing the role of a . A laptop plays the role of a Ā that queries s, i.e. VM s. Our experiments confirm the superior computational advantages of the adoption of multi-server Ā over the existing alternatives.
I-D Differences Compared to the Preliminary Version
The main differences between this paper and its preliminary versionsĀ [37, 38] are as follows: (i) We further consider the location privacy issue of mobile s and offer a way to amortize the cost incurred by mobility. (ii) We also leverage multi-server Ā to address the location privacy issue of s in database-Ā systems that require s to provide spectrum availability to s. (iii) We discuss also a way to reduce the cost of Ā by partitioning the spectrum database instead of simply replicating it using the RAID-PIR protocolĀ [39] and we discuss the privacy-performance tradeoff of relying on such approach. (iv) We provide a more detailed performance evaluation that takes into account the latest advances in Ā technology, namely SealPIRĀ [32] which relies on fully homomorphic encryption.
II Preliminaries and Models
II-A Notation and Building Blocks
We summarize our notations in TableĀ II. Our adaptations of multi-server Ā rely on the following building blocks.
Private Information Retrieval (): Ā allows a user to retrieve a data item of its choice from a database, while preventing the server owning the database from gaining information on the identity of the item being retrievedĀ [40]. One trivial solution to this problem is to make the server send an entire copy of the database to the querying user. Obviously, this is a very inefficient solution to the Ā problem as its communication complexity may be prohibitively large. However, it is considered as the only protocol that can provide information-theoretic privacy, i.e. perfect privacy, to the userās query in single-server setting. There are two main classes of Ā protocols according to their privacy level: information-theoretic Ā () and computational Ā ().
- ā¢
Information-theoretic or multi-server : It guarantees information-theoretic privacy to the user, i.e. privacy against computationally unbounded servers. This could be achieved efficiently only if the database is replicated at non-communicating serversĀ [24, 33]. The main idea behind these protocols consists on decomposing each userās query into several sub-queries to prevent leaking any information about the userās intent.
- ā¢
Computational or single-server : It guarantees privacy against computationally bounded server(s). In other words, a server cannot get any information about the identity of the item retrieved by the user unless it solves a certain computationally hard problem (e.g. prime factorization of large numbers), which is common in modern cryptography. Thus, they offer weaker privacy than their Ā counterpartsĀ [27, 41].
Shamir Secret Sharing: This is a concept introduced by Shamir et al.Ā [42] to allow a secret holder to divide its secret Ā into Ā shares and distribute these shares to Ā parties. In -Shamir secret sharing, where , if Ā or fewer combine their shares, they learn no information about . However, if more than Ā come together, they can easily recover . Given a secret Ā chosen arbitrarily form a finite field, the -Shamir secret sharing scheme works as follows: the secret holder chooses Ā arbitrary non-zero distinct elements . Then, it selects Ā elements uniformly at random. Finally, the secret holder constructs the polynomial , where . The Ā shares , that are given to each party, are . Any or more parties can recover the polynomial using Lagrange interpolation and thus they can reconstruct the secret . However, Ā or less parties can learn nothing about . In other words, if shares of Ā are available then Ā can be easily recovered.
II-B System Model and Security Definitions
We consider a database-driven Ā that contains Ā s, where , and a Ā registered to these s to learn spectrum availability information in its vicinity. We assume that these s share the same content and that they are synchronized as mandated by PAWS standardĀ [3]. We also assume that s may collude in order to infer ās location. In the following, we present our security definitions.
Definition 1**.**
Byzantine* : This is a faulty Ā that runs but produces incorrect answers, possibly chosen maliciously or computed in error. This might be due to a corrupted or obsolete copy of the database caused by a synchronization problem with the other s.*
Definition 2**.**
-private* : The privacy of the query is information-theoretically protected, even if up to Ā of the Ā s collude, where .*
Definition 3**.**
-Byzantine-robust* : Even if Ā of the responding s are Byzantine, Ā can reconstruct the correct database item, and determine which of the s provided incorrect response.*
Definition 4**.**
-out-of-* : Ā can reconstruct the correct record if it receives at least -out-of-Ā responses, .*
Definition 5**.**
Robust* : It can deal with s that do not respond to ās queries and allows Ā to reconstruct the correct output of the queries in this situation.*
Definition 6**.**
-independent* : The content of the database itself is information theoretically protected from the coalition of up to s, where .*
III Proposed Approaches
In the proposed approaches, we tailor multi-server Ā to the context of multi-Ā s. We start by illustrating the structure of the spectrum database that we consider. Then, we give several approaches, each adapts a multi-server Ā protocol with different security, performance properties, and use cases. We model the content of each Ā as an matrix Ā of size bits, where is the number of words of size Ā in each record/block of the database and is the number of records in the database, i.e. , where is the block size in bits. The row of Ā is the record of the database.
[TABLE]
We further assume that each row of the database corresponds to a unique combination of the tuple , where Ā and Ā represent one locationās latitude and longitude, respectively, Ā is a channel number, and Ā is a time-stamp. We also assume that s can associate their location information with the index Ā of the corresponding record of interest in the database using some inverted index technique that is agreed upon with s. An Ā that wishes to retrieve record without any privacy consideration can simply send to Ā a row vector consisting of all zeros except at position Ā where it has the value . Upon receiving , Ā multiplies it with Ā and sends record back to Ā as we illustrate below:
[TABLE]
[TABLE]
This trivial approach makes it easy for s to learn ās location from the vector as Ā is indexed based on location. In the following we present two approaches that try to hide the content of from s, and thus preserve ās location privacy. The approaches present a tradeoff between efficiency, and some additional security features.
III-A Location Privacy with Chor ()
Our first approach, termed , harnesses the simple and efficient Ā protocol proposed by Chor et al.Ā [24]. We describe the different steps of Ā in AlgorithmĀ 1 and highlight these steps in FigureĀ 1. Elements of Ā in this scheme belong to , i.e. bit and .
In , Ā starts by invoking the inverted index subroutine which takes as input the coordinates of the user, its channel of interest, and a time-stamp and returns a value . This value corresponds to the index of the record of Ā that Ā is interested in. Ā then constructs , which is a standard basis vector having [math] everywhere except at position which has the value as we discussed previously. Ā also picks -bit binary strings uniformly at random from , and computes . Finally, Ā sends to , for . Upon receiving the bit-string of length , computes , which could be seen also as the XOR of those blocks in for which the bit of is , then sends back to . Ā receives s from s, , and computes , which is the block of the database that Ā is interested in, from which it can retrieve the spectrum availability information.
Ā is very efficient thanks to its reliance on simple XOR operations only as we discuss in SectionĀ IV. It is also -private, by DefinitionĀ 2, as collusion of up to s cannot enable them to learn , and consequently its location. In fact, only if Ā s collude, then they will be able to learn by simply XORing their . However this approach suffers from two main drawbacks. First, it is not robust since even if one Ā fails to respond, Ā will not be able to recover . Second, it is not byzantine robust; if one or more s return a wrong response, Ā will reconstruct a wrong block and also will not be able to recognize which Ā misbehaved so as not to rely on it for future queries. In SectionĀ III-B we discuss a second approach that improves on these two aspects but with some additional overhead.
III-B Location Privacy with Goldberg ()
Our second approach, termed , is based on Goldbergās Ā protocolĀ [33] which uses Shamir secret sharing to hide , i.e. ās query. It is a modification of Chorās schemeĀ [24] to achieve both robustness and byzantine robustness. Rather than working over (binary arithmetic), this scheme works over a larger field , where each element can represent bits. The database in this scheme, is an matrix of elements of . Each row represents one block of size bits, consisting of words of bits each. Again, is replicated among databases . We summarize the main steps of Ā protocol in AlgorithmĀ 2 and illustrate them in FigureĀ 2.
To determine the index Ā of the record that corresponds to its location, Ā starts by invoking the subroutine then constructs the standard basis vector as explained earlier. Ā then uses -Shamir secret sharing to divide the vector into independent shares to ensure a -private Ā protocol as in DefinitionĀ 2. That is, Ā chooses distinct non-zero elements and creates random degree- polynomials satisfying . Ā then sends to each its share corresponding to the vector . Each then computes the product and sends to .
Some s may fail to respond to ās query and only -out-of-Ā send their responses to . Ā collects Ā responses from the Ā responding s and tries to recover the record at index from the s by using the EasyRecover() subroutine fromĀ [33] which uses Lagrange interpolation to recover from the secret shares . This is possible thanks to the use of -Shamir secret sharing as long as and these Ā s are honest. In fact, by the linearity property of Shamir secret sharing, since is a set of -Shamir secret shares of , then will be also a set of -Shamir secret shares of , which is the block of the database. Thus, it is possible for Ā to reconstruct using Lagrange interpolation as explained in SectionĀ II, by relying only on the Ā responses which makes Ā robust by DefinitionĀ 5. Also, the EasyRecover can detect the s that responded honestly, thus those that are byzantine as well, which should discourage s from misbehaving. More details about this subroutine could be found inĀ [33].
Moreover, s among the Ā responding ones may even be byzantine, as in DefinitionĀ 1, and produce incorrect response. In that case, it would be impossible for Ā to simply rely on Lagrange interpolation to recover the correct responses. Since Shamir secret sharing is based on polynomial interpolation, the problem of recovering the response in the case of byzantine failures corresponds to noisy polynomial reconstruction, which is exactly the problem of decoding Reed-Solomon codesĀ [43]. Thus, Ā would rather rely on error correction codes and more precisely on the Guruswami-Sudan list decodingĀ [44] algorithm which can correct incorrect responses. In fact, the vector is a Reed-Solomon code-word encoding the polynomial , and the client wishes to compute for each to recover all the Ā words forming the record . This is done through the HardRecover() subroutine fromĀ [33]. This makes Ā also -Byzantine-robust, by DefinitionĀ 3, and solves the robustness issues that Ā suffers from, however, this comes at the cost of an additional overhead as we discuss in SectionĀ IV.
Corollary 1**.**
*Ā and Ā directly inherit the security properties of ChorāsĀ [24] Ā and GoldbergāsĀ [33] Ā respectively.*
III-C Location Privacy of Mobile s Through Batching
Thus far, we concerned only about non-mobile s that periodically submit an individual query to s to learn spectrum availability in their fixed location. However, things get more interesting with mobility. In fact, a mobile Ā will need to query s multiple times as its location changes. While the previous two approaches perform well for non-mobile s, they will incur a significant overhead on both Ā and s especially when Ā is moving at a relatively high speed, which will require a large number of Ā queries.
Our third approach aims to protect the location privacy of mobile s while reducing the mobility-associated overhead. The idea is to exploit the fact that a mobile Ā usually has an a priori knowledge of its trajectory to make it query s for its current and future locations by batching these queries together instead of sending them separately. We achieve this by relying on the Ā protocol of Lueks et al.[45] that extends the scheme of GoldbergĀ [33] to support batching of the queries using fast matrix multplication mechanisms inspired from batch codesĀ [46]. We refer to this approach as Ā and we describe it in the following.
Each Ā that receives simultaneous queries from an Ā can process them using Ā by simply multiplying each query with Ā as illustrated in StepĀ 10 of AlgorithmĀ 2. Alternatively, it can also group these queries into a matrix Ā of size , where each row corresponds to a query , before computing the matrix product . The careful reader will notice that this naive multiplication method would cost around operations (including multiplications and additions) which can be prohibitively expensive especially for a large Ā or . This problem boils down to a fast matrix multiplication problem and therefore can benefit from fast matrix multiplication algorithms such as StrassenāsĀ [47].
Strassenās algorithm consists on simply dividing both matrices Ā and Ā into four equally sized block matrices. Then instead of naively multiplying these submatrices, which will result in submatrix multiplications (fundamentally equivalent to simple matrix multiplication), Strassenās algorithm creates linear combinations of blocks in a way that reduces the number of submatrix multiplications to . The exact approach is then applied recursively to the multiplications of the submatrices of the previous step. This simple yet powerful matrix multiplication technique will significantly reduce the overhead for s and therefore the delay that s experience to learn spectrum availability while moving as illustrated in SectionĀ IV.
A row in the resulting matrix, , corresponds to ās response to the query. Ā will then recover the spectrum availability by combining same-index rows of the different s as in .
III-D Location Privacy of s
As we mentioned earlier, in database-driven s, sā content comprises operational information of s which may be very sensitive in systems such as Ā in the 3.5 GHz CBRS band where s are military and governmental entities. The service providers use this operational data to feed their models and populate the spectrum databases with availability information but do not share the sā location information in response to sā queries. Therefore, s do not present a serious threat to s privacy as opposed to the service providers which could be malicious, and could misuse sā sensitive operational data.
In this subsection, we present another approach to take into account the privacy of these s as well. For this we make use of another extension of the GoldbergĀ Ā scheme known as -independence, to prevent s from learning the content of Ā even if up to s collude to learn Ā as defined in DefinitionĀ 6. This is achieved by making s populate the s with spectrum availability information pertaining to their respective channels instead of the service providers, by secretly sharing each record they want to add, among the different service providers using Shamir secret sharing techniques, similar to how s secretly share their queries. That way, each service provider will not be able to decode this data, and only s which have access to the secret can retrieve the record by combining the different shares from the different DBs. This is motivated by the fact that s are expected to be populated by s themselves as it is the case in LSA systems, or by a highly trusted independent entity, the ESC, as in Ā systems. Therefore, whenever a Ā or an ESC submits a Ā activity record of index to s it will divide it into Ā words and distributes Shamir secret shares of every word among the s as reflected in AlgorithmĀ 3. Each will now have a different content :
[TABLE]
where form a -Shamir secret sharing of word . This requires that the random values s, used to create Shamir secret shares as explained in SectionĀ II-A, are shared beforehand among s and s. This could be done by FCC during the registration phase, for instance, and must not be communicated to s.
This way, records revealing operational data of s, which could be used by s to build knowledge of the activity of these s and track them, are information-theoretically protected from s as long as no more than of these s collude. However, for this protocol to work, this condition must hold: . While this extension of Ā should have no impact on the performance from s and s side as we show in SectionĀ IV, it has, however, an impact on the t-privacy of the protocol. In fact as the -independence level, controlling how many s can collude to learn the record submitted by , sought by Ā increases, the maximum achievable t-privacy level will decrease since must always hold.
III-E Location Privacy of s in Partitioned-database s
In this section, we present another location privacy-preserving approach for s in the case where the spectrum database content is distributed among the different s instead of simply replicating it as in the previous approaches. This could be motivated by the fact that some database-driven s may have multiple s covering different or slightly overlapping regions. It could also be a way to reduce cost by making each Ā manage a portion of the database.
For that we rely on the RAID-PIR protocol due to Demmler et al.Ā [39] which builds on Chorās scheme to reduce the communication overhead and the computation required at the server side. The idea here is very similar to that of Chorās but here the vector is divided into Ā chunks. Each query sent to is divided into Ā chunks as illustrated in FigureĀ 3, where Ā is a redundancy parameter that controls the minimum number of s that need to collude to recover the record with . This parameter also controls the number of chunks in every query and how often the chunks overlap throughout these queriesĀ [39].
The details of this approach are described in AlgorithmĀ 4. To optimize the cost, Ā can use a pseudo random generator, , to generate the chunks of as illustrated in AlgorithmĀ 4. For that, Ā randomly generates Ā seeds of size bits each, where is the symmetric security parameter, and expands each seed into random chunks , using , each of size as depicted in stepĀ 5 of AlgorithmĀ 4. The first chunk of query , denoted as , is computed to cancel out the other chunks of each of the other s, if applicable, and is obtained by xoring those chunks with the chunk of . Thanks to the use of the , Ā does not need to send the whole query and needs only to send a compacted version of , denoted as , composed of and the seed , used to generate the other chunks of the full query , to . Then, will use the same pseudo-random generator, , with the seed that it received to generate the full query . Once recovered, will construct its answer by xoring the records in Ā whose indices match those of the set bits in . Finally, Ā needs only to xor the results from the different s to recover the record.
As the size of the query is just , each Ā now needs to store and process only records of Ā which will be beneficial to s especially if the number of these databases increases.
IV Evaluation and Analysis
IV-A *Analytical Comparison *
We start by studying the proposed approachesā performance analytically and we compare them to existing approaches. For , we choose to simplify the cost of computations as inĀ [43]; since in , additions are XOR operations on bytes and multiplications are lookup operations into a KB tableĀ [43]. We summarize the system communication complexity and the computation incurred by both Ā and Ā and we illustrate the difference in architecture and privacy level of the different approaches in TableĀ III. As we mentioned earlier, existing research focuses on the single Ā setting. We compare the proposed approaches to existent techniques despite the difference of architecture to show the great benefits that multi-server Ā brings in terms of performance and privacy as we discuss next. We briefly discuss these approaches in the following.
Gao et al.Ā [2] propose a -based approach, termed , that relies on the Ā scheme of Trostle et al.Ā [27] to defend against the new attack that they identify. This new attack exploits spectrum utilization pattern to localize s. Troja et al.Ā [18, 19] propose two other -based approaches that try to minimize the number of Ā queries by either allowing s to share their availability information with other sĀ [18] or by exploiting trajectory information to make s retrieve information for their current and future positions in the same queryĀ [19].
Despite their merit in providing location privacy to s these -based approaches incur high overhead especially in terms of computation. This is due to the fact that they rely on Ā protocols to provide location privacy to s, which are known to suffer from expensive computational cost. In fact, answering an ās query through a Ā protocol, requires Ā to process all of its records, otherwise Ā would learn that Ā is not interested in them and would then learn partial information about the recordĀ , and consequently ās location. This makes the computational cost of most Ā based location preserving schemes linear on the database size from Ā side as we illustrate in TableĀ III. Now this is not exclusive to Ā protocols as even Ā protocols may require processing all the records to guarantee privacy, however, the main difference with Ā protocols is that the latter have a very large cost per bit in the database, usually involving expensive group operations like multiplication modulo a large modulusĀ [26] as opposed to multi-server Ā protocols. This could be seen clearly in TableĀ III as both Ā and Ā require Ā to perform a very efficient XOR operation per bit of the database. The same applies to the overhead incurred by Ā which only performs XOR operations in both Ā and , while performing expensive modular multiplications and even exponentiations over large primes in the -based approaches.
In terms of communication overhead, the proposed approaches incur a cost that is linear in the number of records and their size . As an optimal choice of these parameters is usually Ā [24, 33, 43, 26] then this cost could be seen as to retrieve a record of size bits, which is a reasonable cost for an information theoretic privacy.
Moreover, as illustrated in TableĀ III, existent approaches fail to provide information theoretic privacy as the underlying security relies on computational Ā schemes. The only approaches that provide information theoretic location privacy are , , and RAID\mathchar 45\relax$$\mathit{LP\mathchar 45\relax Chor}Ā which are -private, -private, and ()-private respectively, by DefinitionĀ 2. It is worth mentioning that Ā [2] relies on the well-known Ā of Trostle et al.Ā [27] representing the state-of-the-art in efficient . However, this Ā scheme has been brokenĀ [26, 48]. Since the security of Ā follows that of Trostle et al.Ā [27] broken , then Ā fails to provide the privacy objective that it was designed for. However, we include it in our performance analysis for completeness.
IV-B Experimental Evaluation
We further evaluate the performance of the proposed schemes experimentally to confirm the analytical observations.
Hardware setting and configuration. We have deployed the proposed approaches on GENIĀ [36] cloud platform using the percy++ libraryĀ [49]. We have created virtual machines (VMs), each playing the role of a Ā and they all share the same copy of . We deploy these GENI VMs in different locations in the US to count for the network delay and make our experiment closer to the real case scenario where spectrum service providers are located in different locations. These VMs are running Ubuntu , each having GB of RAM, GB SSD, and vCPUs, Intel Xeon X5650 Ā GHz or Intel Xeon E5-2450Ā GHz. To assess the Ā overhead we use a Lenovo Yoga 3 Pro laptop with GB RAM running Ubuntu with an Intel Core m Processor 5Y70 CPUĀ GHz. The client laptop communicates with the remote VMs through ssh tunnels. We are also aware of the advances in Ā technology, and more precisely the fastest Ā protocols in the literature: XPIR which is proposed by Aguilar et al.[26] and SealPIR due to Angel et al.Ā [32]. We include these protocols in our experiment to illustrate how multi-server Ā performs against the best known Ā schemes if they are to be deployed in s. We use the available implementation of these protocols provided inĀ [50] andĀ [51] and we deploy their server components on a remote GENI VM while the client component is deployed on the Lenovo Yoga 3 Pro laptop.
Dataset. Spectrum service providers (e.g. Google, Microsoft, etc) offer graphical web interfaces and APIs to interact with their databases allowing to retrieve basic spectrum availability information for a user-specified location. Access to full data from real spectrum databases was not possible, thus, we generated random data for our experiment. The generated data consists of a matrix that models the content of the database, , with a fixed block size B while varying the number of records . The value of is estimated based on the public raw data provided by FCCĀ [52] on a daily basis and which service providers use to populate their spectrum databases.
Results and Comparison. We first measure the query end-to-end delay of the proposed approaches and plot the results in FigureĀ 4. We also include the delay introduced by the existing schemes based on our estimation of the operations included in TableĀ III. The end-to-end delay that we measure takes into consideration the time needed by Ā to generate the query, the network delay, the time needed by Ā to process the query, and finally the time needed by Ā to extract the record of the database. We consider two different internet speed configurations in our experiment. We first rely on a high-speed internet connection of on the download and on the upload for all compared approaches. Then we use a low-speed internet connection of on the upload and download to assess the impact of the bandwidth on Ā and , and also on XPIR as well.
FigureĀ 4 shows that the proposed schemes perform much better than the existing approaches in terms of delay even with low-speed internet connection. They also perform better than the fastest existing Ā protocols XPIR and SealPIR. This shows the benefit of relying on multi-server Ā in multi-Ā s. Also, and as expected, Ā scheme performs better than Ā thanks to its simplicity. As we will see later, Ā also incurs larger communication overhead than Ā as well. This could be acceptable knowing that Ā can handle collusion of up-to Ā s, and is robust in the case of non-responding s, and Ā byzantine s, as opposed to . This means that Ā could be more suitable to real world scenario as failures and byzantine behaviors are common in reality. FigureĀ 4 also shows that the network bandwidth has a significant impact on the end-to-end latency. This is due to the relatively large amount of data that needs to be exchanged during the execution of these protocols which requires higher internet speeds.
We also compare the computational complexity experienced by each Ā and Ā separately in the different approaches as shown in TableĀ III. We further illustrate this through experimentation and we plot the results in FigureĀ 5(a), which shows that the proposed schemes incur lower overhead on the Ā than the existing approaches. The same observation applies to the computation experienced by each Ā which again involves only efficient XOR operations in the proposed schemes. We illustrate this in FigureĀ 5(b).
We also study the impact of non-responding s on the end-to-end delay experienced by the Ā in Ā as illustrated in FigureĀ 6. This Figure shows that as the number of faulty s increases, the end-to-end delay decreases since Ā needs to process fewer shares to recover the record . As opposed to , in , Ā is still able to recover the record Ā even if only Ā out-of-Ā s respond. Please recall also that our experiment was performed on resource constrained VMs to emulate s, however in reality, s should have much more powerful computational resources than those of the used VMs which will have a tremendous impact on further reducing the overhead of the proposed approaches.
FigureĀ 7 illustrates the impact of ās desired privacy level in Ā on the processing time incurred by both Ā and s. As expected, increasing the value of , which controls the number of s that can collude without inferring the content of the query, should not have any impact on each Ā as they will always perform the same operations regardless of the privacy level. However, since the results sent by s could also be considered as a -Shamir secret sharing of the retrieved record, when Ā increases, then the number of secret shares required to recover the record increases which will result in more computation for the Ā when performing Lagrange interpolation over higher degree-Ā polynomials.
We further study the impact of the number of byzantine s on the processing time on Ā side in Ā as depicted in FigureĀ 8. As expected, having more byzantine s will increase the complexity of decoding the different shares, that Ā receives from s, using the relatively expensive HardRecover subroutine fromĀ [33].
As for -, the -independence extension will have no impact on the processing time of s and should also have no impact on s as long as is constant. This means that both s and s will always seek the maximum privacy levels for their data and queries such that . This is reflected in FigureĀ 9. However the processing time will be linear in similar to FigureĀ 7(a).
As for the case of mobile s, we compare the performance of batching multiple queries for the future locations of a Ā to that of sending separate consecutive queries using , SealPIRand,and XPIR as depicted in FigureĀ 10. Using batching mainly reduces the computation on s side and will reduce the end-to-end delay for answering the queries of the moving .
We also demonstrate the benefit of relying on RAID\mathchar 45\relax$$\mathit{LP\mathchar 45\relax Chor}Ā and partitioning the database content among s, instead of simply replicating it, on the sā side for several values of the redundancy parameter . As expected, yields the best performance however it also offers the lowest level of resistance to collusion. Setting to be equal to Ā will is equivalent to the original scheme Ā and will have the best performance. Therefore, RAID\mathchar 45\relax$$\mathit{LP\mathchar 45\relax Chor}Ā offers a performance-privacy tradeoff that is controlled by the redundancy parameter .
In terms of communication overhead, most of the approaches, including ours, have linear cost in the number of records in the database as shown in TableĀ III. What really makes a difference between these schemesā communication overheads is the associated constant factor which could be very large for some protocols. Based on our experiment and the expressions displayed in TableĀ III, we plot in FigureĀ 12, the communication overhead that the Ā experiencesĀ for each private spectrum availability query issued by Ā for the different schemes. The scheme with the lowest communication overhead is that of Troja et al.Ā [19] especially for a large number of records thanks to the use of Gentry et al. Ā [35] which is the most communication efficient single-server protocol in the literature having a constant communication overhead. However this scheme is computationally expensive just like most of the existing -based approaches as we show in FigureĀ 4. RAID\mathchar 45\relax$$\mathit{LP\mathchar 45\relax Chor}Ā is the second best scheme in terms of communication overhead followed by, but they also provide information theoretic privacy. As shown in FigureĀ 12, RAID\mathchar 45\relax$$\mathit{LP\mathchar 45\relax Chor}Ā is significantly more efficient than , which again shows the benefit, in terms of overhead, of distributing the spectrum availability information among multiple s. As shown in FigureĀ 12, Ā incurs much lower communication overhead than Ā thanks to the simplicity of the underlying ChorĀ Ā protocol. However, as we discussed earlier, Ā provides additional security features compared to . SealPIR has a relatively high communication overhead especially for smaller database size but its overhead becomes comparable to that of Ā when the databaseās size gets larger as shown in FigureĀ 12. This could be a good alternative to the Ā schemes used in the context of s especially that it introduces much lower latency which is critical in the context of s. Still, the proposed approaches have better performance and also provide information-theoretic privacy to s, which shows their practicality in real world.
V Related Work
There are other approaches that address the location privacy issue in database-driven s. However, for the below mentioned reasons we decided not to consider them in our performance analysis. For instance, Zhang et al.Ā [17] rely on the concept of k-anonymity to make each Ā queries Ā by sending a square cloak region that includes its actual location. k-anonymity guarantees that ās location is indistinguishable among a set of points. This could be achieved through the use of dummy locations by generating properly selected dummy points, and performing queries to , using the real and dummy locations. Their approach relies on a tradeoff between providing high location privacy level and maximizing some utility. This makes it suffer from the fact that achieving a high location privacy level results in a decrease in spectrum utility. However, k-anonymity-based approaches cannot achieve high location privacy without incurring substantial communication/computation overhead. Furthermore, it has been shown in a recent study led by Sprint and TechnicolorĀ [25] that anonymization based techniques are not efficient in providing location privacy guarantees, and may even leak some location information. Grissa et alĀ [54, 21] propose an information theoretic approach which could be considered as a variant of the trivial Ā solution. They achieve this by using set-membership probabilistic data structures/filters to compress the content of the database and send it to Ā which then needs to try several combinations of channels and transmission parameters to check their existence in the data structure. However, LPDB is only suitable for situations where the structure of the database is known to s which is not always realistic. Also, LPDB relies on probabilistic data structures which makes it prone to false positives that can lead to erroneous spectrum availability decision and cause interference to ās transmission. Zhang et al.Ā [20] rely on the -geo-indistinguishability mechanismĀ [55], derived from differential privacy to protect bilateral location privacy of both s and s, which is different from what we try to achieve in this paper. This mechanism helps s obfuscate their location, however, it introduces noise to ās location which may impact the accuracy of the spectrum availability information retrieved.
VI Conclusion
In this paper, with the key observation that database-driven s contain multiple synchronized s having the same content, we harnessed multi-server Ā techniques to achieve an optimal location privacy for both s and s and for different use cases with high efficiency. Our analytical and experimental analysis indicates that our adaptation of multi-server Ā for database-driven s achieve magnitudes of time faster end-to-end delay compared to the fastest state-of-the-art single-server Ā adaptation with an information theoretical privacy guarantee. Given the demonstrated benefits of multi-server Ā approaches without incurring any extra architectural overhead on database-driven s, we hope this work will provide an incentive for the research community to consider this direction when designing location privacy preservation protocols for s.
Acknowledgment
This work was supported in part by the US National Science Foundation under NSF awards CNS-1162296 and CNS-1652389
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] J. Mitola and G. Q. Maguire, āCognitive radio: making software radios more personal,ā IEEE personal comm. , vol. 6, no. 4, pp. 13ā18, 1999.
- 2[2] Z. Gao, H. Zhu, Y. Liu, M. Li, and Z. Cao, āLocation privacy in database-driven cognitive radio networks: Attacks and countermeasures,ā in INFOCOM, 2013 Proceedings IEEE , 2013, pp. 2751ā2759.
- 3[3] V. Chen, S. Das, L. Zhu, J. Malyar, and P. Mc Cann, āProtocol to access white-space (paws) databases,ā Tech. Rep., 2015.
- 4[4] āGoogle spectrum database,ā https://www.google.com/get/spectrumdatabase/ , accessed: 2017-04-14.
- 5[5] āiconectiv white spaces database,ā https://spectrum.iconectiv.com/main/home/ , accessed: 2017-04-14.
- 6[6] āMicrosoft white spaces database,ā http://whitespaces.microsoftspectrum.com/ , accessed: 2017-04-14.
- 7[7] A. Mancuso, S. Probasco, and B. Patil, āProtocol to access white-space (paws) databases: Use cases and requirements,ā Tech. Rep., 2013.
- 8[8] M. Massaro, āNext generation of radio spectrum management: Licensed shared access for 5g,ā Telecommunications Policy , vol. 41, no. 5-6, pp. 422ā433, 2017.
