Location Privacy in Cognitive Radios with Multi-Server Private   Information Retrieval

Mohamed Grissa; Attila A. Yavuz; and Bechir Hamdaoui

arXiv:1907.02518·cs.NI·July 5, 2019

Location Privacy in Cognitive Radios with Multi-Server Private Information Retrieval

Mohamed Grissa, Attila A. Yavuz, and Bechir Hamdaoui

PDF

TL;DR

This paper proposes a multi-server PIR approach to enhance location privacy for both primary and secondary users in spectrum database-based cognitive radio networks, achieving high efficiency and information-theoretic privacy.

Contribution

It introduces the novel use of multi-server PIR in CRNs, leveraging synchronized databases to provide optimal privacy with reduced overhead.

Findings

01

Multi-server PIR achieves high efficiency in CRNs.

02

Provides information-theoretic privacy for PUs and SUs.

03

Validated through analytical and empirical evaluations.

Abstract

Spectrum database-based cognitive radio networks (CRNs) have become the de facto approach for enabling unlicensed secondary users (SUs) to identify spectrum vacancies in channels owned by licensed primary users (PUs). Despite its merits, the use of spectrum databases incurs privacy concerns for both SUs and PUs. Single-server private information retrieval (PIR) has been used as the main tool to address this problem. However, such techniques incur extremely large communication and computation overheads while offering only computational privacy. Besides, some of these PIR protocols have been broken. In this paper, we show that it is possible to achieve high efficiency and (information-theoretic) privacy for both PUs and SUs in database-driven CRN with multi-server PIR. Our key observation is that, by design, database-driven CRNs comprise multiple databases that are required, by the…

Tables3

Table 1. TABLE I : Performance Comparison

Scheme	Comm.	Delay			Privacy
Scheme	Comm.	$𝑫𝑩$	$𝑺𝑼$	total	Privacy
$𝐿𝑃 - 𝐶ℎ𝑜𝑟$	$753 K B$	$0.48 s$	$0.0077 s$	$0.62 s$	$(ℓ - 1)$ -private
$𝐿𝑃 - 𝐺𝑜𝑙𝑑𝑏𝑒𝑟𝑔$	$6000 K B$	$1.21 s$	$0.32 s$	$1.78 s$	$t$ -private $ℓ$ -comp.-private
$R A I D -$ $𝐿𝑃 - 𝐶ℎ𝑜𝑟$	$125 K B$	$0.022 s$	$0.00041 s$	$0.21 s$	$(π - 1)$ -private
$𝑃𝑟𝑖𝑆𝑝𝑒𝑐𝑡𝑟𝑢𝑚$ [2]	$512.8 K B$	$21 s$	$0.084 s$	$24.2$	underlying $𝑃𝐼𝑅$ broken
Troja et al [19]	$8.4 K B$	$11760 s$	$5.62 s$	$11766 s$	computationally-private
Troja et al [18]	$12120 K B$	$11760 s$	$48 s$	$11820 s$	computationally-private
XPIR [26]	$4321 K B$	$17.66 s$	$0.34 s$	$20.53 s$	computationally-private
SealPIR [32]	$512 K B$	$11.03 s$	$0.008 s$	$11.35 s$	computationally-private

Table 2. TABLE II : Notations

$𝐷𝐵$	Spectrum database
$𝑆𝑈$	Secondary user
$𝐶𝑅𝑁$	Cognitive radio network
$ℓ$	Number of spectrum databases
$𝑫$	Matrix modeling the content of $𝐷𝐵$
$r$	Number of records in $𝑫$
$n$	Size of the database in bits
$b$	Size of one record of the database in bits
$w$	Size of one word of the database in bits
$s$	Number of words per block
$β$	Index of the record sought by $𝑆𝑈$
$t$	Privacy level (tolerated number of colluding $𝐷𝐵$ s)
$k$	Number of responding $𝐷𝐵$ s
$ϑ$	Number of byzantine $𝐷𝐵$ s

Table 3. TABLE III : Comparison with existent schemes

Scheme	Communication	Computation		Setting	Privacy
Scheme	Communication	$𝑫𝑩$	$𝑺𝑼$	Setting	Privacy
$𝐿𝑃 - 𝐶ℎ𝑜𝑟$	$(r + b) \cdot ℓ$	$n t_{\oplus}$	$(r + b) \cdot ((ℓ - 1) \cdot t_{\oplus})$	$ℓ$ $𝐷𝐵$ s	$(ℓ - 1)$ -private
$𝐿𝑃 - 𝐺𝑜𝑙𝑑𝑏𝑒𝑟𝑔$	$r \cdot w \cdot ℓ + k \cdot b$	$(n / w) \cdot t_{\oplus}$	$ℓ \cdot (ℓ - 1) \cdot r t_{\oplus} + 3 ℓ \cdot (ℓ + 1) t_{\oplus}$	$ℓ$ $𝐷𝐵$ s	$t$ -private $ℓ$ -comp.-private
$R A I D -$ $𝐿𝑃 - 𝐶ℎ𝑜𝑟$	$r + ℓ \cdot κ + ℓ \cdot b$	$(π / ℓ) \cdot n t_{\oplus}$	$(r \cdot (π - 1) + b \cdot (ℓ - 1)) t_{\oplus}$	$ℓ$ $𝐷𝐵$	$(π - 1)$ -private
$𝑃𝑟𝑖𝑆𝑝𝑒𝑐𝑡𝑟𝑢𝑚$ [2]	$(2 \sqrt{r} + 3) \cdot ⌈ \log p ⌉$	$𝒪 (r) \cdot M u l p$	$4 \sqrt{r} \cdot M u l p$	$1$ $𝐷𝐵$	underlying $𝑃𝐼𝑅$ broken
Troja et al [19]	$12 δ \cdot b$	$𝒪 (n) \cdot M u l p$	$4 \sqrt{n} \cdot M u l p$	$1$ $𝐷𝐵$	computationally-private
Troja et al [18]	$n_{g} \cdot ψ \cdot \log_{2} q + (2 \sqrt{n} + 3) \cdot ⌈ \log p ⌉$	$𝒪 (n) \cdot M u l p$	$n_{g} \cdot ψ \cdot (2 E x p p + M u l p) + 4 \sqrt{n} \cdot M u l p$	$1$ $𝐷𝐵$	computationally-private
XPIR [26]	$𝒪 (N d \sqrt[d]{n})$	$2 d \cdot (r / α) \cdot (b / ℓ_{0}) \cdot M u l p$	$d \cdot {(r / α)}^{1 / d} \cdot E n c + d \cdot α \cdot b / ℓ_{0} \cdot D e c$	$1$ $𝐷𝐵$	computationally-private
SealPIR [32]	$𝒪 (N d ⌈ \sqrt[d]{n} / N ⌉)$	$𝒪 (d \sqrt[d]{n})$	$d \cdot ℰ + (F^{d - 1} + 1) \cdot 𝒟$	$1$ $𝐷𝐵$	computationally-private

Equations8

D = w_{11} w_{21} ⋮ w_{r 1} w_{12} w_{22} ⋮ w_{r 2} \dots \dots ⋱ \dots w_{1 s} w_{2 s} ⋮ w_{r s}

D = w_{11} w_{21} ⋮ w_{r 1} w_{12} w_{22} ⋮ w_{r 2} \dots \dots ⋱ \dots w_{1 s} w_{2 s} ⋮ w_{r s}

[0 \dots 010 \dots 0] w_{11} w_{21} ⋮ w_{r 1} w_{12} w_{22} ⋮ w_{r 2} \dots \dots ⋱ \dots w_{1 s} w_{2 s} ⋮ w_{r s}

[0 \dots 010 \dots 0] w_{11} w_{21} ⋮ w_{r 1} w_{12} w_{22} ⋮ w_{r 2} \dots \dots ⋱ \dots w_{1 s} w_{2 s} ⋮ w_{r s}

= [w_{β 1} w_{β 2} \dots w_{β s}]

= [w_{β 1} w_{β 2} \dots w_{β s}]

D^{(i)} = w_{11}^{(i)} w_{21}^{(i)} ⋮ w_{r 1}^{(i)} w_{12}^{(i)} w_{22}^{(i)} ⋮ w_{r 2}^{(i)} \dots \dots ⋱ \dots w_{1 s}^{(i)} w_{2 s}^{(i)} ⋮ w_{r s}^{(i)}

D^{(i)} = w_{11}^{(i)} w_{21}^{(i)} ⋮ w_{r 1}^{(i)} w_{12}^{(i)} w_{22}^{(i)} ⋮ w_{r 2}^{(i)} \dots \dots ⋱ \dots w_{1 s}^{(i)} w_{2 s}^{(i)} ⋮ w_{r s}^{(i)}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Location Privacy in Cognitive Radios with Multi-Server Private Information Retrieval

Mohamed Grissa, Attila A. Yavuz, and Bechir Hamdaoui

Oregon State University, grissam,[email protected]

University of South Florida, [email protected]

Abstract

Spectrum database-based cognitive radio networks ( $\mathit{CRN}$ s) have become the de facto approach for enabling unlicensed secondary users ( $\mathit{SU}$ s) to identify spectrum vacancies in channels owned by licensed primary users ( $\mathit{PU}$ s). Despite its merits, the use of spectrum databases incurs privacy concerns for both $\mathit{SU}$ s and $\mathit{PU}$ s. Single-server private information retrieval ( $\mathit{PIR}$ ) has been used as the main tool to address this problem. However, such techniques incur extremely large communication and computation overheads while offering only computational privacy. Besides, some of these $\mathit{PIR}$ protocols have been broken.

In this paper, we show that it is possible to achieve high efficiency and (information-theoretic) privacy for both $\mathit{PU}$ s and $\mathit{SU}$ s in database-driven $\mathit{CRN}$ with multi-server $\mathit{PIR}$ . Our key observation is that, by design, database-driven $\mathit{CRN}$ s comprise multiple databases that are required, by the Federal Communications Commission, to synchronize their records. To the best of our knowledge, we are the first to exploit this observation to harness multi-server $\mathit{PIR}$ technology to guarantee an optimal privacy for both $\mathit{SU}$ s and $\mathit{PU}$ s, thanks to the unique properties of database-driven $\mathit{CRN}$ . We showed, analytically and empirically with deployments on actual cloud systems, that multi-server $\mathit{PIR}$ is an ideal tool to provide efficient location privacy in database-driven $\mathit{CRN}$ .

Index Terms:

Database-driven cognitive radio networks, location privacy, dynamic spectrum access, private information retrieval.

I Introduction

The rapid growth of connected wireless devices has dramatically increased the demand for wireless spectrum and led to a serious shortage in spectrum resources. Cognitive radio networks ( $\mathit{CRN}$ s) [1] have emerged as a promising technology for solving this shortage problem by enabling dynamic spectrum access (DSA), which improves the spectrum utilization efficiency by allowing unlicensed/secondary users ( $\mathit{SU}$ s) to exploit unused spectrum bands (aka spectrum holes or white spaces) of licensed/primary users ( $\mathit{PU}$ s).

Currently, two approaches are being adopted to identify these white spaces: spectrum sensing and geolocation spectrum databases. In the spectrum sensing-based approach, $\mathit{SU}$ s need to sense the $\mathit{PU}$ channel to determine whether the channel is available for opportunistic use. The spectrum database-based approach, on the other hand, waives the sensing requirement and instead enables $\mathit{SU}$ s to query a database ( $\mathit{DB}$ ) to learn about spectrum opportunities in their vicinity. This approach, already promoted and adopted by the Federal Communications Commission (FCC), was introduced as a way to overcome the technical hurdles faced by the spectrum sensing-based approaches, thereby enhancing the efficiency of spectrum utilization, improving the accuracy of available spectrum identification, and reducing the complexity of terminal devices [2]. Moreover, it pushes the responsibility and complexity of complying with spectrum policies to $\mathit{DB}$ and eases the adoption of policy changes by limiting updates to just a handful number of databases, as opposed to updating large numbers of devices [3].

FCC has designated nine entities (e.g. Google [4], iconectiv [5], and Microsoft [6]) as TV bands device database administrators which are required to follow the guidelines provided by PAWS (Protocol to Access White Space) standard [3]. PAWS sets guidelines and operational requirements for both the spectrum database and the $\mathit{SU}$ s querying it. These include: $\mathit{SU}$ s need to be equipped with geo-location capabilities, $\mathit{SU}$ s must query $\mathit{DB}$ with their specific location to check channel availability before starting their transmissions, $\mathit{DB}$ must register $\mathit{SU}$ s and manage their access to the spectrum, $\mathit{DB}$ must respond to $\mathit{SU}$ s’ queries with the list of available channels in their vicinity along with the appropriate transmission parameters. As specified by PAWS standard, $\mathit{SU}$ s may be served by several spectrum databases and are required to register to one or more of these databases prior to querying them for spectrum availability. The spectrum databases are reachable via the Internet, and $\mathit{SU}$ s querying these databases are expected to have some form of Internet connectivity[7].

FCC has established a new service in the 3.5 GHz band, known as Citizens Broadband Radio Service (CBRS), in which the spectrum is also managed through a central database-driven $\mathit{CRN}$ , aka spectrum access system (SAS), to enable spectrum sharing between military and federal incumbents and $\mathit{SU}$ s. A separate entity with Environmental Sensing Capability (ESC) is responsible of populating $\mathit{DB}$ s with data regarding $\mathit{PU}$ s that do not wish to reveal their operational information such as their location or transmission characteristics. A similar concept, named licensed shared access (LSA), for the 2.3-3.4 GHz band is also being developed in Europe to enable $\mathit{SU}$ s to opportunistically access spectrum resources in this band owned by incumbent military aircraft services and police wireless communications. A major difference compared to SAS, is that in LSA, $\mathit{PU}$ s are responsible for populating $\mathit{DB}$ s by providing their a priori information; i.e. their activities and, therefore the spectrum availability information, are known upfront [8].

I-A Location Privacy Issues in Database-Driven $\mathit{CRN}$ s

Despite their benefits, database-driven $\mathit{CRN}$ s suffer from serious security and privacy threats. Since they could be seen as a variant of of location based service (LBS), the disclosure of location information of $\mathit{SU}$ s represents the main threat to $\mathit{SU}$ s when it comes to obtaining spectrum availability from $\mathit{DB}$ s. The fine-grained location, when combined with publicly available information, can easily reveal other personal information about an individual including his/her behavior, health condition, personal habits or even beliefs. For instance, an adversary can learn some information about the health condition of a user by observing that the user regularly goes to a hospital for example. The frequency and duration of these visits can even reveal the seriousness of a user illness and even the type of illness if the location corresponds to that of a specialty clinic. Matters get worse when $\mathit{SU}$ s are mobile. As per the PAWS requirements, $\mathit{SU}$ s need to query $\mathit{DB}$ s whenever they change their location by at least 100 meters. This will make $\mathit{SU}$ s constantly share their location as they move which could be exploited by a malicious service provider for tracking purposes.

The location privacy of $\mathit{SU}$ s is not the only privacy concern that database-driven $\mathit{CRN}$ s suffer from. Indeed, the location privacy of $\mathit{PU}$ s may also be critical in $\mathit{CRN}$ systems such as $\mathit{SAS}$ , in the 3.5 GHz CBRS band, and LSA, in the 2.3-2.4 GHz band, where $\mathit{PU}$ s are not commercial but rather military and governmental entities. To achieve efficient spectrum sharing without interference to military and federal incumbents, these systems require $\mathit{PU}$ s, or entities with sensing capabilities such as ESC, to report $\mathit{PU}$ s’ operational data (including their location, frequencies time of use, etc.) to be included in the spectrum databases which may present serious privacy risks to these $\mathit{PU}$ s.

Being aware of such potential privacy threats, both $\mathit{SU}$ s and $\mathit{PU}$ s may refuse to share their sensitive information with $\mathit{DB}$ s, which may present a serious barrier to the adoption of database-based $\mathit{CRN}$ s, and to the public acceptance and promotion of the dynamic spectrum sharing paradigm. Therefore, there is a critical need for developing techniques to protect the location privacy of both $\mathit{PU}$ s and $\mathit{SU}$ s while allowing the latter to harness the benefits of the $\mathit{CRN}$ paradigm without disrupting the functionalities that these techniques are designed for to promote dynamic spectrum sharing.

I-B Research Gap and Objectives

Despite the importance of the location privacy issue in $\mathit{CRN}$ s, only recently has it started to gain interest from the research community [9]. Some works focus on addressing this issue in the context of collaborative spectrum sensing [10, 11, 12, 13, 14]; others address it in the context of dynamic spectrum auction [15]. Protecting $\mathit{SU}$ s’ location privacy in database-driven $\mathit{CRN}$ s is a more challenging task, merely because $\mathit{SU}$ s are required, by protocol design, to provide their physical location to $\mathit{DB}$ to learn about spectrum opportunities in their vicinity. The heterogeneity of wireless devices and the versatility of services relying on the CRN technology [16] could also present some challenges in designing privacy-preserving mechanisms for users in $\mathit{CRN}$ s. In fact, privacy-preserving solutions need to embrace the different resource constraints of each $\mathit{SU}$ device and the various requirements of each service in terms of data rates and delay sensitivities. This makes it hard to leverage general purpose public key encryption-based techniques due to their high cost in terms of computation and communication overheads especially on resource-constrained devices. It is therefore crucial to design cost-effective protocols that offer strong privacy guarantees to users and also adapt to different systems requirements regardless of the constraints of the users.

The existing location privacy preservation techniques for database-driven $\mathit{CRN}$ (e.g., [17, 2, 18, 19, 20, 21]) generally rely on three main lines of privacy preserving technologies, (i) k-anonymity [22], (ii) differential privacy [23] and (iii) single-server Private Information Retrieval ( $\mathit{PIR}$ ) [24]. However, the direct adaptation of k-anonymity based techniques have been shown to yield either insecure or extremely costly results [25]. The solutions adapting differential privacy (e.g., [20]) not only incur a non-negligible overhead, but also introduce a noise over the queries, and therefore they may negatively impact the accuracy of spectrum availability information.

Among these alternatives, single-server $\mathit{PIR}$ seems to be the most popular. $\mathit{PIR}$ technology is a suitable choice for database-driven $\mathit{CRN}$ s, as it permits privacy preserving queries on a public database, and therefore can enable a $\mathit{SU}$ to retrieve spectrum availability information from the database without leaking its location information. However, single-server $\mathit{PIR}$ protocols rely on highly costly partial homomorphic encryption schemes, which need to be executed over the entire database for each query. Indeed, as we also demonstrated with our experiments in Section IV, the execution of a single query even with some of the most efficient single-server $\mathit{PIR}$ schemes [26] takes approximately $20$ seconds with a $80\>Mbps/\>30Mbps$ bandwidth on a moderate size database (e.g., $10^{6}$ entries). An end-to-end delay with the orders of $20$ seconds might be undesirable for spectrum sensing needs of $\mathit{SU}$ s in real-life applications. Also, some of the state-of-the-art efficient computational $\mathit{PIR}$ schemes [27] that are used in the context of $\mathit{CRN}$ s have been shown to be broken [26]. Thus, there is a significant need for practical location privacy preservation approaches for database-driven $\mathit{CRN}$ s that can meet the efficiency and functionality requirements of $\mathit{SU}$ s.

I-C Our Observation and Contribution

The objective of this paper is to develop efficient techniques for database-driven $\mathit{CRN}$ s that preserve the location privacy of $\mathit{SU}$ s during their process of acquiring spectrum availability information. We also try to protect the operational privacy of $\mathit{PU}$ s in systems that require incumbents to provide spectrum availability information to $\mathit{DB}$ s. Specifically, we will aim for the following design objectives: $(i)$ (location privacy of $\mathit{SU}$ s) Preserve the location privacy of $\mathit{SU}$ s, whether fixed or mobile, while allowing them to receive spectrum availability information; $(ii)$ (efficiency and practicality) Incur minimum computation, communication and storage overhead. The cryptographic delay must be minimum to permit fast spectrum availability decision for the $\mathit{SU}$ s, and storage/processing cost must be low to enable practical deployments. $(iii)$ (fault-tolerance and robustness) Mitigate the effects of system failures or misbehaving entities (e.g., colluding databases). $(iv)$ *(location privacy of $\mathit{PU}$ s) * The location information of $\mathit{PU}$ s needs to be protected while still able to provide spectrum availability information to $\mathit{DB}$ s. It is very challenging to meet all of these seemingly conflicting design goals simultaneously.

The main idea behind our proposed approaches is to harness special properties and characteristics of the database-driven $\mathit{CRN}$ systems to employ private query techniques that can overcome the significant performance, robustness and privacy limitations of the state-of-the-art techniques. Specifically, our proposed approach is based on the following observation:

Observation: FCC requires that all of its certified databases synchronize their records obtained through registration procedures with one another [28, 29] and need to be consistent across the other databases by providing exactly the same spectrum availability information, in any region, in response to $\mathit{SU}$ s’ queries [30]. That is, the same copy of spectrum database is available and accessible to the $\mathit{SU}$ s via multiple (distinct) spectrum database administrators/providers. Is it possible exploit this observation to achieve efficiency location preservation techniques for database-driven $\mathit{CRN}$ ?

In practice, as stated in PAWS standard [3], $\mathit{SU}$ s have the option to register to multiple spectrum databases belonging to multiple service providers. Currently, many companies (e.g. Google [4], iconectiv [5], etc) have obtained authorization from FCC to operate geo-location spectrum databases upon successfully complying to regulatory requirements. Several other companies are still underway to acquire this authorization[31]. Thus, it is more natural and realistic to take this fact into consideration when designing privacy preserving protocols for database-based $\mathit{CRN}$ s. Based on this observation, our main contribution is as follows:

Our Contribution: To the best of our knowledge, we are the first to exploit the fact that multiple copies of spectrum $\mathit{DB}$ s are available by nature in database-driven $\mathit{CRN}$ s, and therefore it is possible to harness multi-server $\mathit{PIR}$ techniques [24, 33] that offer information-theoretic privacy with substantial efficiency advantages over single-server $\mathit{PIR}$ . This is achieved by relying on Shamir secret sharing-based techniques to either divide the content of $\mathit{SU}$ s’ queries or the spectrum availability information, or both, among the different $\mathit{DB}$ s to prevent these $\mathit{DB}$ s from inferring $\mathit{SU}$ s’ location from their queries or from learning $\mathit{PU}$ s’ sensitive operational data from the spectrum availability information.

We show, analytically and experimentally with deployments on cloud systems, that our adaptation of multi-server $\mathit{PIR}$ techniques significantly outperforms the state-of-the-art location privacy preservation methods as demonstrated in Table I and detailed in Section IV. Moreover, our adaptations achieve information theoretical privacy while existing alternatives offer only computational privacy. This feature provides an assurance against even post-quantum adversaries [34] and can avoid recent attacks on computational $\mathit{PIR}$ [26].

Notice that, multi-server $\mathit{PIR}$ techniques require the availability of multiple (synchronized) replicas of the database. Therefore, despite their high efficiency and security, they received a little attention from the practitioners. For instance, in traditional data outsourcing settings (e.g., private cloud storage), the application requires a client to outsource only a single copy of its database. The distribution and maintenance of multiple copies of the database across different service providers brings additional architectural and deployment costs, which might not be economically attractive for the client.

In this paper, we showcased one of the first natural use-cases of multi-server $\mathit{PIR}$ , in which the multiple copies of synchronized databases are already available by the original design of application (i.e., spectrum availability information in multi-database $\mathit{CRN}$ s), and therefore multi-server $\mathit{PIR}$ does not introduce any extra overhead on top of the application. Exploiting this synergy between multi-database $\mathit{CRN}$ and multi-server $\mathit{PIR}$ permitted us to provide informational theoretical location privacy for $\mathit{SU}$ s with a significantly better efficiency compared to existing single-server $\mathit{PIR}$ approaches.

Desirable Properties: We outline the desirable properties of our approaches below.

•

Computational efficiency: The adapted approaches are much more efficient than existing location privacy preserving schemes. For instance, as shown in Table I, $\mathit{LP\mathchar 45\relax Chor}$ and $\mathit{LP\mathchar 45\relax Goldberg}$ are more than $3$ orders of magnitudes faster than the schemes proposed by Troja et al. [18, 19], and $10$ times faster than XPIR [26] and $\mathit{PriSpectrum}$ [2].

•

Information Theoretical Privacy Guarantees: They can achieve information-theoretic privacy which is the optimal privacy level that could be reached as opposed to computational privacy guarantees offered by existing approaches. In fact some of these approaches are prone to recent attacks on computational- $\mathit{PIR}$ protocols [26] and are not secure against post-quantum adversaries [34].

•

Low communication overhead: Our approaches incur a reasonable communication overhead that is a middle ground between the fastest computational $\mathit{PIR}$ [26] and the most communication efficient computational $\mathit{PIR}$ [35].

•

Fault-Tolerance and Robustness: Our proposed approaches are resilient to the issues that are associated with multi-server architectures: failures, byzantine behavior, and collusion. Even though the collusion of all of the service providers is unlikely to happen due to the competing nature of these companies and due to regulatory enforcement from bodies such as FCC to protect users’ data, we have however considered collusion in our system and security model. All proposed approaches can handle collusion of multiple $\mathit{DB}$ s up to certain limit that is different for each approach. In addition, some of the proposed approaches can also handle faulty and byzantine $\mathit{DB}$ s. Besides, simply hacking $\mathit{DB}$ s, when the proposed approaches are in place, will not be sufficient to learn users’ information since some of these protocols offer hybrid privacy protection by combining both computational and information-theoretic $\mathit{PIR}$ protocols enabling them to offer computational privacy even when all of the $\mathit{DB}$ s are compromised.

•

Experimental evaluation on actual cloud platforms: We deploy our proposed approaches on a real cloud platform, GENI [36], to show their feasibility. In our experiment, we create multiple geographically distributed VMs each playing the role of a $\mathit{DB}$ . A laptop plays the role of a $\mathit{SU}$ that queries $\mathit{DB}$ s, i.e. VM s. Our experiments confirm the superior computational advantages of the adoption of multi-server $\mathit{PIR}$ over the existing alternatives.

I-D Differences Compared to the Preliminary Version

The main differences between this paper and its preliminary versions [37, 38] are as follows: (i) We further consider the location privacy issue of mobile $\mathit{SU}$ s and offer a way to amortize the cost incurred by mobility. (ii) We also leverage multi-server $\mathit{PIR}$ to address the location privacy issue of $\mathit{PU}$ s in database- $\mathit{CRN}$ systems that require $\mathit{PU}$ s to provide spectrum availability to $\mathit{DB}$ s. (iii) We discuss also a way to reduce the cost of $\mathit{LP\mathchar 45\relax Chor}$ by partitioning the spectrum database instead of simply replicating it using the RAID-PIR protocol [39] and we discuss the privacy-performance tradeoff of relying on such approach. (iv) We provide a more detailed performance evaluation that takes into account the latest advances in $\mathit{PIR}$ technology, namely SealPIR [32] which relies on fully homomorphic encryption.

II Preliminaries and Models

II-A Notation and Building Blocks

We summarize our notations in Table II. Our adaptations of multi-server $\mathit{PIR}$ rely on the following building blocks.

Private Information Retrieval ( $\bm{\mathit{PIR}{}}$ ): $\mathit{PIR}$ allows a user to retrieve a data item of its choice from a database, while preventing the server owning the database from gaining information on the identity of the item being retrieved [40]. One trivial solution to this problem is to make the server send an entire copy of the database to the querying user. Obviously, this is a very inefficient solution to the $\mathit{PIR}$ problem as its communication complexity may be prohibitively large. However, it is considered as the only protocol that can provide information-theoretic privacy, i.e. perfect privacy, to the user’s query in single-server setting. There are two main classes of $\mathit{PIR}$ protocols according to their privacy level: information-theoretic $\mathit{PIR}$ ( $\mathit{itPIR}$ ) and computational $\mathit{PIR}$ ( $\mathit{cPIR}$ ).

•

Information-theoretic or multi-server $\mathit{PIR}$ : It guarantees information-theoretic privacy to the user, i.e. privacy against computationally unbounded servers. This could be achieved efficiently only if the database is replicated at $k\geq 2$ non-communicating servers [24, 33]. The main idea behind these protocols consists on decomposing each user’s query into several sub-queries to prevent leaking any information about the user’s intent.

•

Computational or single-server $\mathit{PIR}$ : It guarantees privacy against computationally bounded server(s). In other words, a server cannot get any information about the identity of the item retrieved by the user unless it solves a certain computationally hard problem (e.g. prime factorization of large numbers), which is common in modern cryptography. Thus, they offer weaker privacy than their $\mathit{itPIR}$ counterparts [27, 41].

Shamir Secret Sharing: This is a concept introduced by Shamir et al. [42] to allow a secret holder to divide its secret $\mathcal{S}$ into $\mathit{\ell}$ shares $\mathcal{S}{}_{1},\cdots,\mathcal{S}{}_{\mathit{\ell}{}}$ and distribute these shares to $\mathit{\ell}$ parties. In $(\mathit{t}{},\mathit{\ell}{})$ -Shamir secret sharing, where $\mathit{t}{}<\mathit{\ell}{}$ , if $\mathit{t}$ or fewer combine their shares, they learn no information about $\mathcal{S}$ . However, if more than $\mathit{t}$ come together, they can easily recover $\mathcal{S}$ . Given a secret $\mathcal{S}$ chosen arbitrarily form a finite field, the $(\mathit{t}{},\mathit{\ell}{})$ -Shamir secret sharing scheme works as follows: the secret holder chooses $\mathit{\ell}$ arbitrary non-zero distinct elements $\alpha_{1},\cdots,\alpha_{\mathit{\ell}{}}\in\mathbb{F}$ . Then, it selects $\mathit{t}$ elements $\sigma_{1},\cdots,\sigma_{\mathit{t}{}}\in\mathbb{F}$ uniformly at random. Finally, the secret holder constructs the polynomial $f(x)=\sigma_{0}+\sigma_{1}x+\sigma_{2}x^{2}+\cdots+\sigma_{t}x^{t}$ , where $\sigma_{0}=\mathcal{S}{}$ . The $\mathit{\ell}$ shares $\mathcal{S}{}_{1},\cdots,\mathcal{S}{}_{\mathit{\ell}{}}$ , that are given to each party, are $(\alpha_{1},f(\alpha_{1})),\cdots,(\alpha_{\mathit{\ell}{}},f(\alpha_{\mathit{\ell}{}}))$ . Any $\mathit{t}{}+1$ or more parties can recover the polynomial $f$ using Lagrange interpolation and thus they can reconstruct the secret $\mathcal{S}{}=f(0)$ . However, $\mathit{t}$ or less parties can learn nothing about $\mathcal{S}$ . In other words, if $\mathit{t}{}+1$ shares of $\mathcal{S}$ are available then $\mathcal{S}$ can be easily recovered.

II-B System Model and Security Definitions

We consider a database-driven $\mathit{CRN}$ that contains $\mathit{\ell}$ $\mathit{DB}$ s, where $\mathit{\ell}{}\geq 2$ , and a $\mathit{SU}$ registered to these $\mathit{DB}$ s to learn spectrum availability information in its vicinity. We assume that these $\mathit{DB}$ s share the same content and that they are synchronized as mandated by PAWS standard [3]. We also assume that $\mathit{DB}$ s may collude in order to infer $\mathit{SU}$ ’s location. In the following, we present our security definitions.

Definition 1.

Byzantine* $\mathit{DB}$ : This is a faulty $\mathit{DB}$ that runs but produces incorrect answers, possibly chosen maliciously or computed in error. This might be due to a corrupted or obsolete copy of the database caused by a synchronization problem with the other $\mathit{DB}$ s.*

Definition 2.

$\mathit{t}$ -private* $\mathit{PIR}$ : The privacy of the query is information-theoretically protected, even if up to $\mathit{t}$ of the $\mathit{\ell}$ $\mathit{DB}$ s collude, where $0<\mathit{t}{}<\mathit{\ell}{}$ .*

Definition 3.

$\bm{\mathit{\vartheta}{}}$ -Byzantine-robust* $\mathit{PIR}$ : Even if $\mathit{\vartheta}$ of the responding $\mathit{DB}$ s are Byzantine, $\mathit{SU}$ can reconstruct the correct database item, and determine which of the $\mathit{DB}$ s provided incorrect response.*

Definition 4.

$\bm{\mathit{k}{}}$ -out-of- $\bm{\mathit{\ell}{}}$ * $\mathit{PIR}$ : $\mathit{SU}$ can reconstruct the correct record if it receives at least $\mathit{k}$ -out-of- $\mathit{\ell}$ responses, $2\leq\mathit{k}{}\leq\mathit{\ell}{}$ .*

Definition 5.

Robust* $\mathit{PIR}$ : It can deal with $\mathit{DB}$ s that do not respond to $\mathit{SU}$ ’s queries and allows $\mathit{SU}$ to reconstruct the correct output of the queries in this situation.*

Definition 6.

$\bm{\tau}$ -independent* $\mathit{PIR}$ : The content of the database itself is information theoretically protected from the coalition of up to $\tau$ $\mathit{DB}$ s, where $0\leq\tau<k-\mathit{t}{}$ .*

III Proposed Approaches

In the proposed approaches, we tailor multi-server $\mathit{PIR}$ to the context of multi- $\mathit{DB}$ $\mathit{CRN}$ s. We start by illustrating the structure of the spectrum database that we consider. Then, we give several approaches, each adapts a multi-server $\mathit{PIR}$ protocol with different security, performance properties, and use cases. We model the content of each $\mathit{DB}$ as an $\mathit{r}{}\times\mathit{s}{}$ matrix $\bm{\mathit{D}}$ of size $\mathit{n}{}$ bits, where $\mathit{s}{}$ is the number of words of size $\mathit{w}$ in each record/block of the database and $\mathit{r}{}$ is the number of records in the database, i.e. $\mathit{r}{}=\mathit{n}{}/\mathit{b}{}$ , where $\mathit{b}{}=\mathit{s}{}\times\mathit{w}{}$ is the block size in bits. The $k^{th}$ row of $\bm{\mathit{D}}$ is the $k^{th}$ record of the database.

[TABLE]

We further assume that each row of the database corresponds to a unique combination of the tuple $(\mathit{l_{x}}{},\mathit{l_{y}}{},\mathit{C}{},\mathit{ts}{})$ , where $\mathit{l_{x}}$ and $\mathit{l_{y}}$ represent one location’s latitude and longitude, respectively, $\mathit{C}$ is a channel number, and $\mathit{ts}$ is a time-stamp. We also assume that $\mathit{SU}$ s can associate their location information with the index $\mathit{\beta}$ of the corresponding record of interest in the database using some inverted index technique that is agreed upon with $\mathit{DB}$ s. An $\mathit{SU}$ that wishes to retrieve record $\bm{\mathit{D}}{}_{\mathit{\beta}}{}$ without any privacy consideration can simply send to $\mathit{DB}$ a row vector $\bm{e}_{\mathit{\beta}}{}$ consisting of all zeros except at position $\mathit{\beta}$ where it has the value $1$ . Upon receiving $\bm{e}_{\mathit{\beta}}{}$ , $\mathit{DB}$ multiplies it with $\bm{\mathit{D}}$ and sends record $\bm{\mathit{D}}{}_{\mathit{\beta}}{}$ back to $\mathit{SU}$ as we illustrate below:

[TABLE]

This trivial approach makes it easy for $\mathit{DB}$ s to learn $\mathit{SU}$ ’s location from the vector $\bm{e}_{\mathit{\beta}}{}$ as $\bm{\mathit{D}}$ is indexed based on location. In the following we present two approaches that try to hide the content of $\bm{e}_{\mathit{\beta}}{}$ from $\mathit{DB}$ s, and thus preserve $\mathit{SU}$ ’s location privacy. The approaches present a tradeoff between efficiency, and some additional security features.

III-A Location Privacy with Chor ( $\mathit{LP\mathchar 45\relax Chor}$ )

Our first approach, termed $\mathit{LP\mathchar 45\relax Chor}$ , harnesses the simple and efficient $\mathit{itPIR}$ protocol proposed by Chor et al. [24]. We describe the different steps of $\mathit{LP\mathchar 45\relax Chor}$ in Algorithm 1 and highlight these steps in Figure 1. Elements of $\bm{\mathit{D}}$ in this scheme belong to $GF(2)$ , i.e. $\mathit{w}{}=1$ bit and $\mathit{b}{}=\mathit{s}{}$ .

In $\mathit{LP\mathchar 45\relax Chor}$ , $\mathit{SU}$ starts by invoking the inverted index subroutine $InvIndex(\mathit{l_{x}}{},\mathit{l_{y}}{},\mathit{C}{},\mathit{ts}{})$ which takes as input the coordinates of the user, its channel of interest, and a time-stamp and returns a value $\mathit{\beta}$ . This value corresponds to the index of the record $\bm{\mathit{D}}{}_{\mathit{\beta}}{}$ of $\bm{\mathit{D}}$ that $\mathit{SU}$ is interested in. $\mathit{SU}$ then constructs $\bm{e}_{\mathit{\beta}}{}$ , which is a standard basis vector $\bm{\overrightarrow{1}_{\mathit{\beta}}{}}\in\mathbb{Z}^{\mathit{r}}{}$ having [math] everywhere except at position $\mathit{\beta}{}$ which has the value $1$ as we discussed previously. $\mathit{SU}$ also picks $\mathit{\ell}{}-1$ $\mathit{r}$ -bit binary strings $\bm{\mathit{\rho}}{}_{1},\cdots,\bm{\mathit{\rho}}{}_{\mathit{\ell}{}-1}$ uniformly at random from $GF(2)^{\mathit{r}}{}$ , and computes $\bm{\mathit{\rho}}{}_{\mathit{\ell}{}}=\bm{\mathit{\rho}}{}_{1}\oplus\cdots\oplus\bm{e}_{\mathit{\beta}}{}$ . Finally, $\mathit{SU}$ sends $\bm{\mathit{\rho}}{}_{i}$ to $\mathit{DB}{}_{i}$ , for $1\leq i\leq\mathit{\ell}{}$ . Upon receiving the bit-string $\bm{\mathit{\rho}}{}_{i}=\bm{\mathit{\rho}}{}_{i1}\oplus\cdots\bm{\mathit{\rho}}{}_{i\mathit{r}{}}$ of length $\mathit{r}{}$ , $\mathit{DB}{}_{i}$ computes $\bm{\mathit{R}}{}_{i}=\bm{\mathit{\rho}}{}_{i}\cdot\bm{\mathit{D}}{}$ , which could be seen also as the XOR of those blocks $\bm{\mathit{D}}{}_{j}$ in $\bm{\mathit{D}}{}$ for which the $j^{th}$ bit of $\bm{\mathit{\rho}}{}_{i}$ is $1$ , then sends $\bm{\mathit{R}}{}_{i}$ back to $\mathit{SU}$ . $\mathit{SU}$ receives $\bm{\mathit{R}}{}_{i}$ s from $\mathit{DB}{}_{i}$ s, $1\leq i\leq\mathit{\ell}{}$ , and computes $\bm{\mathit{R}}{}_{1}\oplus\cdots\oplus\bm{\mathit{R}}{}_{\mathit{\ell}{}}=(\bm{\mathit{\rho}}{}_{1}\oplus\cdots\oplus\bm{\mathit{\rho}}{}_{\mathit{\ell}{}})\cdot\bm{\mathit{D}}{}=\bm{e}_{\mathit{\beta}}{}\cdot\bm{\mathit{D}}{}$ , which is the $\mathit{\beta}{}^{th}$ block of the database that $\mathit{SU}$ is interested in, from which it can retrieve the spectrum availability information.

$\mathit{LP\mathchar 45\relax Chor}$ is very efficient thanks to its reliance on simple XOR operations only as we discuss in Section IV. It is also $(\mathit{\ell}{}-1)$ -private, by Definition 2, as collusion of up to $\mathit{\ell}{}-1$ $\mathit{DB}$ s cannot enable them to learn $\bm{e}_{\mathit{\beta}}{}$ , and consequently its location. In fact, only if $\mathit{\ell}$ $\mathit{DB}$ s collude, then they will be able to learn $\bm{e}_{\mathit{\beta}}{}$ by simply XORing their $\{\bm{\mathit{\rho}}{}_{i}\}_{i=1}^{\mathit{\ell}{}}$ . However this approach suffers from two main drawbacks. First, it is not robust since even if one $\mathit{DB}$ fails to respond, $\mathit{SU}$ will not be able to recover $\bm{\mathit{D}}{}_{\mathit{\beta}}{}$ . Second, it is not byzantine robust; if one or more $\mathit{DB}$ s return a wrong response, $\mathit{SU}$ will reconstruct a wrong block and also will not be able to recognize which $\mathit{DB}$ misbehaved so as not to rely on it for future queries. In Section III-B we discuss a second approach that improves on these two aspects but with some additional overhead.

III-B Location Privacy with Goldberg ( $\mathit{LP\mathchar 45\relax Goldberg}$ )

Our second approach, termed $\mathit{LP\mathchar 45\relax Goldberg}$ , is based on Goldberg’s $\mathit{itPIR}$ protocol [33] which uses Shamir secret sharing to hide $\bm{e}_{\mathit{\beta}}{}$ , i.e. $\mathit{SU}$ ’s query. It is a modification of Chor’s scheme [24] to achieve both robustness and byzantine robustness. Rather than working over $GF(2)$ (binary arithmetic), this scheme works over a larger field $\mathbb{F}$ , where each element can represent $w$ bits. The database $\bm{\mathit{D}}{}=(\mathit{w}{}_{jk})\in\mathbb{F}^{\mathit{r}{}\times\mathit{s}{}}$ in this scheme, is an $\mathit{r}{}\times\mathit{s}{}$ matrix of elements of $\mathbb{F}=GF(2^{w})$ . Each row represents one block of size $\mathit{b}{}$ bits, consisting of $\mathit{s}{}$ words of $\mathit{w}{}$ bits each. Again, $\bm{\mathit{D}}{}$ is replicated among $\mathit{\ell}{}$ databases $\mathit{DB}{}_{i}$ . We summarize the main steps of $\mathit{LP\mathchar 45\relax Goldberg}$ protocol in Algorithm 2 and illustrate them in Figure 2.

To determine the index $\mathit{\beta}$ of the record that corresponds to its location, $\mathit{SU}$ starts by invoking the subroutine $InvIndex(\mathit{l_{x}}{},\mathit{l_{y}}{},\mathit{C}{},\mathit{ts}{})$ then constructs the standard basis vector $\bm{e}_{\mathit{\beta}}{}\in\mathbb{F}^{r}$ as explained earlier. $\mathit{SU}$ then uses $(\mathit{\ell}{},\mathit{t}{})$ -Shamir secret sharing to divide the vector $\bm{e}_{\mathit{\beta}}{}$ into $\mathit{\ell}{}$ independent shares $(\alpha_{1},,\bm{\mathit{\rho}{}}_{1})\cdots,(\alpha_{\mathit{\ell}}{},\bm{\mathit{\rho}}{}_{\mathit{\ell}}{})$ to ensure a $\mathit{t}{}$ -private $\mathit{PIR}$ protocol as in Definition 2. That is, $\mathit{SU}$ chooses $\mathit{\ell}{}$ distinct non-zero elements $\alpha_{i}\in\mathbb{F}^{*}$ and creates $\mathit{r}{}$ random degree- $\mathit{t}{}$ polynomials $f_{1},\cdots,f_{\mathit{r}}{}$ satisfying $f_{j}(0)=\bm{e}_{\beta}[j]$ . $\mathit{SU}$ then sends to each $\mathit{DB}{}_{i}$ its share corresponding to the vector $\bm{\mathit{\rho}}{}_{i}=\langle f_{1}(\alpha_{i}),\cdots,f_{r}(\alpha_{i})\rangle$ . Each $\mathit{DB}{}_{i}$ then computes the product $\bm{\bm{\mathit{R}}{}}_{i}=\bm{\mathit{\rho}{}}_{i}\cdot\bm{\mathit{D}}{}=\langle\sum_{j}f_{j}(\alpha_{i})\bm{\mathit{w}{}}_{j1},\cdots,\sum_{j}f_{j}(\alpha_{i})\bm{\mathit{w}{}}_{js}\rangle\in\mathbb{F}^{s}$ and sends $\bm{\bm{\mathit{R}}{}}_{i}$ to $\mathit{SU}$ .

Some $\mathit{DB}$ s may fail to respond to $\mathit{SU}$ ’s query and only $\mathit{k}$ -out-of- $\mathit{\ell}$ send their responses to $\mathit{SU}$ . $\mathit{SU}$ collects $\mathit{k}$ responses from the $\mathit{k}$ responding $\mathit{DB}$ s and tries to recover the record at index $\beta$ from the $\bm{\mathit{R}}{}_{i}$ s by using the EasyRecover() subroutine from [33] which uses Lagrange interpolation to recover $\bm{\mathit{D}}{}_{\mathit{\beta}}{}$ from the secret shares $(\alpha_{1},\bm{\mathit{R}}{}_{1}),\cdots,(\alpha_{\mathit{k}}{},\bm{\mathit{R}}{}_{\mathit{k}{}})$ . This is possible thanks to the use of $(\mathit{\ell}{},\mathit{t}{})$ -Shamir secret sharing as long as $\mathit{k}{}>\mathit{t}{}$ and these $\mathit{k}$ $\mathit{DB}$ s are honest. In fact, by the linearity property of Shamir secret sharing, since $\{(\alpha_{i},\bm{\mathit{\rho}{}}_{i})\}_{i=1}^{\ell}$ is a set of $(\mathit{\ell}{},\mathit{t}{})$ -Shamir secret shares of $\bm{e}_{\beta}$ , then $\{(\alpha_{i},\bm{\mathit{R}}{}_{i})\}_{i=1}^{\ell}$ will be also a set of $(\mathit{\ell}{},\mathit{t}{})$ -Shamir secret shares of $\bm{e}_{\beta}\cdot\bm{\mathit{D}}{}$ , which is the $\beta^{th}$ block of the database. Thus, it is possible for $\mathit{SU}$ to reconstruct $\bm{\mathit{D}}{}_{\mathit{\beta}}{}$ using Lagrange interpolation as explained in Section II, by relying only on the $\mathit{k}$ responses which makes $\mathit{LP\mathchar 45\relax Goldberg}$ robust by Definition 5. Also, the EasyRecover can detect the $\mathit{DB}$ s that responded honestly, thus those that are byzantine as well, which should discourage $\mathit{DB}$ s from misbehaving. More details about this subroutine could be found in [33].

Moreover, $\mathit{\vartheta}{}$ $\mathit{DB}$ s among the $\mathit{k}$ responding ones may even be byzantine, as in Definition 1, and produce incorrect response. In that case, it would be impossible for $\mathit{SU}$ to simply rely on Lagrange interpolation to recover the correct responses. Since Shamir secret sharing is based on polynomial interpolation, the problem of recovering the response in the case of byzantine failures corresponds to noisy polynomial reconstruction, which is exactly the problem of decoding Reed-Solomon codes [43]. Thus, $\mathit{SU}$ would rather rely on error correction codes and more precisely on the Guruswami-Sudan list decoding [44] algorithm which can correct $\mathit{\vartheta}{}<\mathit{k}{}-\lfloor\sqrt{\mathit{k}{}\mathit{t}{}}\rfloor$ incorrect responses. In fact, the vector $\langle\bm{\bm{\mathit{R}}{}}_{1}[q],\bm{\bm{\mathit{R}}{}}_{2}[q],\cdots,\bm{\bm{\mathit{R}}{}}_{\ell}[q]\rangle$ is a Reed-Solomon code-word encoding the polynomial $g_{q}=\sum_{j}f_{j}\bm{\mathit{w}{}}_{jq}$ , and the client wishes to compute $g_{q}(0)$ for each $1\leq q\leq\mathit{s}{}$ to recover all the $\mathit{s}$ words forming the record $\bm{\mathit{D}}{}_{\mathit{\beta}}{}=\langle g_{1}(0),\cdots,g_{\mathit{s}}{}(0)\rangle$ . This is done through the HardRecover() subroutine from [33]. This makes $\mathit{LP\mathchar 45\relax Goldberg}$ also $\mathit{\vartheta}$ -Byzantine-robust, by Definition 3, and solves the robustness issues that $\mathit{LP\mathchar 45\relax Chor}$ suffers from, however, this comes at the cost of an additional overhead as we discuss in Section IV.

Corollary 1.

$\mathit{LP\mathchar 45\relax Chor}$ * and $\mathit{LP\mathchar 45\relax Goldberg}$ directly inherit the security properties of Chor’s [24] $\mathit{PIR}$ and Goldberg’s [33] $\mathit{PIR}$ respectively.*

III-C Location Privacy of Mobile $\mathit{SU}$ s Through Batching

Thus far, we concerned only about non-mobile $\mathit{SU}$ s that periodically submit an individual query to $\mathit{DB}$ s to learn spectrum availability in their fixed location. However, things get more interesting with mobility. In fact, a mobile $\mathit{SU}$ will need to query $\mathit{DB}$ s multiple times as its location changes. While the previous two approaches perform well for non-mobile $\mathit{SU}$ s, they will incur a significant overhead on both $\mathit{SU}$ and $\mathit{DB}$ s especially when $\mathit{SU}$ is moving at a relatively high speed, which will require a large number of $\mathit{PIR}$ queries.

Our third approach aims to protect the location privacy of mobile $\mathit{SU}$ s while reducing the mobility-associated overhead. The idea is to exploit the fact that a mobile $\mathit{SU}$ usually has an a priori knowledge of its trajectory to make it query $\mathit{DB}$ s for its current and future locations by batching these queries together instead of sending them separately. We achieve this by relying on the $\mathit{itPIR}$ protocol of Lueks et al.[45] that extends the scheme of Goldberg [33] to support batching of the queries using fast matrix multplication mechanisms inspired from batch codes [46]. We refer to this approach as $\mathit{LP\mathchar 45\relax BatchPIR}$ and we describe it in the following.

Each $\mathit{DB}{}_{i}$ that receives $\mathit{q}{}$ simultaneous queries $\bm{\mathit{\rho}}{}_{i}^{(1)},\cdots,\bm{\mathit{\rho}}{}_{i}^{(\mathit{q}{})}$ from an $\mathit{SU}$ can process them using $\mathit{LP\mathchar 45\relax Goldberg}$ by simply multiplying each query with $\bm{\mathit{D}}$ as illustrated in Step 10 of Algorithm 2. Alternatively, it can also group these queries into a matrix $\bm{\mathit{Q}}{}_{i}$ of size $\mathit{q}{}\times\mathit{r}{}$ , where each row $j$ corresponds to a query $\bm{\mathit{\rho}}{}_{i}^{(j)}$ , before computing the matrix product $\bm{\mathit{Q}}{}_{i}\cdot\bm{\mathit{D}}{}$ . The careful reader will notice that this naive multiplication method would cost around $2qrs$ operations (including multiplications and additions) which can be prohibitively expensive especially for a large $\bm{\mathit{D}}$ or $\mathit{q}$ . This problem boils down to a fast matrix multiplication problem and therefore can benefit from fast matrix multiplication algorithms such as Strassen’s [47].

Strassen’s algorithm consists on simply dividing both matrices $\bm{\mathit{Q}}{}_{i}$ and $\bm{\mathit{D}}$ into four equally sized block matrices. Then instead of naively multiplying these submatrices, which will result in $8$ submatrix multiplications (fundamentally equivalent to simple matrix multiplication), Strassen’s algorithm creates linear combinations of blocks in a way that reduces the number of submatrix multiplications to $7$ . The exact approach is then applied recursively to the multiplications of the submatrices of the previous step. This simple yet powerful matrix multiplication technique will significantly reduce the overhead for $\mathit{DB}$ s and therefore the delay that $\mathit{SU}$ s experience to learn spectrum availability while moving as illustrated in Section IV.

A row $j$ in the resulting matrix, $\bm{\mathcal{R}_{i}}=\bm{\mathit{Q}}{}_{i}\cdot\bm{\mathit{D}}{}$ , corresponds to $\mathit{DB}{}_{i}$ ’s response to the $j^{th}$ query. $\mathit{SU}$ will then recover the spectrum availability by combining same-index rows of the different $\bm{\mathcal{R}_{i}}$ s as in $\mathit{LP\mathchar 45\relax Goldberg}$ .

III-D Location Privacy of $\mathit{PU}$ s

As we mentioned earlier, in database-driven $\mathit{CRN}$ s, $\mathit{DB}$ s’ content comprises operational information of $\mathit{PU}$ s which may be very sensitive in systems such as $\mathit{SAS}$ in the 3.5 GHz CBRS band where $\mathit{PU}$ s are military and governmental entities. The service providers use this operational data to feed their models and populate the spectrum databases with availability information but do not share the $\mathit{PU}$ s’ location information in response to $\mathit{SU}$ s’ queries. Therefore, $\mathit{SU}$ s do not present a serious threat to $\mathit{PU}$ s privacy as opposed to the service providers which could be malicious, and could misuse $\mathit{PU}$ s’ sensitive operational data.

In this subsection, we present another approach to take into account the privacy of these $\mathit{PU}$ s as well. For this we make use of another extension of the Goldberg $\mathit{PIR}$ scheme known as $\tau$ -independence, to prevent $\mathit{DB}$ s from learning the content of $\bm{\mathit{D}}$ even if up to $\tau$ $\mathit{DB}$ s collude to learn $\bm{\mathit{D}}$ as defined in Definition 6. This is achieved by making $\mathit{PU}$ s populate the $\mathit{DB}$ s with spectrum availability information pertaining to their respective channels instead of the service providers, by secretly sharing each record they want to add, among the different service providers using Shamir secret sharing techniques, similar to how $\mathit{SU}$ s secretly share their queries. That way, each service provider will not be able to decode this data, and only $\mathit{SU}$ s which have access to the secret can retrieve the record by combining the different shares from the different DBs. This is motivated by the fact that $\mathit{DB}$ s are expected to be populated by $\mathit{PU}$ s themselves as it is the case in LSA systems, or by a highly trusted independent entity, the ESC, as in $\mathit{SAS}$ systems. Therefore, whenever a $\mathit{PU}$ or an ESC submits a $\mathit{PU}$ activity record of index $j$ to $\mathit{DB}$ s it will divide it into $\mathit{s}$ words $W_{j1},\cdots,W_{j\mathit{s}{}}$ and distributes Shamir secret shares of every word among the $\ell$ $\mathit{DB}$ s as reflected in Algorithm 3. Each $\mathit{DB}{}_{i}$ will now have a different content $\bm{\mathit{D}}{}^{(i)}$ :

[TABLE]

where $\{\mathit{w}{}^{(i)}_{jc}\}_{1\leq i\leq\mathit{\ell}{}}$ form a $(\tau,\mathit{\ell}{})$ -Shamir secret sharing of word $W_{jc}$ . This requires that the random values $\alpha_{i}$ s, used to create Shamir secret shares as explained in Section II-A, are shared beforehand among $\mathit{SU}$ s and $\mathit{PU}$ s. This could be done by FCC during the registration phase, for instance, and must not be communicated to $\mathit{DB}$ s.

This way, records revealing operational data of $\mathit{PU}$ s, which could be used by $\mathit{DB}$ s to build knowledge of the activity of these $\mathit{PU}$ s and track them, are information-theoretically protected from $\mathit{DB}$ s as long as no more than $\tau$ of these $\mathit{DB}$ s collude. However, for this protocol to work, this condition must hold: $0<\mathit{t}{}\leq\mathit{t}{}+\tau<k\leq\mathit{\ell}{}$ . While this extension of $\mathit{LP\mathchar 45\relax Goldberg}$ should have no impact on the performance from $\mathit{SU}$ s and $\mathit{DB}$ s side as we show in Section IV, it has, however, an impact on the t-privacy of the protocol. In fact as the $\tau$ -independence level, controlling how many $\mathit{DB}$ s can collude to learn the record submitted by $\mathit{PU}$ , sought by $\mathit{PU}$ increases, the maximum achievable t-privacy level will decrease since $\mathit{t}{}+\tau<k$ must always hold.

III-E Location Privacy of $\mathit{SU}$ s in Partitioned-database $\mathit{CRN}$ s

In this section, we present another location privacy-preserving approach for $\mathit{SU}$ s in the case where the spectrum database content is distributed among the different $\mathit{DB}$ s instead of simply replicating it as in the previous approaches. This could be motivated by the fact that some database-driven $\mathit{CRN}$ s may have multiple $\mathit{DB}$ s covering different or slightly overlapping regions. It could also be a way to reduce cost by making each $\mathit{DB}$ manage a portion of the database.

For that we rely on the RAID-PIR protocol due to Demmler et al. [39] which builds on Chor’s scheme to reduce the communication overhead and the computation required at the server side. The idea here is very similar to that of Chor’s but here the vector $\bm{e}_{\mathit{\beta}}{}$ is divided into $\mathit{\ell}$ chunks. Each query $q_{i}$ sent to $\mathit{DB}{}_{i}$ is divided into $\mathit{\pi}$ chunks as illustrated in Figure 3, where $\mathit{\pi}$ is a redundancy parameter that controls the minimum number of $\mathit{DB}$ s that need to collude to recover the record $\bm{\mathit{D}}{}_{\mathit{\beta}}{}$ with $2\leq\mathit{\pi}{}\leq\mathit{\ell}{}$ . This parameter also controls the number of chunks in every query and how often the chunks overlap throughout these queries [39].

The details of this approach are described in Algorithm 4. To optimize the cost, $\mathit{SU}$ can use a pseudo random generator, $PRG$ , to generate the $\mathit{\pi}{}-1$ chunks of $q_{i}$ as illustrated in Algorithm 4. For that, $\mathit{SU}$ randomly generates $\mathit{\ell}$ seeds $s_{1},\cdots,s_{\mathit{\ell}}{}$ of size $\kappa$ bits each, where $\kappa$ is the symmetric security parameter, and expands each seed $s_{i}$ into $\mathit{\pi}{}-1$ random chunks $rnd_{i}[j]$ , using $PRG$ , each of size $\frac{r}{\mathit{\ell}{}}$ as depicted in step 5 of Algorithm 4. The first chunk of query $q_{i}$ , denoted as $f_{i}$ , is computed to cancel out the $\mathit{\pi}{}-1$ other $i^{th}$ chunks $rnd_{i}[j]$ of each of the other $\mathit{DB}$ s, if applicable, and is obtained by xoring those $\mathit{\pi}{}-1$ chunks with the $i^{th}$ chunk of $\bm{e}_{\mathit{\beta}}{}$ . Thanks to the use of the $PRG$ , $\mathit{SU}$ does not need to send the whole query and needs only to send a compacted version of $q_{i}$ , denoted as $q^{\prime}_{i}$ , composed of $f_{i}$ and the seed $s_{i}$ , used to generate the other chunks of the full query $q_{i}$ , to $\mathit{DB}{}_{i}$ . Then, $\mathit{DB}{}_{i}$ will use the same pseudo-random generator, $PRG$ , with the seed that it received to generate the full query $q_{i}$ . Once $q_{i}$ recovered, $\mathit{DB}{}_{i}$ will construct its answer $\bm{\mathit{R}}{}_{i}$ by xoring the records in $\bm{\mathit{D}}$ whose indices match those of the set bits in $q_{i}$ . Finally, $\mathit{SU}$ needs only to xor the results from the different $\mathit{DB}$ s to recover the $\mathit{\beta}{}^{th}$ record.

As the size of the query $q_{i}$ is just $\mathit{\pi}{}/\mathit{\ell}{}\cdot\mathit{r}{}$ , each $\mathit{DB}$ now needs to store and process only $\mathit{\pi}{}/\mathit{\ell}{}\cdot\mathit{r}{}$ records of $\bm{\mathit{D}}$ which will be beneficial to $\mathit{DB}$ s especially if the number of these databases increases.

IV Evaluation and Analysis

IV-A Analytical Comparison

We start by studying the proposed approaches’ performance analytically and we compare them to existing approaches. For $\mathit{LP\mathchar 45\relax Goldberg}$ , we choose $\mathit{w}{}=8$ to simplify the cost of computations as in [43]; since in $GF(2^{8})$ , additions are XOR operations on bytes and multiplications are lookup operations into a $64$ KB table [43]. We summarize the system communication complexity and the computation incurred by both $\mathit{DB}$ and $\mathit{SU}$ and we illustrate the difference in architecture and privacy level of the different approaches in Table III. As we mentioned earlier, existing research focuses on the single $\mathit{DB}$ setting. We compare the proposed approaches to existent techniques despite the difference of architecture to show the great benefits that multi-server $\mathit{PIR}$ brings in terms of performance and privacy as we discuss next. We briefly discuss these approaches in the following.

Gao et al. [2] propose a $\mathit{PIR}$ -based approach, termed $\mathit{PriSpectrum}$ , that relies on the $\mathit{PIR}$ scheme of Trostle et al. [27] to defend against the new attack that they identify. This new attack exploits spectrum utilization pattern to localize $\mathit{SU}$ s. Troja et al. [18, 19] propose two other $\mathit{PIR}$ -based approaches that try to minimize the number of $\mathit{PIR}$ queries by either allowing $\mathit{SU}$ s to share their availability information with other $\mathit{SU}$ s [18] or by exploiting trajectory information to make $\mathit{SU}$ s retrieve information for their current and future positions in the same query [19].

Despite their merit in providing location privacy to $\mathit{SU}$ s these $\mathit{PIR}$ -based approaches incur high overhead especially in terms of computation. This is due to the fact that they rely on $\mathit{cPIR}$ protocols to provide location privacy to $\mathit{SU}$ s, which are known to suffer from expensive computational cost. In fact, answering an $\mathit{SU}$ ’s query through a $\mathit{cPIR}$ protocol, requires $\mathit{DB}$ to process all of its records, otherwise $\mathit{DB}$ would learn that $\mathit{SU}$ is not interested in them and would then learn partial information about the record $\bm{\mathit{D}}{}_{\mathit{\beta}}{}$ , and consequently $\mathit{SU}$ ’s location. This makes the computational cost of most $\mathit{cPIR}$ based location preserving schemes linear on the database size from $\mathit{DB}$ side as we illustrate in Table III. Now this is not exclusive to $\mathit{cPIR}$ protocols as even $\mathit{itPIR}$ protocols may require processing all the records to guarantee privacy, however, the main difference with $\mathit{cPIR}$ protocols is that the latter have a very large cost per bit in the database, usually involving expensive group operations like multiplication modulo a large modulus [26] as opposed to multi-server $\mathit{itPIR}$ protocols. This could be seen clearly in Table III as both $\mathit{LP\mathchar 45\relax Chor}$ and $\mathit{LP\mathchar 45\relax Goldberg}$ require $\mathit{DB}$ to perform a very efficient XOR operation per bit of the database. The same applies to the overhead incurred by $\mathit{SU}$ which only performs XOR operations in both $\mathit{LP\mathchar 45\relax Chor}$ and $\mathit{LP\mathchar 45\relax Goldberg}$ , while performing expensive modular multiplications and even exponentiations over large primes in the $\mathit{cPIR}$ -based approaches.

In terms of communication overhead, the proposed approaches incur a cost that is linear in the number of records $\mathit{r}{}$ and their size $\mathit{b}{}$ . As an optimal choice of these parameters is usually $\mathit{r}{}=\mathit{b}{}=\sqrt{\mathit{n}{}}$ [24, 33, 43, 26] then this cost could be seen as $\mathcal{O}(\sqrt{\mathit{n}{}\mathit{w}{}})$ to retrieve a record of size $\sqrt{\mathit{n}{}\mathit{w}{}}$ bits, which is a reasonable cost for an information theoretic privacy.

Moreover, as illustrated in Table III, existent approaches fail to provide information theoretic privacy as the underlying security relies on computational $\mathit{PIR}$ schemes. The only approaches that provide information theoretic location privacy are $\mathit{LP\mathchar 45\relax Chor}$ , $\mathit{LP\mathchar 45\relax Goldberg}$ , and $RAID\mathchar 45\relax$$\mathit{LP\mathchar 45\relax Chor}$ which are $(\mathit{\ell}{}-1)$ -private, $\mathit{t}$ -private, and ( $\mathit{\pi}{}-1$ )-private respectively, by Definition 2. It is worth mentioning that $\mathit{PriSpectrum}$ [2] relies on the well-known $\mathit{cPIR}$ of Trostle et al. [27] representing the state-of-the-art in efficient $\mathit{cPIR}$ . However, this $\mathit{cPIR}$ scheme has been broken [26, 48]. Since the security of $\mathit{PriSpectrum}$ follows that of Trostle et al. [27] broken $\mathit{cPIR}$ , then $\mathit{PriSpectrum}$ fails to provide the privacy objective that it was designed for. However, we include it in our performance analysis for completeness.

IV-B Experimental Evaluation

We further evaluate the performance of the proposed schemes experimentally to confirm the analytical observations.

Hardware setting and configuration. We have deployed the proposed approaches on GENI [36] cloud platform using the percy++ library [49]. We have created $6$ virtual machines (VMs), each playing the role of a $\mathit{DB}$ and they all share the same copy of $\bm{\mathit{D}}$ . We deploy these GENI VMs in different locations in the US to count for the network delay and make our experiment closer to the real case scenario where spectrum service providers are located in different locations. These VMs are running Ubuntu $14.04$ , each having $8$ GB of RAM, $15$ GB SSD, and $4$ vCPUs, Intel Xeon X5650 $2.67$ GHz or Intel Xeon E5-2450 $2.10$ GHz. To assess the $\mathit{SU}$ overhead we use a Lenovo Yoga 3 Pro laptop with $8$ GB RAM running Ubuntu $16.10$ with an Intel Core m Processor 5Y70 CPU $1.10$ GHz. The client laptop communicates with the remote VMs through ssh tunnels. We are also aware of the advances in $\mathit{cPIR}$ technology, and more precisely the fastest $\mathit{cPIR}$ protocols in the literature: XPIR which is proposed by Aguilar et al.[26] and SealPIR due to Angel et al. [32]. We include these protocols in our experiment to illustrate how multi-server $\mathit{PIR}$ performs against the best known $\mathit{cPIR}$ schemes if they are to be deployed in $\mathit{CRN}$ s. We use the available implementation of these protocols provided in [50] and [51] and we deploy their server components on a remote GENI VM while the client component is deployed on the Lenovo Yoga 3 Pro laptop.

Dataset. Spectrum service providers (e.g. Google, Microsoft, etc) offer graphical web interfaces and APIs to interact with their databases allowing to retrieve basic spectrum availability information for a user-specified location. Access to full data from real spectrum databases was not possible, thus, we generated random data for our experiment. The generated data consists of a matrix that models the content of the database, $\bm{\mathit{D}}$ , with a fixed block size $\mathit{b}{}=560$ B while varying the number of records $\mathit{r}{}$ . The value of $\mathit{b}{}$ is estimated based on the public raw data provided by FCC [52] on a daily basis and which service providers use to populate their spectrum databases.

Results and Comparison. We first measure the query end-to-end delay of the proposed approaches and plot the results in Figure 4. We also include the delay introduced by the existing schemes based on our estimation of the operations included in Table III. The end-to-end delay that we measure takes into consideration the time needed by $\mathit{SU}$ to generate the query, the network delay, the time needed by $\mathit{DB}$ to process the query, and finally the time needed by $\mathit{SU}$ to extract the $\mathit{\beta}{}^{th}$ record of the database. We consider two different internet speed configurations in our experiment. We first rely on a high-speed internet connection of $80Mbps$ on the download and $30Mbps$ on the upload for all compared approaches. Then we use a low-speed internet connection of $1Mbps$ on the upload and download to assess the impact of the bandwidth on $\mathit{LP\mathchar 45\relax Chor}$ and $\mathit{LP\mathchar 45\relax Goldberg}$ , and also on XPIR as well.

Figure 4 shows that the proposed schemes perform much better than the existing approaches in terms of delay even with low-speed internet connection. They also perform better than the fastest existing $\mathit{cPIR}$ protocols XPIR and SealPIR. This shows the benefit of relying on multi-server $\mathit{itPIR}$ in multi- $\mathit{DB}$ $\mathit{CRN}$ s. Also, and as expected, $\mathit{LP\mathchar 45\relax Chor}$ scheme performs better than $\mathit{LP\mathchar 45\relax Goldberg}$ thanks to its simplicity. As we will see later, $\mathit{LP\mathchar 45\relax Goldberg}$ also incurs larger communication overhead than $\mathit{LP\mathchar 45\relax Chor}$ as well. This could be acceptable knowing that $\mathit{LP\mathchar 45\relax Goldberg}$ can handle collusion of up-to $\mathit{\ell}$ $\mathit{DB}$ s, and is robust in the case of $(\mathit{\ell}{}-\mathit{k}{})$ non-responding $\mathit{DB}$ s, and $\mathit{\vartheta}$ byzantine $\mathit{DB}$ s, as opposed to $\mathit{LP\mathchar 45\relax Chor}$ . This means that $\mathit{LP\mathchar 45\relax Goldberg}$ could be more suitable to real world scenario as failures and byzantine behaviors are common in reality. Figure 4 also shows that the network bandwidth has a significant impact on the end-to-end latency. This is due to the relatively large amount of data that needs to be exchanged during the execution of these protocols which requires higher internet speeds.

We also compare the computational complexity experienced by each $\mathit{SU}$ and $\mathit{DB}$ separately in the different approaches as shown in Table III. We further illustrate this through experimentation and we plot the results in Figure 5(a), which shows that the proposed schemes incur lower overhead on the $\mathit{SU}$ than the existing approaches. The same observation applies to the computation experienced by each $\mathit{DB}$ which again involves only efficient XOR operations in the proposed schemes. We illustrate this in Figure 5(b).

We also study the impact of non-responding $\mathit{DB}$ s on the end-to-end delay experienced by the $\mathit{SU}$ in $\mathit{LP\mathchar 45\relax Goldberg}$ as illustrated in Figure 6. This Figure shows that as the number of faulty $\mathit{DB}$ s increases, the end-to-end delay decreases since $\mathit{SU}$ needs to process fewer shares to recover the record $\bm{\mathit{D}}{}_{\mathit{\beta}}{}$ . As opposed to $\mathit{LP\mathchar 45\relax Chor}$ , in $\mathit{LP\mathchar 45\relax Goldberg}$ , $\mathit{SU}$ is still able to recover the record $\mathit{\beta}$ even if only $\mathit{k}$ out-of- $\mathit{\ell}$ $\mathit{DB}$ s respond. Please recall also that our experiment was performed on resource constrained VMs to emulate $\mathit{DB}$ s, however in reality, $\mathit{DB}$ s should have much more powerful computational resources than those of the used VMs which will have a tremendous impact on further reducing the overhead of the proposed approaches.

Figure 7 illustrates the impact of $\mathit{SU}$ ’s desired privacy level in $\mathit{LP\mathchar 45\relax Goldberg}$ on the processing time incurred by both $\mathit{SU}$ and $\mathit{DB}$ s. As expected, increasing the value of $\mathit{t}$ , which controls the number of $\mathit{DB}$ s that can collude without inferring the content of the query, should not have any impact on each $\mathit{DB}$ as they will always perform the same operations regardless of the privacy level. However, since the results sent by $\mathit{DB}$ s could also be considered as a $(\mathit{t}{},\mathit{\ell}{})$ -Shamir secret sharing of the retrieved record, when $\mathit{t}$ increases, then the number of secret shares required to recover the record increases which will result in more computation for the $\mathit{SU}$ when performing Lagrange interpolation over higher degree- $\mathit{t}$ polynomials.

We further study the impact of the number of byzantine $\mathit{DB}$ s on the processing time on $\mathit{SU}$ side in $\mathit{LP\mathchar 45\relax Goldberg}$ as depicted in Figure 8. As expected, having more byzantine $\mathit{DB}$ s will increase the complexity of decoding the different shares, that $\mathit{SU}$ receives from $\mathit{DB}$ s, using the relatively expensive HardRecover subroutine from [33].

As for $\tau$ - $\mathit{LP\mathchar 45\relax Goldberg}$ , the $\tau$ -independence extension will have no impact on the processing time of $\mathit{DB}$ s and should also have no impact on $\mathit{SU}$ s as long as $\mathit{t}{}+\tau$ is constant. This means that both $\mathit{PU}$ s and $\mathit{SU}$ s will always seek the maximum privacy levels for their data and queries such that $\mathit{t}{}+\tau<k$ . This is reflected in Figure 9. However the processing time will be linear in $\mathit{t}{}+\tau$ similar to Figure 7(a).

As for the case of mobile $\mathit{SU}$ s, we compare the performance of batching multiple queries for the future locations of a $\mathit{SU}$ to that of sending separate consecutive queries using $\mathit{LP\mathchar 45\relax Goldberg}$ , SealPIRand,and XPIR as depicted in Figure 10. Using batching mainly reduces the computation on $\mathit{DB}$ s side and will reduce the end-to-end delay for answering the queries of the moving $\mathit{SU}$ .

We also demonstrate the benefit of relying on $RAID\mathchar 45\relax$$\mathit{LP\mathchar 45\relax Chor}$ and partitioning the database content among $\mathit{DB}$ s, instead of simply replicating it, on the $\mathit{DB}$ s’ side for several values of the redundancy parameter $\mathit{\pi}$ . As expected, $\mathit{\pi}{}=2$ yields the best performance however it also offers the lowest level of resistance to collusion. Setting $\mathit{\pi}{}$ to be equal to $\mathit{\ell}$ will is equivalent to the original scheme $\mathit{LP\mathchar 45\relax Chor}$ and will have the best performance. Therefore, $RAID\mathchar 45\relax$$\mathit{LP\mathchar 45\relax Chor}$ offers a performance-privacy tradeoff that is controlled by the redundancy parameter $\mathit{\pi}{}$ .

In terms of communication overhead, most of the approaches, including ours, have linear cost in the number of records in the database as shown in Table III. What really makes a difference between these schemes’ communication overheads is the associated constant factor which could be very large for some protocols. Based on our experiment and the expressions displayed in Table III, we plot in Figure 12, the communication overhead that the $\mathit{CRN}$ experiences for each private spectrum availability query issued by $\mathit{SU}$ for the different schemes. The scheme with the lowest communication overhead is that of Troja et al. [19] especially for a large number of records thanks to the use of Gentry et al. $\mathit{PIR}$ [35] which is the most communication efficient single-server protocol in the literature having a constant communication overhead. However this scheme is computationally expensive just like most of the existing $\mathit{cPIR}$ -based approaches as we show in Figure 4. $RAID\mathchar 45\relax$$\mathit{LP\mathchar 45\relax Chor}$ is the second best scheme in terms of communication overhead followed by $\mathit{LP\mathchar 45\relax Chor}$ , but they also provide information theoretic privacy. As shown in Figure 12, $RAID\mathchar 45\relax$$\mathit{LP\mathchar 45\relax Chor}$ is significantly more efficient than $\mathit{LP\mathchar 45\relax Chor}$ , which again shows the benefit, in terms of overhead, of distributing the spectrum availability information among multiple $\mathit{DB}$ s. As shown in Figure 12, $\mathit{LP\mathchar 45\relax Chor}$ incurs much lower communication overhead than $\mathit{LP\mathchar 45\relax Goldberg}$ thanks to the simplicity of the underlying Chor $\mathit{PIR}$ protocol. However, as we discussed earlier, $\mathit{LP\mathchar 45\relax Goldberg}$ provides additional security features compared to $\mathit{LP\mathchar 45\relax Chor}$ . SealPIR has a relatively high communication overhead especially for smaller database size but its overhead becomes comparable to that of $\mathit{LP\mathchar 45\relax Chor}$ when the database’s size gets larger as shown in Figure 12. This could be a good alternative to the $\mathit{cPIR}$ schemes used in the context of $\mathit{CRN}$ s especially that it introduces much lower latency which is critical in the context of $\mathit{CRN}$ s. Still, the proposed approaches have better performance and also provide information-theoretic privacy to $\mathit{SU}$ s, which shows their practicality in real world.

V Related Work

There are other approaches that address the location privacy issue in database-driven $\mathit{CRN}$ s. However, for the below mentioned reasons we decided not to consider them in our performance analysis. For instance, Zhang et al. [17] rely on the concept of k-anonymity to make each $\mathit{SU}$ queries $\mathit{DB}$ by sending a square cloak region that includes its actual location. k-anonymity guarantees that $\mathit{SU}$ ’s location is indistinguishable among a set of $k$ points. This could be achieved through the use of dummy locations by generating $k-1$ properly selected dummy points, and performing $k$ queries to $\mathit{DB}$ , using the real and dummy locations. Their approach relies on a tradeoff between providing high location privacy level and maximizing some utility. This makes it suffer from the fact that achieving a high location privacy level results in a decrease in spectrum utility. However, k-anonymity-based approaches cannot achieve high location privacy without incurring substantial communication/computation overhead. Furthermore, it has been shown in a recent study led by Sprint and Technicolor [25] that anonymization based techniques are not efficient in providing location privacy guarantees, and may even leak some location information. Grissa et al [54, 21] propose an information theoretic approach which could be considered as a variant of the trivial $\mathit{PIR}$ solution. They achieve this by using set-membership probabilistic data structures/filters to compress the content of the database and send it to $\mathit{SU}$ which then needs to try several combinations of channels and transmission parameters to check their existence in the data structure. However, LPDB is only suitable for situations where the structure of the database is known to $\mathit{SU}$ s which is not always realistic. Also, LPDB relies on probabilistic data structures which makes it prone to false positives that can lead to erroneous spectrum availability decision and cause interference to $\mathit{PU}$ ’s transmission. Zhang et al. [20] rely on the $\epsilon$ -geo-indistinguishability mechanism [55], derived from differential privacy to protect bilateral location privacy of both $\mathit{PU}$ s and $\mathit{SU}$ s, which is different from what we try to achieve in this paper. This mechanism helps $\mathit{SU}$ s obfuscate their location, however, it introduces noise to $\mathit{SU}$ ’s location which may impact the accuracy of the spectrum availability information retrieved.

VI Conclusion

In this paper, with the key observation that database-driven $\mathit{CRN}$ s contain multiple synchronized $\mathit{DB}$ s having the same content, we harnessed multi-server $\mathit{PIR}$ techniques to achieve an optimal location privacy for both $\mathit{SU}$ s and $\mathit{PU}$ s and for different use cases with high efficiency. Our analytical and experimental analysis indicates that our adaptation of multi-server $\mathit{PIR}$ for database-driven $\mathit{CRN}$ s achieve magnitudes of time faster end-to-end delay compared to the fastest state-of-the-art single-server $\mathit{PIR}$ adaptation with an information theoretical privacy guarantee. Given the demonstrated benefits of multi-server $\mathit{PIR}$ approaches without incurring any extra architectural overhead on database-driven $\mathit{CRN}$ s, we hope this work will provide an incentive for the research community to consider this direction when designing location privacy preservation protocols for $\mathit{CRN}$ s.

Acknowledgment

This work was supported in part by the US National Science Foundation under NSF awards CNS-1162296 and CNS-1652389

Bibliography55

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] J. Mitola and G. Q. Maguire, “Cognitive radio: making software radios more personal,” IEEE personal comm. , vol. 6, no. 4, pp. 13–18, 1999.
2[2] Z. Gao, H. Zhu, Y. Liu, M. Li, and Z. Cao, “Location privacy in database-driven cognitive radio networks: Attacks and countermeasures,” in INFOCOM, 2013 Proceedings IEEE , 2013, pp. 2751–2759.
3[3] V. Chen, S. Das, L. Zhu, J. Malyar, and P. Mc Cann, “Protocol to access white-space (paws) databases,” Tech. Rep., 2015.
4[4] “Google spectrum database,” https://www.google.com/get/spectrumdatabase/ , accessed: 2017-04-14.
5[5] “iconectiv white spaces database,” https://spectrum.iconectiv.com/main/home/ , accessed: 2017-04-14.
6[6] “Microsoft white spaces database,” http://whitespaces.microsoftspectrum.com/ , accessed: 2017-04-14.
7[7] A. Mancuso, S. Probasco, and B. Patil, “Protocol to access white-space (paws) databases: Use cases and requirements,” Tech. Rep., 2013.
8[8] M. Massaro, “Next generation of radio spectrum management: Licensed shared access for 5g,” Telecommunications Policy , vol. 41, no. 5-6, pp. 422–433, 2017.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Location Privacy in Cognitive Radios with Multi-Server Private Information Retrieval

Abstract

Index Terms:

I Introduction

I-A Location Privacy Issues in Database-Driven CRN\mathit{CRN}CRNs

I-B Research Gap and Objectives

I-C Our Observation and Contribution

I-D Differences Compared to the Preliminary Version

II Preliminaries and Models

II-A Notation and Building Blocks

II-B System Model and Security Definitions

Definition 1**.**

Definition 2**.**

Definition 3**.**

Definition 4**.**

Definition 5**.**

Definition 6**.**

III Proposed Approaches

III-A Location Privacy with Chor (LP\mathchar45Chor\mathit{LP\mathchar 45\relax Chor}LP\mathchar45Chor)

III-B Location Privacy with Goldberg (LP\mathchar45Goldberg\mathit{LP\mathchar 45\relax Goldberg}LP\mathchar45Goldberg)

Corollary 1**.**

III-C Location Privacy of Mobile SU\mathit{SU}SUs Through Batching

III-D Location Privacy of PU\mathit{PU}PUs

III-E Location Privacy of SU\mathit{SU}SUs in Partitioned-database CRN\mathit{CRN}CRNs

IV Evaluation and Analysis

IV-A *Analytical Comparison *

IV-B Experimental Evaluation

V Related Work

VI Conclusion

Acknowledgment

I-A Location Privacy Issues in Database-Driven $\mathit{CRN}$ s

Definition 1.

Definition 2.

Definition 3.

Definition 4.

Definition 5.

Definition 6.

III-A Location Privacy with Chor ( $\mathit{LP\mathchar 45\relax Chor}$ )

III-B Location Privacy with Goldberg ( $\mathit{LP\mathchar 45\relax Goldberg}$ )

Corollary 1.

III-C Location Privacy of Mobile $\mathit{SU}$ s Through Batching

III-D Location Privacy of $\mathit{PU}$ s

III-E Location Privacy of $\mathit{SU}$ s in Partitioned-database $\mathit{CRN}$ s

IV-A Analytical Comparison