Utility-aware and privacy-preserving mobile query services

Emre Yigitoglu; Mehmet Emre Gursoy; Ling Liu

arXiv:1907.06778·cs.CR·July 17, 2019

Utility-aware and privacy-preserving mobile query services

Emre Yigitoglu, Mehmet Emre Gursoy, Ling Liu

PDF

Open Access

TL;DR

This paper introduces StarCloak, a novel privacy-preserving system for mobile location queries on road networks that balances user privacy, utility, and attack resilience, while ensuring scalability and efficiency.

Contribution

StarCloak is the first to integrate user-defined privacy, utility constraints, and attack-resilience using star-based cloaking graphs on road networks.

Findings

01

StarCloak improves query success rate and throughput.

02

It reduces anonymization time and network usage.

03

StarCloak demonstrates higher attack-resilience compared to existing methods.

Abstract

Location-based queries enable fundamental services for mobile road network travelers. While the benefits of location-based services (LBS) are numerous, exposure of mobile travelers' location information to untrusted LBS providers may lead to privacy breaches. In this paper, we propose StarCloak, a utility-aware and attack-resilient approach to building a privacy-preserving query system for mobile users traveling on road networks. StarCloak has several desirable properties. First, StarCloak supports user-defined k-user anonymity and l-segment indistinguishability, along with user-specified spatial and temporal utility constraints, for utility-aware and personalized location privacy. Second, unlike conventional solutions which are indifferent to underlying road network structure, StarCloak uses the concept of stars and proposes cloaking graphs for effective location cloaking on road…

Tables1

Table 1. TABLE I : Default parameter settings used in our experiments

Parameter	$k$	$δ_{k}$	$δ_{l}$	$σ_{s}$	$σ_{t}$	$γ$	$λ$	$α$
Mean	$5$	$5$	$5$	$4$	$10$	$20$	$1$	$2$
Deviation	$1$	$1.5$	$1.5$	$1$	$2$	$2$	$0$	$0$

Equations31

B V (S) = {v ∣ \exists w \in (V_{G} ∖ V_{S}) s.t. (v, w) \in E_{G}}

B V (S) = {v ∣ \exists w \in (V_{G} ∖ V_{S}) s.t. (v, w) \in E_{G}}

like [S ∣ u \leftarrow s, K_{a d}] = \frac{∣ S ^{'} \cap S ∣}{∣ S ∣}

like [S ∣ u \leftarrow s, K_{a d}] = \frac{∣ S ^{'} \cap S ∣}{∣ S ∣}

like^{c} [S ∣ u \leftarrow s, K_{a d}] = i = 1 \sum ((k - 1 S)) Pr [s] \cdot Pr [m_{i}] \cdot like [S ∣ u \leftarrow s, m_{i}, K_{a d}]

like^{c} [S ∣ u \leftarrow s, K_{a d}] = i = 1 \sum ((k - 1 S)) Pr [s] \cdot Pr [m_{i}] \cdot like [S ∣ u \leftarrow s, m_{i}, K_{a d}]

link [u \leftarrow s^{*} ∣ S, K_{a d}] = \frac{like ^{c} [ S ∣ u \leftarrow s ^{*} , K _{a d} ]}{\sum _{s \in S} like ^{c} [ S ∣ u \leftarrow s , K _{a d} ]}

link [u \leftarrow s^{*} ∣ S, K_{a d}] = \frac{like ^{c} [ S ∣ u \leftarrow s ^{*} , K _{a d} ]}{\sum _{s \in S} like ^{c} [ S ∣ u \leftarrow s , K _{a d} ]}

R (q, s) \subseteq O_{s} (q, s) \cup O_{v} (q, v_{b}^{s}) \cup O_{v} (q, v_{e}^{s})

R (q, s) \subseteq O_{s} (q, s) \cup O_{v} (q, v_{b}^{s}) \cup O_{v} (q, v_{e}^{s})

R (q, S) \subseteq (\cup_{s \in S} O_{s} (q, s)) \cup (\cup_{v \in B V (S)} O_{v} (q, v))

R (q, S) \subseteq (\cup_{s \in S} O_{s} (q, s)) \cup (\cup_{v \in B V (S)} O_{v} (q, v))

cos t_{e v a l} (q, S) = C_{s} \cdot ∣ S ∣ + C_{v} \cdot ∣ B V (S) ∣

cos t_{e v a l} (q, S) = C_{s} \cdot ∣ S ∣ + C_{v} \cdot ∣ B V (S) ∣

∣ R (q, S) ∣ \leq r es_s i z e \cdot ∣ B V (S) ∣ + s \in S \sum e \in s \sum ∣ O_{e} (q, e) ∣

∣ R (q, S) ∣ \leq r es_s i z e \cdot ∣ B V (S) ∣ + s \in S \sum e \in s \sum ∣ O_{e} (q, e) ∣

cos t_{co mm} (q, S) = C_{o} \cdot [r es_s i z e \cdot ∣ B V (S) ∣ + ρ_{o} \cdot s \in S \sum e \in s \sum ∣ e ∣]

cos t_{co mm} (q, S) = C_{o} \cdot [r es_s i z e \cdot ∣ B V (S) ∣ + ρ_{o} \cdot s \in S \sum e \in s \sum ∣ e ∣]

cos t (q, S) = β \cdot cos t_{co mm} (q, S) + (1 - β) \cdot cos t_{e v a l} (q, S)

cos t (q, S) = β \cdot cos t_{co mm} (q, S) + (1 - β) \cdot cos t_{e v a l} (q, S)

ϕ min

ϕ min

\forall s \in A S, \existsΦ \in ϕ, s \leftarrow Φ

⟨ δ_{k}^{v}, δ_{l}^{v}, σ_{s}^{v} ⟩ := ⟨ q \in v . Q max δ_{k}^{q}, q \in v . Q max δ_{l}^{q}, q \in v . Q min σ_{s}^{q} ⟩

⟨ δ_{k}^{v}, δ_{l}^{v}, σ_{s}^{v} ⟩ := ⟨ q \in v . Q max δ_{k}^{q}, q \in v . Q max δ_{l}^{q}, q \in v . Q min σ_{s}^{q} ⟩

\forall v\in NS:~{}~{}\Big{[}\delta_{k}^{v}\leq\sum_{\hat{v}\in NS}|\hat{v}.Q|\Big{]}~{}~{}\land~{}~{}\Big{[}\delta_{l}^{v}\leq|seg(\bigcap_{\hat{v}\in NS}\hat{v}.\varTheta)|\Big{]}

\forall v\in NS:~{}~{}\Big{[}\delta_{k}^{v}\leq\sum_{\hat{v}\in NS}|\hat{v}.Q|\Big{]}~{}~{}\land~{}~{}\Big{[}\delta_{l}^{v}\leq|seg(\bigcap_{\hat{v}\in NS}\hat{v}.\varTheta)|\Big{]}

ϑ = v \in N S ⋂ v . Θ

ϑ = v \in N S ⋂ v . Θ

H (S) = - s \in S \sum link [u \leftarrow s] \cdot lo g_{2} (link [u \leftarrow s])

H (S) = - s \in S \sum link [u \leftarrow s] \cdot lo g_{2} (link [u \leftarrow s])

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Vehicular Ad Hoc Networks (VANETs) · Internet Traffic Analysis and Secure E-voting

Full text

Utility-Aware and Privacy-Preserving

Mobile Query Services

Emre Yigitoglu, Mehmet Emre Gursoy, and Ling Liu Emre Yigitoglu, Mehmet Emre Gursoy, and Ling Liu are with the School of Computer Science, Georgia Institute of Technology, Atlanta, GA, 30332.

E-mail: {eyigitoglu,memregursoy}@gatech.edu, [email protected] received X; revised Y.

Abstract

Location-based queries enable fundamental services for mobile road network travelers. While the benefits of location-based services (LBS) are numerous, exposure of mobile travelers’ location information to untrusted LBS providers may lead to privacy breaches. In this paper, we propose StarCloak, a utility-aware and attack-resilient approach to building a privacy-preserving query system for mobile users traveling on road networks. StarCloak has several desirable properties. First, StarCloak supports user-defined $k$ -user anonymity and $l$ -segment indistinguishability, along with user-specified spatial and temporal utility constraints, for utility-aware and personalized location privacy. Second, unlike conventional solutions which are indifferent to underlying road network structure, StarCloak uses the concept of stars and proposes cloaking graphs for effective location cloaking on road networks. Third, StarCloak achieves strong attack-resilience against replay and query injection-based attacks through randomized star selection and pruning. Finally, to enable scalable query processing with high throughput, StarCloak makes cost-aware star selection decisions by considering query evaluation and network communication costs. We evaluate StarCloak on two real-world road network datasets under various privacy and utility constraints. Results show that StarCloak achieves improved query success rate and throughput, reduced anonymization time and network usage, and higher attack-resilience in comparison to XStar, its most relevant competitor.

Index Terms:

Privacy, location privacy, location-based services, road networks, mobile query services

1 Introduction

The growth of location-based services (LBSs) is fueled by ubiquitous wireless connectivity, universal presence of smart mobile devices with multi-modal sensing capability, and increased investments from industry and government on the Internet of Things. Juniper Research [1] forecasted the LBS market to reach $43.3 billion in revenue in 2019, rising from an estimated$ 12.2 billion in 2014. [2] reports that 74% of adult smartphone owners use their phones to get direction or information based on their current location. As more and more mobile travelers and vehicles are connected continuously and automatically, they are embraced by life-enriching location-based experiences and services, including but not limited to improved emergency assistance, real-time traffic alerts, and location recommendations.

While there is ongoing research in answering queries and providing services for mobile users traveling on road networks [3, 4, 5, 6], users’ location privacy poses an important concern. Unauthorized location exposure may cause vulnerability for abuse such as unwanted advertisement, stalking, and location spoofing. In addition, when private location data of a mobile user is linked to sensitive public locations such as health clinics, cancer treatment centers, nightclubs or religious organizations, such unauthorized linkage may cause ethical, professional, and social risks both to individuals and the society at large. As a result, it becomes imperative to protect road network travelers’ location privacy as they interact with third-party LBS providers via service queries.

One viable approach to protecting location privacy of road network travelers is location anonymization through obfuscating or cloaking the mobile user’s actual location. A practical anonymization framework should consider multiple aspects. First, the road network structure must be taken into account during anonymization, both for effective privacy protection and efficient query processing with anonymized locations. Second, the framework should support user-defined, personalized privacy goals such as $k$ -user anonymity and $l$ -segment distinguishability. Third, anonymization should incur as little utility loss as possible; in particular, if there are any utility constraints such as maximum spatial cloaking region size or maximum tolerable time delay in query response, the anonymization framework should satisfy these constraints. These enable the anonymization framework to be flexible in serving users with different privacy and utility needs. Fourth, the anonymization framework should be resilient to replay or query injection-based inference attacks. A sophisticated adversary who observes a cloaked region should not be able to infer the user’s true location. Finally, the anonymization framework should have low communication and IO cost, i.e., anonymized cloaked locations should be compact enough to be sent through a wireless network without much network overhead, and they should be usable without increased processing effort.

To meet these goals, we propose and develop StarCloak, a utility-aware and attack-resilient approach to building a privacy-preserving location query system for mobile users traveling on road networks. StarCloak relies on optimized data structures and algorithms for effectively and efficiently determining cloaked regions for incoming queries, such as the star, star graph, and cloaking graph data structures. StarCloak maintains its internal data structures as new queries are processed, and generates candidate star-sets as cloaked regions when it identifies that certain users’ queries can be successfully served. StarCloak’s candidate star-set pruner, which is implemented with high parallelism, enables pruning of candidate star-sets to generate low-cost cloaked subgraphs with improved attack-resilience via randomized pruning. In addition, we also propose two variants of StarCloak, namely spatially bounded StarCloak and hybrid StarCloak, for generating more compact cloaked regions with negligible sacrifice in query success rate and throughput.

We evaluate StarCloak and its variants through extensive experiments on real-world Georgia and California road networks of different scales, under varying privacy and utility constraints. We also compare StarCloak with two baseline anonymization approaches (random sampling and network expansion) as well as XStar [7], which is the most relevant work to ours from the literature. Results show that StarCloak offers significantly improved query success rate and throughput. Furthermore, compared to XStar, StarCloak achieves substantially reduced anonymization time, network bandwidth usage, and improved resilience to inference attacks.

2 StarCloak Overview and Concepts

StarCloak can be viewed as a trusted location anonymization service. It forms a middle layer between mobile users and their untrusted LBS providers. Assume that user Alice issues a service query while she is moving on a road segment. Without StarCloak, Alice’s device directly sends her query with her true current location to an untrusted LBS provider, which executes the query based on Alice’s location and sends the results to Alice’s device. However, if Alice is using StarCloak, StarCloak will first compute an anonymized location for Alice and replace her true location with the anonymized location transparently from Alice, before the query is sent to the untrusted LBS provider.

Figure 1 illustrates the reference architecture. Let $q$ denote the original query of mobile user $u$ . When $u$ issues query $q$ with their true location, the location and query are intercepted by the location anonymization engine. The engine transforms $u$ ’s true location to a cloaked location $S$ while meeting the personalized privacy and utility profile of $u$ . Next, the engine relays the anonymized location and query to the LBS provider. The LBS provider computes a candidate result, and the candidate result is received by the location anonymization engine. Since the cloaked location often has lower resolution than the actual location to meet privacy goals, the candidate result received by the anonymization engine may contain false positives. The anonymization engine performs post-processing of results to filter false positives. Finally, the anonymization engine delivers the exact query answer to $u$ .

This section presents an overview of StarCloak and describes its privacy, utility, attack, and cost models. StarCloak assumes that mobile users travel on spatially constrained road networks or walk paths. Thus, we first introduce the basic models for road networks, location privacy, inference attacks, and query costs, followed by our problem statement combining these models.

2.1 Road Network Model

We represent a road network as an undirected graph $G=\langle V_{G},E_{G}\rangle$ with the node set $V_{G}$ denoting road junctions and edge set $E_{G}$ denoting road segments, respectively. Each road segment connects a pair of junctions. Figure 2 illustrates a road network. We use $d_{G}(v)$ to denote the degree of a node $v$ with respect to $G$ , $d_{G}(v)=|\{w|(v,w)\in E_{G}\}|$ . We call $v$ an intersection node if $d_{G}(v)\geq 3$ . For example, in Figure 2, $v_{5}$ is an intersection node.

An anonymized location in the road network can be represented as a subgraph. Border nodes are nodes that connect a subgraph $S$ to the remainder of the main graph $G$ .

Definition 1 (Subgraph).

$S$ * is a subgraph of road network $G$ , denoted by $S=\langle V_{S},E_{S}\rangle$ , if and only if $V_{S}\subset V_{G}$ and $E_{S}\subset E_{G}$ .*

Definition 2 (Border Node).

Let $S$ denote a subgraph of $G$ . The set of border nodes of $S$ , denoted $BV(S)$ , are nodes in both $S$ and $G$ but have edges that are in $E_{G}$ but not in $E_{S}$ . Formally:

[TABLE]

Equivalently, border nodes are those nodes $v$ in $S$ that satisfy the condition: $d_{G}(v)>d_{S}(v)$ . As an example, we can construct a subgraph $S$ in Figure 2 as: $V_{S}=\{v_{2},v_{4},v_{5},v_{6},v_{7},v_{10}\}$ and $E_{S}=\{(v_{4},v_{5}),(v_{5},v_{10}),(v_{5},v_{6}),(v_{6},v_{7}),(v_{2},v_{6})\}$ . Then, it holds that: $BV(S)=\{v_{2},v_{4},v_{7},v_{10}\}$ . A sequence of edges $(v_{0},v_{1}),\ldots,(v_{i},v_{i+1}),\ldots,(v_{L-1},v_{L})$ , where all $v_{i}$ are unique and satisfy the conditions $d_{G}(v_{0})\geq 3$ , $d_{G}(v_{L})\geq 3$ , and $d_{G}(v_{i})=2$ for $0<i<L$ , constitute a segment denoted by $\overline{v_{0}v_{L}}$ .

2.2 Utility-Aware Location Privacy Model

StarCloak enforces location privacy for mobile travelers while considering privacy and utility metrics simultaneously. It supports personalized location $k$ -user anonymity and $l$ -segment indistinguishability, such that instead of using a system-supplied fixed $k$ or $l$ for all users and queries [8], it achieves high versatility via user-specified privacy needs and specifications [9]. In addition, we introduce two utility metrics to capture location utility constraints: maximum spatial and temporal cloaking resolutions. These utility metrics constrain and regulate StarCloak so that it performs anonymization while meeting the spatial and temporal tolerances.

StarCloak performs location anonymization via cloaking, which is a process that transforms the user’s exact location into a cloaked region with lower resolution to satisfy user-defined privacy requirements. The goal is to choose the cloaked region with as little utility loss and query costs as possible. We start by formalizing the privacy notions: $k$ -user anonymity and $l$ -segment indistinguishability. $k$ -user anonymity protects user $u$ ’s location by “hiding $u$ in a crowd”, i.e., enforcing at least $k-1$ other users in the vicinity of $u$ report the same cloaked location. We observe that $k$ -anonymity is not sufficient to prevent the linkage of user $u$ with a sensitive public location or road segment, since the cloaked $k$ -anonymized region may lack sufficient segment diversity, e.g., it may contain only a single road segment. This motivates the proposal of $l$ -segment indistinguishability.

Definition 3 ( $k$ -user anonymity).

An anonymized location $S$ (subgraph of road network $G$ ) is said to satisfy $k$ -user anonymity, if at least $k$ active users report $S$ .

Definition 4 ( $l$ -segment indistinguishability).

An anonymized location $S$ is said to satisfy $l$ -segment indistinguishability, if it contains at least $l$ different road segments and any one segment could be plausibly associated with a user reporting $S$ .

In StarCloak, a query $q$ is allowed to specify a custom privacy requirement as $(\delta^{q}_{k},\delta^{q}_{l})$ , such that $\delta^{q}_{k}\geq 1$ is the desired $k$ -user anonymity level and $\delta^{q}_{l}\geq 1$ is the desired $l$ -segment indistinguishability level.

A trivial approach to achieve maximum protection could be to assign the whole road network $G$ as the anonymized location. However, this approach clearly provides weak utility and low quality of service. Hence, we incorporate spatial and temporal cloaking resolutions as utility constraints. The spatial constraint $\sigma_{s}$ bounds the spatial resolution of the anonymized location. This is necessary so that anonymized locations are not arbitrarily large. The temporal constraint $\sigma_{t}$ bounds the maximum time delay resulting from anonymization. This is necessary so that the query-issuing user receives a response in timely manner.

Definition 5 (Query profile).

For user $u$ with query $q$ , we denote by $(\delta^{q}_{k},\delta^{q}_{l},\sigma^{q}_{s},\sigma^{q}_{t})$ the complete service profile of $q$ , where $\delta^{q}_{k},\delta^{q}_{l}$ are the privacy parameters and $\sigma^{q}_{s},\sigma^{q}_{t}$ are the utility parameters.

We expect StarCloak to operate in an environment with diverse user and query profiles and diverse road conditions. It is sometimes possible, e.g., due to low traffic density or few active users in the system at night time, that desired $\delta^{q}_{k}$ -anonymity level is unachievable under strict $\sigma^{q}_{s}$ and $\sigma^{q}_{t}$ constraints for some query $q$ . In such cases where $q$ cannot be serviced, it is discarded (dropped).

2.3 Inference Attack Models

An adversary may run sophisticated inference attacks with the goal of identifying probabilities of each segment $s$ in anonymized $S$ to be the user’s actual segment. From an attack-resilience perspective, the ideal case is when the association of the mobile user with the segments in $S$ follows a uniform distribution (with equal probability $1/|S|$ ). In order to formalize an adversary’s association power, we use the notion of linkability [7].

Definition 6 (Linkability).

For user $u$ with anonymized location $S$ , the linkability of $u$ with a specific segment $s^{*}\in S$ is the probability that adversary associates $u$ with $s^{*}$ based on adversarial background knowledge $K_{ad}$ , denoted as: $\text{link}[u\leftarrow s^{*}|S,K_{ad}]$ .

The background knowledge considered here includes knowledge of the location anonymization algorithm, underlying road network structure, and estimation of overall query cost (Sec. 2.4).

In a general replay attack, the adversary observes the anonymized location as set of segments $S$ and attempts to perform reverse-engineering with understanding of the anonymization algorithm. Specifically, the adversary re-runs the anonymization algorithm, denoted $\mathcal{A}(\cdot)$ , for each segment $s\in S$ that could potentially be the mobile user’s actual location. The similarity between $S$ and the algorithm’s output $S^{\prime}$ generated by $\mathcal{A}$ , is used to estimate the likelihood of $s$ having generated $S$ :

[TABLE]

We propose that this can be improved in two ways to create a correlation-based replay attack. First, the general replay attack only takes into account the placement of a single user in $S$ ; however, in utility-preserving $k$ -anonymity algorithms, one user’s location is cloaked together with other active users in the vicinity. Then, the placement of the remaining $k-1$ users in $S$ should also play a role in the likelihood calculation. Note that there are a total of $\left(\kern-3.00003pt\left(\genfrac{}{}{0.0pt}{}{S}{k-1}\right)\kern-3.00003pt\right)$ different possible placements of $k-1$ queries in $S$ . We denote by $m_{i}$ each placement, such that $1\leq i\leq\left(\kern-3.00003pt\left(\genfrac{}{}{0.0pt}{}{S}{k-1}\right)\kern-3.00003pt\right)$ . Second, the adversary may have statistical knowledge of mobile users’ distribution on the road network. For example, the adversary may know the traffic density distribution of the city during rush hour, which enables the adversary to predict that there is higher probability that the user is actually located on a dense segment rather than a sparse segment. We denote by $\text{Pr}[s]$ and $\text{Pr}[m_{i}]$ the probability of user $u$ being located on segment $s$ and remaining $k-1$ users being located as in the placement of $m_{i}$ according to the background distribution knowledge. Combining the two improvements, we compute $\textit{like}^{c}[S|u\leftarrow s,K_{ad}]$ as:

[TABLE]

Then, linkability can be calculated as:

[TABLE]

In replay attacks, the assumption is that the adversary is an observer only. Next, we also consider active adversaries who can inject queries into the system, i.e., execute a query injection attack. We expect anonymization algorithms with strong minimality (tightness) to be more vulnerable to the query injection attack. Consider an anonymization algorithm which cloaks segments into the same anonymized location if and only if they include at least one active query and through the shortest paths between the active queries. This knowledge can be exploited by an inference attack. For example, consider the anonymized location $S$ that consists of the bold lines in Figure 2. Suppose that $q_{1}$ and $q_{2}$ are the queries injected by the adversary with privacy profiles $(\delta_{k},\delta_{l})=(3,3)$ . Then, by the minimality property, the adversary can infer that the third (actual) query was issued from either segment $\overline{v_{4}v_{5}}$ or segment $\overline{v_{5}v_{10}}$ . To capture the effects of query injection attack, we slightly modify the $\textit{like}^{c}$ calculation. We assign zero to the likelihood value of placement $m_{i}$ if the segments corresponding to this placement conflict with the injected queries’ locations.

2.4 Query Cost Model

An important challenge in finding an optimal anonymized location $S$ to a query $q$ is to minimize the cost of the query when executed with the anonymized location. We study two types of cost: cost of query evaluation and cost of communication.

Cost of Anonymized Query Evaluation: Most query processing approaches for road networks are based on two types of fundamental operations: edge-based and node-based. The edge-based operation takes a query $q$ and an edge $e$ as input and returns a set of objects on $e$ denoted $\mathcal{O}_{e}(q,e)$ which satisfy the query condition. For segment $s$ potentially composed of a sequence of edges, we have: $\mathcal{O}_{s}(q,s)=\cup_{e\in s}\mathcal{O}_{e}(q,e)$ . We denote by $\mathcal{C}_{s}$ the average computation cost of evaluating the query on a segment. While $\mathcal{C}_{s}$ depends on a variety of factors, in our current system, we set $\mathcal{C}_{s}$ statically according to the underlying spatial index implementation (e.g., look-up table, R-Tree). The node-based operation takes a query $q$ and a node $v$ as input and returns a set of objects in the vicinity of $v$ denoted $\mathcal{O}_{v}(q,v)$ which satisfy the query condition. The computation cost of evaluating a node-based query is denoted by $\mathcal{C}_{v}$ .

Let $q$ denote a query issued at some position while traveling on segment $s$ , and let $v^{s}_{b}$ and $v^{s}_{e}$ denote the two ends of $s$ . The query result $\mathcal{R}(q,s)$ satisfies the following:

[TABLE]

We give an example in Figure 3. A 3-nearest neighbor query is issued by a user $u$ located on segment $\overline{v_{5}v_{6}}$ . The exact answer to this query is $\mathcal{R}(q,s)=\{o_{5},o_{6},o_{7}\}$ , which is indeed a subset of the union of ${\cal O}_{s}(q,\overline{v_{5}v_{6}})$ = $\{o_{5},o_{6}\}$ , ${\cal O}_{v}(q,v_{5})$ = $\{o_{1},o_{6},o_{7}\}$ and ${\cal O}_{v}(q,v_{6})$ = $\{o_{3},o_{4},o_{5}\}$ . We extend this model from a single segment $s$ to anonymized locations which potentially consist of a set of segments $S$ by employing the concept of border nodes (see Definition 2). Concretely, the result of query $q$ with $S$ as its anonymized location satisfies:

[TABLE]

Finally, the evaluation cost of $q$ with anonymized location $S$ , denoted by $cost_{eval}(q,S)$ can be estimated as:

[TABLE]

where $|BV|$ denotes the number of border nodes in $S$ and $|S|$ denotes the number of segments in $S$ .

Cost of Communication: We presented the architecture and communication phases of StarCloak in Figure 1. We focus specifically on the cost that is added by a location anonymization service such as StarCloak. For query $q$ , the communication cost in mobile client’s exact request sent and the exact result it receives do not change depending on whether an anonymization engine is used or not, since a service request takes a fixed encoded format and the size of the exact answer is fixed. With respect to the messages exchanged between the location anonymization engine and the LBS provider, we measure communication cost as the length of the sent and received messages, and use $\Arrowvert x\Arrowvert$ to denote the encoded length of object $x$ . For the message sent from the location anonymization engine to the LBS provider, the query remains intact while the location information is anonymized by cloaking it to a set of segments $S$ . Therefore, the communication cost here is $\Arrowvert q\Arrowvert+\Arrowvert S\Arrowvert$ . The message sent from the LBS provider to the location anonymization engine contains the candidate result $\mathcal{R}(q,S)$ ; hence, the communication cost here is $\Arrowvert{\cal R}(q,S)\Arrowvert$ . As discussed above, a query $q$ usually has fixed length. Also, for given location privacy requirements, the number of segments in $S$ tends to be fairly stable. As such, we conclude that $\Arrowvert{\cal R}(q,S)\Arrowvert$ is the dominant and most “optimizable” communication cost.

For query $q$ , let $res\_size$ denote the average exact result size of $q$ , e.g., if $q$ is the popular $k$ -NN query, then $res\_size=k$ . Following Equation 4, given a query $q$ and anonymized location $S$ represented as a set of segments, the size of the candidate result $\mathcal{R}(q,S)$ can be estimated as:

[TABLE]

Then, denoting by $\rho_{o}$ the average number of objects on an edge and $\mathcal{C}_{o}$ the cost of sending/receiving an object $o$ over the wireless channel (e.g., sending unique identifier of $o$ ), the total communication cost for $q$ with anonymized location $S$ is:

[TABLE]

Overall Cost: It is desirable to combine $cost_{eval}$ and $cost_{comm}$ to find an estimation of the overall cost. In StarCloak, we consider a linear combination scheme:

[TABLE]

where $\beta$ is the parameter tuning the trade-off between evaluation cost (mainly CPU computation on server side) and the communication cost (mainly bandwidth of wireless channel).

2.5 Problem Statement

Given a road network represented as a graph $G$ with mobile users traveling on the road network while issuing queries, where each user $u$ ’s query $q$ is associated with its profile $(\delta^{q}_{k},\delta^{q}_{l},\sigma^{q}_{s},\sigma^{q}_{t})$ , the principles and objectives of StarCloak are:

•

It transforms $u$ ’s true location to an anonymized (cloaked) location $S$ , where $S$ is a subgraph of $G$ .

•

$S$ satisfies the privacy requirements of $q$ in terms of $\delta^{q}_{k}$ -user anonymity and $\delta^{q}_{l}$ -segment indistinguishability.

•

$S$ satisfies the utility constraints of $q$ , i.e., spatial size of $S$ is no larger than $\sigma^{q}_{s}$ and the temporal delay caused by location anonymization is no more than $\sigma^{q}_{t}$ .

•

$S$ achieves high attack-resilience; measured in terms of low linkability and high segment entropy.

•

Anonymized location $S$ yields low $cost(q,S)$ .

3 StarCloak Algorithms

This section explains the StarCloak constructs and algorithms in detail. We first describe the concept of cloaking star, star graph, and relevant data structures used in implementing StarCloak efficiently in Section 3.1. We explain how an incoming query $q$ is pre-processed by StarCloak and added to the appropriate data structures in Section 3.2. The overview of the main StarCloak algorithm is presented in Section 3.3. The main algorithm relies on several methods, such as selecting a star, updating the cloaking graph (adding and removing queries from the cloaking graph), candidate star-set selection, and star-set pruning. These methods are described in Sections 3.4, 3.5, 3.6, and 3.7, respectively.

3.1 Star Concept and StarCloak Data Structures

Unlike conventional solutions which are indifferent to underlying road network structure, use a random waypoint mobility model, and rely on a rectangular or circular region as the basic unit of location cloaking; StarCloak introduces a star as the basic unit of location cloaking. Each star is defined by a vertex with its adjacency segment list in $G$ .

Definition 7 (Star).

Let $G=\langle V_{G},E_{G}\rangle$ denote the road network of interest. We define a star $\Phi_{i}$ anchored at vertex $v_{i}\in V_{G}$ as a subgraph of $G$ , denoted by $\Phi_{i}=\langle V_{\Phi}^{i},E_{\Phi}^{i}\rangle$ , and $V_{\Phi}^{i}=\{v_{i}\}$ and $E_{\Phi}^{i}=\{w|w\neq v_{i},w\in V_{G},(v_{i},w)\in E_{G}\}$ .

Accordingly, every node $v_{i}$ with $d_{G}(v_{i})\geq 3$ is associated with a unique star $\Phi_{i}$ , which consists of vertex $v_{i}$ and all of its adjacent road segments, that is, those segments with $v_{i}$ as one of two end nodes. For example, in the left plot of Figure 4, star $\Phi_{5}$ is composed of node $v_{5}$ and segments $\{\overline{v_{5}v_{4}}$ , $\overline{v_{5}v_{6}}$ , $\overline{v_{5}v_{10}}\}$ .

The road network can then be transformed into a star graph, as shown on the right of Figure 4. Each vertex in the star graph is a star in $G$ , and two vertices are adjacent in the star graph if and only if their corresponding stars in $G$ share a segment. All edges in the star graph are of unit length. The hop distance between two stars $\Phi_{i}$ and $\Phi_{j}$ in a road network $G$ is measured by the number of hops in the shortest path between $\Phi_{i}$ and $\Phi_{j}$ . For example, in Figure 4, the hop distance between $\Phi_{6}$ and $\Phi_{10}$ is 2, since their shortest path in the star graph is $\Phi_{6}\rightarrow\Phi_{5}\rightarrow\Phi_{10}$ .

In addition to the star concept, StarCloak uses some important data structures for improved effectiveness and efficiency.

Query Queue, $\mathcal{Q}$ : A first-in-first-out (FIFO) queue that records the incoming queries which must be anonymized before they are relayed to the respective LBS provider. Incoming queries are inserted into the queue from the tail. The anonymization engine pops each query from $\mathcal{Q}$ to find a suitable cloaked subgraph $S$ .

Expiration Heap, $H$ : A max-min heap that maintains the queries in the order of their expiration time computed according to query arrival time and temporal delay constraint $\sigma^{q}_{t}$ . Anonymization engine checks $H$ to identify queries that are close to their expiration time in order to prioritize certain queries or to identify queries that have been expired and should be removed from $\mathcal{Q}$ .

Cloaking Graph, $G_{C}$ : An undirected graph dynamically constructed in-memory, for recording the set of queries associated to a star based on their similarities with respect to their privacy requirements, their spatial proximity, and their expiration deadlines. The cloaking graph will be explained further in Section 3.5.

Star-Map, $M_{S}$ and Query-Map, $M_{Q}$ : We create one hash map to index stars called a Star-Map, and similarly, one hash map to index queries associated to a node in the cloaking graph called Query-Map, for fast star and query look-up.

Candidate Star-Set Queue, $\mathcal{Q}_{C}$ : A FIFO-based queue structure that records generated candidate cloaking star-sets. The pruning unit in StarCloak pops star-sets from $\mathcal{Q}_{C}$ and applies an effective and randomized pruning algorithm to generate the final cloaked star regions.

3.2 Incoming Query Pre-processing

Let $q$ denote an incoming location service query. StarCloak pre-processes $q$ to generate the internal representation of $q$ by performing the following sequence of tasks. First, a unique identifier is assigned to $q$ using a secure hash function with user ID and query issue time, i.e., $hash(q.u||q.t)$ . Second, using the latitude and longitude values of $q$ ’s focal location, the spatial index and road network graph, the road segment of $q$ is determined. Third, $q$ is inserted to queue $\mathcal{Q}$ . Fourth, $q$ is inserted to the expiration heap $H$ with query expiration time as key and query identifier as value. Query expiration time is the sum of query issue time and user-specified temporal delay constraint: $q.t_{exp}=q.t+\sigma^{q}_{t}$ .

3.3 Main StarCloak Procedure

Before going into the details of each step, we summarize the main StarCloak procedure in Algorithm 1. The location anonymization engine continuously pops queries from the query queue $\mathcal{Q}$ and processes them to find anonymized location $S$ fitting the desired privacy and utility requirements. Prior to processing a new query $q$ , StarCloak removes expired queries from the system. When an expired query denoted $q_{e}$ is removed from a cloaking graph node, it is possible to find cloaked subgraph for the remaining queries in the node. All updated nodes are stored in the list $L_{u}$ after expired queries are removed, and StarCloak attempts to find anonymized locations for these nodes before processing new queries (lines 12-15). Next, it pops a new query from query queue $\mathcal{Q}$ and selects the star to assign to it (lines 16-17). Thereafter, StarCloak updates the cloaking graph with the new query $q_{i}$ and searches a possible cloaked location for the updated cloaking graph node (lines 18-21). Finally, in the pruning phase, StarCloak randomly selects and removes non-active stars from the candidate star-sets. In the next sections, we present the details of these operations.

3.4 Select Star

StarCloak engine performs anonymization by scanning through the FIFO query queue $\mathcal{Q}$ . All segments that are associated with active queries are marked as active. The anonymization engine first selects a star to assign queries on the active segment as the initial cloaking star. Each segment has two end nodes and if both nodes are intersection nodes, i.e., $d_{G}(v_{s})\geq 3$ and $d_{G}(v_{t})\geq 3$ , StarCloak needs to determine to which star this active segment should be assigned: $\Phi_{s}$ or $\Phi_{t}$ . For example, in Figure 4 when $q_{1}$ arrives and segment $\overline{v_{5}v_{6}}$ becomes active, one of the two possible stars $\Phi_{5}$ or $\Phi_{6}$ will be determined as the initial cloaking star. When a star $\Phi$ is “selected” and segment $s$ is assigned to $\Phi$ , we denote this by $s\leftarrow\Phi$ .

In StarCloak, we use a cost-aware star selection strategy, taking into account the cost model described in Section 2.4. Let $q$ be the query, $AS$ denote the set of currently active segments on the road network $G$ , and $\phi$ be the set of selected stars. Then, the minimization of the overall cost can be formally stated as:

[TABLE]

This optimization problem aims at finding an assignment between stars and segments such that the stars cover all segments with active queries, while having the minimum total cost. It can be shown that the optimization problem in Expression 7 is NP-Hard. The proof follows from a reduction from the weighted Vertex-Cover problem, which is a well-known NP-Complete problem. Specifically, if for all stars $\Phi$ in the star graph we set $cost(q,\Phi)=1$ , i.e., all stars have identical cost, then the problem is equivalent to the classical Vertex-Cover problem. Motivated by the hardness of finding a globally optimal solution to our optimization problem, we propose a randomized algorithm called Select Star, which finds approximate solutions with high assignment quality and attack-resilience. The intuition is, for each query which has two endpoints as viable stars, the algorithm probabilistically selects one of the two stars with probability inversely proportional to their $cost$ .

The technical description of our Select Star algorithm is given in Algorithm 2. The algorithm works as follows. Let $q$ be an incoming query with travel segment $s$ , and let $\Phi_{a}$ and $\Phi_{b}$ be the two stars on the two endpoints of segment $s$ . For simplicity, we assume both endpoints are stars; if not, then $s$ is trivially assigned to the endpoint which is a star. If only one of $\Phi_{a}$ or $\Phi_{b}$ is currently active, Select Star assigns $s$ to the active star. If both $\Phi_{a}$ and $\Phi_{b}$ are active, then $s$ is assigned to $\Phi_{a}$ with probability $cost(q,\Phi_{b})/[cost(q,\Phi_{a})+cost(q,\Phi_{b})]$ , or $\Phi_{b}$ otherwise. If neither star is active, then the same probabilistic assignment to either $\Phi_{a}$ or $\Phi_{b}$ is carried out, additionally, the assigned star is marked as active for next iterations. This assignment has the desirable property that the outcome of our randomized Select Star algorithm is not far from an optimal assignment. More formally, denoting by $cost^{opt}$ the cost achieved by the optimal assignment, and denoting by $cost^{rnd}$ the cost achieved by our Select Star algorithm, it holds in expectation that: $\mathbb{E}\left[cost^{rnd}\right]$ $\leq$ $2\cdot cost^{opt}$ .

3.5 Cloaking Graph Update

We use the cloaking graph data structure (previously introduced and denoted by $G_{C}$ ) to group nearby queries and efficiently index other query groups that can be cloaked together for easy access. The cloaking graph $G_{C}(V_{C},E_{C})$ is an undirected graph, where $V_{C}$ is the set of vertices each representing a set of requests grouped by the star they are assigned to and their profiles (similarities in privacy and utility requirements). $E_{C}$ is the set of edges; there is an edge $e=(v_{i},v_{j})\in E_{C}$ between $v_{i}$ and $v_{j}$ iff queries associated with both vertices can be cloaked together based on $k$ -user anonymity, $l$ -segment indistinguishability, and spatial tolerance. Each vertex $v$ in $V_{C}$ stores the following information.

The corresponding star $v.\Phi$ : for each active star, there is at least one vertex in $V_{C}$ . The query set $v.Q$ stores the queries assigned to $v.\Phi$ . We compute the combined privacy of utility requirements $\langle\delta_{k}^{v},\delta_{l}^{v},\sigma^{v}_{s}\rangle$ of the queries in $v.Q$ as:

[TABLE]

We denote by covered star-set $v.\varTheta$ the set that contains the identifiers of stars which are within $\sigma^{v}_{s}$ distance from $v.\Phi$ . The segment count $v.sc$ denotes the number of segments associated with stars in star-set $v.\varTheta$ . Finally, adjacency list $v.N$ is stored with $v$ , where being neighbors indicates that requests in corresponding nodes can be cloaked together. Two cloaking nodes $v_{i}$ and $v_{j}$ are considered to be neighbors iff: (i) stars associated with each node are an element of the star-set of the other node, i.e., $v_{i}.\Phi\in v_{j}.\varTheta$ and $v_{j}.\Phi\in v_{i}.\varTheta$ , and (ii) the number of segments that cover both nodes is enough to satisfy their $l$ -segment indistinguishability requirements, i.e., $|v_{i}.\varTheta\cap v_{j}.\varTheta|\geq max\{\delta_{l}^{v_{i}},\delta_{l}^{v_{j}}\}$ .

StarCloak performs two types of updates on the cloaking graph: add query to cloaking graph, remove query from cloaking graph. The function for adding queries to the cloaking graph is given in Algorithm 3. When the function is called to insert a new query, it checks all cloaking graph vertices associated with the corresponding star to add the new query (lines 4-14). If there is no possible vertex to add, a new vertex is created (lines 15-16). The new query can be added to an existing vertex only if its privacy profile does not conflict with the profile of the existing node. A conflict occurs when new spatial tolerance is not able to satisfy the new $l$ -segment indistinguishability requirement. Thus, we need to perform the checks under lines 4-14 to avoid any conflicts.

The function for removing queries from the cloaking graph is given in Algorithm 4. Say that $q_{e}$ is the expired query that should be removed. We first perform a look-up from $M_{Q}$ to find the cloaking graph node $v_{u}$ associated with $q_{e}$ . If $|v_{u}.Q|>1$ , i.e., $v_{u}$ contains other queries as well, its information is updated based on remaining queries after deletion of $q_{e}$ . The update is performed according to Equation 8 to re-compute $\delta_{k}^{v_{u}}$ , $\delta_{l}^{v_{u}}$ , and $\sigma_{s}^{v_{u}}$ . If the updated $v_{u}$ now has either $\delta_{l}^{v_{u}}<\delta_{l}^{q_{e}}$ or $\sigma_{s}^{v_{u}}>\sigma_{s}^{q_{e}}$ , then ${v_{u}}.\varTheta$ , ${v_{u}}.sc$ and $v_{u}.N$ are also re-computed. Note that the latter is only necessary if segment indistinguishability or spatial tolerance requirements are relaxed. On the other hand, if $|v_{u}.Q|=1$ , i.e., $q_{e}$ was the only query associated with $v_{u}$ , then $v_{u}$ is removed from $G_{C}$ , and $M_{S}$ is updated. The return value of the function is $v_{u}$ , which is an input for the next step (candidate star-set selection).

3.6 Candidate Star-Set Selection

The goal of this step is to discover a set of stars, called candidate star-set, which constitutes a possible anonymized sub-graph for certain queries. In order to find such star-set, StarCloak searches over the cloaking graph and identifies a set of nodes, denoted by $NS$ , that satisfy the privacy requirements of all queries associated with each node. Formally, let $\vartheta$ denote a candidate star-set, and let $seg(\vartheta)$ be a function that returns all segments associated with input stars. $NS$ meets $k$ -user anonymity and $l$ -segment indistinguishability if and only if:

[TABLE]

Such $NS$ forms candidate star-set $\vartheta$ with all stars shared within the covered star-set of each node in $NS$ :

[TABLE]

We assume the existence of a procedure named checkReqs(NS), which takes as input a set of nodes $NS$ , performs the privacy check given in Equation 9, and returns either the $\vartheta$ built in Equation 10 if $NS$ passes the privacy check or an empty set $\emptyset$ otherwise. We use the checkReqs procedure in Algorithm 5 for candidate star-set selection.

Algorithm 5 specifies the technical details of candidate star-set selection process. Searching over the cloaking graph for finding a candidate star-set starts with the updated vertex $v_{u}$ . If the number of queries assigned to this vertex is fewer than the $k$ -user anonymity requirement of the vertex, then the algorithm continues the search process over the neighboring nodes, ordered by the hop distance between their associated star and the star of the starting vertex. For each neighbor node, it applies checkReqs to $v_{u}$ and the neighbor node combined (lines 6-8). If a candidate star-set still cannot be found, neighbor node is evaluated with all possible node combinations generated with the previously processed neighbor nodes (lines 10-16). On line 10, we denote by $C$ a clique in $Q_{Comb}$ , and line 11 checks if the clique satisfies the $l$ -segment indistinguishability requirement. The possible node combinations are tracked by variable $Q_{Comb}$ , which is enlarged in each iteration so that newly visited nodes are added (lines 16-17). The output of the algorithm is $\vartheta$ , a candidate a star-set.

3.7 Star-Set Pruning

Final component of StarCloak is star-set pruning: pruning of extra segments from candidate star-set. As specified in the main StarCloak procedure in Algorithm 1, the candidate star-sets found by Algorithm 5 are added to the candidate star-set queue denoted $\mathcal{Q}_{C}$ , and then they are pruned by the star-set pruning component. Star-set pruning plays an important role in the generation of low cost cloaked subgraphs and improved attack-resilience by randomizing the star selection from outer to center of the candidate star-set. Note that star-set pruning is highly parallelizable, i.e., it is possible to implement one or more pruning processes running in parallel (each popping from $\mathcal{Q}_{C}$ ) while another process in the anonymization engine performs remaining tasks and adds to $\mathcal{Q}_{C}$ .

The function for pruning star-sets is given in Algorithm 6. Pruning starts by popping a candidate star-set from queue $\mathcal{Q}_{C}$ . Let $\vartheta$ denote the popped star-set. We find the set of boundary stars $BS$ of $\vartheta$ , which are the stars that have at least one neighbor star not in $\vartheta$ , as well as the set of active stars $AS$ of $\vartheta$ which cannot be removed from the star-set. Let $l^{max}$ denote the maximum $l$ -segment indistinguishability requirement in the star-set. We run multiple iterations, and within each iteration, the following are performed. First, a random star denoted $\Phi_{r}$ is selected from $BS\setminus FS$ . If $\vartheta$ still satisfies $l^{max}$ -segment indistinguishability after removing $\Phi_{r}$ from $\vartheta$ ; then $\Phi_{r}$ is removed from $\vartheta$ , $BS$ is updated by removing $\Phi_{r}$ from $BS$ , and we proceed to the next iteration. However, if $l^{max}$ -segment indistinguishability is violated after removing $\Phi_{r}$ from $\vartheta$ , then the pruning stops here and the current $\vartheta$ (without removing $\Phi_{r}$ ) is produced as the final output of the pruning process.

We give an example of star-set pruning in Figure 5. The candidate star-set is shown on the left with the black and grey circles. Black circles depict active nodes that cannot be removed because of their association with active queries. Suppose that the maximum segment requirement is 9. First, one generates boundary star list $\{\Phi_{5},\Phi_{7},\Phi_{9},\Phi_{13}\}$ , i.e., gray circles that are connected with the white circles. Then, one of the boundary stars is selected randomly, say $\Phi_{5}$ for sake of example. After removing $\Phi_{5}$ , remaining stars still meet the segment requirement. Boundary star set is updated with the new star $\Phi_{10}$ and another star is selected randomly from the set. Suppose $\Phi_{7}$ is selected for pruning; indistinguishability requirement is still satisfied with the remaining stars. Note that there are no new boundary stars because $\Phi_{12}$ is an active star. One selects another star from the current boundary star set which is $\{\Phi_{5},\Phi_{9},\Phi_{10},\Phi_{13}\}$ . Assume $\Phi_{9}$ is selected for removal; then boundary star set is updated with new star $\Phi_{15}$ . However, removing any of the current stars now violates the queries’ segment indistinguishability requirement, thus we have to add back the selected star to the star list. Associated segments constitute the cloaked subgraph. We show the resulting cloaked subgraph with bold lines on the right side of Figure 5.

4 Variants and Optimizations

In this section, we introduce two variants of StarCloak for finding cloaking regions with lower cost, query processing time, and network bandwidth usage without sacrificing privacy.

4.1 Spatially Bounded StarCloak

Basic StarCloak generates cloak regions whenever it finds a star-set that satisfies all queries’ privacy requirements. However, this approach may cause cloak regions that consist of stars that are far from each other and scattered across the part of the road network within $\sigma_{s}$ . An example scenario is given in supplementary material. We propose spatially bounded StarCloak to generate more compact cloaked subgraphs. The essence of this optimization is to sacrifice anonymization time in favor of lower query processing and communication costs. We define a system parameter $\lambda\geq 1$ called the compactness factor, that controls the maximum hop distance between selected vertices in the candidate star-set. To generate more compact cloaked subgraphs, we make some modifications to the candidate star-set selection algorithm. First, we group neighbors by their distance $d$ to the starting node. $\lfloor d/\lambda\rfloor$ determines the level of each group element. At each level, the algorithm only considers neighbor nodes which can be cloaked with the node combinations generated in the previous level. The algorithm searches level by level iteratively in top-down manner. Spatially bounded StarCloak enforces compactness by selecting active stars that, for each star in the star-set there is at least one other star which is no further than $2\lambda-1$ hop distance.

Illustrative Example: Figure 6 shows an example scenario with queries $q_{1},q_{2},\ldots,q_{9}$ distributed on the road-network as in Figure 6. Suppose that queries are issued in the order of their ID, and all queries have 3-user anonymity requirement. (For simplicity, we assume the spatial tolerance is high enough to cover all stars in our small road network, and no $l$ -segment indistinguishability requirement exists.) In Figure 6 we give an example star assignment for the 9 queries. When we apply basic StarCloak for the given queries, selected stars will be as shown in Figures 6, 6 and 6. It can be observed that selected stars are spread all over the network. On the other hand, Figure 7, 7 and 7 show a more compact star selection possibility for the same queries with different combinations.

Suppose that for the above example, we used spatially bounded StarCloak with compactness factor $\lambda=1$ . During the processing of $q_{3}$ , it would first check neighbor nodes which are 2 hop distance from the star $\Phi_{6}$ . Since there is no neighbor in level 1, the anonymization engine would then accept new queries to process. While it is processing $q_{7}$ , it first adds neighbor node associated with the star $\Phi_{12}$ which is in the 1 hop distance level. Two nodes do not satisfy $k$ -user anonymity requirement yet, thus the anonymization engine continues to process neighbor nodes at the second level ( $\Phi_{9}$ and $\Phi_{13}$ ). Since we use a FIFO-based query processing ordering to decrease waiting time, $\Phi_{13}$ is selected in the next iteration. Three nodes together meet all users’ privacy requirement and can be removed from the cloaking graph.

4.2 Hybrid StarCloak

The main difficulty in spatially bounded StarCloak is the choice of $\lambda$ . At first sight, $\lambda$ can be determined by the query density of a general area. However, query density is often highly dynamic and changes street-by-street or star-by-star. Even neighboring segments may have different densities. Thus, the $\lambda$ determined based on query density of a general area may be undesirable for sparse sub-areas, and it is not possible to define an optimal compactness factor for each individual star at each time. To overcome this problem, we propose hybrid StarCloak, which leverages advantages from both basic StarCloak and spatially bounded StarCloak. In hybrid StarCloak, we first try to generate cloaking regions with spatially bounded StarCloak, and then for queries which could not be cloaked yet and are close to their expiration time, we apply basic StarCloak. We use a consideration factor denoted by $\alpha$ as the system parameter to decide when to apply basic StarCloak. Hybrid StarCloak periodically checks the expiration heap $H$ to see if any query is closer than $\alpha$ to their expiration time.

To demonstrate the usefulness of hybrid StarCloak, we consider the example from Figure 6. When we use spatially bounded StarCloak with compactness factor $\lambda=1$ for the example in Figure 6, queries $q_{3},q_{4},q_{9}$ would remain in the system until new queries were issued in their neighborhood, since the distance between stars $\Phi_{6}$ and $\Phi_{8}$ is two hops. Assume now that these queries have an expiration time, then they may have to be dropped before the new queries arrive, even though there is a possible cloaked subgraph. With hybrid StarCloak, we would be able to apply basic StarCloak towards the queries’ expiration time, and we would be able to cloak those stars together, thereby saving the queries from being dropped.

5 Experimental Evaluation

5.1 Experimental Setup

We used two different road network datasets, California 111http://www.cs.utah.edu/~lifeifei/SpatialDataset.htm and Georgia 222https://www.census.gov/geo/maps-data/data/tiger-geodatabases.html, with varying sizes to observe the effect of map density on the efficiency and effectiveness of StarCloak. California road network contains only highways with 21,693 edges and 21,048 nodes. 87,635 points of interest from 62 different classes (e.g., hospital, school, etc.) are associated with the road network. Georgia is the larger road network dataset, which contains primary and secondary roads with 430,849 edges and 428,708 nodes. To simulate user movements, we used the Brinkhoff data generator for moving objects 333http://iapg.jade-hs.de/personen/brinkhoff/generator/. We assign the same number of moving objects (10,000) to each map, with the intention of simulating high user density and low user density conditions since the two maps have different scale. In each simulation, we define two classes of moving objects: vehicles with fast speed (such as passenger cars) and vehicles with slow speed (such as trucks).

During the simulation, each vehicle generates $k$ -NN queries with randomized probability with parameters specified as: (1) $k$ denotes the number of nearest points of interest requested; (2) $\delta_{k}$ and $\delta_{l}$ are the personalized privacy parameters; (3) $\sigma_{s}$ and $\sigma_{t}$ are the personalized spatial and temporal tolerance constraints; (4) $\gamma$ is the waiting time, i.e., amount of time a vehicle waits until its previous query is either answered or dropped, before issuing another query. The values of each individual query are drawn independently from Gaussian distributions with default mean and standard deviation parameters listed in Table I. The values of parameters $\sigma_{t}$ , $\alpha$ , and $\gamma$ are in seconds. The compactness factor $\lambda$ and consideration factor $\alpha$ are only used in spatially bounded StarCloak and hybrid StarCloak. All algorithms are implemented in Java and tested on a Windows 7 platform with Intel(R) Core(TM) CPU (4.00 GHz) and 16GB memory.

5.2 Compared Approaches

In our evaluation, we compare multiple approaches. Random sampling and network expansion serve as two baseline anonymization approaches. XStar [7] is the most relevant system to StarCloak. We also include three versions of StarCloak in our comparison: basic, spatially bounded, and hybrid.

Random Sampling: Given an incoming query with profile $(\delta^{q}_{k},\delta^{q}_{l},\sigma^{q}_{s},\sigma^{q}_{t})$ , this approach iteratively samples segments randomly from the spatial region within $\sigma^{q}_{s}$ one-by-one, and adds them to the anonymized location. It terminates when $(\delta^{q}_{k},\delta^{q}_{l})$ privacy requirements are satisfied. The strength of random sampling is its high resilience to inference attacks. Its weakness is the high query processing cost due to random segment selection.

Network Expansion: For incoming query $q$ , this approach starts from the actual segment of the query and incrementally adds a neighboring segment using Dijkstra’s deterministic network expansion algorithm. The order of expansion is based on the distance between $q$ ’s focal position and neighboring segments’ midpoints. The approach terminates when $(\delta^{q}_{k},\delta^{q}_{l})$ privacy requirements are satisfied. Network expansion results in a cloaked location connected as a densely compact subgraph. Its advantage is low query processing cost. Its main weakness is added vulnerability to attack since the expansion follows a deterministic best-first search.

XStar: The most related work to StarCloak in the literature is XStar [7], which performs road network anonymization under utility and privacy constraints. Our evaluation shows StarCloak is superior to XStar in aspects including reduced query processing and anonymization time, higher success rate, and higher attack-resilience.

StarCloak and Variants: In our result graphs, we denote by StarCloak the basic version of StarCloak. We include its two optimized variants, which are spatially bounded StarCloak and hybrid StarCloak, in our experimental comparison.

5.3 Evaluation Metrics

To evaluate the performance of different mechanisms, we use multiple metrics: success rate in anonymization, anonymization time, query processing time, size of candidate result set, successful throughput, and segment entropy against inference attacks.

Success Rate: An effective anonymization engine should successfully anonymize as many queries as possible, and drop as few queries as possible. Success rate measures the fraction of successfully anonymized queries divided by the number of total queries issued by the mobile users.

Anonymization Time: When users issue queries, they want fast answers (low temporal delay). However, an anonymization engine needs a certain amount of time to perform the anonymization. The anonymization time metric measures the average time elapsed from the query issue time until successful anonymization. From the user’s perspective, it is desired that the anonymization time is as low as possible. Note that in order to ensure a fair comparison among multiple compared approaches, we measure anonymization time only on successfully anonymized queries.

Query Processing Time: This metric measures the processing time cost for anonymized queries. With anonymization, queries are evaluted on cloaked subgraphs instead of user’s exact location. Number of border nodes and edges in the cloaked subgraph impact query processing time.

Candidate Result Size: This metric aims to measure the added bandwidth overhead for the communication between the anonymization engine and the LBS provider. More compact anonymized subgraphs lead to smaller candidate result sets, thus lower communication cost.

Successful Throughput: We use the throughput metric to evaluate the scalability of the anonymization approaches. Rate of successful throughput equals the multiplication of query execution rate (number of queries processed per second) and success rate.

Entropy: We use entropy as a quantitative measure of adversarial uncertainty achieved by anonymization, where higher entropy means higher attack-resilience. Given an anonymized location $S$ for user $u$ as a subgraph consisting of multiple segments, the segment entropy of $S$ can be calculated by:

[TABLE]

Note that the number of segments in the generated anonymized locations may vary from anonymization algorithm to algorithm, based on their segment selection strategy. Thus, using simple entropy for different sized anonymized locations may not adequately capture the strength of protection. For this reason, we use normalized entropy [10] defined as: $H(S)/\log_{2}(|S|)$ .

5.4 Experiment Results

Results on Success Rate: Figure 8 shows the percentage success rates of compared approaches with respect to varying $\delta_{k}$ , $\delta_{l}$ , $\sigma_{s}$ , and $\sigma_{t}$ for California and Georgia maps. Generally, StarCloak and its optimized variants have high success rates because of their ability in handling different user requirements effectively. The results show that increasing the privacy requirements often decreases success rate, but the effects are different for different maps and different privacy requirements. For example, when $\delta_{k}$ increases, success rate decreases faster on the Georgia map compared to California. The reason for this is the query density of the maps. Keeping the number of queries constant across maps, since the Georgia map is more detailed than California, the distribution of query density on Georgia is sparser. Thus, on Georgia, the chance of finding enough queries to cloak together under the same spatial constraint is smaller, causing more queries being dropped and lowered success rate. On the other hand, for XStar, increasing $\delta_{l}$ impacts success rate on California more than Georgia, unlike StarCloak. The reason for this is also related to query density. XStar anonymizes queries on the same star together, whereas in StarCloak if there is a conflict between two queries’ $l$ -segment indistinguishability and $\sigma_{s}$ spatial tolerance, they are cloaked on different vertices of the cloaking graph. This allows StarCloak to maintain high success rates despite increasing $\delta_{l}$ .

Comparing the three StarCloak variants, the basic version achieves highest success rate, followed by hybrid StarCloak and then spatially bounded StarCloak. This is expected because spatially bounded StarCloak aims at finding compact cloak regions, whereas basic StarCloak allows suboptimal regions for higher success rate. Hybrid StarCloak achieves a trade-off between success rate and compactness of a cloak region. The rightmost two graphs within Figure 8 display the impact of changing spatial and temporal tolerance constraints on success rate. When users have higher tolerance, their queries are anonymized with higher success rate. We see clearly that lower spatial tolerance affects XStar’s success rate negatively far more than it affects StarCloak, once again showing StarCloak’s superiority.

Results on Successful Throughput: The throughputs of compared approaches with respect to varying $\delta_{k}$ , $\delta_{l}$ , $\sigma_{s}$ and $\sigma_{t}$ are shown in Figure 9 for California and Georgia maps. Note that the y-axes of these figures are in logarithmic scale. We observe that throughputs of the baseline approaches (random sampling and network expansion) are often significantly lower than XStar and StarCloak. While XStar and StarCloak maintain high throughput despite increasing $\delta_{k}$ (stricter privacy), the throughput of XStar drops when $\delta_{l}$ is increased. Hence, we find that StarCloak is much more capable of satisfying challenging $l$ -segment indistinguishability requirements than compared approaches.

With respect to varying spatial and temporal tolerances, we observe that StarCloak variants are capable in handling a variety of tolerance values without significant degradation in throughput. In contrast, the throughputs of the baseline approaches are often 5-6 times smaller than StarCloak. Furthermore, while XStar’s throughput is comparable to StarCloak when $\sigma_{s}$ is high, XStar may perform even worse than the baselines for small $\sigma_{s}$ (see Figure 9). Collectively, these results show the superiority of StarCloak and its variants in query service and scalability compared to both XStar and baseline approaches, under varying privacy and utility settings.

Results on Anonymization Time: In Figure 10, we report the average anonymization times for the California and Georgia maps. The results show that StarCloak variants have significantly better anonymization time than compared approaches under various privacy and utility constraints. XStar often has the highest anonymization time on California map. Among StarCloak variants, hybrid StarCloak and spatially bounded StarCloak are similar, whereas basic StarCloak has lowest anonymization time. This is because basic StarCloak has no preference towards “waiting for a better opportunity” to generate cloaked regions for incoming queries, whereas the other two variants can wait closer until the query expiration time before anonymization.

Results on Query Processing Time: We measure the average query processing time of an anonymized query on the server-side and report the results in Figure 11. Since each compared anonymization approach may have different success rate, in order to ensure a fair comparison, we pick the same number of anonymized locations across all approaches in this set of experiments and those experiments reported in the next subsection (which is the number of anonymized locations achieved by the approach with lowest success rate). It is expected that anonymized locations with scattered segments will cause higher query processing time. We observe that StarCloak’s results are often significantly better than its main competitor XStar. Among the three StarCloak variants, basic StarCloak has highest query processing time, whereas the hybrid and spatially bounded versions have similar processing time, because of their more compact cloaked regions. The improvement of spatially bounded StarCloak becomes significant particularly when $\sigma_{s}$ is relaxed (increased).

Results on Candidate Result Size: The size of the candidate result is an important measure of the added network bandwidth cost caused by anonymization. Larger the number of items returned in the candidate result set, higher the communication bandwidth cost. We measure the candidate result set size under varying $\delta_{k}$ , $\delta_{l}$ , $\sigma_{s}$ , and $k$ parameters, and report the results in Figure 12. Spatially bounded and hybrid StarCloak often provide the best results due to their compact output cloak regions. StarCloak’s competitors are comparable when the $\delta_{k}$ , $\delta_{l}$ privacy requirements are relaxed, but as we make the privacy requirements stricter, the bandwidth cost of XStar in particular becomes significantly large. The increase in candidate result size caused by large $\sigma_{s}$ can be explained by the fact that relaxed spatial tolerance inevitably causes the StarCloak approaches to be more relaxed regarding the compactness of the output cloak regions, thus query candidate result sets are also more scattered and diverse. The increase in candidate result size due to increased $k$ is expected, since $k$ is the parameter controlling the number of nearest neighbors returned by the $k$ -NN query. Naturally, with higher $k$ , more candidates have to be returned, hence the candidate result set has larger size.

Results on Attack-Resilience: We use the normalized entropy metric to measure attack-resilience of compared approaches, with higher entropy meaning higher attack-resilience. The results are shown in Figure 13. In this set of experiments, it is expected that by nature, random sampling will give highest entropy, whereas network expansion will give lowest entropy. The results in Figure 13 confirm these expectations, and show that the entropy of StarCloak variants and XStar are between random sampling and network expansion. Under a variety of settings, StarCloak has higher entropy than XStar. Furthermore, StarCloak’s entropy values are similar to random sampling, showing that it achieves near-optimal attack-resilience. As $\delta_{k}$ increases, since more users are cloaked together, entropy increases. The increase in entropy is more clear for spatially bounded StarCloak and hybrid StarCloak compared to basic StarCloak, as their output cloak regions are more compact (focused on the users’ actual locations) with small $\delta_{k}$ in the first place. In the rightmost two graphs in Figure 13, we show the impact of the number of injected queries on entropy in the query injection attack. In XStar and StarCloak, while it is generally the case that with more query injections cause a more successful attack, the vulnerability of XStar becomes significantly higher than StarCloak when 4 or more queries are injected. Unlike XStar and StarCloak, random sampling and network expansion do not consider nearby queries’ locations during cloak region generation, thus their entropy remains unaffected by query injections.

6 Related Work

Location privacy has been an active research area for more than a decade. Several location and trajectory obfuscation mechanisms have been developed to satisfy privacy notions such as $k$ -anonymity, differential privacy, and geo-indistinguishability [9, 11, 12, 13, 14, 15, 16]. However, these mechanisms operate in the Euclidean space, and do not take the road network structure under consideration. In this paper, we study location privacy protection for mobile travelers on road networks.

Location privacy approaches on road networks can be studied under three categories: mobile permission systems, mix-zones, and location obfuscation. Two recent works under the permission systems category are SmarPer [17] and PrivacyZone [18]. Permission systems are not comparable to StarCloak because they either completely block location access or randomly perturb the user’s location when the user is in a designated sensitive zone. Mix-zones were proposed to circumvent the risks of continuous location tracking on road networks. After a set of users enter a mix-zone, they change pseudonyms and exit the mix-zone such that the mapping between users’ old and new pseudonyms is hidden. Among recent works under this category, MobiMix considers road network, time spent in mix-zone, and travel speed constraints to build attack-resilient mix-zones [19, 20]. Palanisamy and Liu [21] further improve effectiveness and attack-resilience by studying continuous query correlation attacks and non-rectangular mix-zones. The approach in [22] enables distribution of group secret keys in cryptographic mix-zones in the presence of malicious eavesdroppers, without relying on trusted dealers. Vaas et al. [23] propose using fictive chaff vehicles to establish attack-resilient mix-zones in areas with low traffic density. Mix-zones differ from location obfuscation and StarCloak in several ways. Most importantly, mix-zones do not anonymize users on demand (i.e., when user issues query to a LBS) but rather when sufficiently many users enter a mix-zone.

StarCloak falls under the location obfuscation category. Under this category, Mouratidis and Yiu [24] provide $k$ -anonymity for road network travelers under the reciprocity requirement. Chow et al. [25] support personalized privacy specifications such that a cloaked region satisfies $k$ -anonymity and includes a total minimum segment length of $L$ . Li and Palanisamy [26] propose reversible cloaking schemes such that anonymity levels can be reduced to accommodate multi-level privacy and selective de-anonymization. Yang et al. [27] study the orthogonal problem of path privacy, and define the M-cut requirement to achieve path privacy. A similar path privacy problem is studied in [28]. Another orthogonal problem is semantic-aware and privacy-preserving sharing of sensitive locations under road network constraints [29, 30]. In contrast, StarCloak does not require semantic annotation. Most closely related to our work under this category is XStar [7]. We empirically compare against XStar and show that StarCloak is superior to XStar in several aspects.

7 Conclusion

In this paper, we proposed and evaluated StarCloak, a utility-aware and privacy-preserving location query system for mobile users traveling on road networks. StarCloak has an array of desirable features, including utility-aware and personalized location privacy protection, cost-aware star selection, and randomized star-set pruning for improved attack-resilience. The two optimized variants of StarCloak, namely spatially bounded StarCloak and hybrid StarCloak, improve network bandwidth usage and query processing time, with small sacrifice in success rate, throughput, and anonymization time. In comparison to XStar, StarCloak achieves reduced query processing and anonymization time, higher success rate in anonymization, and higher entropy against the considered attacks.

Bibliography30

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] “Juniper research: Mobile context and location services,” http://www.juniperresearch.com/press-release/context-and-location-based-services-pr 2 , 2014.
2[2] P. R. Center, “Location-based Services,” http://www.pewinternet.org/2013/09/12/location-based-services , 2013.
3[3] K. Lee, L. Liu, B. Palanisamy, and E. Yigitoglu, “Road network-aware spatial alarms,” IEEE Transactions on Mobile Computing , vol. 15, no. 1, pp. 188–201, 2016.
4[4] X. Miao, Y. Gao, G. Mai, G. Chen, and Q. Li, “On efficiently monitoring continuous aggregate k nearest neighbors in road networks,” IEEE Transactions on Mobile Computing , 2019.
5[5] X. Miao, Y. Gao, S. Guo, and G. Chen, “On efficiently answering why-not range-based skyline queries in road networks,” IEEE Transactions on Knowledge and Data Engineering , vol. 30, no. 9, pp. 1697–1711, 2018.
6[6] S. Luo, B. Kao, G. Li, J. Hu, R. Cheng, and Y. Zheng, “Toain: a throughput optimizing adaptive index for answering dynamic k nn queries on road networks,” Proceedings of the VLDB Endowment , vol. 11, no. 5, pp. 594–606, 2018.
7[7] T. Wang and L. Liu, “Privacy-aware mobile services over road networks,” VLDB , vol. 2, pp. 1042–1053, 2009.
8[8] M. Gruteser and D. Grunwald, “Anonymous usage of location-based services through spatial and temporal cloaking,” in Proceedings of the 1st International Conference on Mobile Systems, Applications and Services . ACM, 2003, pp. 31–42.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Utility-Aware and Privacy-Preserving

Abstract

Index Terms:

1 Introduction

2 StarCloak Overview and Concepts

2.1 Road Network Model

Definition 1** (Subgraph).**

Definition 2** (Border Node).**

2.2 Utility-Aware Location Privacy Model

Definition 3** (kkk-user anonymity).**

Definition 4** (lll-segment indistinguishability).**

Definition 5** (Query profile).**

2.3 Inference Attack Models

Definition 6** (Linkability).**

2.4 Query Cost Model

2.5 Problem Statement

3 StarCloak Algorithms

3.1 Star Concept and StarCloak Data Structures

Definition 7** (Star).**

3.2 Incoming Query Pre-processing

3.3 Main StarCloak Procedure

3.4 Select Star

3.5 Cloaking Graph Update

3.6 Candidate Star-Set Selection

3.7 Star-Set Pruning

4 Variants and Optimizations

4.1 Spatially Bounded StarCloak

4.2 Hybrid StarCloak

5 Experimental Evaluation

5.1 Experimental Setup

5.2 Compared Approaches

5.3 Evaluation Metrics

5.4 Experiment Results

6 Related Work

7 Conclusion

Definition 1 (Subgraph).

Definition 2 (Border Node).

Definition 3 ( $k$ -user anonymity).

Definition 4 ( $l$ -segment indistinguishability).

Definition 5 (Query profile).

Definition 6 (Linkability).

Definition 7 (Star).