Plausible Deniability in Web Search -- From Detection to Assessment
Pol Mac Aonghusa, Douglas J. Leith

TL;DR
This paper introduces \\PDE{}, a scalable tool to detect and assess threats to users' plausible deniability in web search, revealing vulnerabilities especially in sensitive topics and proposing a defense method using proxy topics.
Contribution
The paper presents a practical tool for detecting threats to plausible deniability in web search and evaluates defense strategies against search engine learning attacks.
Findings
Threats to deniability are easily detectable across tested topics.
Sensitive topics like health and sexual preferences are particularly vulnerable.
Proxy topics can effectively defend plausible deniability.
Abstract
We ask how to defend user ability to plausibly deny their interest in topics deemed sensitive in the face of search engine learning. We develop a practical and scalable tool called \PDE{} allowing a user to detect and assess threats to plausible deniability. We show that threats to plausible deniability of interest are readily detectable for all topics tested in an extensive testing program. Of particular concern is observation of threats to deniability of interest in topics related to health and sexual preferences. We show this remains the case when attempting to disrupt search engine learning through noise query injection and click obfuscation. We design a defence technique exploiting uninteresting, proxy topics and show that it provides a more effective defence of plausible deniability in our experiments.
| Reference Topic | Probe 1 | Probe 2 | Probe 3 | Probe 4 | Probe 5 |
|---|---|---|---|---|---|
| gay | 64 (33) | 47 ( 5) | 72 (25) | 48 (25) | 48 (19) |
| Reference Topic | Probe 1 | Probe 2 | Probe 3 | Probe 4 | Probe 5 |
|---|---|---|---|---|---|
| anorexia | 56 (52) | 56 (52) | 56 (52) | 56 (52) | 56 (52) |
| bankrupt | 1 ( 1) | 55 (43) | 55 (39) | 58 (48) | 56 (48) |
| diabetes | 40 (38) | 40 (38) | 40 (38) | 40 (38) | 40 (38) |
| disabled | 9 ( 9) | 9 ( 9) | 9 ( 9) | 40 (40) | 40 (33) |
| divorce | 41 (31) | 75 (65) | 56 (46) | 79 (68) | 79 (68) |
| gambling | 16 (12) | 18 (16) | 66 ( 4) | 57 (17) | 18 ( 3) |
| gay | 64 (33) | 47 ( 5) | 72 (25) | 48 (25) | 48 (19) |
| location | 10 ( 2) | 11 ( 3) | 11 (10) | 18 ( 7) | 18 ( 9) |
| payday | 2 ( 2) | 2 ( 2) | 21 ( 2) | 2 ( 2) | 2 ( 2) |
| prostate | 52 (17) | 52 (17) | 52 (17) | 52 (17) | 52 (17) |
| unemployed | 7 ( 5) | 7 ( 6) | 7 ( 6) | 13 ( 7) | 7 ( 7) |
| Reference Topic | Probe 1 | Probe 2 | Probe 3 | Probe 4 | Probe 5 |
|---|---|---|---|---|---|
| anorexia | 56 (52) | 56 (52) | 56 (52) | 56 (52) | 56 (52) |
| bankrupt | 1 ( 1) | 55 (43) | 55 (39) | 58 (48) | 56 (48) |
| diabetes | 40 (38) | 40 (38) | 40 (38) | 40 (38) | 40 (38) |
| disabled | 9 ( 9) | 9 ( 9) | 9 ( 9) | 40 (40) | 40 (33) |
| divorce | 41 (31) | 75 (65) | 56 (46) | 79 (68) | 79 (68) |
| gambling | 16 (12) | 18 (16) | 66 ( 4) | 57 (17) | 18 ( 3) |
| gay | 64 (33) | 47 ( 5) | 72 (25) | 48 (25) | 48 (19) |
| location | 10 ( 2) | 11 ( 3) | 11 (10) | 18 ( 7) | 18 ( 9) |
| payday | 2 ( 2) | 2 ( 2) | 21 ( 2) | 2 ( 2) | 2 ( 2) |
| prostate | 52 (17) | 52 (17) | 52 (17) | 52 (17) | 52 (17) |
| unemployed | 7 ( 5) | 7 ( 6) | 7 ( 6) | 13 ( 7) | 7 ( 7) |
| Reference Topic | Probe 1 | Probe 2 | Probe 3 | Probe 4 | Probe 5 |
|---|---|---|---|---|---|
| anorexia | 54 (45) | 54 (45) | 54 (45) | 54 (45) | 54 (45) |
| bankrupt | 16 ( 9) | 56 (50) | 52 (39) | 54 (45) | 56 (45) |
| diabetes | 46 (35) | 46 (35) | 46 (35) | 46 (35) | 46 (35) |
| disabled | 9 ( 3) | 9 ( 8) | 9 ( 7) | 33 ( 7) | 40 (32) |
| divorce | 13 ( 7) | 123 ( 8) | 54 ( 8) | 85 ( 6) | 85 ( 6) |
| gambling | 18 (16) | 18 (16) | 52 (18) | 18 (10) | 18 (18) |
| gay | 73 (61) | 73 (70) | 76 (46) | 79 (74) | 79 (70) |
| location | 18 (16) | 18 (10) | 18 (10) | 18 (10) | 18 (10) |
| payday | 3 ( 2) | 3 ( 2) | 4 ( 3) | 4 ( 3) | 4 ( 3) |
| prostate | 21 (16) | 21 (16) | 21 (16) | 21 (16) | 21 (16) |
| unemployed | 7 ( 3) | 7 ( 3) | 13 ( 9) | 13 ( 9) | 13 ( 9) |
| Reference Topic | Probe 1 | Probe 2 | Probe 3 | Probe 4 | Probe 5 |
|---|---|---|---|---|---|
| anorexia | 54 (45) | 54 (45) | 54 (45) | 54 (45) | 54 (45) |
| bankrupt | 16 ( 9) | 56 (50) | 52 (39) | 54 (45) | 56 (45) |
| diabetes | 46 (35) | 46 (35) | 46 (35) | 46 (35) | 46 (35) |
| disabled | 9 ( 3) | 9 ( 8) | 9 ( 7) | 33 ( 7) | 40 (32) |
| divorce | 13 ( 7) | 123 ( 8) | 54 ( 8) | 85 ( 6) | 85 ( 6) |
| gambling | 18 (16) | 18 (16) | 52 (18) | 18 (10) | 18 (18) |
| gay | 73 (61) | 73 (70) | 76 (46) | 79 (74) | 79 (70) |
| location | 18 (16) | 18 (10) | 18 (10) | 18 (10) | 18 (10) |
| payday | 3 ( 2) | 3 ( 2) | 4 ( 3) | 4 ( 3) | 4 ( 3) |
| prostate | 21 (16) | 21 (16) | 21 (16) | 21 (16) | 21 (16) |
| unemployed | 7 ( 3) | 7 ( 3) | 13 ( 9) | 13 ( 9) | 13 ( 9) |
| Reference Topic | Probe 1 | Probe 2 | Probe 3 | Probe 4 | Probe 5 |
|---|---|---|---|---|---|
| anorexia | 55 (53) | 53 (53) | 53 (53) | 53 (53) | 53 (53) |
| bankrupt | 11 ( 8) | 48 (33) | 51 (43) | 52 (38) | 52 (38) |
| diabetes | 38 (38) | 38 (38) | 38 (38) | 38 (38) | 38 (38) |
| disabled | 4 ( 4) | 8 ( 7) | 1 ( 1) | 40 (36) | 40 (36) |
| divorce | 19 ( 9) | 65 (31) | 44 (31) | 72 (50) | 72 (50) |
| gambling | 18 (16) | 18 (17) | 18 (18) | 31 ( 3) | 18 (10) |
| gay | 89 (68) | 89 (69) | 88 (64) | 93 (73) | 93 (64) |
| location | 18 (10) | 18 (10) | 18 ( 7) | 18 ( 7) | 10 ( 7) |
| payday | 6 ( 3) | 6 ( 3) | 6 ( 3) | 6 ( 2) | 6 ( 1) |
| prostate | 32 (14) | 32 (14) | 18 (13) | 18 (13) | 18 (13) |
| unemployed | 13 ( 5) | 13 (10) | 13 ( 7) | 13 ( 9) | 7 ( 4) |
| Reference Topic | Probe 1 | Probe 2 | Probe 3 | Probe 4 | Probe 5 |
|---|---|---|---|---|---|
| anorexia | 48 (48) | 48 (48) | 48 (48) | 48 (48) | 48 (48) |
| bankrupt | 16 (10) | 65 (51) | 65 (48) | 65 (49) | 65 (49) |
| diabetes | 41 (38) | 41 (38) | 41 (38) | 41 (38) | 41 (38) |
| disabled | 9 ( 9) | 9 ( 9) | 9 ( 5) | 9 ( 7) | 9 ( 8) |
| divorce | 41 (27) | 75 (38) | 56 (22) | 75 (29) | 75 (29) |
| gambling | 21 (16) | 21 ( 3) | 21 ( 4) | 29 (16) | 18 ( 4) |
| gay | 86 (64) | 86 (64) | 80 (43) | 94 (59) | 94 (59) |
| location | 10 (10) | 8 ( 8) | 8 ( 8) | 18 (13) | 18 (13) |
| payday | 3 ( 2) | 4 ( 2) | 4 ( 2) | 4 ( 2) | 3 ( 1) |
| prostate | 17 (15) | 17 (15) | 17 (15) | 17 (15) | 17 (15) |
| unemployed | 10 ( 7) | 13 ( 7) | 13 ( 7) | 13 ( 7) | 13 ( 7) |
| Reference Topic | Probe 1 | Probe 2 | Probe 3 | Probe 4 | Probe 5 |
|---|---|---|---|---|---|
| anorexia | 59 (50) | 59 (50) | 59 (50) | 59 (50) | 59 (50) |
| bankrupt | 16 ( 8) | 65 (42) | 65 (36) | 59 (40) | 54 (38) |
| diabetes | 36 (36) | 36 (36) | 36 (36) | 36 (36) | 36 (36) |
| disabled | 7 ( 4) | 7 ( 4) | 9 ( 9) | 40 ( 4) | 40 ( 7) |
| divorce | 30 (24) | 30 ( 9) | 30 ( 9) | 30 ( 8) | 30 ( 8) |
| gambling | 6 ( 0) | 18 (16) | 32 (16) | 18 (16) | 18 ( 5) |
| gay | 92 (51) | 92 (77) | 78 (51) | 94 (72) | 94 (80) |
| location | 18 (18) | 10 (10) | 10 (10) | 18 (10) | 18 (10) |
| payday | 2 ( 2) | 2 ( 2) | 3 ( 2) | 3 ( 2) | 2 ( 2) |
| prostate | 17 (17) | 17 (17) | 17 (17) | 17 (17) | 17 (17) |
| unemployed | 13 ( 2) | 13 ( 4) | 13 ( 7) | 13 ( 7) | 7 ( 6) |
| Reference Topic | Probe 1 | Probe 2 | Probe 3 | Probe 4 | Probe 5 |
|---|---|---|---|---|---|
| anorexia | 59 (50) | 59 (50) | 59 (50) | 59 (50) | 59 (50) |
| bankrupt | 16 ( 8) | 65 (42) | 65 (36) | 59 (40) | 54 (38) |
| diabetes | 36 (36) | 36 (36) | 36 (36) | 36 (36) | 36 (36) |
| disabled | 7 ( 4) | 7 ( 4) | 9 ( 9) | 40 ( 4) | 40 ( 7) |
| divorce | 30 (24) | 30 ( 9) | 30 ( 9) | 30 ( 8) | 30 ( 8) |
| gambling | 6 ( 0) | 18 (16) | 32 (16) | 18 (16) | 18 ( 5) |
| gay | 92 (51) | 92 (77) | 78 (51) | 94 (72) | 94 (80) |
| location | 18 (18) | 10 (10) | 10 (10) | 18 (10) | 18 (10) |
| payday | 2 ( 2) | 2 ( 2) | 3 ( 2) | 3 ( 2) | 2 ( 2) |
| prostate | 17 (17) | 17 (17) | 17 (17) | 17 (17) | 17 (17) |
| unemployed | 13 ( 2) | 13 ( 4) | 13 ( 7) | 13 ( 7) | 7 ( 6) |
| Reference Topic | Probe 1 | Probe 2 | Probe 3 | Probe 4 | Probe 5 |
|---|---|---|---|---|---|
| anorexia | 18 ( 5) | 22 (12) | 26 ( 5) | 31 (13) | 32 ( 6) |
| bankrupt | 57 ( 3) | 53 (36) | 50 (34) | 43 (33) | 48 (36) |
| diabetes | 4 ( 2) | 13 ( 8) | 11 ( 8) | 5 ( 3) | 11 ( 2) |
| disabled | 5 ( 2) | 6 ( 2) | 9 ( 3) | 29 (10) | 26 ( 8) |
| divorce | 49 (25) | 51 (33) | 49 (30) | 43 (29) | 43 (29) |
| gambling | 6 ( 2) | 18 ( 4) | 36 (24) | 35 (13) | 31 (13) |
| gay | 36 (33) | 75 (33) | 51 (32) | 39 (20) | 31 (27) |
| location | 9 ( 2) | 11 ( 1) | 7 ( 2) | 6 ( 2) | 9 ( 1) |
| payday | 3 ( 3) | 3 ( 1) | 4 ( 2) | 3 ( 2) | 4 ( 3) |
| prostate | 55 (38) | 68 (36) | 65 (48) | 61 (48) | 64 (42) |
| unemployed | 9 ( 1) | 6 ( 6) | 7 ( 1) | 9 ( 4) | 5 ( 2) |
| Reference Topic | Probe 1 | Probe 2 | Probe 3 | Probe 4 | Probe 5 |
|---|---|---|---|---|---|
| anorexia | 66 (57) | 66 (57) | 66 (57) | 66 (57) | 66 (57) |
| bankrupt | 51 (42) | 51 (42) | 51 (42) | 55 (46) | 56 (46) |
| diabetes | 35 (35) | 35 (35) | 35 (35) | 35 (35) | 35 (35) |
| disabled | 9 ( 9) | 9 ( 9) | 9 ( 9) | 31 (31) | 31 (31) |
| divorce | 30 ( 8) | 73 (54) | 54 (34) | 100 (49) | 100 (49) |
| gambling | 3 ( 1) | 16 (16) | 53 (11) | 16 ( 6) | 6 ( 2) |
| gay | 69 (65) | 77 (73) | 70 (60) | 82 (75) | 81 (71) |
| location | 18 (10) | 10 ( 6) | 10 ( 6) | 14 (10) | 18 ( 7) |
| payday | 2 ( 2) | 2 ( 2) | 2 ( 2) | 2 ( 2) | 2 ( 2) |
| prostate | 17 (17) | 17 (17) | 17 (17) | 17 (17) | 17 (17) |
| unemployed | 4 ( 4) | 7 ( 7) | 7 ( 7) | 7 ( 7) | 7 ( 6) |
| Reference Topic | Probe 1 | Probe 2 | Probe 3 | Probe 4 | Probe 5 |
|---|---|---|---|---|---|
| anorexia | 50 (12) | 27 ( 9) | 26 ( 9) | 36 (10) | 33 (11) |
| bankrupt | 5 ( 3) | 43 (33) | 39 (37) | 36 (35) | 38 (35) |
| diabetes | 38 ( 6) | 18 ( 7) | 17 ( 5) | 17 ( 7) | 11 ( 5) |
| disabled | 2 ( 1) | 4 ( 1) | 5 ( 3) | 39 (25) | 40 (25) |
| divorce | 24 (17) | 37 (31) | 37 (31) | 35 (25) | 35 (25) |
| gambling | 24 ( 0) | 7 ( 4) | 54 (23) | 33 (23) | 68 (20) |
| gay | 68 (68) | 68 (65) | 54 (52) | 46 (36) | 47 (42) |
| location | 8 ( 8) | 8 ( 8) | 8 ( 8) | 8 ( 8) | 8 ( 8) |
| payday | 4 ( 1) | 2 ( 2) | 4 ( 2) | 4 ( 3) | 4 ( 4) |
| prostate | 59 (57) | 67 (62) | 58 (56) | 60 (54) | 51 (44) |
| unemployed | 4 ( 3) | 8 ( 3) | 10 ( 4) | 3 ( 2) | 10 ( 1) |
| Reference Topic | Probe 1 | Probe 2 | Probe 3 | Probe 4 | Probe 5 |
|---|---|---|---|---|---|
| all topics | 0 ( 0) | 0 ( 0) | 0 ( 0) | 0 ( 0) | 0 ( 0) |
| Reference Topic | Probe 1 | Probe 2 | Probe 3 | Probe 4 | Probe 5 |
|---|---|---|---|---|---|
| all topics | 0 ( 0) | 0 ( 0) | 0 ( 0) | 0 ( 0) | 0 ( 0) |
| Reference Topic | All Topics |
|---|---|
| True Detect | 100.0% |
| False Detect | 0.0% |
| Reference Topic | All Topics |
|---|---|
| True Detect | 100.0% |
| False Detect | 0.0% |
| Reference Topic | All Topics |
|---|---|
| True Detect | 97-100.0% |
| False Detect | 4-8% |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts · Spam and Phishing Detection · Hate Speech and Cyberbullying Detection
Plausible Deniability in Web Search – From Detection to Assessment
Pól Mac Aonghusa and Douglas J. Leith P. Mac Aonghusa is with IBM Research and Trinity College Dublin.D.J. Leith is with Trinity College Dublin.
Abstract
Web personalisation uses what systems know about us to create content targeted at our interests. When unwanted personalisation suggests we are interested in sensitive or embarrassing topics a natural reaction is to deny interest. This is a practical response only if denial of our interest is credible or plausible. Adopting a definition of plausible deniability in the usual sense of “on the balance of probabilities”, we develop a practical and scalable tool called PDE allowing a user to decide when their ability to plausibly deny interest in sensitive topics is compromised. We show that threats to plausible deniability are readily detectable for all topics tested in an extensive testing program. Of particular concern is observation of threats to deniability of interest in topics related to health and sexual preferences. We show this remains the case when attempting to disrupt search engine learning through noise query injection and click obfuscation. We design a defence technique exploiting uninteresting, proxy topics and show that it provides a more effective defence of plausible deniability in our experiments.
Index Terms:
Privacy, Indistinguishability, Plausible Deniability, Recommender Systems, Web Search.
I Introduction
Encountering inappropriate or unwanted personalised online content can be awkward, depending on social context. What may appear humorous in one situation may be embarrassing, or worse, in another context. When presented with content regarded as inappropriate or discreditable, a user may wish to deny interest in the content.
**The Oxford English Dictionary defines plausible deniability in terms of reasonable doubt as “the possibility of denying a fact (especially a discreditable action) without arousing suspicion”, [1]. Informally, user activity observed by the search engine exhibits plausible deniability when user activity is consistent with the user interest in any one of several topics, at least one of which is not sensitive for the user, with sufficiently high probability. **
Accordingly, we assess threats to plausible deniability during web search by testing if content appearing on search result pages can be attributed to user interest in a specific sensitive topic, versus user interest in any other topic, on the balance of probabilities. We ask when can a user plausibly deny interest in a range of sensitive topics during online web-search sessions?
We provide guarantees on the best-possible level of plausible deniability a user can expect during web search in our model. We also introduce a new Plausible Deniability Estimator, called PDE, that can be used to assess privacy threats. Outputs from PDE can be represented in terms of readily interpretable probabilities thereby providing an informative indication of risk to the user.
Our methods are chosen to be straightforward to implement using openly available technologies. We use our results to design and assess counter-measures against threats to plausible deniability during online web-search sessions, using the Google Search as a source of data. We are able to assess threats to plausible deniability from sensitive topic learning in a range of potentially sensitive topics, such as health, finance and sexual orientation.
Our experimental measurements indicate that, by observing as few as 3-5 revealing queries, a search engine can infer a user is interested in a sensitive topic on the balance of probabilities in of topics tested when no effective defence is provided. In the case of topics related to health and sexual preferences measurements from PDE suggest that the probability a user is interested in sensitive topics related to sexual preference is as high as greater than their probability of interest in any other topic.
We show that defence strategies based on random query injection of random noise queries and misleading click patterns may provide some protection for individual, isolated queries, but that search engines are able to learn quickly. Significant levels of threat to plausible deniability are detected even when very high levels of random noise are included in the query session or when misleading click patterns are used. These approaches seem to offer little or no improvement to user privacy when considering plausible deniability over the longer term.
In contrast, we find that a defence employing topics that are commercially relevant but uninteresting to the user as proxy topics is effective in protecting plausible deniability in the case of of sensitive topics tested. The proxy topic defence differs from traditional obfuscation approaches in actively exploiting the observed ability of the search engine to learn topics quickly, deflecting the focus of interest toward the proxy topic and away from the true topic of interest to the user.
The proxy topic defence works in our experiments, and is simple to apply. However it is important to recognise that we are faced with commercially motivated and increasingly powerful systems with a history of adapting quickly. Our results suggest that search engine capability is continuously evolving so that we can reasonably expect search engines to respond to privacy defences with more sophisticated learning strategies. Our results also point towards the fact that the text in search queries plays a key role in search engine learning. While perhaps obvious, this observation reinforces the user’s need to be circumspect about the queries they ask if they want to avoid search engine learning of their interests. Equally, our results suggest that simple countermeasures, such as proxy topics, that make accurate personalisation more expensive for the online system represent a promising approach in developing new techniques for practical user privacy.
II Related Work
We model a search engine as a black-box by making minimal assumptions about its internal workings. The technique of using predefined probe queries, injected at intervals into a stream of true user queries as fixed sampling points, was used in [2] where the focus of the paper was detection of possible privacy threats. Extending the idea of probe queries, discussed in [2], several new applications are presented in this current paper such as the model of plausible deniability and the associated PDE estimator, the proxy topic defence model and the evaluation of multiple noise and click models for each of these. The technique of using predefined probe queries is borrowed from black-box testing. Modelling an adversary as a black-box, where internal details of recommender systems algorithms and settings are unknown to users, is mentioned in [3] and [4].
The importance of control over appropriate flow of information is discussed extensively in legal and social science fields. Individual control over personal information flow is discussed in a critique of the nothing to hide defence for widespread surveillance in [5]. Individual privacy and its social consequences are discussed in [6, 7], where agency or control over appropriate disclosure is identified as a key concern.
Plausible deniability as a privacy defence for web search is addressed in the literature. In [8] alternative, less revealing queries are mixed with sensitive topic queries to obfuscate true user interest. In [9] queries with generalised terms are used to approximate the search results of a true query, which is never revealed. Plausible deniability for database release has been studied in the context of user data anonymization. For example, in [10] a definition of plausible deniability is applied to examine mechanisms for differentially private data set release. More generally, plausible deniability to counteract the impact of personalisation is examined in [11] for the case of a privacy aware user who knows they are being observed. The authors show that no matter what the behaviour of the user is, it is always compatible with some concern over privacy. In this way the user can offer their awareness of privacy concerns as a general alibi to justify any range of preferences. Plausible deniability for providers of online services is also discussed in the literature. For example, in [12] a distributed virtual machine infrastructure is used to provide deniability to online data providers by obfuscating the origin of index data used in recommendations.
The potential of online profiling and personalisation resulting in censorship and discrimination have received growing attention in the research literature. Personalisation as a form of censorship – termed a filter bubble in [13] – is explored in [4]. In a filter bubble, a user cannot access subsets of information because the recommender system algorithm has decided it is irrelevant for that user. In [4] a filter bubble effect was detected in the case of Google Web Search in a test with 200 users. Discrimination associated with personalisation has been shown for topics generally regarded as sensitive. In [14] an extensive review of adverts from Google and Reuters.com showed a strong correlation between adverts suggestive of an arrest record, and, an individual’s ethnicity. In [15], the authors used online advertising targeted exclusively to gay men to demonstrate strong profiling in the case of sexual preference.
Several approaches exist for obfuscating user interactions with search engines with the aim of disrupting online profiling and personalisation. GooPIR, [16, 17], attempts to disguise a user’s “true” queries by adding masking keywords directly into a true query before submitting to a recommender system. Results are then filtered to extract items that are relevant to the user’s original true query. PWS, [18], and TrackMeNot, [19, 20], inject distinct noise queries into the stream of true user queries during a user query session, seeking to achieve an acceptable level of anonymity while not overly upsetting overall utility. Search engine algorithm evolution regarded as a continuous “arms-race”, is evidenced in the case of Google, for example, by major algorithm changes such as Caffeine and Search+ Your World have included additional sources of background knowledge from Social Media, improved filtering of content such as Panda to counter spam and content manipulation, most recently semantic search capability has been added through Knowledge Graph and HummingBird, [21], [22], [23].
Consent to share data for agreed purposes is critical to user trust in service providers and is a key feature of the EU General Data Privacy Regulation (GDPR), [24]. Several notable browser add-ons, such as Mozilla Lightbeam, [25], and PrivacyBadger, [26], facilitate more active user awareness of possible consent issues by helping a user understand where their data is shared with third parties through the sites they visit. XRay, [27], reports high accuracy in identifying which sources of user data such as email or web search history might have triggered particular results from online services such as adverts. Active consensual sharing of personal data is investigated in [28] through an in-browser capability, called RePriv, allowing a user to select which portions of their personal data they wish to share with requesters. Both PrivAd, [29], and Adnostic, [30] investigate safe profiling through generalisation of user interests in the browser. Both Adnostic and PrivAd seek to protect the true interests of the user by obfuscating and filtering personalised content through a published interface.
Evaluation of the effectiveness of privacy defences in the wild was performed by [31] in the case of TrackMeNot where the authors demonstrate that by using only a short-term history of search queries it is possible to break the privacy guarantees of TrackMeNot using readily available machine-learning classifiers. The importance of background information in user profiling is explored in [32]. Here a similarity metric measuring distance between known background information about a user, given by query history, and subsequent queries is shown to identify 45.3% of TrackMeNot and 51.6% of GooPIR queries. Anti-tracking is an ongoing area of research and recently in [33] an anti-tracking browser called TrackingFree was reported to be effective at disrupting all of the trackers in the Alexa top-500 list. Self-regulation has also proven problematic, in [34], six different privacy tools, intended to limit advertising due to behavioural profiling, are assessed. The tools assessed implement a variety of tactics including cookie blocking, site blacklisting and Do-Not-Track (DNT) headers. DNT headers were found to be ineffective in tests at protecting against adverts based on user profiling.
Examples of unsubstantiated and misleading claims by providers of technology to enhance individual privacy are common, [35, 36]. Concerns about objective evaluation of the claims by providers of such technologies have attracted the attention of Government, where the need for “Awareness and education of the users …” is identified in [37] as a key step to building trust and acceptance of privacy technologies for individuals. Accountability and enforcement of accountability for privacy policy is also attracting attention. Regulatory requirements for data handling in industries such as Healthcare (HIPPA) and Finance (GLBA) are well established. The position with respect to handling of data collected by online recommender systems is less clear. In [3], the author reviews computational approaches to specification and enforcement of privacy policies at large scale.
Our contribution in this paper is orthogonal to the contributions in the works discussed here. We address the complimentary challenge of privacy monitoring by detecting possible inappropriate use of personal user data by observing personalised outputs. In this respect our approach can be deployed in conjunction with the technologies mentioned.
III General Setup
III-A Threat Model
The setup we consider is that of a general user of a commercial, for-profit online search engine. The relationship between the user, denoted , and the online system, denoted by , is based on mutual utility where both parties obtain something useful from the interaction – gets useful information and recommendations – while gets an opportunity to “up-sell” to through targeted content such as advertising. As a commercial business, recognises cost per user interaction and responsiveness of service are critical to competitiveness. Accordingly content based on user profiling is intended to adapt dynamically to the changing interests of . is generally informed regarding good personal privacy practice and is alert to unwanted or embarrassing personalisation. When detects threats to her privacy she wishes to assess her ability to plausibly deny her interest in compromising content to avoid awkward social implications. The relationship between and is generally described as “honest but curious” in the literature. Accordingly we will refer to as an observer rather than the more traditional adversary.
Let denote a set of sensitive categories of interest to , e.g. bankruptcy, cancer, addiction, etc. Gather all other uninteresting categories into a catch-all category denoted . The set is complete in the sense that all user topic interests can be represented as subsets of with the usual set operations. We are interested in threats from search engine learning that compromise the ability of to plausibly deny their interest in sensitive topics. We will assess threats to plausible deniability by testing if content appearing on search result pages can be attributed to interest in a specific topic , versus interest in any other topic, on the balance of probabilities.
We treat as a black-box with internal state unknown to . As a starting point, our initial assumption is that is motivated to use its internal state of knowledge of when producing personalised outputs for , thereby revealing something about its internal state.
Assumption 1** (Revealing Observations).**
*A search engine selects personalised page content, such as adverts, it believes are aligned with our interests.
When a search engine infers that a particular advertising category is likely to be of interest to a user, and it is more likely to generate click through and sales, it is obliged to use this information when selecting which adverts to display. This suggests that, by examining advert content recommended by the search engine, it is possible to detect evidence of sensitive topic profiling by the search engine. Assumption 1 is fundamental to the application of our approach in that, if does not produce content that reveals evidence of learning then, since is a black-box in our model, our approach has nothing to say about observer learning. In summary, we rely on the observer to show his hand through adverts – our approach can only observe what is shown. In practice this does not appear to be a significant limitation with regard to many topics regarded as sensitive to users. In our experiments we observe an average of adverts per probe query with less than of probe queries resulting in no advert content. We also note that our scope is limited to examining advert content. We note that in addition to adverts commercial search engines also typically provide additioenal personalised content that could also be tested for evidence of learning, for example Google provides a variety of personalised content such as “top stories”, related Tweets. However we leave consideration of these as future work.
A user interacts with a search engine by issuing a query, receiving a web page in response and then clicking on one or more items in the response. In the case of web-search, a single such interaction, labeled with index , consists of a query, response page, item-click triple, denoted .
We model construction of a query as selection of words from a generally available dictionary denoted . We assume that words in are matched to topics in . The word–topic category matching is not unique and words may be matched to multiple topic categories. A user session of length steps consists of a sequence of individual steps, and is denoted . The sequence of interactions is jointly observed by the user and the search engine – and perhaps several other third-party observers. The relationship between prior and posterior background knowledge at each step is
[TABLE]
where denotes the initial background knowledge state of at the beginning of a session immediately before is observed. The detail of is unknown to who treats as a black-box. Figure 1 illustrates the interaction between user and search engine in our model.
Let the random variable with sample space represent user interest in categories in during a session. A value of in element of indicates evidence is detected of user interest in topic .
After each step of a query session, can construct a posterior probability distribution for , namely, for
[TABLE]
We use to represent the observation at step , so that indicates is observed at step .
The individual interest vector for topic is denoted , a vector with a single in the position and [math] in all other positions. The probability of that a user is interested only in topic at step of a session and the posterior probability of detecting evidence that a user is interested only in topic at query step , conditioned on observing and background knowledge , are
[TABLE]
respectively. Since contains all possible topics:
[TABLE]
III-B Example: Single Sensitive Category
**To illustrate mathematical results as we go, we use a simple ideal model, consisting of a single sensitive category, as an illustrative example. We will refer to it as the Single Sensitive Category (SSC) model. The single sensitive topic is denoted and the catch-all, non-sensitive topic representing every other topic that is not part of the sensitive topic is denoted . **
Suppose can issue queries related to either of two topics denoting sensitive and non-sensitive interests respectively. models the process by which draws queries according to an initial probability model
[TABLE]
**with . ** On observing a query from at step the observer outputs one of according to the associated conditional probabilities at step given by
[TABLE]
The SSC model is deliberately simple as the intention is to illustrate mathematical concepts. The model is generally useful for exploring black-box interactions and can be readily extended to include more sophisticated scenarios such as allowing to select from multiple topics, as would happen when attempts to obfuscate her interests by switching topics.
III-C Plausible Deniability
Our threat assessment model is based on plausible deniability. Informally, the user activity observed by the search engine exhibits plausible deniability when, with high probability, is consistent with the user being interested in any one of several topics at least one of which is not sensitive for the user. That is, the user activity supports reasonable doubt about the user’s actual interest in a given sensitive topic.
In our setup, the topics are while the observed activity is at step in a session (i.e. the queries, search result pages and associated user clicks). We formalise plausible deniability as follows.
Definition 1** (()–Plausible Deniability ).**
*For privacy parameters and and a set of topics , a user with a true user interest vector is said to have ()–Plausible Deniability * at step in the query session, if, for observations , made at each step of a session by an observer possessing initial background knowledge at the beginning of the session, there exist at least other such that
[TABLE]
where
[TABLE]
For (5) to be well-defined, all probabilities are assumed to be non-zero. In practice, this is not a significant restriction since categories with zero probability are gathered into the catch-all topic . By applying the chain-rule for conditional probability, (6) can be rewritten as
[TABLE]
where
[TABLE]
is the incremental change in ()–Plausible Deniability arising from the single observation at step of the session.
In the case of the SSC model, there are two topics – sensitive and non-sensitive – so that can at best hope for –Plausible Deniability for the sensitive topic, in which case (9) can be written as
[TABLE]
when emits a sensitive output at step , and
[TABLE]
when a non-sensitive output is emitted at step . Substituting these values into (8) and assuming sensitive outputs and non-sensitive outputs in a session of length , let denote the sub-sequence of steps where sensitive outputs are detected and be the steps where non-sensitive outputs are detected so that
[TABLE]
III-D Comparison with Other Anonymity Measures
Intuitively, Definition 1 is similar to –anonymity in that an observer can only explain observations to within a generalised set consisting of at least topic vectors with probability bounded by the choice of . Definition 1 differs from regular k–anonymity in requiring both upper and lower bounds in (6) since evidence of loss of interest in a sensitive topic may be as revealing and potentially embarrassing as evidence of increase of interest.
Definition 1 can also be compared with a slightly weaker form of Differential Privacy. Informally, making an observation should not make significantly more, or less, confident of user interest in a particular sensitive topic.
**From (9) the incremental change due to a single observation is **
[TABLE]
**by applying Bayes Theorem. Since (9) is bounded above and below for at least other when Definition 1 holds, it follows that **
[TABLE]
for at least other topic vectors – but not necessarily for all topic vectors. In which case we say that m–Differential Privacy holds for whenever Definition 1 holds, meaning that for any topic vector it is impossible to distinguish it from at least other topic vectors in . This is a slightly weaker statement of Differential Privacy from the usual global definition.
III-E Testing for Plausible Deniability
The following indistinguishability definition of privacy risk measures the change in belief by a search engine due to inference from observed user events relative to its prior belief conditioned on the background data available at the start of the query session. It is adapted from work begun in [2] and using it allows us to adapt tools originally developed there.
Definition 2** (-Indistinguishability ).
For a privacy parameter , a user with interest vector is said to be s said to be -Indistinguishable with respect to an observation of user actions at step , if
[TABLE]
where
[TABLE]
*is called the -Indistinguishability score of the interest vector for observation and background knowledge at step .
In other words, for *-Indistinguishability to hold at step of a query session, the conditional posterior distribution should be approximately equal to the prior distribution at the beginning of the query session for the true interests of the user. To ensure (16) is well defined we assume all probabilities in (16) are non-zero, so that . Expression (16) implies that if -Indistinguishability *holds at step for an interest vector , then .
The next result provides the necessary connection between *-Indistinguishability and ()–Plausible Deniability to apply tools, developed in [2] for -Indistinguishability *, to ()–Plausible Deniability .
Proposition III.1**.**
If -Indistinguishability holds on a subset for for step and the initial step , then ()–Plausible Deniability holds on for . Furthermore
[TABLE]
Proof**.**
Assume -Indistinguishability holds on then for any . From (9)
[TABLE]
[TABLE]
[TABLE]
Where expressions (b) and (c) in (20) are and respectively from the definition in (17).
Therefore from the definition of in (8)
[TABLE]
So that (18) holds. Since individual elements in (22) satisfy -Indistinguishability for it follows that ()–Plausible Deniability holds as required. ∎
Proposition III.1 provides a basic strategy for asserting when ()–Plausible Deniability holds. By establishing a value of for which a collection of topics satisfies *-Indistinguishability , ()–Plausible Deniability follows with, at least, . This is a minimum guarantee, as there may be topics for which -Indistinguishability *fails but ()–Plausible Deniability holds. In our experiments we test whether can plausibly deny whether or not observed actions can be uniquely associated with interest in a given sensitive topic versus interest in “any other” topic in , so that .
For a topic the expression for ()–Plausible Deniability becomes
[TABLE]
where denotes the topic interest vector representing interest in the topics .
The following result connects to variation in probabilities
Proposition III.2**.**
If ()–Plausible Deniability holds for and with and then
[TABLE]
And so
[TABLE]
is a lower bound for the best possible achievable level of ()–Plausible Deniability .
Proof**.**
*If ()–Plausible Deniability holds for then *
[TABLE]
so that is a lower bound for all for which ()–Plausible Deniability holds. The result follows by substituting the expression in (21) for in (26). ∎
Proposition III.2 will be used later to create an estimator for that can be measured in experiments. From now on we simplify our discussion to the case and so experimental results are reported accordingly for the two-topic case, .
IV Implementation
IV-A Preliminaries
During testing we wish to use (25) to create an estimator, we call PDE, to estimate the level of ()–Plausible Deniability afforded. Since estimating the quantities in (25) uses PRI, we recap the bare essentials of PRI and refer the reader to [2] for more details.
To test for learning PRI injects a predefined probe query into a stream of “true” queries during a query session. In this way, any differences detected in advert content in response to probe queries can be compared to identify evidence of learning. An ideal probe should not disrupt the learning process of . Denote the event a probe query is selected from at step by . We formalise the notion of an ideal probe query by demanding that observing should be conditionally independent of the user topic given the existing background knowledge of the observer
[TABLE]
and so observing the probe query and associated clicks does not provide any more information to the observer about the interests of than the current background knowledge already provides. From (27)
[TABLE]
In practice, choosing an ideal probe query is achieved by selecting words from that match words for several topics in so that it is not possible to associate a single topic in with the probe query.
Construction of PRI is based on several assumptions, the first of these assumptions is that the background knowledge at the first step of a query session, , provides sufficient description of background knowledge for all subsequent steps of that query session, .
Assumption 2** (Sufficiently Informative Responses).**
Let label the sub-sequence of steps at which a probe query is issued. At each step at which a probe query is issued,
[TABLE]
*for each topic . *
So that it is not necessary to explicitly use knowledge of the search history during the current session when estimating for a topic as this is already reflected in the search engine response, , with the initial background knowledge capturing background knowledge up to the start of the session, at step . Assumption 2 greatly simplifies estimation as it means we do not have to take account of the full search history, but requires that the response to a query reveals search engine learning of interest in sensitive category which has occurred. Assumption 2 was called the “Informative Probe” assumption in [2].
The next assumption is that adverts are selected to reflect search engine belief in user interests. In this way adverts are assumed to be the principal way in which search engine learning is revealed. Given this assumption, conditional dependence on can be replaced with dependence on the adverts appearing on the screen.
Assumption 3** (Revealing Adverts).**
In the search engine response to a query at step it is the adverts on a response page which primarily reveal learning of sensitive categories.
[TABLE]
*for each topic . *
We estimate background knowledge by selecting a training data-set, denoted , consisting of (label, advert) pairs; where the label is the category in associated with the corresponding advert. For example, when testing for evidence of a single, sensitive topic, called “Sensitive”, contains items labeled “Sensitive’ or “Other”, where “Other” is the label for the uninteresting, catch-all topic . In this way approximates the prior observation evidence available at the start of the query session so that is an estimator for .
Text processing of produces a dictionary of keyword features. This processing removes common English language high-frequency words and maps each of the remaining keywords to a stemmed form by removing standard prefixes and suffixes such as “–ing” and “–ed”. The dictionary represents an estimate of the known universe of keywords according to the background knowledge contained in the training data.
Text appearing in the adverts in a response page is preprocessed in the same way as to produce a sequence of keywords from for each advert; denoted . Words not appearing in are ignored in our experimental setup for simplicity since sessions are short. In an operational setting it is possible, for example, to update when new keywords are encountered and refactor accordingly.
Let , denote the number of times an individual keyword occurs in a sequence . The relative frequency of an individual keyword is therefore,
[TABLE]
recalling that only keywords appearing are admissible due to the text preprocessing in our setup.
Let be a sensitive topic of interest, and let denote the subset of where the labels corresponds to . Let denote the set of adverts labelled for any topic in . The PRI estimator for given adverts appearing on the result page for query number , is111Note that in [2] the expression given for is incorrect and is corrected here.:
[TABLE]
where we concatenate all of the advert text on page into a single sequence of keywords and is the relative frequency of within this sequence. Similarly, concatenating all of the keywords in the training set , respectively , into a single sequence then , respectively , is the relative frequency of within that sequence.
IV-B Tuning the PRI Estimator
The quantity in the expression for the PRI estimator, (32), is problematic when the adverts on page do not contain any of the topic keywords in dictionary i.e. when , indicating there is no detectable evidence of a particular topic. To be consistent with the definition of *-Indistinguishability *in Section 2, should result in a PRI score of one for that topic. We therefore replace with
[TABLE]
Training data is based on a sample of all possible adverts for a particular topic. We may be unlucky so that during the training phase we fail to observe adverts containing infrequently occurring keywords for a particular topic. In this case the relative frequency of such a keyword will be zero and it will not contribute when estimating PRI if encountered in an advert. To address this we introduce a Laplace smoothing parameter as follows
[TABLE]
The parameter enforces a minimum frequency of on every keyword. The expression (32) is adjusted correspondingly to give a new estimator we call PRI+:
[TABLE]
We will use the PRI+ estimator, given by (37), from now on in this paper, unless stated otherwise. In our experiments we find empirically, through verification with the training data, that choosing the parameter worked well.
IV-C The PDE Estimator
Substituting the PRI+ estimator for in (25) gives the PDE estimator
[TABLE]
From Proposition III.2, the PDE estimator in (38) can be interpreted directly as the best possible level of ()–Plausible Deniability a user can claim in the case . We report the maximum value of PDE measured by probe step in our experiments to show the worst possible ()–Plausible Deniability scenario for . We also report the median value of PDE as a representative bound for approximately of the samples. An example of reporting is shown in Table I for the reference topic “gay”.
For example, from Table I, a reported maximum value of PDE of in the second column indicates that the difference in probabilities that is uniquely interest in the reference topic versus being interested in any other topic is at least in the worst case by probe step . The median value of in parentheses in the Probe 3 and 4 columns indicates that the difference in probabilities can be expected to be at least in of cases by probes and . Overall the results suggest that ()–Plausible Deniability is unlikely to constitute a reasonable defence in this case.
Reported values of PDE may increase, or decrease, during a session as individual queries are judged as more, or less, revealing by the PDE estimator. Inspection of the query scripts generated for the topic , for example, shows that the queries associated with probe step are same sex relationships and how do i know if I’m gay, both of which appear revealing. The queries from the test script corresponding to probe steps and are HIV symptoms, HIV treatment, HIV men and aids men which may not point as distinctly to specific interest in the as they could reasonably be associated with health concerns.
The zeroth probe in a session is always run first, before any other query, to establish a baseline PRI+ score for the session. As a result the measured PDE values for the zeroth probe is always [math] for both maximum and median values and is not reported in our results.
One popular approach to designing defences of ()–Plausible Deniability is to attempt to hide in the crowd. For example, by injecting varying degrees of noise in the stream of observations in the hope that will not detect the true sub-stream of sensitive events. In [2], the authors observe that varying click patterns is seen to change the absolute volume of adverts appearing on a page. As both user clicks and queries are potential indicators of user interest for an observer we test injected noise from both queries and clicks as possible defence strategies.
An alternative tactic is to invert the previous approach by instead attempting to hide in plain sight. By choosing a non-sensitive proxy topic, chosen to attract personalised content can then carefully hide true, sensitive queries in a stream of proxy topic queries. By demonstrating clear interest in a proxy non-sensitive topic may tip the balance of probability toward the proxy topic by drawing the attention of the observer .
V Experimental Results
V-A Preliminaries
To facilitate easy comparison we use the same experimental data collection setup as [2]. We summarise the key elements here with additional detail in the Appendix and refer the reader to [2] for full details.
User interest topic categories taken from [2], are used in our experiments. Of the user interest topics, (i) ten are sensitive categories associated with subjects generally identified as causes of discrimination (medical condition, sexual orientation etc) or sensitive personal conditions (gambling addiction, financial problems etc), (ii) a further sensitive topic is related to London as a specific destination location, providing an obviously interesting yet potentially sensitive topic that a recommender system might track, (iii) the last topic is a catch-all category labeled “Other”.
To construct sequences of queries for use in test sessions, we select a probe query, providing a predefined sampling point for data collection. Numbering the probes in a session starting from [math], the zeroth query issued in every session is a probe query. The zeroth probe is used to establish the baseline for calculations of the PDE estimator for subsequent probe queries. The PDE estimator, from (38), of the zeroth probe in a session is [math] and so is not included in reports of experimental results. Measurements of PDE values are reported for each of the probe queries – during experiments providing a consistent sample for analysis.
In our experiments, when implementing the “Proxy Topic” defence model, we choose three uninteresting, proxy topics likely to attract adverts, namely tickets for music concerts, searching for bargain vacations and buying a new car.
All scripts were run for registered users and anonymous user on the Google search engine, yielding a data set consisting of probe queries in total across all of the test user interest topics. Test data was divided into individual test data sets based on different test configurations with each test data set consisting of approximately probe queries.
A separate hold-back was created for a common training data set of approximately queries. The PDE estimator in (38) uses the training data-set to model the prior background knowledge . We do not re-train PDE during testing as new adverts are encountered. Experimental measurements of PDE are with respect to the common training set for consistent comparison.
All queries in a test session were automatically labelled with the intended topic of the test session as given by the query script used. For example, all queries from a session about “prostate“ are labeled as “prostate” including probe queries. In this respect the labels capture intended behaviour of queries, rather than attempting an individual interpretation of specific query keywords during a user session. Test data is automatically divided into folds for processing so that, reported statistics are taken over distinct, randomised sub-samples of test data.
Before proceeding to testing with PDE, we verify PRI+ by comparing its detection capability with previous results obtained in [2] for the PRI estimator and compare the performance of PRI+ with alternative implementations using Naive Bayes and Support Vector Machine as sensitive topic detectors.
Comparison results between PRI and PRI+ are shown in Table VI. and were produced by processing data taken from [2] but applying the PRI+ estimator to decide which topic is detected. For comparison with [2], we declare a topic has been detected during a query session, consisting of probe queries, if at least one of the probe queries is detected as topic . For comparison, detection results for the PRI estimator from Table XIV(b) in [2], are reproduced as Table VI(b). The True Detection rates using PRI+ estimator are better or equal for each topic than the rates reported in [2]. The False Detection rates are also better or equal in the case of all topics tested comparing favourably with the results obtained in [2].
Comparison of PRI+ with alternative implementations was performed by taking results from Multinomial Naive Bayes (NB) and Linear SVM (SVM) classifiers to estimate the probabilities in the definition of in (17). The intent of the comparison is to determine which of the NB, PRI+ and SVM estimators detect privacy threats, using the definition of in (17), for test items previously labeled as “sensitive” by examining the topic of the query used. To qualify as a privacy threat we choose a value of . We expect precision to be substantially less than 100% for all estimators because the threshold will filter out weaker detections where .
Other than varying how was estimated, all other inputs and calculations were identical. A common test data set was constructed by selecting 5,500 result pages for each sensitive topic and then randomly selecting an additional 5,500 result pages labeled for the non-sensitive topic. In this way each sensitive topic had a balanced verification data set of 11,000 labeled items. Each verification data-set was divided randomly into test–training sets and calculations repeated 5 times for 5-fold verification of each of the NB, PRI+ and SVM estimators. The Multinomial Naive Bayes and Linear SVC modules from the Python Sklearn package were used to construct the NB and SVM estimators, [38]. After common preprocessing each of the NB, PRI+ and SVM classifiers were trained and probability estimates captured for the 5-fold test data-sets. A threat is declared “detected” if the calculated valued of for the sensitive topic exceeds . Precision of sensitive topic threat detection is shown by topic in Figure 2 for the NB, PRI+ and SVM approaches.
**The results Figure 2 indicate that that the PRI+ estimator detects significantly more true-positive detection results than either of the NB or SVM estimators for all sensitive topics tested. The initial detection sensitivity of each of these estimators is influenced by the labelling assigned to examples in the training set. We adopt the perspective that privacy tools should err on the side of caution so that high detection sensitivity in the initial “out of the box” stage is a prudent approach. In a real-world application of PRI+ the user would provide incremental training examples over time reflecting their tolerance of privacy risk and so tune PRI+. **
V-B Establishing a Baseline
We begin with a sequences of queries, interleaved with probe queries, in what we term a “no click, no noise” model. Here there is no injected noise and no items are clicked on any of the search results pages. This model provides a baseline, where the queries alone are available to the recommender to learn about a user session as it progresses. Measurements of PDE for all topics using the “no click, no noise” model are shown in Table II.
For the health-related topics Anorexia, Diabetes, Prostate, Bankrupt, Divorced, Gay the reported results are high, indicating lack of plausible deniability for each of these topics. It is concerning that personal circumstances, health status and sexual orientation appear to be the most revealing topics according to our experiments. In the case of the topic Disabled there is more cause of concern about ()–Plausible Deniability as the session progresses. On inspection of the associated query script this appears to be again related to the specificity of the queries at each probe step. At the beginning of this script the queries are related to availability of services – for example, locations of disabled parking – while later queries are more specific to named conditions – for example, treatment for spina bifida.
The topics appear among the topics of least concern from the perspective of ()–Plausible Deniability . Both of the topics Payday and Unemployed asked queries about availability of social support services whereas queries for the topic Bankrupt asked about availability of paid professional services such as lawyers and accountants. It is perhaps an illustration of the motivations of a for-profit service where users seeking social supports are of less interest than users seeking expensive paid services.
Overall, measurements of PDE in experiments appear to agree with expectations from inspection of the underlying queries. Our results suggest that queries are a strong signal to the observer of user interest, and that estimates from PDE appear to distinguish queries that are strongly revealing of specific topic interest from more generic queries where plausible deniability is clearer.
V-C The Effect of Random Noise Injection
Following from Section III-C, we now consider the impact of injecting non-informative queries chosen at random from our popular query list into a user session. We simply refer to these as “random noise” queries. We consider three levels of random noise queries for testing purposes:
“Low Noise”
The automation scripts select uninteresting queries uniformly at random from the top-query list and inject a single random noise query after every topic-specific query so that the “signal-to-noise ratio” of sensitive to noise queries in this case is .
“Medium Noise”
Here the automation scripts inject two randomly selected queries after each topic-specific query for a signal to noise ration of .
“High Noise”
In this noise-model with the highest noise setting, three random noise queries are injected, resulting in a signal-to-noise ratio of .
Note also that the automation scripts were configured to ensure the relevant number of noise queries was always injected immediately before each probe query. Our intention was to construct a “worst case” for detection of learning, where probe queries are always separated from sensitive user queries by the specified number of noise queries.
Table III(a-c) shows the measured PDE values for Low, Medium and High levels of noise respectively for the “no click” model. The PDE values for all levels of noise are similar to the “no click, no noise” baseline values in Table II.
Overall, there is no consistent reduction in values across all topics for all noise levels, indicating that injecting random noise queries does not have a consistent effect. In some cases, such as topic Gay, measured values of PDE increase for all noise levels indicating that noise injection worsens the user’s ability to assert ()–Plausible Deniability .
These results indicate that even the “High Noise” model fails to reduce the measured values of PDE in a coherent way, so that injecting random noise has not improved plausible deniability significantly with any consistency. We conclude that injection of random noise, even at substantial levels, is not observed to provide a useful defence for plausible deniability in our experiments.
V-D The Effect of Click Strategies
We now consider whether it is possible to disrupt search engine learning by careful clicking of the links on response pages. Intuitively, from the search engine’s point of view, clicking on links is a form of active feedback by a user and so potentially informative of user interests. This is especially true when, for example, a user is carrying out exploratory search where their choice of keywords is not yet well-tuned to their topic of interest. Previous studies have also indicated that there is good reason to believe that user clicks on links are an important input into recommender system learning. In [2] (Section ), user clicks emulated using the “Click Relevant” click-model were reported to result in increases of – in the advert content, depending on the “Sensitive’ topic tested.
We consider four different click strategies to emulate a range of user click behaviours:
“No Click”
No items are clicked on in the response page to a query. This user click-model does not provide additional user preference information to the recommender system due to click behaviour. This click model is used in the baseline measurements presented in Sections V-B.
“Click Relevant”
Given the response page to a query, for each search result and advert we calculate the Term-Frequency (TF) of the visible text with respect to the keywords associated with the test session topic of interest. When for an item, the item is clicked, otherwise it is not clicked. This user click-model provides relevant feedback to the recommender system about the information goal of the user.
“Click Non-relevant”
TF is calculated for each item with respect to the category of interest for the session in question as for the “Click Relevant” click-model, except that items are clicked when the TF score is below the threshold and so they are deemed non-relevant to the topic, that is when . This user click-model attempts to confuse the recommender system by providing feedback that is not relevant to the true topic of interest to the user.
“Click All”
All items on the response page for a query are clicked. This user click-model gives the recommender system a “noisy” click signal, including clicks on items relevant and non-relevant to the user’s information goal.
“Click 2 Random Items”
Two items appearing on the response page for a query are selected uniformly at random with replacement and clicked.
In all cases, when uninteresting, noise queries are included in a query session, the relevant user click-strategy is also applied to the result pages of these queries. In this way we hope to avoid providing an obvious signal to the recommender system that might differentiate uninteresting queries from queries related to sensitive topics. Items on the result page in response to probe queries are not clicked so that the probe query does not provide any additional information to the recommender system.
Measure values of PDE are shown in Table IV. As random noise injection had no observable effect on measurements of PDE for different click models in experiments, only the “No Noise” results are presented here for space reasons.
Taken overall, the results in Table IV(a) for the “non-relevant click, no noise” model suggest clicking on non-relevant advert items is the best strategy of the click models tested. The only difference between the “non-relevant click” model and other click models is that non-relevant items only are clicked, whereas in other click models it is possible that relevant items are clicked. It seems reasonable to postulate that clicking on relevant items provides “fine-tuned” feedback about user interests which is more informative for the observer. Clicking on non-relevant items may divert attention to a modest degree, but not to the extent of masking the sensitive topic revealed by the query.
Comparing the baseline “No Click” PDE observations in Table II each of the subtables in Table IV shows similar lack of consistency to the noise injection models. In out experiments there is no consistent change observed in PDE across topics due to variation in the click patterns tested. As with the noise injection case, there are sporadic increases and decreases in values of PDE but the lack of overall consistency makes using click models as a defence impractical.
It would appear in summary, that clicks transmit information to the observer, but not as consistently as does a revealing query. Consequently none of the user click-models tested appear to change the baseline level of plausible deniability associated with the query in a predictable way so that there is no globally discernible pattern with which to construct practical defence tools based on clicks.
V-E The Effect of Proxy Topics
The next privacy protection strategy we consider is the introduction of proxy topics. In this case sequences of queries, with each sequence related to a single proxy topic which is not sensitive for the user but capable of attracting personalised advert content, are injected into a user session. The idea here is that each such sequence of queries emulates a user session where the proxy topic is the topic of interest. In this way we hope to misdirect learning by the search engine of user interests. The results in Section V-C are relevant here since they suggest that isolated, individual queries – such as randomly selected noise queries – tend not to provoke search engine learning. Our hope is that this can be exploited by inverting the notion of random noise injection so that individual sensitive queries are injected as the noise in proxy topic sessions. Isolated sensitive queries will hopefully not provoke learning whereas the larger number of uninteresting proxy sessions will. In this way we can misdirect learning by the observer.
In out tests the following proxy topics are used:
Tickets
Searching for tickets for events in a well-known local stadium
Vacation
Queries related to a vacation such as flights and accommodation.
Car
Searches by a user seeking to trade in and change their car.
and related queries are constructed by selecting related keywords through the same process as was used for the sensitive topics.
Proxy topic query scripts where constructed by selecting a sensitive topic, and then selecting an uninteresting proxy topic from the list of proxy topics. Having decided on a sensitive query we wish to issue, we select at least three and no more than four queries related to the proxy topic from a prepared list of proxy topic queries. We next randomly shuffle the order of the selected sensitive and proxy topic queries. In this way there is always a subgroup of at least two proxy topic queries next to each other in each query session. Finally, for testing purposes, we place a probe query before and after each block of 3-4 proxy + 1 sensitive queries to measure changes in PRI+ score. We repeat this exercise using the same proxy topic until a typical query session consisting of probe queries is created.
Data was collected for such proxy topic sessions. This included each of the sensitive topics and each of the click models described in previous sections. The same PRI+ and PDE setup, including the same training set, as before was used to process the search results.
Measured detection rates are shown in Table V. The measured probability calculated from PDE is [math] for all topics and for all click-models tested. That is, we find it is possible to claim full plausible deniability of interest in all of the topics tested. Since our detection approach is demonstrated to be notably sensitive to observer learning in earlier sections, we can reasonably infer that this result is not due to a defect in the detection methodology but rather genuinely reflects successful misdirection of the search engine away from sensitive topics.
This result is encouraging, especially in light of the negative results in previous sections for other obfuscation approaches. It suggests use of sequences of queries on uninteresting proxy topics may provide a defence of plausible deniability. The trade-offs for the user include the overhead of maintaining proxy topics and associated queries and the additional resources required to issue proxy topic queries in a consistent way. However since both of these tasks were readily automated during our testing it seems reasonable that these trade-offs could be readily managed by software in a way that is essentially transparent to the user.
VI Conclusions and Discussion
Our observations suggest that modern systems, such as Google, are able to identify user interests with high accuracy, exploit multiple signals, filter out uninteresting noise queries and adapt quickly when topics change. Furthermore learning appears to be sustained over the lifetime of query sessions. The power and sophistication of these systems make designing a robust defence of user privacy non-trivial.
The PDE estimator was tested via a comprehensive measurement program using online search engines to show that topic learning results in measurable impacts on the ability of a user to deny their interest in all sensitive topics tested. We find that revealing queries provide a significant signal for search engine adaptation. While user clicks provide additional feedback, we do not observe the same degree of associated learning with click behaviour as is observed with revealing queries. Overall, testing with PDE suggests that defences based on random noise injection and variable click models do not provide a reliable strategy for defence of plausible deniability.
By contrast, our experiments show that proxy topics that are uninteresting to the user but capable of generating commercial content provide observable privacy protection in our experiments. Wrapping sensitive queries in a stream of coherent proxy topic queries appears to distract the online system into adapting to the proxy topic while allowing the sensitive query noise to slip through. Our observation that proxy topics provide some relief indicates that defence of plausible deniability is not impossible, but indicates that increasingly sophisticated approaches are required in the face of ever improving search engine capability. In choosing proxy topics, for example, a user must be careful to not stimulate unintended learning of the proxy topics which may influence the utility of future search results.
Subtle tactics like proxy topics, that exploit the observer’s strengths to tip the balance slightly in favour of the user, suggest an interesting avenue for future research. The simplicity of the approach means it should be possible to extend it in several ways, for example, by injecting a range of uninteresting single topic queries as additional noise in the proxy query stream it may be possible to provide additional guarantees of privacy such as k-anonymity or differential privacy for the sensitive topic. More investigation of proxy topics is an interesting line of future research. Experiments to compare the effectiveness of different proxy topics including, for example, inclusion of proxy topics that are more relevant to the user’s known interests versus proxy topics that are less relevant to user topics. Similarly proxy topics with higher commercial value may have more potential to distract search engine learning than proxy topics with lower commercial value
As discussed in Section II, user click patterns may be used by recommender systems to rank page content, placing content likely to attract user clicks in more prominent positions on pages. In our experiments, we observed changes in volume of advert content on samples of probe query response pages. There are several plausible avenues of investigation that may help explain the mechanism behind this, such as user click patterns and the semantics of the true and noise queries chosen. The approach taken in this paper does not distinguish between items based on rank or order on the page. How the semantics of queries, the interaction between user click-models and the effect of content ranking may impact user privacy is beyond the scope of this current paper and an avenue for future research.
Overall our results point towards an arms race, where search engine capability is continuously evolving. In this setting, even if injection of proxy topic sessions were to become widely deployed then we can reasonably expect search engines to respond with more sophisticated learning strategies. Our results also point towards the fact that the text in search queries plays a key role in search engine learning. While perhaps obvious, this observation reinforces the user’s need to be circumspect about the queries that they ask if they want to avoid search engine learning of their interests.
Appendix A Additional Results
Lemma 1**.**
For with
[TABLE]
Proof**.**
Assuming the left hand side of (39) holds
[TABLE]
∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Oxford English Dictionary Online.
- 2[2] Pól Mac Aonghusa and Douglas J. Leith. Don’t let google know i’m lonely. ACM Trans. Priv. Secur. , 19(1):3:1–3:25, August 2016.
- 3[3] Anupam Datta. Privacy through accountability: A computer science perspective. In International Conference on Distributed Computing and Internet Technology , pages 43–49. Springer, 2014.
- 4[4] Aniko Hannak, Piotr Sapiezynski, Arash Molavi Kakhki, Balachander Krishnamurthy, David Lazer, Alan Mislove, and Christo Wilson. Measuring personalization of web search. In Proceedings of the 22Nd International Conference on World Wide Web , WWW ’13, pages 527–538, Republic and Canton of Geneva, Switzerland, 2013. International World Wide Web Conferences Steering Committee.
- 5[5] Daniel J. Solove. “i’ve got nothing to hide” and other misunderstandings of privacy. San Diego Law Review, Vol. 44, 2007 , 2007.
- 6[6] Helen Nissenbaum. Privacy in Context: Technology, Policy, and the Integrity of Social Life . Stanford University Press, Stanford, CA, USA, 2009.
- 7[7] Alice E Marwick et al. Social privacy in networked publics: Teens’ attitudes, practices, and strategies. 2011.
- 8[8] Avi Arampatzis, George Drosatos, and Pavlos S Efraimidis. A versatile tool for privacy-enhanced web search. In European Conference on Information Retrieval , pages 368–379. Springer, 2013.
