Plausible Deniability in Web Search -- From Detection to Assessment

Pol Mac Aonghusa; Douglas J. Leith

arXiv:1703.03471·cs.CR·June 27, 2017

Plausible Deniability in Web Search -- From Detection to Assessment

Pol Mac Aonghusa, Douglas J. Leith

PDF

Open Access

TL;DR

This paper introduces \\PDE{}, a scalable tool to detect and assess threats to users' plausible deniability in web search, revealing vulnerabilities especially in sensitive topics and proposing a defense method using proxy topics.

Contribution

The paper presents a practical tool for detecting threats to plausible deniability in web search and evaluates defense strategies against search engine learning attacks.

Findings

01

Threats to deniability are easily detectable across tested topics.

02

Sensitive topics like health and sexual preferences are particularly vulnerable.

03

Proxy topics can effectively defend plausible deniability.

Abstract

We ask how to defend user ability to plausibly deny their interest in topics deemed sensitive in the face of search engine learning. We develop a practical and scalable tool called \PDE{} allowing a user to detect and assess threats to plausible deniability. We show that threats to plausible deniability of interest are readily detectable for all topics tested in an extensive testing program. Of particular concern is observation of threats to deniability of interest in topics related to health and sexual preferences. We show this remains the case when attempting to disrupt search engine learning through noise query injection and click obfuscation. We design a defence technique exploiting uninteresting, proxy topics and show that it provides a more effective defence of plausible deniability in our experiments.

Tables17

Table 1. TABLE I: Measured ϵ ^ ∗ , k subscript ^ italic-ϵ 𝑘 \widehat{\epsilon}_{*,k} for Reference Topic versus Any Other Topic, reported as “max (median)”, by Probe Query Sequence

Reference Topic	Probe 1	Probe 2	Probe 3	Probe 4	Probe 5
gay	64 (33)	47 ( 5)	72 (25)	48 (25)	48 (19)

Table 2. TABLE II: Measured ϵ ^ ∗ , k subscript ^ italic-ϵ 𝑘 \widehat{\epsilon}_{*,k} for Reference Topic versus Any Other Topic, reported as “max (median)”, by Probe Query Sequence

Reference Topic	Probe 1	Probe 2	Probe 3	Probe 4	Probe 5
anorexia	56 (52)	56 (52)	56 (52)	56 (52)	56 (52)
bankrupt	1 ( 1)	55 (43)	55 (39)	58 (48)	56 (48)
diabetes	40 (38)	40 (38)	40 (38)	40 (38)	40 (38)
disabled	9 ( 9)	9 ( 9)	9 ( 9)	40 (40)	40 (33)
divorce	41 (31)	75 (65)	56 (46)	79 (68)	79 (68)
gambling	16 (12)	18 (16)	66 ( 4)	57 (17)	18 ( 3)
gay	64 (33)	47 ( 5)	72 (25)	48 (25)	48 (19)
location	10 ( 2)	11 ( 3)	11 (10)	18 ( 7)	18 ( 9)
payday	2 ( 2)	2 ( 2)	21 ( 2)	2 ( 2)	2 ( 2)
prostate	52 (17)	52 (17)	52 (17)	52 (17)	52 (17)
unemployed	7 ( 5)	7 ( 6)	7 ( 6)	13 ( 7)	7 ( 7)

Table 3. (a) No Click, No Noise

Reference Topic	Probe 1	Probe 2	Probe 3	Probe 4	Probe 5
anorexia	56 (52)	56 (52)	56 (52)	56 (52)	56 (52)
bankrupt	1 ( 1)	55 (43)	55 (39)	58 (48)	56 (48)
diabetes	40 (38)	40 (38)	40 (38)	40 (38)	40 (38)
disabled	9 ( 9)	9 ( 9)	9 ( 9)	40 (40)	40 (33)
divorce	41 (31)	75 (65)	56 (46)	79 (68)	79 (68)
gambling	16 (12)	18 (16)	66 ( 4)	57 (17)	18 ( 3)
gay	64 (33)	47 ( 5)	72 (25)	48 (25)	48 (19)
location	10 ( 2)	11 ( 3)	11 (10)	18 ( 7)	18 ( 9)
payday	2 ( 2)	2 ( 2)	21 ( 2)	2 ( 2)	2 ( 2)
prostate	52 (17)	52 (17)	52 (17)	52 (17)	52 (17)
unemployed	7 ( 5)	7 ( 6)	7 ( 6)	13 ( 7)	7 ( 7)

Table 4. TABLE III: Measured ϵ ^ ∗ , k subscript ^ italic-ϵ 𝑘 \widehat{\epsilon}_{*,k} for Reference Topic versus Any Other Topic, reported as “max (median)”, by Probe Query Sequence

Reference Topic	Probe 1	Probe 2	Probe 3	Probe 4	Probe 5
anorexia	54 (45)	54 (45)	54 (45)	54 (45)	54 (45)
bankrupt	16 ( 9)	56 (50)	52 (39)	54 (45)	56 (45)
diabetes	46 (35)	46 (35)	46 (35)	46 (35)	46 (35)
disabled	9 ( 3)	9 ( 8)	9 ( 7)	33 ( 7)	40 (32)
divorce	13 ( 7)	123 ( 8)	54 ( 8)	85 ( 6)	85 ( 6)
gambling	18 (16)	18 (16)	52 (18)	18 (10)	18 (18)
gay	73 (61)	73 (70)	76 (46)	79 (74)	79 (70)
location	18 (16)	18 (10)	18 (10)	18 (10)	18 (10)
payday	3 ( 2)	3 ( 2)	4 ( 3)	4 ( 3)	4 ( 3)
prostate	21 (16)	21 (16)	21 (16)	21 (16)	21 (16)
unemployed	7 ( 3)	7 ( 3)	13 ( 9)	13 ( 9)	13 ( 9)

Table 5. (a) No Click, Low Noise

Reference Topic	Probe 1	Probe 2	Probe 3	Probe 4	Probe 5
anorexia	54 (45)	54 (45)	54 (45)	54 (45)	54 (45)
bankrupt	16 ( 9)	56 (50)	52 (39)	54 (45)	56 (45)
diabetes	46 (35)	46 (35)	46 (35)	46 (35)	46 (35)
disabled	9 ( 3)	9 ( 8)	9 ( 7)	33 ( 7)	40 (32)
divorce	13 ( 7)	123 ( 8)	54 ( 8)	85 ( 6)	85 ( 6)
gambling	18 (16)	18 (16)	52 (18)	18 (10)	18 (18)
gay	73 (61)	73 (70)	76 (46)	79 (74)	79 (70)
location	18 (16)	18 (10)	18 (10)	18 (10)	18 (10)
payday	3 ( 2)	3 ( 2)	4 ( 3)	4 ( 3)	4 ( 3)
prostate	21 (16)	21 (16)	21 (16)	21 (16)	21 (16)
unemployed	7 ( 3)	7 ( 3)	13 ( 9)	13 ( 9)	13 ( 9)

Table 6. (b) No Click, Med Noise

Reference Topic	Probe 1	Probe 2	Probe 3	Probe 4	Probe 5
anorexia	55 (53)	53 (53)	53 (53)	53 (53)	53 (53)
bankrupt	11 ( 8)	48 (33)	51 (43)	52 (38)	52 (38)
diabetes	38 (38)	38 (38)	38 (38)	38 (38)	38 (38)
disabled	4 ( 4)	8 ( 7)	1 ( 1)	40 (36)	40 (36)
divorce	19 ( 9)	65 (31)	44 (31)	72 (50)	72 (50)
gambling	18 (16)	18 (17)	18 (18)	31 ( 3)	18 (10)
gay	89 (68)	89 (69)	88 (64)	93 (73)	93 (64)
location	18 (10)	18 (10)	18 ( 7)	18 ( 7)	10 ( 7)
payday	6 ( 3)	6 ( 3)	6 ( 3)	6 ( 2)	6 ( 1)
prostate	32 (14)	32 (14)	18 (13)	18 (13)	18 (13)
unemployed	13 ( 5)	13 (10)	13 ( 7)	13 ( 9)	7 ( 4)

Table 7. (c) No Click, High Noise

Reference Topic	Probe 1	Probe 2	Probe 3	Probe 4	Probe 5
anorexia	48 (48)	48 (48)	48 (48)	48 (48)	48 (48)
bankrupt	16 (10)	65 (51)	65 (48)	65 (49)	65 (49)
diabetes	41 (38)	41 (38)	41 (38)	41 (38)	41 (38)
disabled	9 ( 9)	9 ( 9)	9 ( 5)	9 ( 7)	9 ( 8)
divorce	41 (27)	75 (38)	56 (22)	75 (29)	75 (29)
gambling	21 (16)	21 ( 3)	21 ( 4)	29 (16)	18 ( 4)
gay	86 (64)	86 (64)	80 (43)	94 (59)	94 (59)
location	10 (10)	8 ( 8)	8 ( 8)	18 (13)	18 (13)
payday	3 ( 2)	4 ( 2)	4 ( 2)	4 ( 2)	3 ( 1)
prostate	17 (15)	17 (15)	17 (15)	17 (15)	17 (15)
unemployed	10 ( 7)	13 ( 7)	13 ( 7)	13 ( 7)	13 ( 7)

Table 8. TABLE IV: Measured Plausible Deniability versus any other tested topics as probability of interest, by Probe Query Sequence when the true topic of interest is “Other” with range ( μ ± 3 σ ) plus-or-minus 𝜇 3 𝜎 (\mu\pm 3\sigma)

Reference Topic	Probe 1	Probe 2	Probe 3	Probe 4	Probe 5
anorexia	59 (50)	59 (50)	59 (50)	59 (50)	59 (50)
bankrupt	16 ( 8)	65 (42)	65 (36)	59 (40)	54 (38)
diabetes	36 (36)	36 (36)	36 (36)	36 (36)	36 (36)
disabled	7 ( 4)	7 ( 4)	9 ( 9)	40 ( 4)	40 ( 7)
divorce	30 (24)	30 ( 9)	30 ( 9)	30 ( 8)	30 ( 8)
gambling	6 ( 0)	18 (16)	32 (16)	18 (16)	18 ( 5)
gay	92 (51)	92 (77)	78 (51)	94 (72)	94 (80)
location	18 (18)	10 (10)	10 (10)	18 (10)	18 (10)
payday	2 ( 2)	2 ( 2)	3 ( 2)	3 ( 2)	2 ( 2)
prostate	17 (17)	17 (17)	17 (17)	17 (17)	17 (17)
unemployed	13 ( 2)	13 ( 4)	13 ( 7)	13 ( 7)	7 ( 6)

Table 9. (a) Click Relevant, No Noise

Reference Topic	Probe 1	Probe 2	Probe 3	Probe 4	Probe 5
anorexia	59 (50)	59 (50)	59 (50)	59 (50)	59 (50)
bankrupt	16 ( 8)	65 (42)	65 (36)	59 (40)	54 (38)
diabetes	36 (36)	36 (36)	36 (36)	36 (36)	36 (36)
disabled	7 ( 4)	7 ( 4)	9 ( 9)	40 ( 4)	40 ( 7)
divorce	30 (24)	30 ( 9)	30 ( 9)	30 ( 8)	30 ( 8)
gambling	6 ( 0)	18 (16)	32 (16)	18 (16)	18 ( 5)
gay	92 (51)	92 (77)	78 (51)	94 (72)	94 (80)
location	18 (18)	10 (10)	10 (10)	18 (10)	18 (10)
payday	2 ( 2)	2 ( 2)	3 ( 2)	3 ( 2)	2 ( 2)
prostate	17 (17)	17 (17)	17 (17)	17 (17)	17 (17)
unemployed	13 ( 2)	13 ( 4)	13 ( 7)	13 ( 7)	7 ( 6)

Table 10. (b) Click Non-relevant, No Noise

Reference Topic	Probe 1	Probe 2	Probe 3	Probe 4	Probe 5
anorexia	18 ( 5)	22 (12)	26 ( 5)	31 (13)	32 ( 6)
bankrupt	57 ( 3)	53 (36)	50 (34)	43 (33)	48 (36)
diabetes	4 ( 2)	13 ( 8)	11 ( 8)	5 ( 3)	11 ( 2)
disabled	5 ( 2)	6 ( 2)	9 ( 3)	29 (10)	26 ( 8)
divorce	49 (25)	51 (33)	49 (30)	43 (29)	43 (29)
gambling	6 ( 2)	18 ( 4)	36 (24)	35 (13)	31 (13)
gay	36 (33)	75 (33)	51 (32)	39 (20)	31 (27)
location	9 ( 2)	11 ( 1)	7 ( 2)	6 ( 2)	9 ( 1)
payday	3 ( 3)	3 ( 1)	4 ( 2)	3 ( 2)	4 ( 3)
prostate	55 (38)	68 (36)	65 (48)	61 (48)	64 (42)
unemployed	9 ( 1)	6 ( 6)	7 ( 1)	9 ( 4)	5 ( 2)

Table 11. (c) Click All, No Noise

Reference Topic	Probe 1	Probe 2	Probe 3	Probe 4	Probe 5
anorexia	66 (57)	66 (57)	66 (57)	66 (57)	66 (57)
bankrupt	51 (42)	51 (42)	51 (42)	55 (46)	56 (46)
diabetes	35 (35)	35 (35)	35 (35)	35 (35)	35 (35)
disabled	9 ( 9)	9 ( 9)	9 ( 9)	31 (31)	31 (31)
divorce	30 ( 8)	73 (54)	54 (34)	100 (49)	100 (49)
gambling	3 ( 1)	16 (16)	53 (11)	16 ( 6)	6 ( 2)
gay	69 (65)	77 (73)	70 (60)	82 (75)	81 (71)
location	18 (10)	10 ( 6)	10 ( 6)	14 (10)	18 ( 7)
payday	2 ( 2)	2 ( 2)	2 ( 2)	2 ( 2)	2 ( 2)
prostate	17 (17)	17 (17)	17 (17)	17 (17)	17 (17)
unemployed	4 ( 4)	7 ( 7)	7 ( 7)	7 ( 7)	7 ( 6)

Table 12. (d) Click 2 Random Items, No Noise

Reference Topic	Probe 1	Probe 2	Probe 3	Probe 4	Probe 5
anorexia	50 (12)	27 ( 9)	26 ( 9)	36 (10)	33 (11)
bankrupt	5 ( 3)	43 (33)	39 (37)	36 (35)	38 (35)
diabetes	38 ( 6)	18 ( 7)	17 ( 5)	17 ( 7)	11 ( 5)
disabled	2 ( 1)	4 ( 1)	5 ( 3)	39 (25)	40 (25)
divorce	24 (17)	37 (31)	37 (31)	35 (25)	35 (25)
gambling	24 ( 0)	7 ( 4)	54 (23)	33 (23)	68 (20)
gay	68 (68)	68 (65)	54 (52)	46 (36)	47 (42)
location	8 ( 8)	8 ( 8)	8 ( 8)	8 ( 8)	8 ( 8)
payday	4 ( 1)	2 ( 2)	4 ( 2)	4 ( 3)	4 ( 4)
prostate	59 (57)	67 (62)	58 (56)	60 (54)	51 (44)
unemployed	4 ( 3)	8 ( 3)	10 ( 4)	3 ( 2)	10 ( 1)

Table 13. TABLE V: Measured Plausible Deniability versus any other tested topics as probability of interest, by Probe Query Sequence when the true topic of interest is “Other” with range ( μ ± 3 σ ) plus-or-minus 𝜇 3 𝜎 (\mu\pm 3\sigma)

Reference Topic	Probe 1	Probe 2	Probe 3	Probe 4	Probe 5
all topics	0 ( 0)	0 ( 0)	0 ( 0)	0 ( 0)	0 ( 0)

Table 14. (a) All Click and Noise Models

Reference Topic	Probe 1	Probe 2	Probe 3	Probe 4	Probe 5
all topics	0 ( 0)	0 ( 0)	0 ( 0)	0 ( 0)	0 ( 0)

Table 15. TABLE VI: Comparison of measured detection rate of at least one individual “Sensitive’ topic in a session of 5 5 5 probes.

Reference Topic	All Topics
True Detect	100.0%
False Detect	0.0%

Table 16. (a) PRI+

Reference Topic	All Topics
True Detect	100.0%
False Detect	0.0%

Table 17. (b) PRI

Reference Topic	All Topics
True Detect	97-100.0%
False Detect	4-8%

Equations92

{ω_{k}, E_{k - 1}}

{ω_{k}, E_{k - 1}}

P (\overset{ˉ}{X} = \overset{ˉ}{x} ∣ Ω_{k} = ω_{k}, E_{k}) = P (\overset{ˉ}{X} = \overset{ˉ}{x} ∣ E_{k + 1})

P (\overset{ˉ}{X} = \overset{ˉ}{x} ∣ Ω_{k} = ω_{k}, E_{k}) = P (\overset{ˉ}{X} = \overset{ˉ}{x} ∣ E_{k + 1})

P (\overset{ˉ}{X} = \overset{ˉ}{c}_{i} ∣ E_{k})

P (\overset{ˉ}{X} = \overset{ˉ}{c}_{i} ∣ E_{k})

i = 0 \sum N P (\overset{ˉ}{X} = \overset{ˉ}{c}_{i} ∣ E_{k}) = 1, k = 1, 2, \dots

i = 0 \sum N P (\overset{ˉ}{X} = \overset{ˉ}{c}_{i} ∣ E_{k}) = 1, k = 1, 2, \dots

P (\overset{ˉ}{X} = c^{1} ∣ E_{0}) := p_{0}^{s}, P (\overset{ˉ}{X} = c^{o} ∣ E_{0}) := p_{0}^{n}

P (\overset{ˉ}{X} = c^{1} ∣ E_{0}) := p_{0}^{s}, P (\overset{ˉ}{X} = c^{o} ∣ E_{0}) := p_{0}^{n}

P (Ω_{k} = ω_{k}^{s} ∣ \overset{ˉ}{X} = c^{1}, E_{k}) = P (Ω_{k} = ω_{k}^{n} ∣ \overset{ˉ}{X} = c^{o}, E_{k})

P (Ω_{k} = ω_{k}^{s} ∣ \overset{ˉ}{X} = c^{1}, E_{k}) = P (Ω_{k} = ω_{k}^{n} ∣ \overset{ˉ}{X} = c^{o}, E_{k})

P (Ω_{k} = ω_{k}^{n} ∣ \overset{ˉ}{X} = c^{1}, E_{k}) = P (Ω_{k} = ω_{k}^{s} ∣ \overset{ˉ}{X} = c^{o}, E_{k})

e^{- ϵ} < D_{k} (\overset{ˉ}{x}, \overset{ˉ}{x}_{i}, {ω_{j}}_{j = 1}^{k}) < e^{ϵ}

e^{- ϵ} < D_{k} (\overset{ˉ}{x}, \overset{ˉ}{x}_{i}, {ω_{j}}_{j = 1}^{k}) < e^{ϵ}

D_{k} (\overset{ˉ}{x}, \overset{ˉ}{x}_{i}, {ω_{j}}_{j = 1}^{k}) = \frac{P ( Ω _{k} = ω _{k} , \dots Ω _{1} = ω _{1} ∣ X ˉ = x ˉ , E _{0} )}{P ( Ω _{k} = ω _{k} , \dots Ω _{1} = ω _{1} ∣ X ˉ = x ˉ _{i} , E _{0} )}

D_{k} (\overset{ˉ}{x}, \overset{ˉ}{x}_{i}, {ω_{j}}_{j = 1}^{k}) = \frac{P ( Ω _{k} = ω _{k} , \dots Ω _{1} = ω _{1} ∣ X ˉ = x ˉ , E _{0} )}{P ( Ω _{k} = ω _{k} , \dots Ω _{1} = ω _{1} ∣ X ˉ = x ˉ _{i} , E _{0} )}

D_{k} (\overset{ˉ}{x}, \overset{ˉ}{x}_{i}, {ω_{j}}_{j = 1}^{k}) = \frac{P ( Ω _{k} = ω _{k} , \dots Ω _{1} = ω _{1} ∣ X ˉ = x ˉ , E _{0} )}{P ( Ω _{k} = ω _{k} , \dots Ω _{1} = ω _{1} ∣ X ˉ = x ˉ _{i} , E _{0} )}

D_{k} (\overset{ˉ}{x}, \overset{ˉ}{x}_{i}, {ω_{j}}_{j = 1}^{k}) = \frac{P ( Ω _{k} = ω _{k} , \dots Ω _{1} = ω _{1} ∣ X ˉ = x ˉ , E _{0} )}{P ( Ω _{k} = ω _{k} , \dots Ω _{1} = ω _{1} ∣ X ˉ = x ˉ _{i} , E _{0} )}

= j = 0 \prod k - 1 \frac{P ( Ω _{k - j} = ω _{k - j} ∣ X ˉ = x ˉ , Ω _{k - j - 1} = ω _{k - j - 1} , \dots Ω _{1} = ω _{1} , E _{0} )}{P ( Ω _{k - j} = ω _{k - j} ∣ X ˉ = x ˉ _{i} , Ω _{k - j - 1} = ω _{k - j - 1} , \dots Ω _{1} = ω _{1} , E _{0} )}

= j = 0 \prod k - 1 \frac{P ( Ω _{k - j} = ω _{k - j} ∣ X ˉ = x ˉ , E _{k - j} )}{P ( Ω _{k - j} = ω _{k - j} ∣ X ˉ = x ˉ _{i} , E _{k - j} )} = j = 0 \prod k - 1 d_{k - j} (\overset{ˉ}{x}, \overset{ˉ}{x}_{i}, ω_{k - j})

d_{k} (\overset{ˉ}{x}, \overset{ˉ}{x}_{i}, ω_{j}) := \frac{P ( Ω _{j} = ω _{j} ∣ X ˉ = x ˉ , E _{j} )}{P ( Ω _{k} = ω _{j} ∣ X ˉ = x ˉ _{i} , E _{j} )}

d_{k} (\overset{ˉ}{x}, \overset{ˉ}{x}_{i}, ω_{j}) := \frac{P ( Ω _{j} = ω _{j} ∣ X ˉ = x ˉ , E _{j} )}{P ( Ω _{k} = ω _{j} ∣ X ˉ = x ˉ _{i} , E _{j} )}

d_{k} (c^{1}, c^{o}, ω_{j}^{s})

d_{k} (c^{1}, c^{o}, ω_{j}^{s})

d_{k} (c^{1}, c^{o}, ω_{j}^{s})

d_{k} (c^{1}, c^{o}, ω_{j}^{s})

D_{k} (c^{1}, c^{o}, {ω_{j}}_{j = 1}^{k}) = i \in k_{s} \prod \frac{π ^{i}}{1 - π ^{i}} j \in k_{n} \prod \frac{1 - π ^{j}}{π ^{j}}

D_{k} (c^{1}, c^{o}, {ω_{j}}_{j = 1}^{k}) = i \in k_{s} \prod \frac{π ^{i}}{1 - π ^{i}} j \in k_{n} \prod \frac{1 - π ^{j}}{π ^{j}}

\frac{P ( Ω _{j} = ω _{j} ∣ X ˉ = x ˉ , E _{j} )}{P ( Ω _{k} = ω _{j} ∣ X ˉ = x ˉ _{i} , E _{j} )} = \frac{P ( X ˉ = x ˉ ∣ Ω _{j} = ω _{j} , E _{j} )}{P ( X ˉ = x ˉ _{i} ∣ Ω _{k} = ω _{j} , E _{j} )}

\frac{P ( Ω _{j} = ω _{j} ∣ X ˉ = x ˉ , E _{j} )}{P ( Ω _{k} = ω _{j} ∣ X ˉ = x ˉ _{i} , E _{j} )} = \frac{P ( X ˉ = x ˉ ∣ Ω _{j} = ω _{j} , E _{j} )}{P ( X ˉ = x ˉ _{i} ∣ Ω _{k} = ω _{j} , E _{j} )}

e^{- ϵ} < \frac{P ( X ˉ = x ˉ ∣ Ω _{j} = ω _{j} , E _{j} )}{P ( X ˉ = x ˉ _{i} ∣ Ω _{k} = ω _{j} , E _{j} )} < e^{ϵ}

e^{- ϵ} < \frac{P ( X ˉ = x ˉ ∣ Ω _{j} = ω _{j} , E _{j} )}{P ( X ˉ = x ˉ _{i} ∣ Ω _{k} = ω _{j} , E _{j} )} < e^{ϵ}

e^{- ϵ}

e^{- ϵ}

M_{k} (\overset{ˉ}{x}, ω_{k})

M_{k} (\overset{ˉ}{x}, ω_{k})

= \frac{P ( X ˉ = x ˉ ∣ E _{k + 1} )}{P ( X ˉ = x ˉ ∣ E _{0} )} (Applying \eqref eqn:e:k:1)

D_{k} (\overset{ˉ}{x}, \overset{ˉ}{x}_{i}, {ω_{j}}_{j = 1}^{k}) = \frac{M _{k} ( x ˉ _{1} , ω _{k} )}{M _{k} ( x ˉ _{2} , ω _{k} )} \frac{M _{1} ( x ˉ _{2} , ω _{1} )}{M _{1} ( x ˉ _{1} , ω _{1} )}

D_{k} (\overset{ˉ}{x}, \overset{ˉ}{x}_{i}, {ω_{j}}_{j = 1}^{k}) = \frac{M _{k} ( x ˉ _{1} , ω _{k} )}{M _{k} ( x ˉ _{2} , ω _{k} )} \frac{M _{1} ( x ˉ _{2} , ω _{1} )}{M _{1} ( x ˉ _{1} , ω _{1} )}

d_{k} (\overset{ˉ}{x}_{1}, \overset{ˉ}{x}_{2}, ω_{k}) = \frac{P ( Ω _{k} = ω _{k} ∣ X ˉ = x ˉ _{1} , E _{k} )}{P ( Ω _{k} = ω _{k} ∣ X ˉ = x ˉ _{2} , E _{k} )}

d_{k} (\overset{ˉ}{x}_{1}, \overset{ˉ}{x}_{2}, ω_{k}) = \frac{P ( Ω _{k} = ω _{k} ∣ X ˉ = x ˉ _{1} , E _{k} )}{P ( Ω _{k} = ω _{k} ∣ X ˉ = x ˉ _{2} , E _{k} )}

= \frac{P ( X ˉ = x ˉ _{1} ∣ Ω _{k} = ω _{k} , E _{k} )}{P ( X ˉ = x ˉ _{1} ∣ E _{k} )} \frac{P ( X ˉ = x ˉ _{2} ∣ E _{k} )}{P ( X ˉ = x ˉ _{2} ∣ Ω _{k} = ω _{k} , E _{k} )}

= (a) \frac{P ( X ˉ = x ˉ _{1} ∣ Ω _{k} = ω _{k} , E _{k} )}{P ( X ˉ = x ˉ _{1} ∣ E _{0} )} (b) \frac{P ( X ˉ = x ˉ _{1} ∣ E _{0} )}{P ( X ˉ = x ˉ _{1} ∣ E _{k} )} \times \dots

= (a) \frac{P ( X ˉ = x ˉ _{1} ∣ Ω _{k} = ω _{k} , E _{k} )}{P ( X ˉ = x ˉ _{1} ∣ E _{0} )} (b) \frac{P ( X ˉ = x ˉ _{1} ∣ E _{0} )}{P ( X ˉ = x ˉ _{1} ∣ E _{k} )} \times \dots

\dots \times (c) \frac{P ( X ˉ = x ˉ _{2} ∣ E _{k} )}{P ( X ˉ = x ˉ _{2} ∣ E _{0} )} (d) \frac{P ( X ˉ = x ˉ _{2} ∣ E _{0} )}{P ( X ˉ = x ˉ _{2} ∣ Ω _{k} = ω _{k} , E _{k} )}

= \frac{M _{k} ( x ˉ _{1} , ω _{k} ) M _{k - 1} ( x ˉ _{2} , ω _{k - 1} )}{M _{k - 1} ( x ˉ _{1} , ω _{k - 1} ) M _{k} ( x ˉ _{2} , ω _{k} )}

= \frac{M _{k} ( x ˉ _{1} , ω _{k} ) M _{k - 1} ( x ˉ _{2} , ω _{k - 1} )}{M _{k - 1} ( x ˉ _{1} , ω _{k - 1} ) M _{k} ( x ˉ _{2} , ω _{k} )}

D_{k} (\overset{ˉ}{x}, \overset{ˉ}{x}_{i}, {ω_{j}}_{j = 1}^{k}) = \frac{M _{k} ( x ˉ _{1} , ω _{k} )}{M _{k} ( x ˉ _{2} , ω _{k} )} \frac{M _{1} ( x ˉ _{2} , ω _{1} )}{M _{1} ( x ˉ _{1} , ω _{1} )}

D_{k} (\overset{ˉ}{x}, \overset{ˉ}{x}_{i}, {ω_{j}}_{j = 1}^{k}) = \frac{M _{k} ( x ˉ _{1} , ω _{k} )}{M _{k} ( x ˉ _{2} , ω _{k} )} \frac{M _{1} ( x ˉ _{2} , ω _{1} )}{M _{1} ( x ˉ _{1} , ω _{1} )}

D_{k} (\overset{ˉ}{c}_{i}, \overset{ˉ}{c}_{- i}, {ω_{j}}_{j = 1}^{k}) := \frac{P ( Ω _{k} = ω _{k} , \dots Ω _{1} = ω _{1} ∣ X ˉ = c ˉ _{i} , E _{0} )}{P ( Ω _{k} = ω _{k} , \dots Ω _{1} = ω _{1} ∣ X ˉ = c ˉ _{- i} , E _{0} )}

D_{k} (\overset{ˉ}{c}_{i}, \overset{ˉ}{c}_{- i}, {ω_{j}}_{j = 1}^{k}) := \frac{P ( Ω _{k} = ω _{k} , \dots Ω _{1} = ω _{1} ∣ X ˉ = c ˉ _{i} , E _{0} )}{P ( Ω _{k} = ω _{k} , \dots Ω _{1} = ω _{1} ∣ X ˉ = c ˉ _{- i} , E _{0} )}

∣

∣

- P (Ω_{k} = ω_{k}, \dots Ω_{1} = ω_{1} ∣ \overset{ˉ}{X} = \overset{ˉ}{c}_{- i}, E_{0}) ∣

\geq lo g (\frac{M _{k} ( x ˉ _{1} , ω _{k} )}{M _{k} ( x ˉ _{2} , ω _{k} )} \frac{M _{1} ( x ˉ _{2} , Ω _{k - 1} )}{M _{1} ( x ˉ _{1} , Ω _{k - 1} )})

ϵ_{*} := lo g (\frac{M _{k} ( x ˉ _{1} , ω _{k} )}{M _{k} ( x ˉ _{2} , ω _{k} )} \frac{M _{1} ( x ˉ _{2} , ω _{k - 1} )}{M _{1} ( x ˉ _{1} , ω _{k - 1} )})

ϵ_{*} := lo g (\frac{M _{k} ( x ˉ _{1} , ω _{k} )}{M _{k} ( x ˉ _{2} , ω _{k} )} \frac{M _{1} ( x ˉ _{2} , ω _{k - 1} )}{M _{1} ( x ˉ _{1} , ω _{k - 1} )})

lo g (D_{k} (\overset{ˉ}{c}_{i}, \overset{ˉ}{c}_{- i}, {ω_{j}}_{j = 1}^{k})) < ϵ

lo g (D_{k} (\overset{ˉ}{c}_{i}, \overset{ˉ}{c}_{- i}, {ω_{j}}_{j = 1}^{k})) < ϵ

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMisinformation and Its Impacts · Spam and Phishing Detection · Hate Speech and Cyberbullying Detection

Full text

Plausible Deniability in Web Search – From Detection to Assessment

Pól Mac Aonghusa and Douglas J. Leith P. Mac Aonghusa is with IBM Research and Trinity College Dublin.D.J. Leith is with Trinity College Dublin.

Abstract

Web personalisation uses what systems know about us to create content targeted at our interests. When unwanted personalisation suggests we are interested in sensitive or embarrassing topics a natural reaction is to deny interest. This is a practical response only if denial of our interest is credible or plausible. Adopting a definition of plausible deniability in the usual sense of “on the balance of probabilities”, we develop a practical and scalable tool called PDE allowing a user to decide when their ability to plausibly deny interest in sensitive topics is compromised. We show that threats to plausible deniability are readily detectable for all topics tested in an extensive testing program. Of particular concern is observation of threats to deniability of interest in topics related to health and sexual preferences. We show this remains the case when attempting to disrupt search engine learning through noise query injection and click obfuscation. We design a defence technique exploiting uninteresting, proxy topics and show that it provides a more effective defence of plausible deniability in our experiments.

Index Terms:

Privacy, Indistinguishability, Plausible Deniability, Recommender Systems, Web Search.

I Introduction

Encountering inappropriate or unwanted personalised online content can be awkward, depending on social context. What may appear humorous in one situation may be embarrassing, or worse, in another context. When presented with content regarded as inappropriate or discreditable, a user may wish to deny interest in the content.

**The Oxford English Dictionary defines plausible deniability in terms of reasonable doubt as “the possibility of denying a fact (especially a discreditable action) without arousing suspicion”, [1]. Informally, user activity observed by the search engine exhibits plausible deniability when user activity is consistent with the user interest in any one of several topics, at least one of which is not sensitive for the user, with sufficiently high probability. **

Accordingly, we assess threats to plausible deniability during web search by testing if content appearing on search result pages can be attributed to user interest in a specific sensitive topic, versus user interest in any other topic, on the balance of probabilities. We ask when can a user plausibly deny interest in a range of sensitive topics during online web-search sessions?

We provide guarantees on the best-possible level of plausible deniability a user can expect during web search in our model. We also introduce a new Plausible Deniability Estimator, called PDE, that can be used to assess privacy threats. Outputs from PDE can be represented in terms of readily interpretable probabilities thereby providing an informative indication of risk to the user.

Our methods are chosen to be straightforward to implement using openly available technologies. We use our results to design and assess counter-measures against threats to plausible deniability during online web-search sessions, using the Google Search as a source of data. We are able to assess threats to plausible deniability from sensitive topic learning in a range of potentially sensitive topics, such as health, finance and sexual orientation.

Our experimental measurements indicate that, by observing as few as 3-5 revealing queries, a search engine can infer a user is interested in a sensitive topic on the balance of probabilities in $100\%$ of topics tested when no effective defence is provided. In the case of topics related to health and sexual preferences measurements from PDE suggest that the probability a user is interested in sensitive topics related to sexual preference is as high as $90\%$ greater than their probability of interest in any other topic.

We show that defence strategies based on random query injection of random noise queries and misleading click patterns may provide some protection for individual, isolated queries, but that search engines are able to learn quickly. Significant levels of threat to plausible deniability are detected even when very high levels of random noise are included in the query session or when misleading click patterns are used. These approaches seem to offer little or no improvement to user privacy when considering plausible deniability over the longer term.

In contrast, we find that a defence employing topics that are commercially relevant but uninteresting to the user as proxy topics is effective in protecting plausible deniability in the case of $100\%$ of sensitive topics tested. The proxy topic defence differs from traditional obfuscation approaches in actively exploiting the observed ability of the search engine to learn topics quickly, deflecting the focus of interest toward the proxy topic and away from the true topic of interest to the user.

The proxy topic defence works in our experiments, and is simple to apply. However it is important to recognise that we are faced with commercially motivated and increasingly powerful systems with a history of adapting quickly. Our results suggest that search engine capability is continuously evolving so that we can reasonably expect search engines to respond to privacy defences with more sophisticated learning strategies. Our results also point towards the fact that the text in search queries plays a key role in search engine learning. While perhaps obvious, this observation reinforces the user’s need to be circumspect about the queries they ask if they want to avoid search engine learning of their interests. Equally, our results suggest that simple countermeasures, such as proxy topics, that make accurate personalisation more expensive for the online system represent a promising approach in developing new techniques for practical user privacy.

II Related Work

We model a search engine as a black-box by making minimal assumptions about its internal workings. The technique of using predefined probe queries, injected at intervals into a stream of true user queries as fixed sampling points, was used in [2] where the focus of the paper was detection of possible privacy threats. Extending the idea of probe queries, discussed in [2], several new applications are presented in this current paper such as the model of plausible deniability and the associated PDE estimator, the proxy topic defence model and the evaluation of multiple noise and click models for each of these. The technique of using predefined probe queries is borrowed from black-box testing. Modelling an adversary as a black-box, where internal details of recommender systems algorithms and settings are unknown to users, is mentioned in [3] and [4].

The importance of control over appropriate flow of information is discussed extensively in legal and social science fields. Individual control over personal information flow is discussed in a critique of the nothing to hide defence for widespread surveillance in [5]. Individual privacy and its social consequences are discussed in [6, 7], where agency or control over appropriate disclosure is identified as a key concern.

Plausible deniability as a privacy defence for web search is addressed in the literature. In [8] alternative, less revealing queries are mixed with sensitive topic queries to obfuscate true user interest. In [9] queries with generalised terms are used to approximate the search results of a true query, which is never revealed. Plausible deniability for database release has been studied in the context of user data anonymization. For example, in [10] a definition of plausible deniability is applied to examine mechanisms for differentially private data set release. More generally, plausible deniability to counteract the impact of personalisation is examined in [11] for the case of a privacy aware user who knows they are being observed. The authors show that no matter what the behaviour of the user is, it is always compatible with some concern over privacy. In this way the user can offer their awareness of privacy concerns as a general alibi to justify any range of preferences. Plausible deniability for providers of online services is also discussed in the literature. For example, in [12] a distributed virtual machine infrastructure is used to provide deniability to online data providers by obfuscating the origin of index data used in recommendations.

The potential of online profiling and personalisation resulting in censorship and discrimination have received growing attention in the research literature. Personalisation as a form of censorship – termed a filter bubble in [13] – is explored in [4]. In a filter bubble, a user cannot access subsets of information because the recommender system algorithm has decided it is irrelevant for that user. In [4] a filter bubble effect was detected in the case of Google Web Search in a test with 200 users. Discrimination associated with personalisation has been shown for topics generally regarded as sensitive. In [14] an extensive review of adverts from Google and Reuters.com showed a strong correlation between adverts suggestive of an arrest record, and, an individual’s ethnicity. In [15], the authors used online advertising targeted exclusively to gay men to demonstrate strong profiling in the case of sexual preference.

Several approaches exist for obfuscating user interactions with search engines with the aim of disrupting online profiling and personalisation. GooPIR, [16, 17], attempts to disguise a user’s “true” queries by adding masking keywords directly into a true query before submitting to a recommender system. Results are then filtered to extract items that are relevant to the user’s original true query. PWS, [18], and TrackMeNot, [19, 20], inject distinct noise queries into the stream of true user queries during a user query session, seeking to achieve an acceptable level of anonymity while not overly upsetting overall utility. Search engine algorithm evolution regarded as a continuous “arms-race”, is evidenced in the case of Google, for example, by major algorithm changes such as Caffeine and Search+ Your World have included additional sources of background knowledge from Social Media, improved filtering of content such as Panda to counter spam and content manipulation, most recently semantic search capability has been added through Knowledge Graph and HummingBird, [21], [22], [23].

Consent to share data for agreed purposes is critical to user trust in service providers and is a key feature of the EU General Data Privacy Regulation (GDPR), [24]. Several notable browser add-ons, such as Mozilla Lightbeam, [25], and PrivacyBadger, [26], facilitate more active user awareness of possible consent issues by helping a user understand where their data is shared with third parties through the sites they visit. XRay, [27], reports high accuracy in identifying which sources of user data such as email or web search history might have triggered particular results from online services such as adverts. Active consensual sharing of personal data is investigated in [28] through an in-browser capability, called RePriv, allowing a user to select which portions of their personal data they wish to share with requesters. Both PrivAd, [29], and Adnostic, [30] investigate safe profiling through generalisation of user interests in the browser. Both Adnostic and PrivAd seek to protect the true interests of the user by obfuscating and filtering personalised content through a published interface.

Evaluation of the effectiveness of privacy defences in the wild was performed by [31] in the case of TrackMeNot where the authors demonstrate that by using only a short-term history of search queries it is possible to break the privacy guarantees of TrackMeNot using readily available machine-learning classifiers. The importance of background information in user profiling is explored in [32]. Here a similarity metric measuring distance between known background information about a user, given by query history, and subsequent queries is shown to identify 45.3% of TrackMeNot and 51.6% of GooPIR queries. Anti-tracking is an ongoing area of research and recently in [33] an anti-tracking browser called TrackingFree was reported to be effective at disrupting all of the trackers in the Alexa top-500 list. Self-regulation has also proven problematic, in [34], six different privacy tools, intended to limit advertising due to behavioural profiling, are assessed. The tools assessed implement a variety of tactics including cookie blocking, site blacklisting and Do-Not-Track (DNT) headers. DNT headers were found to be ineffective in tests at protecting against adverts based on user profiling.

Examples of unsubstantiated and misleading claims by providers of technology to enhance individual privacy are common, [35, 36]. Concerns about objective evaluation of the claims by providers of such technologies have attracted the attention of Government, where the need for “Awareness and education of the users …” is identified in [37] as a key step to building trust and acceptance of privacy technologies for individuals. Accountability and enforcement of accountability for privacy policy is also attracting attention. Regulatory requirements for data handling in industries such as Healthcare (HIPPA) and Finance (GLBA) are well established. The position with respect to handling of data collected by online recommender systems is less clear. In [3], the author reviews computational approaches to specification and enforcement of privacy policies at large scale.

Our contribution in this paper is orthogonal to the contributions in the works discussed here. We address the complimentary challenge of privacy monitoring by detecting possible inappropriate use of personal user data by observing personalised outputs. In this respect our approach can be deployed in conjunction with the technologies mentioned.

III General Setup

III-A Threat Model

The setup we consider is that of a general user of a commercial, for-profit online search engine. The relationship between the user, denoted $\mathcal{U}$ , and the online system, denoted by $\mathcal{S}$ , is based on mutual utility where both parties obtain something useful from the interaction – $\mathcal{U}$ gets useful information and recommendations – while $\mathcal{S}$ gets an opportunity to “up-sell” to $\mathcal{U}$ through targeted content such as advertising. As a commercial business, $\mathcal{S}$ recognises cost per user interaction and responsiveness of service are critical to competitiveness. Accordingly content based on user profiling is intended to adapt dynamically to the changing interests of $\mathcal{U}$ . $\mathcal{U}$ is generally informed regarding good personal privacy practice and is alert to unwanted or embarrassing personalisation. When $\mathcal{U}$ detects threats to her privacy she wishes to assess her ability to plausibly deny her interest in compromising content to avoid awkward social implications. The relationship between $\mathcal{U}$ and $\mathcal{S}$ is generally described as “honest but curious” in the literature. Accordingly we will refer to $\mathcal{S}$ as an observer rather than the more traditional adversary.

Let $\{c_{1},\ldots,c_{N}\}$ denote a set of sensitive categories of interest to $\mathcal{U}$ , e.g. bankruptcy, cancer, addiction, etc. Gather all other uninteresting categories into a catch-all category denoted $c_{0}$ . The set $\mathcal{C}=\{c_{0},c_{1},\ldots,c_{N}\}$ is complete in the sense that all user topic interests can be represented as subsets of $\mathcal{C}$ with the usual set operations. We are interested in threats from search engine learning that compromise the ability of $\mathcal{U}$ to plausibly deny their interest in sensitive topics. We will assess threats to plausible deniability by testing if content appearing on search result pages can be attributed to interest in a specific topic $c_{i}\in\mathcal{C}$ , versus interest in any other topic, on the balance of probabilities.

We treat $\mathcal{S}$ as a black-box with internal state unknown to $\mathcal{U}$ . As a starting point, our initial assumption is that $\mathcal{S}$ is motivated to use its internal state of knowledge of $\mathcal{U}$ when producing personalised outputs for $\mathcal{U}$ , thereby revealing something about its internal state.

Assumption 1 (Revealing Observations).

*A search engine selects personalised page content, such as adverts, it believes are aligned with our interests.

When a search engine infers that a particular advertising category is likely to be of interest to a user, and it is more likely to generate click through and sales, it is obliged to use this information when selecting which adverts to display. This suggests that, by examining advert content recommended by the search engine, it is possible to detect evidence of sensitive topic profiling by the search engine. Assumption 1 is fundamental to the application of our approach in that, if $\mathcal{S}$ does not produce content that reveals evidence of learning then, since $\mathcal{S}$ is a black-box in our model, our approach has nothing to say about observer learning. In summary, we rely on the observer to show his hand through adverts – our approach can only observe what is shown. In practice this does not appear to be a significant limitation with regard to many topics regarded as sensitive to users. In our experiments we observe an average of $2-3$ adverts per probe query with less than $10\%$ of probe queries resulting in no advert content. We also note that our scope is limited to examining advert content. We note that in addition to adverts commercial search engines also typically provide additioenal personalised content that could also be tested for evidence of learning, for example Google provides a variety of personalised content such as “top stories”, related Tweets. However we leave consideration of these as future work.

A user interacts with a search engine by issuing a query, receiving a web page in response and then clicking on one or more items in the response. In the case of web-search, a single such interaction, labeled with index $j$ , consists of a query, response page, item-click triple, denoted $\omega_{j}=\left(q_{j},p_{j},l_{j}\right)$ .

We model construction of a query $q_{j}$ as selection of words from a generally available dictionary denoted $\mathcal{D}$ . We assume that words in $\mathcal{D}$ are matched to topics in $\mathcal{C}$ . The word–topic category matching is not unique and words may be matched to multiple topic categories. A user session of length $k>0$ steps consists of a sequence of $k$ individual steps, and is denoted $\left\{\omega_{k}\right\}_{k\geq 1}$ . The sequence of interactions $\left\{\omega_{k}\right\}_{k\geq 1}$ is jointly observed by the user and the search engine – and perhaps several other third-party observers. The relationship between prior and posterior background knowledge at each step $k$ is

[TABLE]

where $\mathscr{E}_{0}$ denotes the initial background knowledge state of $\mathcal{S}$ at the beginning of a session immediately before $\omega_{1}$ is observed. The detail of $\mathscr{E}_{0}$ is unknown to $\mathcal{U}$ who treats $\mathcal{S}$ as a black-box. Figure 1 illustrates the interaction between user and search engine in our model.

Let the random variable $\mathbf{\bar{X}}$ with sample space $\{0,1\}^{N}$ represent user interest in categories in $\mathcal{C}$ during a session. A value of $1$ in element $i$ of $\mathbf{\bar{X}}$ indicates evidence is detected of user interest in topic $c_{i}$ .

After each step $k$ of a query session, $\mathcal{S}$ can construct a posterior probability distribution for $\mathbf{\bar{X}}$ , namely, for $\mathbf{\bar{x}}\in\{0,1\}^{N}$

[TABLE]

We use $\Omega_{k}$ to represent the observation at step $k$ , so that $\Omega_{k}=\omega_{k}$ indicates $\omega_{k}$ is observed at step $k$ .

The individual interest vector for topic $i$ is denoted $\mathbf{\bar{c}}_{i}$ , a vector with a single $1$ in the $i^{th}$ position and [math] in all other positions. The probability of that a user is interested only in topic $i$ at step $k$ of a session and the posterior probability of detecting evidence that a user is interested only in topic $i$ at query step $k$ , conditioned on observing $\omega_{k}$ and background knowledge $\mathscr{E}_{k}$ , are

[TABLE]

respectively. Since $\mathcal{C}$ contains all possible topics:

[TABLE]

III-B Example: Single Sensitive Category

**To illustrate mathematical results as we go, we use a simple ideal model, consisting of a single sensitive category, as an illustrative example. We will refer to it as the Single Sensitive Category (SSC) model. The single sensitive topic is denoted $c^{1}$ and the catch-all, non-sensitive topic representing every other topic that is not part of the sensitive topic is denoted $c^{o}$ . **

Suppose $\mathcal{U}$ can issue queries related to either of two topics $\{c^{1},c^{o}\}$ denoting sensitive and non-sensitive interests respectively. $\mathcal{S}$ models the process by which $\mathcal{U}$ draws queries according to an initial probability model

[TABLE]

**with $p^{1}+p^{0}=1$ . ** On observing a query from $\mathcal{U}$ at step $k$ the observer $\mathcal{S}$ outputs one of $\{\omega^{s}_{k},\omega^{n}_{k}\}$ according to the associated conditional probabilities at step $k$ given by

[TABLE]

The SSC model is deliberately simple as the intention is to illustrate mathematical concepts. The model is generally useful for exploring black-box interactions and can be readily extended to include more sophisticated scenarios such as allowing $\mathcal{U}$ to select from multiple topics, as would happen when $\mathcal{U}$ attempts to obfuscate her interests by switching topics.

III-C Plausible Deniability

Our threat assessment model is based on plausible deniability. Informally, the user activity observed by the search engine exhibits plausible deniability when, with high probability, is consistent with the user being interested in any one of several topics at least one of which is not sensitive for the user. That is, the user activity supports reasonable doubt about the user’s actual interest in a given sensitive topic.

In our setup, the topics are $c_{i}\in\mathcal{C}$ while the observed activity is $\{\omega_{j}\}_{j=1}^{k}$ at step $k$ in a session (i.e. the queries, search result pages and associated user clicks). We formalise plausible deniability as follows.

Definition 1 (( $\epsilon,m$ )–Plausible Deniability ).

*For privacy parameters $\epsilon>0$ and $m>0$ and a set of $N+1$ topics $\mathcal{C}=\{c_{0},\ldots c_{N}\}$ , a user with a true user interest vector $\mathbf{\bar{x}}\in\{0,1\}^{N+1}$ is said to have ( $\epsilon,m$ )–Plausible Deniability * at step $k$ in the query session, if, for observations $\{\omega_{j}\}_{j=1}^{k}$ , made at each step $j=1,\ldots k$ of a session by an observer possessing initial background knowledge $\mathscr{E}_{0}$ at the beginning of the session, there exist at least $m-1$ other $\mathbf{\bar{x}}_{i}\in\{0,1\}^{N+1}\setminus\{\mathbf{\bar{x}}\}$ such that

[TABLE]

where

[TABLE]

For (5) to be well-defined, all probabilities are assumed to be non-zero. In practice, this is not a significant restriction since categories with zero probability are gathered into the catch-all topic $c_{0}$ . By applying the chain-rule for conditional probability, (6) can be rewritten as

[TABLE]

where

[TABLE]

is the incremental change in ( $\epsilon,m$ )–Plausible Deniability arising from the single observation $\omega_{k}$ at step $k$ of the session.

In the case of the SSC model, there are two topics – sensitive and non-sensitive – so that $\mathcal{U}$ can at best hope for $(\epsilon,m=2)$ –Plausible Deniability for the sensitive topic, in which case (9) can be written as

[TABLE]

when $\mathcal{S}$ emits a sensitive output $\omega^{s}_{j}$ at step $j$ , and

[TABLE]

when a non-sensitive output is emitted at step $j$ . Substituting these values into (8) and assuming $a$ sensitive outputs and $k-a$ non-sensitive outputs in a session of length $k$ , let $k_{s}$ denote the sub-sequence of steps where sensitive outputs are detected and $k_{n}=[k]\setminus k_{s}$ be the steps where non-sensitive outputs are detected so that

[TABLE]

III-D Comparison with Other Anonymity Measures

Intuitively, Definition 1 is similar to $k$ –anonymity in that an observer can only explain observations $\{\omega_{j}\}_{j=1}^{k}$ to within a generalised set consisting of at least $k=m$ topic vectors with probability bounded by the choice of $\epsilon$ . Definition 1 differs from regular k–anonymity in requiring both upper and lower bounds in (6) since evidence of loss of interest in a sensitive topic may be as revealing and potentially embarrassing as evidence of increase of interest.

Definition 1 can also be compared with a slightly weaker form of Differential Privacy. Informally, making an observation should not make $\mathcal{S}$ significantly more, or less, confident of user interest in a particular sensitive topic.

**From (9) the incremental change due to a single observation $\Omega_{j}=\omega_{j}$ is **

[TABLE]

**by applying Bayes Theorem. Since (9) is bounded above and below for at least $m-1$ other $x_{i}$ when Definition 1 holds, it follows that **

[TABLE]

for at least $m-1$ other topic vectors $x_{i}$ – but not necessarily for all topic vectors. In which case we say that m–Differential Privacy holds for $\epsilon>0$ whenever Definition 1 holds, meaning that for any topic vector $x$ it is impossible to distinguish it from at least $m-1$ other topic vectors in $\{0,1\}^{N+1}$ . This is a slightly weaker statement of Differential Privacy from the usual global definition.

III-E Testing for Plausible Deniability

The following indistinguishability definition of privacy risk measures the change in belief by a search engine due to inference from observed user events relative to its prior belief conditioned on the background data available at the start of the query session. It is adapted from work begun in [2] and using it allows us to adapt tools originally developed there.

Definition 2** ( $\epsilon$ -Indistinguishability ).

For a privacy parameter $\epsilon>0$ , a user with interest vector $\mathbf{\bar{x}}\in\{0,1\}^{N}$ is said to be s said to be $\epsilon$ -Indistinguishable with respect to an observation of user actions $\omega_{k}$ at step $k$ , if

[TABLE]

where

[TABLE]

*is called the $\epsilon$ -Indistinguishability score of the interest vector $\mathbf{\bar{x}}$ for observation $\omega_{k}$ and background knowledge $\mathscr{E}_{k}$ at step $k$ .

In other words, for $\epsilon$ *-Indistinguishability to hold at step $k$ of a query session, the conditional posterior distribution should be approximately equal to the prior distribution at the beginning of the query session for the true interests of the user. To ensure (16) is well defined we assume all probabilities in (16) are non-zero, so that $0<\mathbb{M}_{k}(\mathbf{\bar{x}},\omega_{k})<\infty$ . Expression (16) implies that if $\epsilon$ -Indistinguishability *holds at step $k$ for an interest vector $\mathbf{\bar{x}}$ , then $e^{-\epsilon}<(\mathbb{M}_{k-1}(\mathbf{\bar{x}},\omega_{k}))^{-1}<e^{\epsilon}$ .

The next result provides the necessary connection between $\epsilon$ *-Indistinguishability and ( $\epsilon,m$ )–Plausible Deniability to apply tools, developed in [2] for $\epsilon$ -Indistinguishability *, to ( $\epsilon,m$ )–Plausible Deniability .

Proposition III.1.

If $\epsilon$ -Indistinguishability holds on a subset $\mathcal{I}\subseteq\{0,1\}^{N+1}$ for $\epsilon>0$ for step $k$ and the initial step $1$ , then ( $4\epsilon,m$ )–Plausible Deniability holds on $\mathcal{I}$ for $m\leq|\mathcal{B}|$ . Furthermore

[TABLE]

Proof.

Assume $\epsilon$ -Indistinguishability holds on $\mathcal{I}\subseteq\{0,1\}^{N+1}$ then for any $\mathbf{\bar{x}}_{1},\mathbf{\bar{x}}_{2}\in\mathcal{I}$ . From (9)

[TABLE]

Where expressions (b) and (c) in (20) are $(\mathbb{M}_{k-1}({\mathbf{\bar{x}}_{1},\omega_{k-1}}))^{-1}$ and $(\mathbb{M}_{k-1}(\mathbf{\bar{x}}_{2},\omega_{k-1}))^{-1}$ respectively from the definition in (17).

Therefore from the definition of $\mathbb{D}_{k}(\mathbf{\bar{x}},\mathbf{\bar{x}}_{i},\{\omega_{j}\}_{j=1}^{k})$ in (8)

[TABLE]

So that (18) holds. Since individual elements in (22) satisfy $\epsilon$ -Indistinguishability for $\epsilon>0$ it follows that ( $4\epsilon,m$ )–Plausible Deniability holds as required. ∎

Proposition III.1 provides a basic strategy for asserting when ( $\epsilon,m$ )–Plausible Deniability holds. By establishing a value of $\epsilon$ for which a collection of topics $\mathcal{C}=\{c_{0},\ldots c_{N}\}$ satisfies $\epsilon$ *-Indistinguishability , ( $4\epsilon,m$ )–Plausible Deniability follows with, at least, $m=|\mathcal{C}|=N+1$ . This is a minimum guarantee, as there may be topics for which $\epsilon$ -Indistinguishability *fails but ( $4\epsilon,m$ )–Plausible Deniability holds. In our experiments we test whether $\mathcal{U}$ can plausibly deny whether or not observed actions can be uniquely associated with interest in a given sensitive topic $c_{i}$ versus interest in “any other” topic in $\mathcal{C}\setminus\{c_{i}\}$ , so that $m=2$ .

For a topic $c_{i}\in\mathcal{C}$ the expression for ( $\epsilon,m$ )–Plausible Deniability becomes

[TABLE]

where $\mathbf{\bar{c}}_{-i}$ denotes the topic interest vector representing interest in the topics $\mathcal{C}\setminus\{c_{i}\}$ .

The following result connects $\mathbb{D}_{k}(\mathbf{\bar{c}}_{i},\mathbf{\bar{c}}_{-i},\{\omega_{j}\}_{j=1}^{k})$ to variation in probabilities

Proposition III.2.

If ( $\epsilon,m$ )–Plausible Deniability holds for $\mathbf{\bar{c}}_{i}$ and $\mathbf{\bar{c}}_{-i}$ with $\epsilon>0$ and $m=2$ then

[TABLE]

And so

[TABLE]

is a lower bound for the best possible achievable level of ( $\epsilon,m$ )–Plausible Deniability .

Proof.

*If ( $\epsilon,m$ )–Plausible Deniability holds for $\epsilon>0$ then *

[TABLE]

so that $|\log(\mathbb{D}_{k}(\mathbf{\bar{c}}_{i},\mathbf{\bar{c}}_{-i},\{\omega_{j}\}_{j=1}^{k}))|$ is a lower bound for all $\epsilon>0$ for which ( $\epsilon,m$ )–Plausible Deniability holds. The result follows by substituting the expression in (21) for $\mathbb{D}_{k}(\mathbf{\bar{c}}_{i},\mathbf{\bar{c}}_{-i},\{\omega_{j}\}_{j=1}^{k})$ in (26). ∎

Proposition III.2 will be used later to create an estimator for $\epsilon_{*}$ that can be measured in experiments. From now on we simplify our discussion to the case $m=2$ and so experimental results are reported accordingly for the two-topic case, $\{c_{i},c_{-i}\}$ .

IV Implementation

IV-A Preliminaries

During testing we wish to use (25) to create an estimator, we call PDE, to estimate the level of ( $\epsilon,m$ )–Plausible Deniability afforded. Since estimating the quantities in (25) uses PRI, we recap the bare essentials of PRI and refer the reader to [2] for more details.

To test for learning PRI injects a predefined probe query into a stream of “true” queries during a query session. In this way, any differences detected in advert content in response to probe queries can be compared to identify evidence of learning. An ideal probe should not disrupt the learning process of $\mathcal{S}$ . Denote the event a probe query is selected from $\mathcal{D}$ at step $k+1$ by $\Omega_{k+1}=\omega_{k+1}^{P}$ . We formalise the notion of an ideal probe query by demanding that observing $\Omega_{k+1}=\omega_{k+1}^{P}$ should be conditionally independent of the user topic $\mathbf{\bar{X}}$ given the existing background knowledge of the observer

[TABLE]

and so observing the probe query and associated clicks does not provide any more information to the observer about the interests of $\mathcal{U}$ than the current background knowledge already provides. From (27)

[TABLE]

In practice, choosing an ideal probe query is achieved by selecting words from $\mathcal{D}$ that match words for several topics in $\mathcal{C}$ so that it is not possible to associate a single topic in $\mathcal{C}$ with the probe query.

Construction of PRI is based on several assumptions, the first of these assumptions is that the background knowledge at the first step of a query session, $\mathscr{E}_{1}$ , provides sufficient description of background knowledge for all subsequent steps of that query session, $\mathscr{E}_{k}$ .

Assumption 2 (Sufficiently Informative Responses).

Let $\mathcal{K}\subset\{1,2,\cdots\}$ label the sub-sequence of steps at which a probe query is issued. At each step $k\in\mathcal{K}$ at which a probe query is issued,

[TABLE]

*for each topic ${c}_{i}\in\mathcal{C}$ . *

So that it is not necessary to explicitly use knowledge of the search history during the current session when estimating $\mathbb{M}_{k}$ for a topic $c$ as this is already reflected in the search engine response, $\omega_{k}$ , with the initial background knowledge $\mathscr{E}_{1}$ capturing background knowledge up to the start of the session, at step $k$ . Assumption 2 greatly simplifies estimation as it means we do not have to take account of the full search history, but requires that the response to a query reveals search engine learning of interest in sensitive category $c$ which has occurred. Assumption 2 was called the “Informative Probe” assumption in [2].

The next assumption is that adverts are selected to reflect search engine belief in user interests. In this way adverts are assumed to be the principal way in which search engine learning is revealed. Given this assumption, conditional dependence on $\omega_{k}$ can be replaced with dependence on the adverts appearing on the screen.

Assumption 3 (Revealing Adverts).

In the search engine response to a query at step $k$ it is the adverts ${a}_{k}$ on a response page which primarily reveal learning of sensitive categories.

[TABLE]

*for each topic ${c}_{i}\in\mathcal{C}$ . *

We estimate background knowledge $\mathscr{E}_{1}$ by selecting a training data-set, denoted $\mathcal{T}$ , consisting of (label, advert) pairs; where the label is the category in $\mathcal{C}$ associated with the corresponding advert. For example, when testing for evidence of a single, sensitive topic, called “Sensitive”, $\mathcal{T}$ contains items labeled “Sensitive’ or “Other”, where “Other” is the label for the uninteresting, catch-all topic $c_{0}$ . In this way $\mathcal{T}$ approximates the prior observation evidence available at the start of the query session so that $\mathcal{T}$ is an estimator for $\mathscr{E}_{1}$ .

Text processing of $\mathcal{T}$ produces a dictionary $\mathcal{D}$ of keyword features. This processing removes common English language high-frequency words and maps each of the remaining keywords to a stemmed form by removing standard prefixes and suffixes such as “–ing” and “–ed”. The dictionary $\mathcal{D}$ represents an estimate of the known universe of keywords according to the background knowledge contained in the training data.

Text appearing in the adverts in a response page is preprocessed in the same way as $\mathcal{T}$ to produce a sequence of keywords from $\mathcal{D}$ for each advert; denoted $W=\{w_{1},w_{2},\cdots,w_{|W|}\}$ . Words not appearing in $\mathcal{D}$ are ignored in our experimental setup for simplicity since sessions are short. In an operational setting it is possible, for example, to update $\mathcal{D}$ when new keywords are encountered and refactor $\mathscr{E}_{1}$ accordingly.

Let $n_{\mathcal{D}}(w|W):=|\{i:i\in\{1,\cdots,|W|\},w_{i}=w\}|$ , denote the number of times an individual keyword $w\in\mathcal{D}$ occurs in a sequence $W=\{w_{1},w_{2},\cdots,w_{|W|}\}$ . The relative frequency of an individual keyword $w\in\mathcal{W}$ is therefore,

[TABLE]

recalling that only keywords $w$ appearing $\mathcal{D}$ are admissible due to the text preprocessing in our setup.

Let $c_{i}\in\mathcal{C}$ be a sensitive topic of interest, and let $\mathcal{T}(c_{i})$ denote the subset of $\mathcal{T}$ where the labels corresponds to $c_{i}$ . Let $T(\mathcal{C})$ denote the set of adverts labelled for any topic in $\mathcal{C}$ . The PRI estimator for $\mathbb{M}_{k}(\mathbf{\bar{x}},\omega_{k})$ given adverts $a_{k}$ appearing on the result page for query number $k$ , is111Note that in [2] the expression given for $\widehat{M}_{k}(\mathbf{\bar{c}}_{i},\omega_{k})$ is incorrect and is corrected here.:

[TABLE]

where we concatenate all of the advert text on page $k$ into a single sequence of keywords and $\psi_{\mathcal{D}}(w|a_{k})$ is the relative frequency of $w$ within this sequence. Similarly, concatenating all of the keywords in the training set $\mathcal{T}(c_{i})$ , respectively $\mathcal{T}(\mathcal{C})$ , into a single sequence then $\phi_{\mathcal{D}}(w|\mathcal{T}(c_{i}))$ , respectively $\phi_{\mathcal{D}}(w|\mathcal{T}(\mathcal{C}))$ , is the relative frequency of $w$ within that sequence.

IV-B Tuning the PRI Estimator

The quantity $\psi_{\mathcal{D}}(w|a_{k})$ in the expression for the PRI estimator, (32), is problematic when the adverts $a_{k}$ on page $k$ do not contain any of the topic keywords in dictionary $\mathcal{D}$ i.e. when $a_{k}=\emptyset$ , indicating there is no detectable evidence of a particular topic. To be consistent with the definition of $\epsilon$ *-Indistinguishability *in Section 2, should result in a PRI score of one for that topic. We therefore replace $\phi_{\mathcal{D}}(w|a_{k})$ with

[TABLE]

Training data is based on a sample of all possible adverts for a particular topic. We may be unlucky so that during the training phase we fail to observe adverts containing infrequently occurring keywords for a particular topic. In this case the relative frequency of such a keyword will be zero and it will not contribute when estimating PRI if encountered in an advert. To address this we introduce a Laplace smoothing parameter $\lambda$ as follows

[TABLE]

The parameter $0\leq\lambda<1$ enforces a minimum frequency of $1/|\mathcal{D}|$ on every keyword. The expression (32) is adjusted correspondingly to give a new estimator we call PRI+:

[TABLE]

We will use the PRI+ estimator, given by (37), from now on in this paper, unless stated otherwise. In our experiments we find empirically, through verification with the training data, that choosing the parameter $\lambda=0.001$ worked well.

IV-C The PDE Estimator

Substituting the PRI+ estimator $\widehat{\mathbb{M}_{k}{}}$ for $\mathbb{M}_{k}{}$ in (25) gives the PDE estimator

[TABLE]

From Proposition III.2, the PDE estimator in (38) can be interpreted directly as the best possible level of ( $\epsilon,m$ )–Plausible Deniability a user can claim in the case $m=2$ . We report the maximum value of PDE measured by probe step in our experiments to show the worst possible ( $\epsilon,m$ )–Plausible Deniability scenario for $\mathcal{U}$ . We also report the median value of PDE as a representative bound for approximately $50\%$ of the samples. An example of reporting is shown in Table I for the reference topic “gay”.

For example, from Table I, a reported maximum value of PDE of $47\%$ in the second column indicates that the difference in probabilities that $\mathcal{U}$ is uniquely interest in the reference topic versus being interested in any other topic is at least $47\%$ in the worst case by probe step $5$ . The median value of $25\%$ in parentheses in the Probe 3 and 4 columns indicates that the difference in probabilities can be expected to be at least $25\%$ in $50\%$ of cases by probes $3$ and $4$ . Overall the results suggest that ( $\epsilon,m$ )–Plausible Deniability is unlikely to constitute a reasonable defence in this case.

Reported values of PDE may increase, or decrease, during a session as individual queries are judged as more, or less, revealing by the PDE estimator. Inspection of the query scripts generated for the topic $c_{i}=\text{Gay}$ , for example, shows that the queries associated with probe step $3$ are same sex relationships and how do i know if I’m gay, both of which appear revealing. The queries from the test script corresponding to probe steps $4$ and $5$ are HIV symptoms, HIV treatment, HIV men and aids men which may not point as distinctly to specific interest in the $c_{i}=\text{Gay}$ as they could reasonably be associated with health concerns.

The zeroth probe in a session is always run first, before any other query, to establish a baseline PRI+ score for the session. As a result the measured PDE values for the zeroth probe is always [math] for both maximum and median values and is not reported in our results.

One popular approach to designing defences of ( $\epsilon,m$ )–Plausible Deniability is to attempt to hide in the crowd. For example, by injecting varying degrees of noise in the stream of observations $\{\omega_{j}\}$ in the hope that $\mathcal{S}$ will not detect the true sub-stream of sensitive events. In [2], the authors observe that varying click patterns is seen to change the absolute volume of adverts appearing on a page. As both user clicks and queries are potential indicators of user interest for an observer we test injected noise from both queries and clicks as possible defence strategies.

An alternative tactic is to invert the previous approach by instead attempting to hide in plain sight. By choosing a non-sensitive proxy topic, chosen to attract personalised content $\mathcal{U}$ can then carefully hide true, sensitive queries in a stream of proxy topic queries. By demonstrating clear interest in a proxy non-sensitive topic $\mathcal{U}$ may tip the balance of probability toward the proxy topic by drawing the attention of the observer $\mathcal{S}$ .

V Experimental Results

V-A Preliminaries

To facilitate easy comparison we use the same experimental data collection setup as [2]. We summarise the key elements here with additional detail in the Appendix and refer the reader to [2] for full details.

User interest topic categories taken from [2], are used in our experiments. Of the user interest topics, (i) ten are sensitive categories associated with subjects generally identified as causes of discrimination (medical condition, sexual orientation etc) or sensitive personal conditions (gambling addiction, financial problems etc), (ii) a further sensitive topic is related to London as a specific destination location, providing an obviously interesting yet potentially sensitive topic that a recommender system might track, (iii) the last topic is a catch-all category labeled “Other”.

To construct sequences of queries for use in test sessions, we select a probe query, providing a predefined sampling point for data collection. Numbering the probes in a session starting from [math], the zeroth query issued in every session is a probe query. The zeroth probe is used to establish the baseline for calculations of the PDE estimator for subsequent probe queries. The PDE estimator, from (38), of the zeroth probe in a session is [math] and so is not included in reports of experimental results. Measurements of PDE values are reported for each of the probe queries $1$ – $5$ during experiments providing a consistent sample for analysis.

In our experiments, when implementing the “Proxy Topic” defence model, we choose three uninteresting, proxy topics likely to attract adverts, namely tickets for music concerts, searching for bargain vacations and buying a new car.

All scripts were run for $3$ registered users and $1$ anonymous user on the Google search engine, yielding a data set consisting of $21,861$ probe queries in total across all of the test user interest topics. Test data was divided into individual test data sets based on different test configurations with each test data set consisting of approximately $1,000$ probe queries.

A separate hold-back was created for a common training data set of approximately $1,000$ queries. The PDE estimator in (38) uses the training data-set to model the prior background knowledge $\mathscr{E}_{0}$ . We do not re-train PDE during testing as new adverts are encountered. Experimental measurements of PDE are with respect to the common training set for consistent comparison.

All queries in a test session were automatically labelled with the intended topic of the test session as given by the query script used. For example, all queries from a session about “prostate“ are labeled as “prostate” including probe queries. In this respect the labels capture intended behaviour of queries, rather than attempting an individual interpretation of specific query keywords during a user session. Test data is automatically divided into $7$ folds for processing so that, reported statistics are taken over $7$ distinct, randomised sub-samples of test data.

Before proceeding to testing with PDE, we verify PRI+ by comparing its detection capability with previous results obtained in [2] for the PRI estimator and compare the performance of PRI+ with alternative implementations using Naive Bayes and Support Vector Machine as sensitive topic detectors.

Comparison results between PRI and PRI+ are shown in Table VI. and were produced by processing data taken from [2] but applying the PRI+ estimator to decide which topic is detected. For comparison with [2], we declare a topic $c_{i}$ has been detected during a query session, consisting of $5$ probe queries, if at least one of the $5$ probe queries is detected as topic $c_{i}$ . For comparison, detection results for the PRI estimator from Table XIV(b) in [2], are reproduced as Table VI(b). The True Detection rates using PRI+ estimator are better or equal for each topic than the rates reported in [2]. The False Detection rates are also better or equal in the case of all topics tested comparing favourably with the results obtained in [2].

Comparison of PRI+ with alternative implementations was performed by taking results from Multinomial Naive Bayes (NB) and Linear SVM (SVM) classifiers to estimate the probabilities in the definition of $\mathbb{M}_{k}$ in (17). The intent of the comparison is to determine which of the NB, PRI+ and SVM estimators detect privacy threats, using the definition of $\mathbb{M}_{k}$ in (17), for test items previously labeled as “sensitive” by examining the topic of the query used. To qualify as a privacy threat we choose a value of $e^{\epsilon}>1.1$ . We expect precision to be substantially less than 100% for all estimators because the threshold will filter out weaker detections where $1.0<e^{\epsilon}\leq 1.1$ .

Other than varying how $\mathbb{M}_{k}$ was estimated, all other inputs and calculations were identical. A common test data set was constructed by selecting 5,500 result pages for each sensitive topic and then randomly selecting an additional 5,500 result pages labeled for the non-sensitive topic. In this way each sensitive topic had a balanced verification data set of 11,000 labeled items. Each verification data-set was divided randomly into $20\%-80\%$ test–training sets and calculations repeated 5 times for 5-fold verification of each of the NB, PRI+ and SVM estimators. The Multinomial Naive Bayes and Linear SVC modules from the Python Sklearn package were used to construct the NB and SVM estimators, [38]. After common preprocessing each of the NB, PRI+ and SVM classifiers were trained and probability estimates captured for the 5-fold test data-sets. A threat is declared “detected” if the calculated valued of $\mathbb{M}_{k}$ for the sensitive topic exceeds $1.0$ . Precision of sensitive topic threat detection is shown by topic in Figure 2 for the NB, PRI+ and SVM approaches.

**The results Figure 2 indicate that that the PRI+ estimator detects significantly more true-positive detection results than either of the NB or SVM estimators for all sensitive topics tested. The initial detection sensitivity of each of these estimators is influenced by the labelling assigned to examples in the training set. We adopt the perspective that privacy tools should err on the side of caution so that high detection sensitivity in the initial “out of the box” stage is a prudent approach. In a real-world application of PRI+ the user would provide incremental training examples over time reflecting their tolerance of privacy risk and so tune PRI+. **

V-B Establishing a Baseline

We begin with a sequences of queries, interleaved with probe queries, in what we term a “no click, no noise” model. Here there is no injected noise and no items are clicked on any of the search results pages. This model provides a baseline, where the queries alone are available to the recommender to learn about a user session as it progresses. Measurements of PDE for all topics using the “no click, no noise” model are shown in Table II.

For the health-related topics Anorexia, Diabetes, Prostate, Bankrupt, Divorced, Gay the reported results are high, indicating lack of plausible deniability for each of these topics. It is concerning that personal circumstances, health status and sexual orientation appear to be the most revealing topics according to our experiments. In the case of the topic Disabled there is more cause of concern about ( $\epsilon,m$ )–Plausible Deniability as the session progresses. On inspection of the associated query script this appears to be again related to the specificity of the queries at each probe step. At the beginning of this script the queries are related to availability of services – for example, locations of disabled parking – while later queries are more specific to named conditions – for example, treatment for spina bifida.

The topics $\{Location,Payday,Unemployed\}$ appear among the topics of least concern from the perspective of ( $\epsilon,m$ )–Plausible Deniability . Both of the topics Payday and Unemployed asked queries about availability of social support services whereas queries for the topic Bankrupt asked about availability of paid professional services such as lawyers and accountants. It is perhaps an illustration of the motivations of a for-profit service where users seeking social supports are of less interest than users seeking expensive paid services.

Overall, measurements of PDE in experiments appear to agree with expectations from inspection of the underlying queries. Our results suggest that queries are a strong signal to the observer of user interest, and that estimates from PDE appear to distinguish queries that are strongly revealing of specific topic interest from more generic queries where plausible deniability is clearer.

V-C The Effect of Random Noise Injection

Following from Section III-C, we now consider the impact of injecting non-informative queries chosen at random from our popular query list into a user session. We simply refer to these as “random noise” queries. We consider three levels of random noise queries for testing purposes:

“Low Noise”

The automation scripts select uninteresting queries uniformly at random from the top-query list and inject a single random noise query after every topic-specific query so that the “signal-to-noise ratio” of sensitive to noise queries in this case is $1:1$ .

“Medium Noise”

Here the automation scripts inject two randomly selected queries after each topic-specific query for a signal to noise ration of $1:2$ .

“High Noise”

In this noise-model with the highest noise setting, three random noise queries are injected, resulting in a signal-to-noise ratio of $1:3$ .

Note also that the automation scripts were configured to ensure the relevant number of noise queries was always injected immediately before each probe query. Our intention was to construct a “worst case” for detection of learning, where probe queries are always separated from sensitive user queries by the specified number of noise queries.

Table III(a-c) shows the measured PDE values for Low, Medium and High levels of noise respectively for the “no click” model. The PDE values for all levels of noise are similar to the “no click, no noise” baseline values in Table II.

Overall, there is no consistent reduction in values across all topics for all noise levels, indicating that injecting random noise queries does not have a consistent effect. In some cases, such as topic Gay, measured values of PDE increase for all noise levels indicating that noise injection worsens the user’s ability to assert ( $\epsilon,m$ )–Plausible Deniability .

These results indicate that even the “High Noise” model fails to reduce the measured values of PDE in a coherent way, so that injecting random noise has not improved plausible deniability significantly with any consistency. We conclude that injection of random noise, even at substantial levels, is not observed to provide a useful defence for plausible deniability in our experiments.

V-D The Effect of Click Strategies

We now consider whether it is possible to disrupt search engine learning by careful clicking of the links on response pages. Intuitively, from the search engine’s point of view, clicking on links is a form of active feedback by a user and so potentially informative of user interests. This is especially true when, for example, a user is carrying out exploratory search where their choice of keywords is not yet well-tuned to their topic of interest. Previous studies have also indicated that there is good reason to believe that user clicks on links are an important input into recommender system learning. In [2] (Section $6.4$ ), user clicks emulated using the “Click Relevant” click-model were reported to result in increases of $60\%$ – $450\%$ in the advert content, depending on the “Sensitive’ topic tested.

We consider four different click strategies to emulate a range of user click behaviours:

“No Click”

No items are clicked on in the response page to a query. This user click-model does not provide additional user preference information to the recommender system due to click behaviour. This click model is used in the baseline measurements presented in Sections V-B.

“Click Relevant”

Given the response page to a query, for each search result and advert we calculate the Term-Frequency (TF) of the visible text with respect to the keywords associated with the test session topic of interest. When $TF>0.1$ for an item, the item is clicked, otherwise it is not clicked. This user click-model provides relevant feedback to the recommender system about the information goal of the user.

“Click Non-relevant”

TF is calculated for each item with respect to the category of interest for the session in question as for the “Click Relevant” click-model, except that items are clicked when the TF score is below the threshold and so they are deemed non-relevant to the topic, that is when $TF\leq 0.1$ . This user click-model attempts to confuse the recommender system by providing feedback that is not relevant to the true topic of interest to the user.

“Click All”

All items on the response page for a query are clicked. This user click-model gives the recommender system a “noisy” click signal, including clicks on items relevant and non-relevant to the user’s information goal.

“Click 2 Random Items”

Two items appearing on the response page for a query are selected uniformly at random with replacement and clicked.

In all cases, when uninteresting, noise queries are included in a query session, the relevant user click-strategy is also applied to the result pages of these queries. In this way we hope to avoid providing an obvious signal to the recommender system that might differentiate uninteresting queries from queries related to sensitive topics. Items on the result page in response to probe queries are not clicked so that the probe query does not provide any additional information to the recommender system.

Measure values of PDE are shown in Table IV. As random noise injection had no observable effect on measurements of PDE for different click models in experiments, only the “No Noise” results are presented here for space reasons.

Taken overall, the results in Table IV(a) for the “non-relevant click, no noise” model suggest clicking on non-relevant advert items is the best strategy of the click models tested. The only difference between the “non-relevant click” model and other click models is that non-relevant items only are clicked, whereas in other click models it is possible that relevant items are clicked. It seems reasonable to postulate that clicking on relevant items provides “fine-tuned” feedback about user interests which is more informative for the observer. Clicking on non-relevant items may divert attention to a modest degree, but not to the extent of masking the sensitive topic revealed by the query.

Comparing the baseline “No Click” PDE observations in Table II each of the subtables in Table IV shows similar lack of consistency to the noise injection models. In out experiments there is no consistent change observed in PDE across topics due to variation in the click patterns tested. As with the noise injection case, there are sporadic increases and decreases in values of PDE but the lack of overall consistency makes using click models as a defence impractical.

It would appear in summary, that clicks transmit information to the observer, but not as consistently as does a revealing query. Consequently none of the user click-models tested appear to change the baseline level of plausible deniability associated with the query in a predictable way so that there is no globally discernible pattern with which to construct practical defence tools based on clicks.

V-E The Effect of Proxy Topics

The next privacy protection strategy we consider is the introduction of proxy topics. In this case sequences of queries, with each sequence related to a single proxy topic which is not sensitive for the user but capable of attracting personalised advert content, are injected into a user session. The idea here is that each such sequence of queries emulates a user session where the proxy topic is the topic of interest. In this way we hope to misdirect learning by the search engine of user interests. The results in Section V-C are relevant here since they suggest that isolated, individual queries – such as randomly selected noise queries – tend not to provoke search engine learning. Our hope is that this can be exploited by inverting the notion of random noise injection so that individual sensitive queries are injected as the noise in proxy topic sessions. Isolated sensitive queries will hopefully not provoke learning whereas the larger number of uninteresting proxy sessions will. In this way we can misdirect learning by the observer.

In out tests the following proxy topics are used:

Tickets

Searching for tickets for events in a well-known local stadium

Vacation

Queries related to a vacation such as flights and accommodation.

Car

Searches by a user seeking to trade in and change their car.

and related queries are constructed by selecting related keywords through the same process as was used for the sensitive topics.

Proxy topic query scripts where constructed by selecting a sensitive topic, and then selecting an uninteresting proxy topic from the list of $3$ proxy topics. Having decided on a sensitive query we wish to issue, we select at least three and no more than four queries related to the proxy topic from a prepared list of proxy topic queries. We next randomly shuffle the order of the selected sensitive and proxy topic queries. In this way there is always a subgroup of at least two proxy topic queries next to each other in each query session. Finally, for testing purposes, we place a probe query before and after each block of 3-4 proxy + 1 sensitive queries to measure changes in PRI+ score. We repeat this exercise using the same proxy topic until a typical query session consisting of $5$ probe queries is created.

Data was collected for $2,300$ such proxy topic sessions. This included each of the sensitive topics and each of the click models described in previous sections. The same PRI+ and PDE setup, including the same training set, as before was used to process the search results.

Measured detection rates are shown in Table V. The measured probability calculated from PDE is [math] for all topics and for all click-models tested. That is, we find it is possible to claim full plausible deniability of interest in all of the topics tested. Since our detection approach is demonstrated to be notably sensitive to observer learning in earlier sections, we can reasonably infer that this result is not due to a defect in the detection methodology but rather genuinely reflects successful misdirection of the search engine away from sensitive topics.

This result is encouraging, especially in light of the negative results in previous sections for other obfuscation approaches. It suggests use of sequences of queries on uninteresting proxy topics may provide a defence of plausible deniability. The trade-offs for the user include the overhead of maintaining proxy topics and associated queries and the additional resources required to issue proxy topic queries in a consistent way. However since both of these tasks were readily automated during our testing it seems reasonable that these trade-offs could be readily managed by software in a way that is essentially transparent to the user.

VI Conclusions and Discussion

Our observations suggest that modern systems, such as Google, are able to identify user interests with high accuracy, exploit multiple signals, filter out uninteresting noise queries and adapt quickly when topics change. Furthermore learning appears to be sustained over the lifetime of query sessions. The power and sophistication of these systems make designing a robust defence of user privacy non-trivial.

The PDE estimator was tested via a comprehensive measurement program using online search engines to show that topic learning results in measurable impacts on the ability of a user to deny their interest in all sensitive topics tested. We find that revealing queries provide a significant signal for search engine adaptation. While user clicks provide additional feedback, we do not observe the same degree of associated learning with click behaviour as is observed with revealing queries. Overall, testing with PDE suggests that defences based on random noise injection and variable click models do not provide a reliable strategy for defence of plausible deniability.

By contrast, our experiments show that proxy topics that are uninteresting to the user but capable of generating commercial content provide observable privacy protection in our experiments. Wrapping sensitive queries in a stream of coherent proxy topic queries appears to distract the online system into adapting to the proxy topic while allowing the sensitive query noise to slip through. Our observation that proxy topics provide some relief indicates that defence of plausible deniability is not impossible, but indicates that increasingly sophisticated approaches are required in the face of ever improving search engine capability. In choosing proxy topics, for example, a user must be careful to not stimulate unintended learning of the proxy topics which may influence the utility of future search results.

Subtle tactics like proxy topics, that exploit the observer’s strengths to tip the balance slightly in favour of the user, suggest an interesting avenue for future research. The simplicity of the approach means it should be possible to extend it in several ways, for example, by injecting a range of uninteresting single topic queries as additional noise in the proxy query stream it may be possible to provide additional guarantees of privacy such as k-anonymity or differential privacy for the sensitive topic. More investigation of proxy topics is an interesting line of future research. Experiments to compare the effectiveness of different proxy topics including, for example, inclusion of proxy topics that are more relevant to the user’s known interests versus proxy topics that are less relevant to user topics. Similarly proxy topics with higher commercial value may have more potential to distract search engine learning than proxy topics with lower commercial value

As discussed in Section II, user click patterns may be used by recommender systems to rank page content, placing content likely to attract user clicks in more prominent positions on pages. In our experiments, we observed changes in volume of advert content on samples of probe query response pages. There are several plausible avenues of investigation that may help explain the mechanism behind this, such as user click patterns and the semantics of the true and noise queries chosen. The approach taken in this paper does not distinguish between items based on rank or order on the page. How the semantics of queries, the interaction between user click-models and the effect of content ranking may impact user privacy is beyond the scope of this current paper and an avenue for future research.

Overall our results point towards an arms race, where search engine capability is continuously evolving. In this setting, even if injection of proxy topic sessions were to become widely deployed then we can reasonably expect search engines to respond with more sophisticated learning strategies. Our results also point towards the fact that the text in search queries plays a key role in search engine learning. While perhaps obvious, this observation reinforces the user’s need to be circumspect about the queries that they ask if they want to avoid search engine learning of their interests.

Appendix A Additional Results

Lemma 1.

For $x,y,\epsilon\in\mathbb{R}^{+}$ with $0<x,y<1$

[TABLE]

Proof.

Assuming the left hand side of (39) holds

[TABLE]

∎

Bibliography38

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Oxford English Dictionary Online.
2[2] Pól Mac Aonghusa and Douglas J. Leith. Don’t let google know i’m lonely. ACM Trans. Priv. Secur. , 19(1):3:1–3:25, August 2016.
3[3] Anupam Datta. Privacy through accountability: A computer science perspective. In International Conference on Distributed Computing and Internet Technology , pages 43–49. Springer, 2014.
4[4] Aniko Hannak, Piotr Sapiezynski, Arash Molavi Kakhki, Balachander Krishnamurthy, David Lazer, Alan Mislove, and Christo Wilson. Measuring personalization of web search. In Proceedings of the 22Nd International Conference on World Wide Web , WWW ’13, pages 527–538, Republic and Canton of Geneva, Switzerland, 2013. International World Wide Web Conferences Steering Committee.
5[5] Daniel J. Solove. “i’ve got nothing to hide” and other misunderstandings of privacy. San Diego Law Review, Vol. 44, 2007 , 2007.
6[6] Helen Nissenbaum. Privacy in Context: Technology, Policy, and the Integrity of Social Life . Stanford University Press, Stanford, CA, USA, 2009.
7[7] Alice E Marwick et al. Social privacy in networked publics: Teens’ attitudes, practices, and strategies. 2011.
8[8] Avi Arampatzis, George Drosatos, and Pavlos S Efraimidis. A versatile tool for privacy-enhanced web search. In European Conference on Information Retrieval , pages 368–379. Springer, 2013.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Plausible Deniability in Web Search – From Detection to Assessment

Abstract

Index Terms:

I Introduction

II Related Work

III General Setup

III-A Threat Model

Assumption 1** (Revealing Observations).**

III-B Example: Single Sensitive Category

III-C Plausible Deniability

Definition 1** ((ϵ,m\epsilon,mϵ,m)–Plausible Deniability ).**

III-D Comparison with Other Anonymity Measures

III-E Testing for Plausible Deniability

Definition 2** (ϵ\epsilonϵ-Indistinguishability ).

Proposition III.1**.**

Proof**.**

Proposition III.2**.**

Proof**.**

IV Implementation

IV-A Preliminaries

Assumption 2** (Sufficiently Informative Responses).**

Assumption 3** (Revealing Adverts).**

IV-B Tuning the PRI Estimator

IV-C The PDE Estimator

V Experimental Results

V-A Preliminaries

V-B Establishing a Baseline

V-C The Effect of Random Noise Injection

V-D The Effect of Click Strategies

V-E The Effect of Proxy Topics

VI Conclusions and Discussion

Appendix A Additional Results

Lemma 1**.**

Proof**.**

Assumption 1 (Revealing Observations).

Definition 1 (( $\epsilon,m$ )–Plausible Deniability ).

Definition 2** ( $\epsilon$ -Indistinguishability ).

Proposition III.1.

Proof.

Proposition III.2.

Proof.

Assumption 2 (Sufficiently Informative Responses).

Assumption 3 (Revealing Adverts).

Lemma 1.

Proof.