Real-Time Privacy-Preserving Data Release for Smart Meters

Mohammadhadi Shateri; Francisco Messina; Pablo Piantanida; Fabrice; Labeau

arXiv:1906.06427·eess.SP·November 29, 2021

Real-Time Privacy-Preserving Data Release for Smart Meters

Mohammadhadi Shateri, Francisco Messina, Pablo Piantanida, Fabrice, Labeau

PDF

TL;DR

This paper introduces a real-time, privacy-preserving data release mechanism for smart meters using an adversarial deep learning framework, effectively balancing privacy and data utility against online inference threats.

Contribution

It proposes a novel deep learning adversarial approach to optimize privacy-utility trade-offs in real-time smart meter data sharing, incorporating flexible distortion measures.

Findings

01

Effective privacy protection against online inference attacks.

02

Outperforms existing methods in occupancy detection privacy.

03

Robust to data mismatch scenarios.

Abstract

Smart Meters (SMs) are able to share the power consumption of users with utility providers almost in real-time. These fine-grained signals carry sensitive information about users, which has raised serious concerns from the privacy viewpoint. In this paper, we focus on real-time privacy threats, i.e., potential attackers that try to infer sensitive information from SMs data in an online fashion. We adopt an information-theoretic privacy measure and show that it effectively limits the performance of any attacker. Then, we propose a general formulation to design a privatization mechanism that can provide a target level of privacy by adding a minimal amount of distortion to the SMs measurements. On the other hand, to cope with different applications, a flexible distortion measure is considered. This formulation leads to a general loss function, which is optimized using a deep learning…

Equations44

p_{Z^{T} \hat{X}^{T} ∣ W^{T}} (z^{T}, \overset{x}{^}^{T} ∣ w^{T}) = t = 1 \prod T p_{Z_{t} ∣ W^{t}} (z_{t} ∣ w^{t}) p_{\hat{X}_{t} ∣ Z^{t}} (\overset{x}{^}_{t} ∣ z^{t}) .

p_{Z^{T} \hat{X}^{T} ∣ W^{T}} (z^{T}, \overset{x}{^}^{T} ∣ w^{T}) = t = 1 \prod T p_{Z_{t} ∣ W^{t}} (z_{t} ∣ w^{t}) p_{\hat{X}_{t} ∣ Z^{t}} (\overset{x}{^}_{t} ∣ z^{t}) .

\overset{x}{^}_{t} \in X argmax p_{\hat{X}_{t} ∣ Z^{t}} (\overset{x}{^}_{t} ∣ z^{t}),

\overset{x}{^}_{t} \in X argmax p_{\hat{X}_{t} ∣ Z^{t}} (\overset{x}{^}_{t} ∣ z^{t}),

I\big{(}X^{T}\rightarrow\hat{X}^{T}\big{)}\coloneqq\sum_{t=1}^{T}I(X^{t};\hat{X}_{t}|\hat{X}^{t-1}),

I\big{(}X^{T}\rightarrow\hat{X}^{T}\big{)}\coloneqq\sum_{t=1}^{T}I(X^{t};\hat{X}_{t}|\hat{X}^{t-1}),

D (Z^{T}, Y^{T}) : = \frac{E [ d ( Z ^{T} , Y ^{T} )]}{T},

D (Z^{T}, Y^{T}) : = \frac{E [ d ( Z ^{T} , Y ^{T} )]}{T},

d (z^{T}, y^{T}) : = ∥ z^{T} - y^{T} ∥_{p} = (t = 1 \sum T ∣ z_{t} - y_{t} ∣^{p})^{1/ p},

d (z^{T}, y^{T}) : = ∥ z^{T} - y^{T} ∥_{p} = (t = 1 \sum T ∣ z_{t} - y_{t} ∣^{p})^{1/ p},

p_{Z^{T} ∣ W^{T}} inf I (X^{T} \to \hat{X}^{T}) subject to D (Z^{T}, Y^{T}) \leq ε .

p_{Z^{T} ∣ W^{T}} inf I (X^{T} \to \hat{X}^{T}) subject to D (Z^{T}, Y^{T}) \leq ε .

\displaystyle\underset{p_{\hat{X}^{t}|Z^{t}}}{\text{inf}}\;\text{KL}\big{(}p_{X^{t}Z^{t}}\|p_{\hat{X}^{t}Z^{t}}\big{)}

\displaystyle\underset{p_{\hat{X}^{t}|Z^{t}}}{\text{inf}}\;\text{KL}\big{(}p_{X^{t}Z^{t}}\|p_{\hat{X}^{t}Z^{t}}\big{)}

I (X^{T} \to \hat{X}^{T})

I (X^{T} \to \hat{X}^{T})

\leq (i) t = 1 \sum T [H (\hat{X}_{t} ∣ \hat{X}^{t - 1}) - H (\hat{X}_{t} ∣ \hat{X}^{t - 1}, X^{t}, Z^{t})]

= (ii) t = 1 \sum T [H (\hat{X}_{t} ∣ \hat{X}^{t - 1}) - H (\hat{X}_{t} ∣ Z^{t})]

\leq (iii) T lo g ∣ X ∣ - t = 1 \sum T H (\hat{X}_{t} ∣ Z^{t}),

f (P_{e, t}) : = h (P_{e, t}) + P_{e, t} lo g (∣ X ∣ - 1) \geq H (\hat{X}_{t} ∣ X_{t}) \geq H (\hat{X}_{t} ∣ Z^{t}),

f (P_{e, t}) : = h (P_{e, t}) + P_{e, t} lo g (∣ X ∣ - 1) \geq H (\hat{X}_{t} ∣ X_{t}) \geq H (\hat{X}_{t} ∣ Z^{t}),

- \frac{1}{T} t = 1 \sum T H (\hat{X}_{t} ∣ Z^{t}),

- \frac{1}{T} t = 1 \sum T H (\hat{X}_{t} ∣ Z^{t}),

- \frac{1}{T} t = 1 \sum T H (X_{t} ∣ Z^{T}, X^{t - 1}) .

- \frac{1}{T} t = 1 \sum T H (X_{t} ∣ Z^{T}, X^{t - 1}) .

\mathcal{L}_{\mathcal{R}}(\theta,\phi,\lambda)\coloneqq D(Z^{T},Y^{T})-\frac{\lambda}{T}\sum_{t=1}^{T}H\big{(}\hat{X}_{t}|Z^{t}\big{)},

\mathcal{L}_{\mathcal{R}}(\theta,\phi,\lambda)\coloneqq D(Z^{T},Y^{T})-\frac{\lambda}{T}\sum_{t=1}^{T}H\big{(}\hat{X}_{t}|Z^{t}\big{)},

L_{A} (ϕ) : = \frac{1}{T} t = 1 \sum T E [- lo g p_{\hat{X}_{t} ∣ Z^{t}} (X_{t} ∣ Z^{t})],

L_{A} (ϕ) : = \frac{1}{T} t = 1 \sum T E [- lo g p_{\hat{X}_{t} ∣ Z^{t}} (X_{t} ∣ Z^{t})],

L_{R} (θ, ϕ, λ) \approx \frac{1}{B T} b = 1 \sum B d (z^{(b) T}, y^{(b) T})

L_{R} (θ, ϕ, λ) \approx \frac{1}{B T} b = 1 \sum B d (z^{(b) T}, y^{(b) T})

+ \frac{λ}{B T} b = 1 \sum B \overset{x}{^}_{t}^{(b)} \in X \sum p_{\hat{X}_{t} ∣ Z^{t}} (\overset{x}{^}_{t}^{(b)} ∣ z^{(b) t}) lo g p_{\hat{X}_{t} ∣ Z^{t}} (\overset{x}{^}_{t}^{(b)} ∣ z^{(b) t}),

L_{A} (ϕ)

f_{t}

f_{t}

g_{t}

i_{t}

o_{t}

C_{t}

h_{t}

NE_{p} : = \frac{E [ ∥ Y ^{T} - Z ^{T} ∥ _{p} ]}{E [ ∥ Y ^{T} ∥ _{p} ]} .

NE_{p} : = \frac{E [ ∥ Y ^{T} - Z ^{T} ∥ _{p} ]}{E [ ∥ Y ^{T} ∥ _{p} ]} .

Balanced Accuracy : = \frac{1}{2} (\frac{c _{11}}{c _{11} + c _{12}} + \frac{c _{22}}{c _{22} + c _{21}}) .

Balanced Accuracy : = \frac{1}{2} (\frac{c _{11}}{c _{11} + c _{12}} + \frac{c _{22}}{c _{22} + c _{21}}) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Real-Time Privacy-Preserving Data Release

for Smart Meters

Mohammadhadi Shateri

Francisco Messina

Pablo Piantanida

Fabrice Labeau

M. Shateri, F. Messina, and F. Labeau are with the Department of Electrical and Computer Engineering, McGill University, QC, Canada.

Email:{mohammadhadi.shateri,francisco.messina}@mail.mcgill.caP. Piantanida is with Laboratoire des Signaux et Systèmes, CentraleSupélec-CNRS-Université Paris Sud, Gif-sur-Yvette, France and with Montreal Institute for Learning Algorithms (Mila), Université de Montréal, QC, Canada.

Abstract

Smart Meters (SMs) are able to share the power consumption of users with utility providers almost in real-time. These fine-grained signals carry sensitive information about users, which has raised serious concerns from the privacy viewpoint. In this paper, we focus on real-time privacy threats, i.e., potential attackers that try to infer sensitive information from SMs data in an online fashion. We adopt an information-theoretic privacy measure and show that it effectively limits the performance of any attacker. Then, we propose a general formulation to design a privatization mechanism that can provide a target level of privacy by adding a minimal amount of distortion to the SMs measurements. On the other hand, to cope with different applications, a flexible distortion measure is considered. This formulation leads to a general loss function, which is optimized using a deep learning adversarial framework, where two neural networks –referred to as the releaser and the adversary– are trained with opposite goals. An exhaustive empirical study is then performed to validate the performance of the proposed approach and compare it with state-of-the-art methods for the occupancy detection privacy problem. Finally, we also investigate the impact of data mismatch between the releaser and the attacker.

Index Terms:

Privacy-preserving mechanism, Deep learning, Adversarial training, Time series data, Recurrent Neural Networks, Long-Short Term Memory (LSTM) cell, Directed information, Privacy-utility trade-off, Smart meters privacy.

I Introduction

I-A Motivation

SMs are a cornerstone for the development of smart electrical grids. These devices are able to report power consumption measurements of a house to a utility provider every hour or even every few minutes. This feature generates a considerably amount of useful data which enables several applications in almost real-time such as power quality monitoring, timely fault detection, demand response, energy theft prevention, etc. [1, 2]. However, this fine-grained power consumption monitoring poses a threat to consumers privacy. As a matter of fact, it has been shown that simple algorithms, known in general as NonIntrusive Load Monitoring (NILM) methods, can readily be used to infer the types of appliances being used at a household at a given time from the SMs data [3]. Since these features are highly correlated with the presence of people at the dwelling and their personal habits, this induces serious privacy concerns which can have an impact on the acceptance and deployment pace of SMs [4]. The natural challenge raised here is: how can privacy be enhanced while preserving the utility of the SMs data? Although the privacy problem has been widely studied in the field of data science [5], the time series structure of SMs data requires a particular treatment [6]. For a recent survey about SMs privacy, the reader is referred to [7].

I-B Related work

Simple approaches for preservation of privacy in the context of SMs include data aggregation and encryption [8, 9], the use of pseudonyms rather than the real identities of users [10], downsampling of the data [11, 12] and random noise addition [13]. However, these methods often restrict the potential applications of the SMs data in an uncontrolled way. For instance, downsampling of the data may incur time delays to detect critical events, while data aggregation degrades the positioning and accuracy of the power measurements.

A formal approach to the privacy problem has been presented in [14] from an information-theoretic perspective, where it has been proposed to assess privacy by the MI between the sensitive variables to be hidden and the power measurements distorted by a privatizer mechanism. More specifically, the authors model the power measurements of SMs with a hidden Markov model in which the distribution of the measurements is controlled by the state of the appliances, and for each particular state, the distribution of power consumption is assumed to be Gaussian. This model is then used to obtain the privacy-utility trade-off using tools from rate-distortion theory [15]. Although this approach is very appealing, it has two important limitations for its application to real-time privacy problems using actual data. First, the privacy-preserving data release mechanism works with blocks of samples, which is not well-suited for real-time processing. Second, the Gaussian model may be quite restrictive to model SMs signals. The information-theoretic approach was used in other privacy-aware SMs studies such as [16] where the MI between the distorted SMs data and sensitive appliance states at time slot $t$ was considered as the privacy measure. However, in this work, the temporal correlation in SMs data is not taken into account.

More sophisticated approaches consider the use of Rechargeable Batteries (RBs) and Renewable Energy Sources (RES) in homes in order to modify the actual energy consumption of users with the goal of hiding the sensitive information [17, 18, 19, 20, 21, 22]. The main motivation to introduce the use of physical resources into the privacy problem comes from the observation that this strategy does not require any distortion in the actual SMs measurements, which means that there is no loss in terms of utility. However, the incorporation of physical resources may not only make the problem more complex and limited in scope, but can also generate a significant cost to users due to the faster wear and tear of the RBs as a consequence of the increased charging/discharging rate [7]. On the other hand, the required level of distortion for a specific privacy goal in a realistic scenario in which the attacker threatening privacy has only partial information is still an open question. Thus, the need and convenience of these solutions is still questionable. However, it is also important to note that these approaches are complementary to the ones based on distorting the power measurements.

The use of neural networks to model a privacy attacker has been considered in [23]. However, a more powerful formulation of the problem is obtained if one assumes that both the releaser (i.e., the privatizer) and the attacker are deep neural networks (DNNs). In this framework, the releaser can be trained by simulating an attacker based on a minimax game, an idea that is inspired by the well-known Generative Adversarial Networks (GANs) [24]. This concept can be referred to as Generative Adversarial Privacy (GAP) [25] and is the basis for the approach taken in the present work. It should be mentioned that the concept of GAP has been studied for different applications related to image classification [26, 27] but, to the best of our knowledge, not in the context of SMs. In these works, the authors consider independent and identically distributed (i.i.d.) data and deep feed-forward neural networks for the releaser and attacker, which are unable to capture and exploit the time correlation in the time series SMs signals. The idea of time-series generation with an adversarial approach has been considered in [28] for medical data based in the principle of differential privacy.

I-C Contributions

In this paper, we adopt a distortion-based real-time privacy-preserving strategy. For simplicity, we assume that no RBs and/or RESs are available and thus, the distortion on power measurements is the only mean to achieve a desired privacy level. The main contributions of this work, which is an extension of a short version in [29], are the following:

(i)

We applied DI as a privacy measure and show its theoretical relevance for the privacy problem under consideration. It is worth to mention that DI was first used in [30] but in a different manner. In addition, unlike this and other works such as [14] , we impose no explicit assumptions on the generating model of the power measurements, but take a more versatile data-driven approach. 2. (ii)

We study different distortion measures to provide more flexibility to control the specific features to be preserved in the released signals, i.e., the relevant characteristics for the targeted applications of the data. 3. (iii)

For the sake of computational tractability, we propose a loss function for training the privacy-preserving releaser based on an upper bound of the DI. Then, considering an attacker that minimizes a Kullback-Leibler divergence between the true and approximate distributions of the sensitive variables given the released signal, we provide a relaxed formulation of the original problem suitable for a deep learning framework. 4. (iv)

We perform an extensive statistical study with actual data to characterize the utility-privacy trade-offs and the nature of the distortion generated by the releaser network. 5. (v)

We investigate the data mismatch problem in the context of SMs privacy, which occurs when the data available to the attacker is not the same as the one used for training the releaser mechanism, and show that it has an important impact on the privacy-utility trade-off. This confirms that, under some conditions, the privacy-utility trade-off can indeed be much less severe than expected. 6. (vi)

To the best of our knowledge, this is the first time that the concept of the generative adversarial privacy is used in the context of the SMs data privacy preservation. In addition, in this paper we consider deep RNNs to capture and exploit the time correlation of SMs signals.

I-D Organization of the paper

The rest of the paper is organized as follows. In Section II, we present the theoretical formulation of the problem. This leads to the loss functions for the releaser and attacker posed in Section III, where the privacy-preserving adversarial framework is introduced along with the training algorithm. Extensive results are presented and discussed in Section IV. Finally, some concluding remarks are presented in Section V.

II Problem Formulation

The privacy-preserving framework studied in this paper is presented in Fig.1. As shown, a data releaser (which is aware of the private attributes that the user wants to hide) manipulates the actual SM measurements before sharing them to the UP to prevent leakage of sensitive information that could be inferred by a malicious attacker such as an eavesdropper.

There are four main types of variables that should be clearly defined in the privacy-preserving data release setting: (i) the private/sensitive attribute which we aim to hide $X^{T}$ (e.g., occupancy state of a house over time); (ii) the useful process for the utility provider $Y^{T}$ (e.g., actual electricity consumption of the household), which is generally highly correlated with the private data; (iii) the observed signal $W^{T}$ , a combination of private and useful variables, which is the input to the data release system; (iv) and the released process $Z^{T}$ , a sanitized version of $Y^{T}$ which is the output of the data release system.

We assume that $X_{t}$ takes values on a fixed discrete alphabet $\mathcal{X}$ for each $t\in\{1,\ldots,T\}$ . At each time $t$ , a releaser produces the released process $Z_{t}$ based on the observation $W^{t}$ , while an attacker attempts to infer $X_{t}$ based on $Z^{t}$ by using an approximation of $p_{X^{T}|Z^{T}}$ , which we shall denote by $p_{\hat{X}^{T}|Z^{T}}$ . Notice that the releaser must be causal in order to avoid delays in the SMs data report process. In addition, we treat the case in which the attacker is performing the inference in real-time, so that it is also causal. This assumption is reasonable for scenarios in which the sensitive information is valuable in a timely manner (e.g., in the case of targeted burglary based on occupancy detection [6]). However, it should be noted that not all privacy threats fall under this umbrella. Alternative attacker structures, which may be interesting in other scenarios, are left for future work and are out of the scope of this paper. Note that due to the previous assumption the distribution $p_{Z^{T}\hat{X}^{T}|W^{T}}$ can be decomposed as follows:

[TABLE]

In abstract terms, the goal of the releaser is to minimize the information leakage of the sensitive process $X^{T}$ while simultaneously keeping the distortion between the released time series $Z^{T}$ and the useful signal $Y^{T}$ small. On the other hand, the goal of the attacker is to infer $p_{X_{t}|Z^{t}}$ , for each $t$ , as accurately as possible. Note that after the approximation $p_{\hat{X}_{t}|Z^{t}}$ is obtained, the attacker can estimate the private information $x^{t}$ from observations $z^{t}$ in an online (causal) fashion, by solving

[TABLE]

at each $t=1,\dots,T$ . Thus, the attacker can be interpreted as a classifier or hypothesis test, as stated in [31]. However, in the present case, we consider the more realistic and general scenario in which the statistical test is sub-optimal due to the fact that the attacker has no access to the actual conditional distributions $p_{X_{t}|Z^{t}}$ but only to its approximation $p_{\hat{X}_{t}|Z^{t}}$ .

In order to take into account the causal relation between $X^{T}$ and $\hat{X}^{T}$ , the information leakage is quantified by the DI [32]:

[TABLE]

where $I(X^{t};\hat{X}_{t}|\hat{X}^{t-1})$ is the conditional MI between $X^{t}$ and $\hat{X}_{t}$ conditioned on $\hat{X}^{t-1}$ [15].

The normalized expected distortion between $Y^{T}$ and its noisy (or disturbed) observation $Z^{T}$ is defined as:

[TABLE]

where $d\mathrel{\mathop{\ordinarycolon}}\mathbb{R}^{T}\times\mathbb{R}^{T}\to\mathbb{R}$ is any distortion function (i.e., a metric on $\mathbb{R}^{T}$ ). To ensure the quality of the release, it is natural to impose the following constraint: $D(Z^{T},Y^{T})\leq\varepsilon$ for some given $\varepsilon\geq 0$ . In previous works, the normalized squared-error was considered as a distortion function (e.g., [14, 29]). Nevertheless, other distortion measures can also be relevant within the framework of SMs. For instance, demand response programs usually require an accurate knowledge of peak power consumption, so a distortion function closer to the infinity norm would be more meaningful for those particular applications. Thus, for the sake of generality and to keep the distortion function simple, we propose to use an $\ell_{p}$ distance:

[TABLE]

where $p\geq 2$ is a fixed parameter. Note that this distortion function leads to the root-mean-squared error when $p=2$ , while it converges to the maximum error between the components of $z^{T}$ and $y^{T}$ as $p\to\infty$ .

Therefore, the problem of finding an optimal releaser subject to the optimal (Bayesian) attacker and distortion constraint can be formally written as follows:

[TABLE]

Note that the solution to this optimization problem requires a balance between the attacker $p_{\hat{X}^{T}|Z^{T}}$ and the releaser $p_{Z^{T}|W^{T}}$ , where the optimal attacker consists in inferring the private information $\hat{X}^{T}\approx{X}^{T}$ and thus, the attacker attempts to minimize the Kullback-Leibler divergence [15] between the corresponding predictors:

[TABLE]

where the expectation is with respect to $p_{X^{T}Z^{T}}$ . Note that solving (7) is equivalent to minimizing $\mathbb{E}[-\log p_{\hat{X}^{t}|Z^{t}}(X^{t}|Z^{t})]$ , the so-called cross-entropy.

Unfortunately, the optimization problem (6) is, in general, computationally intractable. For instance, simply computing the DI would take $O(|\mathcal{X}|^{T})$ operations, where $|\mathcal{X}|$ is the size of $\mathcal{X}$ , which makes it not scalable for large sequences of data. However, it can be exploited to obtain a more convenient surrogate objective function for the releaser, by considering the following simpler upper bound:

[TABLE]

where (i) is due to the fact that conditioning reduces entropy; equality (ii) is due to the Markov chain $(X^{t},\hat{X}^{t-1})-\!\!\!\!\minuso\!\!\!\!-Z^{t}-\!\!\!\!\minuso\!\!\!\!-\hat{X}_{t}$ , which follows from (1); and (iii) is due to the trivial bounds ${H(\hat{X}_{t}|\hat{X}^{t-1})\leq H(\hat{X}_{t})\leq\log(|\mathcal{X}|)}$ . Note that minimizing the upper bound (8) corresponds to maximizing $\sum_{t=1}^{T}H(\hat{X}_{t}|Z^{t})$ , which amounts to maximizing the total uncertainty of the attacker. In fact, from Fano’s inequality [15], we have that

[TABLE]

where $P_{e,t}\coloneqq\mathbb{P}(X_{t}\neq\hat{X}_{t})$ and $h(p)\coloneqq-p\log(p)-(1-p)\log(1-p)$ is the so-called binary entropy function. In addition, the bound is tight (i.e., it can not be strengthened without further assumptions) [15, 33]. It should be noted that this bound constrains $P_{e,t}$ to be in an interval around $P_{e,t}=(|\mathcal{X}|-1)/|\mathcal{X}|$ , which corresponds to the performance of an attacker that does uniform random guessing to infer the value of $X_{t}$ . Indeed, in the extreme case in which $H(\hat{X}_{t}|Z^{t})=\log(|\mathcal{X}|)$ , we have that $P_{e,t}=(|\mathcal{X}|-1)/|\mathcal{X}|$ . Physically, this means that $Z^{t}$ is not providing any information to the attacker to infer $X_{t}$ . Moreover, the length of this interval decreases monotonically when $H(\hat{X}_{t}|Z^{t})$ is increased. Fig. 2 is presented to illustrate the proposed scenario for the binary case. From Fig. 2 it can be noticed that as $H(\hat{X}_{t}|Z^{t})$ increases, the interval over which $P_{e,t}$ lies shrinks. Also, in the extreme case in which $H(\hat{X}_{t}|Z^{t})=\log(2)$ , Fano’s inequality (9) implies ${P_{e,t}=1/2}$ , which corresponds to the performance of random guessing the value of $X_{t}$ , meaning that $Z^{t}$ does not provide any valuable information. Therefore, a releaser which attempts to maximize $H(\hat{X}_{t}|Z^{t})$ is trying to constrain $P_{e,t}$ to be close to random guessing performance. This can be considered as a universal privacy guarantee and justifies the usefulness of the DI and, in particular, the surrogate upper bound (8).

Therefore, in this work, the information leakage is measured by the following average conditional entropy (ignoring constant terms):

[TABLE]

where the factor $1/T$ has been introduced for normalization purposes. It is interesting to notice that this is different from the formulation in [14], in which the authors consider the MI $I(X^{T};Z^{T})=H(X^{T})-H(X^{T}|Z^{T})$ as the information leakage measure, which can be equivalently written as follows (again ignoring constant terms and normalizing):

[TABLE]

By comparing (10) and (11), the differences between the two privacy measures are clear. The fact that we have assumed that the attacker has a causal structure explains why $Z^{t}$ appears in (10) instead of $Z^{T}$ as in (11). More fundamentally, the expression (11) corresponds to assuming that the attacker is optimal (i.e., $p_{\hat{X}_{t}|Z^{t}}=p_{X_{t}|Z^{t}}$ ) and has access to $X^{t-1}$ to infer $X_{t}$ . These latter assumptions are not expected to hold in practice.

III Privacy-Preserving Model

We now shift the focus from the abstract theoretical formulation to a practical one based on deep learning ideas. In particular, in this section, we model the releaser $p_{Z^{T}|W^{T}}$ and attacker $p_{\hat{X}^{T}|Z^{T}}$ as RNNs, which are well-suited for the time-series structure of the SMs data and online processing. In the following, we describe in detail the loss functions, the network architectures and the learning algorithm used to train the releaser mechanism.

III-A Loss Functions

Considering (6) and (10), the loss function for the releaser is defined as follows:

[TABLE]

where $\lambda\geq 0$ controls the privacy-utility trade-off, $\theta$ are the parameters of the releaser and $\phi$ are the parameters of the adversary. It should be mentioned that for $\lambda=0$ , the loss function $\mathcal{L}_{\mathcal{R}}(\theta,\phi,\lambda)$ reduces to the expected distortion, being independent from the adversary. In such scenario, the releaser offers no privacy guarantees. Conversely, for very large values of $\lambda$ , the loss function $\mathcal{L}_{\mathcal{R}}(\theta,\phi,\lambda)$ is dominated by the second term, so that privacy is the main goal of the releaser. In this regime, we expect the attacker to fail in inferring $X^{T}$ , i.e., to approach to random guessing performance.

On the other hand, from (7), the adversary loss function is defined as follows:

[TABLE]

where the expectation is with respect to $p_{X_{t}Z^{t}}$ .

It should be mentioned that for training the previous loss functions are approximated by evaluating the expectations empirically as shown next. Let $\{(x^{(b)T},y^{(b)T})\}_{b=1}^{B}$ be a sample of $B$ examples and $\{z^{(b)T}\}_{b=1}^{B}$ the corresponding outputs of the releaser. Then, the loss functions are approximated as follows:

[TABLE]

III-B Recurrent Neural Networks and Long Short-Term Memory

RNNs are a class of neural networks that are able to process sequential data by modeling the temporal correlation in data. Therefore, the output of an RNN network at time step $t-1$ generally affects the output at time $t$ . Training of the RNNs is generally performed by gradient descent using the backpropagation through time algorithm [34]. However, learning long-term dependencies of time series data by RNNs may lead to the gradient vanishing or exploding problems, thus preventing successful training [35]. To resolve this issue, the so-called LSTM cell was introduced in [36] and further improved in [37]. Fig. 3 represents the architecture of a LSTM cell in detail.

The LSTM cell includes four gating units to control the flow of information. All the gating units have a sigmoid activation function ( $\sigma$ ) except for the input unit that uses an hyperbolic tangent activation function ( $\tanh$ ) by default. Considering $w^{T}=(w_{1},w_{2},\dots,w_{T})$ as the input, the formulations of the LSTM nodes at time step $t$ are represented in equation (16) where the parameters $b,K,V$ are respectively biases, input weights, and recurrent weights:

[TABLE]

In the LSTM architecture, the forget gate $f_{t}$ uses the output of the previous cell (which is called hidden state $h_{t-1}$ ) to control the cell state $C_{t}$ to remove irrelevant information. On the other hand, the input gate $g_{t}$ and input unit $i_{t}$ adds new information to $C_{t}$ from the current input. Finally, the output gate $o_{t}$ generates the output of the cell from the current input and cell state. For more details on RNNs and LSTMs, the reader is referred to [38] and references therein.

III-C Privacy-Preserving Adversarial Learning

Based on the previous formulation, an adversarial modeling framework consisting of two RNNs, a releaser $\mathcal{R}_{\theta}$ and an adversary $\mathcal{A}_{\phi}$ , is considered (see Fig. 4). Note that independent noise $U^{T}$ (with dimension $m$ ) is appended to $W^{T}$ in order to randomize the released variables $Z^{T}$ , which is a popular approach in privacy-preserving methods. In addition, the available theoretical results show that, for Gaussian distributions, the optimal release contains such a noise component [14, 26]. For both networks, a LSTM architecture is selected. Training in the suggested framework is performed using the Algorithm 1 which requires $k$ gradient steps to train $\mathcal{A}_{\phi}$ followed by one gradient step to train $\mathcal{R}_{\theta}$ . It is worth to emphasize that $k$ should be large enough in order to ensure that $\mathcal{A}_{\phi}$ represents a strong adversary. However, if $k$ is too large, this could lead to overfitting and thus a poor adversary. After the training of both networks is completed, a new network is trained from scratch in order to test the privacy achieved by the releaser network.

IV Results and Discussion

We will validate our results on the Electricity Consumption & Occupancy (ECO) dataset. ECO is collected and published by [41], which includes 1 Hz power consumption measurements and occupancy information of five houses in Switzerland over a period of $8$ months. Occupancy labels are determined as $1$ for the case that someone is at home and [math] otherwise. Thus, for this application, the privacy attacker is a binary classifier that attempts to infer if a household is occupied or not at a given time. In this study, we re-sampled the data to have hourly samples. We model the time dependency over each day, so the dataset is reshaped to sample sequences of length $24$ . A total number of $11225$ sample sequences were collected. The datasets are split into training and test sets with a ratio of roughly 85:15 while $10\%$ of training data is dedicated to validation which intended to set the hyperparameters. The network architectures and hyperparameters values are summarized in Table I. A stronger attacker composed of 3 LSTM layers is used for the test.

To clearly assess the distortion with respect to the actual power consumption measurements, we define the Normalized Error (NE) for the different $\ell_{p}$ distortion functions as follows:

[TABLE]

In addition, performance of the attacker on inferring the private attributes is quantified based on the balanced accuracy. This is common in classification problems to deal with the data imbalance problem, which occurs when the number of samples for each class is quite different. Balanced accuracy is defined as the average recall calculated for each class [42]. Concretely, let $c_{ij}$ represent the fraction of examples of class $i$ classified as class $j$ . Then, the balanced accuracy can be defined as

[TABLE]

This metric provides a fair assessment of the quality of the attacker independently from the degree of data unbalance. Thus, simplifying the analysis of the results. In the following we use the term accuracy to refer to the balanced accuracy.

IV-A $\ell_{2}$ * Distortion*

In this section, we consider the $\ell_{2}$ distortion function (i.e., $p=2$ in (5)). Fig. 5 shows the empirically found privacy-utility trade-off for this scenario. Note that by increasing the distortion of the release, the accuracy of the attacker changes from more than $80\%$ (almost no privacy) to $50\%$ (full privacy).

To assess the quality of the release signal, utility providers may be interested in several different indicators. These include, for instance, the mean, skewness, kurtosis, standard deviation to mean ratio, and maximum to mean ratio [43]. Thus, for completeness, we present these indicators in Table II for three different cases along the privacy-utility trade-off curve. We observe that in general the errors in these indicators are small when the privacy constraints are lax and increase as they become stricter. Nevertheless, no simple relation can be expected between NE2 and the values of those indicators.

IV-A1 Comparison with regular random noise addition approach

As it was discussed in Section III, the proposed model in this study provides privacy through distorting the SMs data. However, in contrast with the regular random noise addition approaches [13] where a random noise $E_{t}$ is added to the SMs data (i.e. $Z_{t}=Y_{t}+E_{t},$ for $t\in\{1,2,\dots,T\}$ ), our model distorts the SMs data by performing a noisy recurrent transformation on $Z^{T}$ . To compare our method with the regular random noise addition method, four different cases for random noise $E_{t}$ were considered: Laplacian, Gaussian, Uniform, and U-quadratic. In all cases, the amount of distortion is controlled by the variance of noise. The same type of attacker as the one presented in Table I is used to infer the private data out of the distorted one. Fig. 6 shows the privacy-utility trade-off for these random noise addition approaches as compared with our model. It can be seen clearly that, for the same amount of distortion, our method is more successful in preventing the attacker from inferring the private information. This is expected, as our method is able to learn the noise distribution to fit the actual demand load and the sensitive information that is being hidden from the attacker, which is a much more powerful approach than just using a fixed noise distribution. Notice also that, as expected, all results are similar as the distortion approaches to zero. For some of the random noise addition cases, the fitted curves cross over the one of our model, but in the low distortion low privacy area of the graph, which is not of interest.

IV-A2 Comparison with PPAN model

The main limitation of the random noise addition approach is that the noise mechanism is independent of the SMs data. A more sophisticated strategy that address this issue would be the PPAN model [26]. This method uses a variational lower bound on $I(X_{t};Z_{t})$ to train a releaser using an adversarial learning approach. For more details of this approach the readers are referred to [26]. Fig. 7 shows the privacy-utility trade-off of the PPAN model as compared with our method using the same type of attacker as presented in Table I. For the PPAN, both the mechanism and adversary networks are deep neural networks including three hidden layers with 64 nodes and a rectified linear unit (ReLU) activation function, while the RMSprop optimizer [44] with learning rate 0.01 is used.

From Fig. 7 it can be seen that our method can clearly outperform the PPAN. The reason for this gap can be attributed to both the different cost function formulation (i.e., the fact that we are using an upper bound on $I(X^{T}\to\hat{X}^{T})$ instead of a lower bound on $I(X_{t};Z_{t})$ ) and the recurrent structure of the proposed releaser mechanism.

IV-B $\ell_{p}$ * Distortion*

As already discussed in Section II, the distortion function should be properly matched to the intended application of the released variables $Z^{T}$ in order to preserve the characteristics of the target variables $Y^{T}$ that are considered essential. In this section, we consider the $\ell_{p}$ distortion (5) with $p=4,5$ as an alternative to the $\ell_{2}$ distortion function used in the previous section and study their potential benefits.

The privacy-utility trade-off curve for these distortion functions is shown in Fig. 8. As a first observation, it is clear that the choice of the distortion measure has a non-negligible impact on the privacy-utility trade-off curve. In fact, it can be seen that for a given amount of normalized distortion, the releaser trained with the $\ell_{4}$ and $\ell_{5}$ distortion measures achieve a higher level of privacy than the one trained with the $\ell_{2}$ distortion function. It should also be mentioned that we also considered other norms, such as the $\ell_{10}$ , and the privacy-utility trade-off was observed to be similar, but slightly different, than the one corresponding to the $\ell_{4}$ norm.

As we discussed in Section II, in some applications, such as demand response programs, the utilities are mostly interested in the peak power consumption of the customers. It is also expected that higher-order $\ell_{p}$ norms are better at preserving these signal characteristics than the $\ell_{2}$ norm. To verify this notion, we considered 60 random days of the ECO dataset in a full privacy scenario (i.e., with an attacker accuracy very close to $50\%$ ) and plotted the actual power consumption along with the corresponding release signals for both the $\ell_{4}$ and $\ell_{2}$ distortion functions. Results shown in Fig. 9 clearly indicate that the number of peaks preserved by the releaser trained with the $\ell_{4}$ distortion function is much higher than the ones kept by the releaser trained with the $\ell_{2}$ distortion function. This suggests that for these applications, higher order $\ell_{p}$ distortion functions should be considered.

IV-C Attacker with Data Mismatch Problem

All the previous results are based on the assumption that the attacker has access to exactly the same training dataset used by the releaser-adversary system. This case should be considered as a worst-case analysis of the performance of the releaser. However, this assumption may not be true in practice. To examine the impact of this hypothesis, we consider two different cases. It should be noted that the total number of samples used for training and testing was kept fixed in all the different scenarios. In the first case, we assume that, out of the dataset of the five houses in the ECO dataset, the releaser uses the data of all the houses for training while the attacker has only access to the data of houses $1$ and $3$ . In the second case, we assume that releaser is trained with the data of houses $\{1,2,4,5\}$ but the attacker has only access to data from house $3$ . These scenarios try to capture different degrees of the data mismatch problem, which could have an impact on the privacy-utility trade-off due to the different generalization errors. The results are presented in Fig. 10 along with the worst-case scenario. We conclude that the overlapping of the training datasets of the releaser and the attacker can strongly affect the performance of the model. In fact, in the case where the attacker does not have access to the same dataset as the releaser, its performance largely degrades, which means that a target level of privacy requires much less distortion. In the extreme case where the attacker has no access to the releaser training dataset, a very high level of privacy can be achieved with negligible distortion. It should be mentioned that we repeated this experiment with different choices of these 5 houses and similar results were obtained.

V Discussion and Concluding Remarks

Privacy concerns associated with SMs data-sharing are an important problem since these can have an impact on their deployment pace and the advancement of smart grid technologies. As a consequence, it is essential to understand and to palliate real privacy risks in order to provide an adequate solution to conveniently share SMs data. A summary of the privacy-aware SMs methodology proposed in this study and the key findings are provided below:

•

We proposed to measure the privacy based on the DI between the sensitive time series and its inference by a potential attacker optimized for the dedicated task. This captures the causal time dependencies present in the time series data and its sequential processing. For the sake of computational tractability, we propose an upper bound to the DI which leads to our training objective. Then, using Fano’s inequality, it was shown that this bound can constrain the performance of the optimal (Bayesian) attacker.

•

We explored a data release framework that balances the trade-off between privacy of the sensitive information and distortion of the useful data. The desired releaser was trained using RNNs in an adversarial framework to optimize such objective, while an adversary mimics a real attacker. After convergence, an actual attacker was trained to test the level of privacy achieved by the releaser.

•

A detailed study of the inference of households occupancy using actual SMs dataset was performed. The performance of the proposed model was compared with two methodologies: random noise addition and PPAN-based. In both cases, the results show that our method clearly outperforms the other algorithms in limiting the attacker inference ability. This is not surprising, as our method is able to fit the noise distribution to the actual consumer load and the sensitive feature that is trying to be hidden by the releaser in a more effective manner.

•

We showed that the choice of the distortion measure can have a significant impact on the privacy-utility trade-off curve. Indeed, it is shown that the $\ell_{4}$ distortion measure generates a release that preserves most of the power consumption peaks even under a full privacy regime, which is not the case for the $\ell_{2}$ distortion function. This result may be of considerable importance for some applications such as demand response. More generally, our formulation is amenable to train different release systems tailored for several potential applications based on SMs data.

•

We studied the impact of the data mismatch problem in this application, which occurs when the training dataset of the attacker is not exactly the same as the one used by the releaser. These results have shown that this effect can greatly affect the privacy-utility trade-off. Since this phenomenon is expected in practice, at least to some degree, these findings suggest that the level of required distortion to achieve desired privacy targets may be negligible in several cases of interest.

To wrap up the paper, two possible extensions for future work are briefly discussed. The first one is modeling the availability of side information at the attacker side to model prior knowledge of sensitive information as in [45], which cannot be distorted by the releaser, and study its impact on the privacy-utility trade-offs. The second is combining this SM data distortion approach with the ones which rely on physical resources for demand shaping. This would require us to incorporate the electricity cost consideration, leading to the study of the privacy-utility-cost trade-offs.

Acknowledgment

This work was supported by Hydro-Quebec, the Natural Sciences and Engineering Research Council of Canada, and McGill University in the framework of the NSERC/Hydro-Quebec Industrial Research Chair in Interactive Information Infrastructure for the Power Grid (IRCPJ406021-14). This project has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 792464.

Bibliography45

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] D. Alahakoon and X. Yu, “Smart electricity meter data intelligence for future energy systems: A survey,” IEEE Transactions on Industrial Informatics , vol. 12, pp. 425–436, Feb 2016.
2[2] Y. Wang, Q. Chen, T. Hong, and C. Kang, “Review of smart meter data analytics: Applications, methodologies, and challenges,” IEEE Transactions on Smart Grid , vol. 10, pp. 3125–3148, May 2019.
3[3] A. Molina-Markham, P. Shenoy, K. Fu, E. Cecchet, and D. Irwin, “Private memoirs of a smart meter,” in Proceedings of the 2Nd ACM Workshop on Embedded Sensing Systems for Energy-Efficiency in Building , Build Sys ’10, (New York, NY, USA), pp. 61–66, ACM, 2010.
4[4] E. Mc Kenna, I. Richardson, and M. Thomson, “Smart meter data: Balancing consumer privacy concerns with legitimate applications,” Energy Policy , vol. 41, pp. 807 – 814, 2012. Modeling Transport (Energy) Demand and Policies.
5[5] P. Jain, M. Gyanchandani, and N. Khare, “Big data privacy: a technological perspective and review,” Journal of Big Data , vol. 3, p. 25, Nov 2016.
6[6] M. R. Asghar, G. Dán, D. Miorandi, and I. Chlamtac, “Smart meter data privacy: A survey,” IEEE Communications Surveys Tutorials , vol. 19, pp. 2820–2835, Fourthquarter 2017.
7[7] G. Giaconi, D. Gunduz, and H. V. Poor, “Privacy-aware smart metering: Progress and challenges,” IEEE Signal Processing Magazine , vol. 35, no. 6, pp. 59–78, 2018.
8[8] F. Li, B. Luo, and P. Liu, “Secure information aggregation for smart grids using homomorphic encryption,” in 2010 First IEEE International Conference on Smart Grid Communications , pp. 327–332, Oct 2010.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Real-Time Privacy-Preserving Data Release

Abstract

Index Terms:

I Introduction

I-A Motivation

I-B Related work

I-C Contributions

I-D Organization of the paper

II Problem Formulation

III Privacy-Preserving Model

III-A Loss Functions

III-B Recurrent Neural Networks and Long Short-Term Memory

III-C Privacy-Preserving Adversarial Learning

IV Results and Discussion

IV-A ℓ2\ell_{2}ℓ2​* Distortion*

IV-A1 Comparison with regular random noise addition approach

IV-A2 Comparison with PPAN model

IV-B ℓp\ell_{p}ℓp​* Distortion*

IV-C Attacker with Data Mismatch Problem

V Discussion and Concluding Remarks

Acknowledgment

IV-A $\ell_{2}$ * Distortion*

IV-B $\ell_{p}$ * Distortion*