Information-Theoretic Privacy through Chaos Synchronization and Optimal Additive Noise
Carlos Murguia, Iman Shames, Farhad Farokhi, Dragan Nesic

TL;DR
This paper introduces a privacy-preserving method using synchronized chaotic oscillators to generate optimal additive noise, minimizing information leakage in data queries over public channels.
Contribution
It proposes a novel approach combining chaos synchronization with convex optimization to enhance data privacy in communication systems.
Findings
Optimal noise distribution reduces mutual information effectively.
Chaotic oscillators can be synchronized to generate identical noise realizations.
Simulations demonstrate the method's effectiveness in privacy preservation.
Abstract
We study the problem of maximizing privacy of data sets by adding random vectors generated via synchronized chaotic oscillators. In particular, we consider the setup where information about data sets, queries, is sent through public (unsecured) communication channels to a remote station. To hide private features (specific entries) within the data set, we corrupt the response to queries by adding random vectors. We send the distorted query (the sum of the requested query and the random vector) through the public channel. The distribution of the additive random vector is designed to minimize the mutual information (our privacy metric) between private entries of the data set and the distorted query. We cast the synthesis of this distribution as a convex program in the probabilities of the additive random vector. Once we have the optimal distribution, we propose an algorithm to generate…
| 0.5888 | 0.0200 | 0.0056 | 0.0560 | 0.0038 | 0.2616 | 0.0110 | 0.0042 | 0.0468 | 0.0022 |
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
|---|---|---|---|---|---|---|---|---|---|
| 0.1664 | 0.1522 | 0.1518 | 0.1355 | 0.1033 | 0.0832 | 0.0690 | 0.0591 | 0.0795 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
**Information -Theoretic Privacy through Chaos Synchronization and
Optimal Additive Noise
**Carlos Murguia1,a, Iman Shames1,b, Farhad Farokhi1,2,c, and Dragan Nešić1,d
1Department of Electrical and Electronic Engineering, University of Melbourne, Australia
2The Commonwealth Scientific and Industrial Research Organisation (CSIRO), Data61, Australia
a[email protected]; b[email protected]; c[email protected];
1 Abstract
We study the problem of maximizing privacy of data sets by adding random vectors generated via synchronized chaotic oscillators. In particular, we consider the setup where information about data sets, queries, is sent through public (unsecured) communication channels to a remote station. To hide private features (specific entries) within the data set, we corrupt the response to queries by adding random vectors. We send the distorted query (the sum of the requested query and the random vector) through the public channel. The distribution of the additive random vector is designed to minimize the mutual information (our privacy metric) between private entries of the data set and the distorted query. We cast the synthesis of this distribution as a convex program in the probabilities of the additive random vector. Once we have the optimal distribution, we propose an algorithm to generate pseudorandom realizations from this distribution using trajectories of a chaotic oscillator. At the other end of the channel, we have a second chaotic oscillator, which we use to generate realizations from the same distribution. Note that if we obtain the same realizations on both sides of the channel, we can simply subtract the realization from the distorted query to recover the requested query. To generate equal realizations, we need the two chaotic oscillators to be synchronized, i.e., we need them to generate exactly the same trajectories on both sides of the channel synchronously in time. We force the two chaotic oscillators into exponential synchronization using a driving signal. Exponential synchronization implies that trajectories of the oscillators converge to each other exponentially fast for all admissible initial conditions and are perfectly synchronized in the limit only. Thus, in finite time, there is always a “small” difference between their trajectories. To implement our algorithm, we assume (as it is often done in related work) that systems have been operating for sufficiently long time so that this small difference is negligible and oscillators are practically synchronized. We quantify the worst-case distortion induced by assuming perfect synchronization, and show that this distortion vanishes exponentially fast. Simulations are presented to illustrate our results.
Keywords: Privacy; Data Sets, Queries, Mutual Information, Chaos.
2 Introduction
In a hyperconnected world, scientific and technological advances have led to an overwhelming amount of user data being collected and processed by hundreds of companies over public networks. Companies mine this data to provide targeted advertising and personalized services. However, these new technologies have also led to an alarming widespread loss of privacy in society. Depending on adversary’s resources, opponents may infer private user information from public data available on the internet and unsecured/public servers. A motivating example of privacy loss is the potential use of data from smart electrical meters by criminals, advertising agencies, and governments, for monitoring the presence and activities of occupants [1, 2]. Other examples are privacy loss caused by information sharing in distributed control systems and cloud computing [3]; the use of travel data for traffic estimation in intelligent transportation systems [4]; and data collection and sharing by the Internet-of-Things (IoT) [5], which is, most of the time, done without the user’s informed consent. These privacy concerns show that there is an acute need for privacy preserving mechanisms capable of handling the new privacy challenges induced by an interconnected world.
In this manuscript, we consider the problem of hiding private information of users (modeled as discrete random vectors) within datasets when publicly sharing requested queries from the same source. In particular, the aim of our privacy scheme is to respond to queries with distorted queries of the form such that, when releasing , the private is “hidden”. Realizations of the vector are transmitted over a public (unsecured) communication channel to a remote station. Then, if we do not distort before transmission, information about is directly accessible through the public channel. The first problem that we address is the design of the probability distribution of to maximize privacy, i.e., the distribution of must be constructed so that carries as little information about as possible. Here, we follow an information-theoretic approach to privacy. We use the mutual information between private information and distorted queries , , as privacy metric. The design of the discrete additive vector is casted as an optimization problem where we minimize using the probability mass function of , , as optimization variables. That is, the optimal distribution, , is given by , where is taken over a class of probability mass functions. Contrary to related work [6]-[11], we do not consider any sort of privacy-distortion trade-off in our formulation. We actually aim at making as small as possible regardless of the distortion between and induced by . Distortion is not an issue because we seek to generate exactly the same realization of at the remote station; then, we could recover the query by simply subtracting this realization from the one of . In order to accomplish this, we propose an algorithm to generate pseudorandom realizations from at both sides of the channel using trajectories of two synchronized chaotic oscillators.
There are a number of requirements that the oscillators must satisfy for our algorithm to work: 1) trajectories of the oscillators must be bounded and chaotic; 2) they must be synchronized, i.e., we need them to generate exactly the same trajectories on both sides of the channel synchronously in time; and 3) the synchronous solution, regarded as a random process, must be stationary. Before giving the algorithm, we provide general guidelines for selecting the dynamics of the oscillators so that all the aforementioned requirements are satisfied. In particular, we use a range of well-known results in the literature to provide a synthesis procedure that allows to choose suitable oscillators. For boundedness, we use the notion of Input-to-State-Stability (ISS); for chaos, we employ standard largest Lyapunov exponent methods [12] and the (0-1) test [13]; for synchronization, we introduce the notion of convergent systems [14]; and for stationarity, we use hyperbolicity of the chaotic trajectories [15].
To generate equal realizations, our algorithm needs trajectories of the two chaotic oscillators (one at each side of the channel) to be synchronized. We force the oscillators into exponential synchronization using a driving signal. Exponential synchronization implies that trajectories of the oscillators converge to each other exponentially for all admissible initial conditions and are perfectly synchronized in the limit only. Therefore, in finite time, there is always a “small” difference between their trajectories. However, because oscillators synchronize exponentially fast, and it is often possible in practice to select initial conditions from a known compact set (known to both sides of the channel), it is safe to assume that the interconnected systems have been operating for sufficiently large time such that oscillators are practically synchronized, i.e., the synchronization error is so small that trajectories can be assumed to be equal. This is a standard assumption that is made in most, if not all, of the existing work on chaotic encryption based on synchronization [16]-[20]. Here, we give sufficient conditions for exponential synchronization to occur, provide tools for selecting the oscillators such that these conditions are satisfied, and assume that, after transients have settled down, trajectories are perfectly synchronized to some chaotic trajectory, say , . If , our algorithm uses any entry of to generate realizations from , where denotes some compact set that characterizes the support of . Because oscillators are selected such that , regarded as a random process, is stationary, samples from follow a stationary probability density function. We obtain this density through Monte Carlo simulations [21] and divide its support into a finite set of cells such that the probability that lies in these cells equals the optimal probability distribution . That is, we generate pseudorandom realizations from by properly selecting and evaluating if lies in at the sampling instants.
The use of additive noise to preserve privacy is common practice. There are mainly two classes of privacy metrics considered in the literature; namely, differential privacy [22]-[23] and information-theoretic metrics, e.g., mutual information, conditional entropy, Kullback-Leibler divergence, and Fisher information [24]-[28]. In differential privacy, because it provides certain privacy guarantees, Laplace noise is usually used [29]. However, when maximal privacy is desired, Laplace noise is generally not the optimal solution. This raises the fundamental question: what is the noise distribution achieving maximal privacy? This question has many possible answers depending on the particular privacy metric being considered and the system configuration, see, e.g., [6]-[8],[11], for differential privacy based results, and [24]-[28], for information theoretic results. In general, if the data to be kept private follows continuous distributions, the problem of finding the optimal additive noise to maximize privacy is hard to solve. If a close-form solution for the distribution is desired, the problem amounts to solving a set of nonlinear partial differential equations which, in general, might not have a solution, and even if they do have a solution, it is hard to find [24]. This problem has been addressed by imposing some particular structure on the considered distributions or assuming the data to be kept private is deterministic [24],[7],[8]. The authors in [7],[8] consider deterministic input data sets and treat optimal distributions as distributions that concentrate probability around zero as much as possible while ensuring differential privacy. Under this framework, they obtain a family of piecewise constant probability density functions that achieve minimal distortion for a given level of privacy. In [24], the authors consider the problem of preserving the privacy of deterministic databases using additive continuous noise with constrained support. They use the Fisher information and the Cramer-Rao bound to construct a privacy metric between deterministic data and the one with the additive noise, and find the probability density function that minimizes it. Moreover, they prove that, in the unconstrained support case, the optimal noise distribution minimizing the Fisher information is Gaussian. This observation has been also made in [30] when using mutual information as a measure of privacy. We remark that most of the aforementioned papers consider privacy-distortion trade-offs when designing their distorting mechanisms. We do not consider this trade-off here because, at the end of the channel, we remove the distortion that we induce using our synchronization based formulation.
Existing work on chaotic encryption based on synchronization [16]-[20] directly uses the states of the chaotic oscillators to mask private information. That is, standard algorithms do not use chaotic trajectories to generate pseudorandom realization from probability distributions (as we do here); instead, they simply add the value of the sampled chaotic trajectory (or functions of it) to private messages. Although the latter succeeds in masking messages, it does not give any privacy guarantees (neither information-theoretic nor in a differential privacy sense) on the private information, and it is not optimal in any sense. Hence, the contributions of our scheme with respect to existing work on chaotic encryption [16]-[20] are the treatment of fully stochastic datasets, the information-theoretic privacy guarantees that our framework provides, and the optimal performance of the designed distorting additive vector (optimal in the sense of minimizing the mutual information ). The work here is inspired by the experimental results presented in [31], where the authors propose a framework similar to ours for deterministic data using a electronic circuit implementation of the Mackey-Glass chaotic oscillator [32]. The contribution of our work with respect to [31] is threefold: 1) we consider fully stochastic data, which makes the privacy scheme fundamentally very different; 2) we provide a general formulation that encompasses a large class of chaotic systems, not only the electronic circuit implementation of the Mackey-Glass oscillator; and 3) we generate realizations from optimal distorting distributions, in [31], they consider uniform distributions only which is not optimal for stochastic data.
Next, we summarize the main contributions of the chapter.
**Contributions:
**
- We provide a general information-theoretic privacy framework based on optimal additive distorting random vectors and synchronization of chaotic oscillators; 2) We prove that the synthesis of the probability mass function of the distorting random vector can be posed as a convex program in over a class of probability mass functions; 3) We provide an algorithm to generate pseudorandom realizations from this distribution using trajectories of chaotic oscillators; 4) Using off-the-shelf results in the literature, we provide a synthesis procedure for selecting the dynamics of the oscillators so that our algorithm is guaranteed to work.
The remainder of the paper is organized as follows. In Section 3, we present some preliminaries results needed for the subsequent sections. We introduce the notion of convergent systems and the concept of mutual information. The general formulation and the specific problems to be addressed are given in Section 4. In Section 5, we pose the synthesis of the probability distribution of the optimal distorting vector. General guidelines for selecting the chaotic oscillators are given in Section 6. The algorithm for generating pseudorandom realizations from the optimal distribution is presented in Section 7. Simulation results are given in Section 8 and concluding remarks in Section 9.
3 Notation and Preliminaries
The symbol stands for the real numbers, () denotes the set of positive (non-negative) real numbers. The symbol stands for the set of natural numbers. The Euclidian norm in is denoted simply as , , where ⊤ defines transposition. For a given measurable function , , we denote its norm as , where denotes essential supremum. Matrices composed of only ones and only zeros of dimension are denoted by and , respectively, or simply and when their dimensions are clear. For square matrices , denotes the spectral radius of . A continuous function is said to belong to class if it strictly increasing and . Similarly, a continuous function belongs to class if, for fixed , belongs to class with respect to and, for fixed , is decreasing with respect to and . Consider a discrete random vector with alphabet , , , , and probability mass function , , where denotes probability of event . Similarly, for two random vectors and , taking values in the alphabets and , respectively, their joint probability mass function is denoted by , the marginal distribution of is given by , and the conditional distribution of given as . Analogously, for a continuous random vector , we denote their (multivariate) probability density function as . The notation () stands for continuous (discrete) random vectors following the probability density (mass) function (). We denote by "Simplex" the probability simplex defined by , for all . The notation denotes the expected value of the random vector . We denote independence between two random vectors, and , as X\rotatebox[origin={c}]{90.0}{\models}Y.
3.1 Mutual Information
Definition 1**.**
Consider two random vectors, and , with joint probability mass function and marginal probability mass functions, and , respectively. Their mutual information is defined as the relative entropy between the joint distribution and the product distribution , i.e.,
[TABLE]
Mutual information between two jointly distributed vectors, and , is a measure of the dependence between and .
3.2 Convergent Systems
Consider the dynamical system:
[TABLE]
with , state , input , and vector field . The vector field is continuously differentiable in , and is piecewise continuous in and takes values in some compact set .
Definition 2**.**
*[33]**. System (1) is said to be globally asymptotically convergent if and only if for any bounded input , , there is a unique bounded globally asymptotically stable solution , , such that for all initial conditions. *
For a convergent system, the limit solution is solely determined by the external excitation and not by the initial conditions. A sufficient condition for convergence obtained by Demidovich [33] and later extended in [14] is presented in the following proposition.
Proposition 1**.**
[33, 14]**. If there exists a positive definite matrix such that all the eigenvalues of the symmetric matrix
[TABLE]
are negative and separated from zero, i.e., there exists a constant such that for all , , and , then system (1) is globally exponentially convergent; and, for any pair of solutions of (1), the following is satisfied:
[TABLE]
*with constant and being the largest eigenvalue of the symmetric matrix . *
Remark 1**.**
*There are other methods to verify that trajectories of system (1) converge to a limit solution that is independent of the initial conditions and solely determined by the external excitation . For instance, contraction theory [34], Lyapunov function approach to incremental stability [35], the quadratic (QUAD) inequality approach (a Lipschitz-like condition) [36], and differential dissipativity [37], which are all concepts that are closely related to notion of convergent systems [14] that we use here. *
4 Problem Setup
Let be a discrete random vector that must be kept private. The alphabet and probability mass function of are denoted as , , , and , , respectively. The entries of represent, for instance, private entries of users within a dataset that is stored by a trusted server. The server admits queries of the form , , for some (stochastic or deterministic) mapping characterized by the transition probabilities , , , where , , . The aim of our privacy scheme is to respond to queries of the form with distorted queries , for some discrete random vector (with V\rotatebox[origin={c}]{90.0}{\models}Y), such that, when releasing , the individual entries of are “hidden”. Realizations of the vector are transmitted over a public (unsecured) communication channel to a remote station, see Figure 1. Then, if we do not add to before transmission, information about is directly accessible through the public channel. As a preliminary problem that we need to solve for the subsequent results, we address the design of the probability distribution of to maximize privacy, i.e., the distribution of must be constructed so that the sum, , carries as little information about as possible. In this manuscript, we use the mutual information between and , , as privacy metric. We aim at finding the probability mass function of , , that minimizes over a class of probability mass functions. That is, we cast the design of as an optimization problem with cost function , optimization variables , and subject to V\rotatebox[origin={c}]{90.0}{\models}Y and the usual probability simplex constraints. Note that, contrary to related work [9]-[11],[27],[28], we do not consider any sort of privacy-distortion trade-off in our formulation. We minimize regardless of the distortion between and induced by . Distortion is not an issue because, we seek to generate exactly the same realization of at the remote station and then recover the query by subtracting this realization from the one of . This is addressed in Problem 2 and Problem 3 below.
We let be a discrete random vector with alphabet and probability mass function , , i.e., the alphabet of and the one of the query are equal. Having equal alphabets imposes a tractable convex structure on the cost and reduces the optimization variables to the probabilities of each element of the alphabet. The case with arbitrary alphabet leads to a combinatorial optimization problem where the objective changes its structure for different combinations. We do not address this case in this manuscript; it is left as a future work. In what follows, we formally present the optimization problem we seek to address.
Problem 1**.**
[Optimal Distribution of the Additive Distorting Signal]* For given and , , , find the probability mass function , solution of the optimization problem:*
[TABLE]
Here, denotes the optimal distribution solution to (3). To hide , once we have obtained , we aim at generating realizations from this distribution, add them to the required query (), and send realizations of the sum to the remote station through the public channel. At the other end of the channel, we seek to generate the exact same realizations from so that we can recover the query by simply subtracting from , see Figure 2. Note that, in Figure 2, we have a recovered at the remote station rather that the actual . This is because we want to remark that, due to practical errors in our algorithm–e.g., due to communication delays and transients–realizations of that we generate at both ends of the channel might be slightly different in practice. To generate these realizations, we use trajectories, , , , , of a chaotic dynamical system of the form:
[TABLE]
with state , output , continuous in input taking values in some compact set , continuous function , and vector field continuously differentiable in its first argument, uniformly in its second argument. Hereafter, system (4) is referred to as responder 1. Responder 1 is placed at the side of the trusted server, see Figure 2. The input signal is generated by a chaotic autonomous exosystem:
[TABLE]
with state , output , and vector fields and . The vector field is locally Lipschitz in and is continuous. We refer to (5) as the driver system. We let be connected to the remote station via the public channel, see Figure 2. At the other end of the channel, driven by the same input signal , we have a third chaotic oscillator with the same dynamics as (4) but with potentially different initial conditions, i.e., the second oscillator is given by
[TABLE]
with state and output . We denote trajectories of (6) as with , , and . System (6) is referred to as responder 2. Note that if , , i.e., if systems (4) and (6) are synchronized, and we use the synchronous chaotic solution, say , to generate realizations from , we could have the same realization of at both sides of the channel.
Problem 2**.**
[Boundedness, Chaos, and Synchronization]* State sufficient conditions on the vector fields , , , and of the coupled system (4)-(6) such that: 1) trajectories of (4)-(6) exist and are bounded and chaotic; and 2) systems (4) and (6) exponentially synchronize, i.e., , exponentially fast. *
Remark 2**.**
*Problem 2 seeks to enforce exponential synchronization by selecting the dynamics of the oscillators. Exponential synchronization implies that trajectories of the responders converge to each other exponentially for all initial conditions and are perfectly synchronized in the limit only. It follows that, in finite time, there is always a “small” difference between their trajectories. Nevertheless, because oscillators synchronize exponentially fast, and it is often possible in practice to select initial conditions from a known compact set (known to both the trusted server and the remote station), it is safe to assume that the interconnected systems have been operating for sufficiently large time such that oscillators are practically synchronized, i.e., the synchronization error is so small that trajectories can be assumed to be equal. This is a standard assumption that is made in most, if not all, of the existing work on chaotic encryption based on synchronization [16]-[20]. *
Finally, once we have found functions solution to Problem 2, which guarantees exponential synchronization of the responders, and assuming that responders are synchronized (see Remark 2), we aim at designing a procedure to generate pseudorandom realizations from using the synchronous chaotic solution . Note that , for all . Moreover, because ; then, for some compact set .To reduce the complexity of the algorithm, we use the lower dimensional synchronous solution to generate the realizations from .
Problem 3**.**
[Generation of Optimal Pseudorandom Numbers]* Using the lower dimensional synchronous solution, , design an algorithm to generate pseudorandom realizations from the optimal distribution , . *
5 Optimal Distribution of the Additive Distorting Signal
In this section, we prove that Problem 1 can be posed as a convex program in the probabilities , . We derive an explicit expression for the cost function , , in terms of the given and and the variables , restricted to satisfy the independence constraint V\rotatebox[origin={c}]{90.0}{\models}Y.
Lemma 1**.**
* with , V\rotatebox[origin={c}]{90.0}{\models}Y, is a convex function of , , for given and , , ; and can be written compactly in terms of , , and , as follows:*
[TABLE]
Proof: The expression on the right-hand side of (7a) follows by inspection of Definition 1 and the fact that . By [38, Theorem 2.7.4], cost (7a) is convex in for given . However, our optimization variables are and not . Note that , , and form a Markov chain in that order [39]; therefore, . Marginalizing with respect to and then conditioning with respect to yields and , respectively. Note that is just a linear transformation of . Hence, convexity with respect to implies convexity with respect to because convexity is preserved under affine transformations [40]. Next, consider . By definition, , , . Note that
[TABLE]
where (a) follows from independence between and . Thus,
[TABLE]
We have concluded convexity of with respect to above. Hence, because and is a linear transformation of ( for and zero otherwise), the cost is convex in . Moreover, since and , equality (7b) holds true. It remains to prove that can be written as (7c). Because , , for a given , can be written as the sum of the probabilities of all and that result in , i.e.,
[TABLE]
*where (b) follows from independence between and . *
By Lemma 1, the cost , for V\rotatebox[origin={c}]{90.0}{\models}Y, is convex in and parametrized by and . In what follows, we cast the nonlinear program for solving Problem 1.
Theorem 1**.**
Given and , , , the mapping , , that minimizes , , subject to V\rotatebox[origin={c}]{90.0}{\models}Y can be found by solving the following convex program:
[TABLE]
Proof: Theorem 1 follows from Lemma 1.* *
6 Boundedness, Chaos, and Synchronization
6.1 Existence, Uniqueness, and Boundedness of Solutions
We start addressing existence, uniqueness, and boundedness of the solutions of the coupled systems (4)-(6). To be able to use synchronous solutions to generate realizations from , we first need these solutions to exist and be bounded. In the system description given above, we have assumed that is continuously differentiable in uniformly in , is continuous in , and is locally Lipschitz. These alone imply uniqueness and existence of solutions of (4)-(6) over some finite time interval , , [41, Theorem 2.2].To conclude the latter for arbitrarily large , besides the locally Lipschitz assumption on the functions, we need boundedness of the solutions of (4)-(6) [41, Theorem 2.4]. Note that the coupled systems (4)-(6) have a cascade structure. The driver dynamics is independent of the responders states, and its output, , is the input of the responders. Then, an approach to conclude boundedness of the overall system is to conclude boundedness of the driver first, and then boundedness of the responders when driven by bounded inputs. In what follows, we formally introduce the notion of boundedness that we use here.
Definition 3**.**
[41]* The solutions of (5) are bounded for a bounded set of initial conditions if there exists a positive constant , independent of the initial time instant, and for every , there is , independent of the initial time instant, such that , . If the latter holds for arbitrarily large ; then, the solutions of (5) are globally bounded. *
Remark 3**.**
*Because is continuous, by the extreme value theorem, boundedness of implies boundedness of . *
Remark 4**.**
*We do not give conditions for boundedness of the solutions of (5). It is assumed that the vector field is such that the solutions of the driver are globally bounded. We refer the reader to, for instance, [41, Theorem 4.18], where sufficient conditions for boundedness are given in terms of Lyapunov-like results. *
Next, for bounded solutions of the driver, we need the solutions of the responders to be bounded when driven by . To address this, we use the notion if Input-to-State-Stability (ISS) [42].
Definition 4**.**
[42]* System (4) (and thus system (5) as well) is said to be Input-to-State-Stable if there exist a class function and a class function such that for any initial condition and any bounded input , the solution exists for all and satisfies: . *
Remark 5**.**
ISS* of the responders with respect to guarantees that, for any bounded , the states and are bounded. Moreover, as increases, and are ultimately bounded [41] by , see [42] for further details. *
Remark 6**.**
*Sufficient conditions for the responders to be ISS with input are not provided here. We assume that the vector field is such that systems (4) and (5) are ISS with respect to . We refer the reader to, for instance, [41, Theorem 4.19], where sufficient conditions for ISS are given in terms of ISS-Lyapunov functions. *
Remark 7**.**
*The weaker property of integral Input-to-State-Stable (iISS) [43] could be used to conclude boundedness of the responder’s trajectories when driven by “sufficiently small” inputs. We refer the reader to [44], where sufficient conditions for *iISS * and related boundedness results are given. *
6.2 Synchronization
Next, we give sufficient conditions on such that , i.e., the responders exponentially synchronize. We assume that solutions of the coupled systems (4)-(6) exist and are bounded, i.e., vector fields , , and satisfy the conditions stated in the previous subsection. Then, for bounded , a sufficient condition for the responders to exponentially synchronize is that systems (4) and (6) are convergent systems in the sense of Definition 2. The latter implies that, because both responders are driven by the input and their dynamics are described by the same set of differential equations, trajectories of (4) and (6) converge to the same the limit solution, , and this solution is solely determined by and not by the initial conditions. In the following corollary of Proposition 1, we give a sufficient condition for the responders to be exponentially convergent (and thus to exponentially synchronize).
Corollary 1**.**
Consider the responders (4) and (6). If there exists a positive definite matrix such that, for all and , all the eigenvalues of the symmetric matrix:
[TABLE]
*are negative and separated from zero; then, responders (4) and (6) are globally exponentially convergent, and thus , exponentially fast. *
Remark 8**.**
*If the driver’s output is to be sent over a network and quantization (or some sort of coding) is required, we would need to drive responders by the same quantized , say , to achieve exponential synchronization. That is, if we quantize to obtain , and we drive both responders by (with, e.g., a Zero-Order-Hold (ZOH)), they would also exponentially synchronize. They would synchronize to a different trajectory than when driven by , but they would synchronize exponentially fast. *
Besides the notion of convergent systems, there are other methods available in the literature that can be used to verify that trajectories of responders asymptotically synchronize to a limit solution that is independent of the initial conditions. See Remark 1 for details.
6.3 Chaotic Dynamics
There are mainly two branches of methods to identify chaotic dynamics; namely, standard largest Lyapunov exponent methods [12], and the more recent (0-1) test [13]. Both methods use trajectories (numerical or experimental) of the systems under study to decide whether they are chaotic or not. In general, there are no sufficient conditions directly on the differential equations (the vector fields and ) such that chaotic trajectories are guaranteed to occur. There are, however, many well known systems in the literature known to exhibit chaotic trajectories. For instance, the Lorenz system [45], Duffing [46] and van der Pol [47] oscillators, the Rössler [48] and Chua [49] systems, and neural oscillators [50] (e.g., the Hodgkin-Huxley, Morris-Lecar, Hindmarsh-Rose, and FitzHugh-Nagumo oscillators). We can use any of these chaotic systems (if they satisfy all the required extra conditions, see Section 6.4) as driver and then select a pair of responders with convergent dynamics. Indeed, we need to verify that the responders that we choose produce chaotic trajectories when driven by the chaotic driver. Moreover, to generate the pseudorandom realizations from (this is addressed in the next section), we need the chaotic trajectories of the responders, regarded as a random process, to be stationary, i.e., after transients have settled down, trajectories must follow a stationary probability distribution [39] which is independent of the initial conditions. The latter is a strong condition that is not satisfied for all chaotic systems. The existence of stationary distributions for chaotic trajectories has been proven for hyperbolic and quasi-hyperbolic (also called singular-hyperbolic) chaotic systems [15]. The definition of (quasi) hyperbolic dynamical systems [15, 51] is technical and not needed for the subsequent results. It requires concepts from differential topology that we prefer to omit here for readability of the manuscript. It suffices to know that the chaotic system that we use for the driver must lead to stationary distributions of the responders. This can be tested numerically by Monte Carlo simulations [21]. Moreover, there are many well-known chaotic systems with (quasi) hyperbolic dynamics in the literature, e.g., the Lorenz and Chua systems [52], neural oscillators [53], the many predator-pray like systems given in [54, 55], and some mechanical nonlinear oscillators [56]. In the next subsection, we provide a synthesis procedure to choose the functions of the coupled systems (4)-(6) such that all the required conditions mentioned above are satisfied.
6.4 General Guidelines
**Synthesis Procedure:
** 1) Select a driver dynamics (5) (i.e., the vector field ) known to be chaotic and (quasi) hyperbolic (e.g., systems in [52]-[56]).
2) Verify that the corresponding is locally Lipschitz and the trajectories of the driver are globally bounded, in the sense of Definition 3, using, e.g., [41, Theorem 4.18].
3) In (5), let , , and , , i.e., fix the output of the driver to be any state of (5). In doing this, we ensure that is continuous, bounded, chaotic, and (quasi) hyperbolic.
4) For the responders (4) and (6), select any continuously differentiable vector field (with respect to ) leading to ISS dynamics, see Remark 6, and satisfying the conditions for convergence in Corollary 1, e.g., , for any matrix with spectral radius and differentiable vector field . Then, we ensure that the responders have bounded trajectories and exponentially synchronize.
5) Verify that the trajectories of the responders, when driven by the chaotic driver, are chaotic (using Lyapunov exponents or the (0-1) test) and, after transients have settled down, lead to a stationary probability distribution independent of the initial conditions. See Section 6.3 for details.
6) In (4) (and respectively in (6)), let , , and , , i.e., fix the output of the responders to be any state of (4) and (6), respectively. Indeed, we need the same for both responders, i.e., and . In doing this, we ensure that and are continuous, bounded, chaotic, and lead to stationary probability distributions.
7 Generation of Optimal Pseudorandom Numbers
In this section, we assume that the driver and the responders dynamics have been designed following the general guidelines in Section 6.4. Then, for sufficiently large , the chaotic trajectories of the responders are practically synchronized, i.e., for any finite , there is , such that and , for all , where denotes the asymptotic synchronous solution for some compact set ; and samples from follow a stationary probability distribution. Here, we assume that the responders have been operating for sufficiently large time such that the synchronization error, , is so small that trajectories of the responders can be assumed to be equal to (see Remark 2), i.e., is sufficiently large so that is practically zero. In Section 7.1, we quantify the worst-case distortion induced by assuming in finite time. In particular, we give an upper bound on the mean squared error , where denotes the estimate of realizations of using , , and the algorithm provided below. In the remainder of this section, we assume . Note that the sample space of , regarded as a random process, is some compact set , i.e., the sample space is a subset of the real line and thus samples from follow some stationary probability density function (pdf), say , for some virtual continuous random variable . That is, for , define the sampled sequence for sampling time-instants , , , and sampling period ; then, for all . Because we know the dynamics (4)-(6), we can obtain by Monte Carlo simulations [21]. If we know , we can always find a set of cells , , , such that , , and for and . In other words, using the pdf , we can select the cells so that the probability that lies in the cells equals the optimal probability distribution . It follows that we can generate pseudorandom realizations from by properly selecting . Note that, because realizations are being generated by a deterministic process, there would be high correlation between consecutive realizations for small sampling period . However, because the is a stationary process (see Section 6.3), the larger the , the smaller the correlation between and for all . Indeed, large would introduce large time-delays for generating realizations. There is a trade-off between correlation and time-delay that should be taken into account in practice. One way to deal with this trade-off is to compute the normalized autocorrelation function [15, 20] of . Then, we select the smallest time-delay that leads to a desired correlation between and , , and use the delayed sequence to generate realizations from . In the following algorithm, we summarize the ideas introduced above.
**Algorithm 1: Pseudorandom Number Generation:
** 1) Consider the probability mass function , , solution to Problem 1; and the synchronous solution of the responders.
2) Fix the sampling period and obtain, by Monte Carlo simulations [21], the probability density function of the sampled sequence , , .
3) Select a finite set of cells , , , such that , , and for all .
4) Generate realization from using the piecewise function:
[TABLE]
7.1 Distortion Induced by Synchronization Errors
Algorithm 1 in Section 7 is constructed under the assumption that responders are perfectly synchronized. However, because we only have exponential synchronization, in finite time, there is always a “small” difference between and due to potentially different initial conditions. It follows that there is also a difference between realizations generated using , denoted as , and realizations generated through , where . Exponential synchronization implies that for any finite , there is (denoted as for simplicity), parametrized by and the initial synchronization error , such that for all , and . Consider the cell , , with end points and , , the length of is defined as . If (or ), . Without loss of generality, let , , and . Note that, if , and are at most one level apart from each other, e.g., if , then either or ; and if , then , , or . It follows that , , is of the form depicted in Figure 3, where denotes the estimate of realizations of using , , and Algorithm 1. Similarly, if , and are at most two levels apart from each other and thus lead to a different structure of the transition probabilities. Here, we only consider the case where . Distortion induced by larger synchronization errors can be estimated following the same methods. Note that, because responders synchronize exponentially, as (), for , and , for , for all . That is, distortion due to synchronization errors disappears exponentially fast. The actual value of the transition probabilities depend on the responders and driver dynamics, the initial conditions, and the cells . However, we do not need these probabilities, only the structure of depicted in Figure 3 is used to derive an upper bound on the expected distortion. Let denote the set of pairs for which there is a nonzero transition probability between and , , as depicted in Figure 3. The set is parametrized by the upper bound on the synchronization error . Define the distortion function . The function is a deterministic function of two jointly distributed random vectors, and , with joint distribution . Hence, see [39] for details, we can write the expected distortion as follows
[TABLE]
where the left-hand side of (11) follows from the definition of above, and the last inequality from the fact that for all . The constant provides an upper bound on the worst-case distortion induced by a synchronization error. Moreover, as , ; therefore, . That is, distortion due to synchronization errors is bounded by and vanishes exponentially fast.
8 Simulation Results
We next present an evaluation of our algorithms on real data. We use the adult-dataset, available from the UCI Machine Learning Repository [57], which contains census data. Each attribute within the dataset has entries. We use three of these attributes: race, sex, and income, which take values on finite discrete sets. We let race and sex be the private information, , and use income as the information requested by the query, . The probability mass functions of and , and part of the one of are given in Table 1.In Figure 4, we depict , , and with mass points indexed in the order given in Table 1.We first compute the optimal distribution of the distorting additive noise . We solve the convex program (8) in Theorem 1. The optimal distribution is depicted in Figure 5 and the corresponding numerical values are given in Table 2. This leads to while the mutual information without distortion is , i.e., according to our metric, by optimally distorting the query, we leak about ten times less information. To generate realization from this distribution at both sides of the channel, we use trajectories of two chaotic responders as introduced in Section 2. We use the synthesis procedure in Section 6.4 to select suitable driver and responders. As driver (5), we use the Lorenz system:
[TABLE]
with states and driving signal . The Lorenz system produces bounded trajectories [58], and is known to be chaotic and quasi-hyperbolic [52]. For the responders and , we let , with and . Because is diagonal and has negative eigenvalues, responders satisfy the conditions of Corollary 1 with ; hence, they are convergent systems and thus exponentially synchronize when driven by the same input . Moreover, since responders are linear in and is Hurwitz, systems can be proved to be ISS with input [42]. Because is bounded and is continuous, by the extreme value theorem, is bounded, which, together with ISS, imply boundedness of the responders’ trajectories [42]. We let the outputs of the responders be and (their second state). In Figure 6, we show traces of the chaotic driver and responders trajectories obtained by computer simulations (using Matlab from Mathworks), and in Figure 7, we plot the synchronization error between the outputs of the responders. We initialized the responders in antiphase , and far from the limit trajectory. Note, in Figure 7, that responders synchronize exponentially and are practically synchronized for . Moreover, after , the synchronization error is within Matlab’s precision (). Because the Lorenz system is quasi-hyperbolic, samples from the driving signal follow a stationary distribution that is independent of the initial conditions of the driver, see Section 6.3. Then, according to the synthesis procedure in Section 6.4, we next verify, using Monte Carlo simulations, that samples (see Section 7), from the synchronous trajectory, , are also stationary. To do so, we compute the probability density function , , for different initial conditions and verify that all of them lead to the same density. In Figure 8, we depict probability densities of for twenty different initial conditions, sampling instants , , and . Note that they all lead to the same density . The support (obtained numerically) of is given . Finally, we use the piecewise function (10) to generate realizations from using samples, , from the synchronous trajectory. Following the algorithm given in Section 7, we have to divide the support of into a set of partitions , such that the probability that lies in the cells equals the optimal probability distribution . This can be done using the empirical Cumulative Distribution Function (CDF), , corresponding to . We depict this CDF in Figure 9. Then, we simply select the cells such that for all , (the cardinality of the alphabet of ). For this CDF and in Table 2, we obtain the following cells:
[TABLE]
In Figure 10, we show realizations generated by the piecewise function (10) at both sides of the channel, and the corresponding probability mass functions. To generate this realizations, at the trusted server, we use samples from and, at the remote station, we sample . Note that, as expected, all samples are perfectly synchronized and their probability mass functions are equal to in Figure 5.
9 Conclusions
Using an information-theoretic privacy metric (mutual information), we have provided a general privacy framework based on additive distorting random vectors and exponential synchronization of chaotic systems. The synthesis of the optimal probability distribution, , of the additive distorting vector has been posed as a convex program in . We have provided an algorithm for generating pseudorandom realizations from this distribution using trajectories of chaotic oscillators. To generate equal realizations at both sides of the channel, we have induced exponential synchronization on two chaotic oscillators (one at each side of the channel), and use their trajectories and the proposed algorithm to generate realizations. However, exponential synchronization implies that, in finite time, there is always a small error between trajectories (and thus also between realizations). We have derived an upper bound on the worst-case distortion induced by finite-time synchronization errors and showed that this distortion disappears exponentially fast. Using off-the-shelf results in the literature, we have provided general guidelines for selecting the dynamics of the responders and driver so that our algorithm for generating synchronized realizations from is guaranteed to work. We have presented simulation results to illustrate our results.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] S. R. Rajagopalan, L. Sankar, S. Mohajer, and H. V. Poor, “Smart meter privacy: A utility-privacy framework,” in 2011 IEEE International Conference on Smart Grid Communications (Smart Grid Comm) , 2011, pp. 190–195.
- 2[2] O. Tan, D. Gunduz, and H. V. Poor, “Increasing smart meter privacy through energy harvesting and storage devices,” IEEE Journal on Selected Areas in Communications , vol. 31, pp. 1331–1341, 2013.
- 3[3] Z. Huang, Y. Wang, S. Mitra, and G. E. Dullerud, “On the cost of differential privacy in distributed control systems,” in Proceedings of the 3rd International Conference on High Confidence Networked Systems , 2014, pp. 105–114.
- 4[4] and M. Gruteser, , and A. Alrabady, “Enhancing security and privacy in traffic-monitoring systems,” IEEE Pervasive Computing , vol. 5, pp. 38–46, 2006.
- 5[5] R. H. Weber, “Internet of things – new security and privacy challenges,” Computer Law and Security Review , vol. 26, pp. 23–30, 2010.
- 6[6] S. Han, U. Topcu, and G. J. Pappas, “Differentially private convex optimization with piecewise affine objectives,” in 53rd IEEE Conference on Decision and Control , 2014.
- 7[7] J. Soria-Comas and J. Domingo-Ferrer, “Optimal data-independent noise for differential privacy,” Information Sciences , vol. 250, pp. 200 – 214, 2013.
- 8[8] Q. Geng and P. Viswanath, “The optimal mechanism in differential privacy,” in 2014 IEEE International Symposium on Information Theory , 2014, pp. 2371–2375.
