Interpretable Encrypted Searchable Neural Networks
Kai Chen, Zhongrui Lin, Jian Wan, Chungen Xu

TL;DR
This paper introduces Interpretable Encrypted Searchable Neural Networks (IESNN), combining machine learning and encryption to enable efficient, dynamic, and interpretable search in cloud environments with reduced computational and communication costs.
Contribution
The paper presents a novel IESNN framework that uses probabilistic learning and adversarial training for encrypted search, improving efficiency and interpretability over traditional searchable encryption methods.
Findings
Query complexity reduced to approximately O(log N)
Lower computational and communication overhead
Enhanced adaptability with automatic weight updates
Abstract
In cloud security, traditional searchable encryption (SE) requires high computation and communication overhead for dynamic search and update. The clever combination of machine learning (ML) and SE may be a new way to solve this problem. This paper proposes interpretable encrypted searchable neural networks (IESNN) to explore probabilistic query, balanced index tree construction and automatic weight update in an encrypted cloud environment. In IESNN, probabilistic learning is used to obtain search ranking for searchable index, and probabilistic query is performed based on ciphertext index, which reduces the computational complexity of query significantly. Compared to traditional SE, it is proposed that adversarial learning and automatic weight update in response to user's timely query of the latest data set without expensive communication overhead. The proposed IESNN performs better than…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCryptography and Data Security · Privacy-Preserving Technologies in Data · Cloud Data Security Solutions
11institutetext: School of Science, Nanjing University of Science and Technology, Nanjing, CHN
22institutetext: School of Computer Science and Engineering, NJUST, Nanjing, CHN
22email: {kaichen,zhongruilin,wanjian,xuchung}@njust.edu.cn
Interpretable Encrypted Searchable Neural Networks
Kai Chen 11
Zhongrui Lin 22
Jian Wan 22
Chungen Xu 1(1())
Abstract
In cloud security, traditional searchable encryption (SE) requires high computation and communication overhead for dynamic search and update. The clever combination of machine learning (ML) and SE may be a new way to solve this problem. This paper proposes interpretable encrypted searchable neural networks (IESNN) to explore probabilistic query, balanced index tree construction and automatic weight update in an encrypted cloud environment. In IESNN, probabilistic learning is used to obtain search ranking for searchable index, and probabilistic query is performed based on ciphertext index, which reduces the computational complexity of query significantly. Compared to traditional SE, it is proposed that adversarial learning and automatic weight update in response to user’s timely query of the latest data set without expensive communication overhead. The proposed IESNN performs better than the previous works, bringing the query complexity closer to and introducing low overhead on computation and communication.
Keywords:
Searchable EncryptionSearchable Neural NetworksProbabilistic Learning Adversarial LearningAutomatic Weight Update
1 Introduction
The frequent and massive disclosure of private data has drawn the growing attention of the public to the cyberspace security. Meanwhile, cloud storage services are increasingly attracting individuals and enterprises to outsource data into cloud server with the rapid development of cloud computing. Unfortunately, outsourcing data into cloud server may reveal the privacy of data [10, 14]. In cloud security, searchable encryption (SE) has received widespread attention as it protects the privacy of outsourced data and prevents sensitive information from leaking [5]. However, traditional SE [1, 3, 6, 8, 9, 10, 12, 13, 14] requires high computation and communication overhead to enable dynamic search and dynamic update, which makes SE still unable to satisfy user’s experience and requirements of the actual application adequately. Actually, machine learning (ML) can provide intelligent and efficient means yet the current popular ML only supports plaintext data training and can not satisfy the special requirements of encrypted cloud data. Therefore, it is necessary to discuss the cross-fusion problem of ML and SE, and introduce intelligence and high-efficiency into SE.
SE has been continuously developed since it was proposed [8], and multi-keyword ranked search scheme is recognized as excellent [5]. Cao et al. [1] first discussed privacy-preserving multi-keyword ranked search over encrypted cloud data (MRSE) for single data owner model, and established strict privacy requirements. They first used asymmetric scalar-product preserving encryption (ASPE) [12] to obtain the similarity score of the query vector and the index vector. In this way, cloud server can retrieve top-k documents that are most relevant to the data user’s query request. However, since matrix operations require high computation overhead, MRSE is not suitable for practical application scenario. For the purpose of managing the keyword dictionary dynamically and improving system performance, Li et al. [6] proposed efficient multi-keyword ranked query over encrypted data in cloud computing (MKQE) based on MRSE, which owns a low overhead index construction algorithm and a novel trapdoor generation algorithm. However, it still has no major breakthrough in improving search efficiency when the data set is large. To achieve dynamic search, Xia et al. [13] provided a secure and dynamic multi-keyword ranked search scheme over encrypted cloud data (EDMRS) to support dynamic operation in SE. For tree-based index structures, search efficiency is improved by the greedy depth-first search (GDFS) algorithm and parallel computing. Regrettably, the search efficiency of ordinary balanced binary tree they used gradually decreases and tends to linear search efficiency when migrating to multiple data owners model with large amount of differential data. Moreover, maintaining such an index tree is not flexible and efficient. Guo et al. [3] discussed secure multi-keyword ranked search over encrypted cloud data for multiple data owners model (MKRS_MO) and designed a heuristic weight generation algorithm based on the relationships among keywords, documents and owners (KDO). They considered the correlation among documents and the impact of documents’ quality on search results. Experiments on the real-world data set showed that MKRS_MO is better than the schemes using traditional keyword weight model [9]. However, the fly in the ointment is that the operations of calculating index similarity in MKRS_MO may lead to “curse of dimensionality”, which limits the availability of the system. Last but not least, they ignored the secure solution in known background model [1] (threat model for measuring the ability of cloud server to evaluate private data and the risk of revealing private information in SE system).
For the first time, this paper proposes interpretable encrypted searchable neural networks (IESNN) to explore intelligent SE. Based on the neural network, we propose sorting network and employ probabilistic learning to obtain the query ranking for encrypted searchable index. To be specific, firstly it performs a sufficient amount of random queries (obey uniform distribution) and then calculates the sum of the inner product of each index vector and all random query vectors. Finally it sorts the index vectors according to the match scores from high to low. Therefore, the probabilistic ranking of the index is close to the ranking in the actual query, which reduces the computational complexity of the query significantly. Moreover, probabilistic query with computational complexity close to , is used to retrieve top-k documents. In order to achieve secure weight update without revealing private information to “semi-trusted” cloud server [10, 14], we propose searching adversarial network and weight update network in an encrypted cloud environment. Specifically, in order to respond to user’s timely query of the latest data set, we employ adversarial learning [2] and optimal game equilibrium to make the probabilistic ranking of the index close to its popular ranking. Furthermore, we combine backpropagation neural network [4] with discrete Hopfield neural network [7] to enable automatic weight update. It is worth mentioning that the update operations are done in the cloud, which means there is no expensive communication overhead. So we can use IESNN for model training and intelligent system implementation. On the one hand, it introduces intelligence into the SE system, which improves user’s experience and reduces system overhead. On the other hand, training data sources for ML can be derived from ciphertext. It means that data mining based on ciphertext analysis can not only obtain results consistent with plaintext analysis but also strengthen the intensity of data privacy protection.
Our main contributions are summarized as follows:
(1)
Towards intelligent SE by combining popular ML with traditional SE effectively;
(2)
We employ probabilistic learning method to achieve maximum likelihood searching and improve search efficiency significantly;
(3)
We use IESNN to implement flexible dynamic operation and maintenance in an encrypted cloud environment.
The remainder of this paper is organized as follows: Section 2 describes the SE model. Section 3 describes the details of IESNN and its performance tests. Section 4 discusses our solution and its implications.
2 Searchable Encryption Model
2.1 System Model
The system model proposed in this paper consists of three parties, is depicted in Fig. 1, and the specific description is as follows:
Data owners():
are responsible for building searchable index and original IESNN, encrypting the data and sending them to cloud server.
Data users():
are consumers of cloud services. Once the license is granted, they can retrieve the encrypted cloud data.
Cloud server():
is considered“semi-trusted”in SE [10, 14]. It provides cloud service, including running authorized access controls, performing searches for encrypted cloud data based on query requests, returning top-k documents to and enabling dynamic operation and maintenance with IESNN.
2.2 System Framework
Setup:
Based on privacy requirements in known background model [1], determines the size of dictionary , the number of pseudo-keyword, sets the parameter . For all data owners = {,…,}, we have = {,…,}, = {,…,}, = {,…,}.
KeyGen():
generate secret key = {,…,}, where = {, , }, and are two invertible matrices with the dimension and is a random -length vector.
Extended-KeyGen():
For dynamic search [6], if new keywords are added into the -th dictionary , generates a new = {, , }, two invertible matrices and with the dimension , and a new -length vector .
BuildIndex():
In order to reduce the possibility that “semi-trusted” cloud server [10, 14] evaluates the private data successfully, first build searchable indexes for documents and obtain the weighted index vectors, and then fill index vectors with random pseudo-keywords (obey Gaussian distribution) and obtain secure index vectors with high privacy protection strength [1]. Finally they use secure index vectors to build IESNN () and send to (specific example: “splits” index vector into two random vectors . Specifically, if , = = ; else if , is a random value, . encrypts as = with ).
Trapdoor():
send query request (query keywords and ) to . generate query = {,…,} (where is a weighted vector with dimension ) and calculate the trapdoor = {,…,} using and send to (specific example: “splits” query vector into two random vectors . Specifically, if , is a random value, and ; else if , . Finally, encrypts as = with ).
Query(:
send trapdoors, query instruction and attribute identification to . performs searches based on the query, and returns top-k documents to .
3 Interpretable Encrypted Searchable Neural Networks
3.1 Maximum Likelihood Searching
We employ inner product similarity [11] to quantitatively evaluate the effective similarity between the query vector and the index vector. As illustrated in Fig. 2 (for an intuitive understanding, it shows the unencrypted network), in sorting network, it performs a sufficient amount of random queries (obey uniform distribution: , that is ), and then calculates the sum of the inner product of each index vector and all random query vectors with formula 1. Finally it sorts the index vectors according to the match scores from high to low. Therefore, the index ranking obtained by probabilistic learning is close to the ranking in the actual query.
[TABLE]
We implement the proposed scheme using Python in Windows 10 operation system with Intel Core i5 Processor 2.40GHz and test its efficiency on a real-world document set (IEEE INFOCOM publications, including 400 papers and 2,000 keywords). The probabilistic query algorithm based on the probabilistic ranking of encrypted searchable index brings the query complexity closer to . As shown in Fig. 3, when retrieving the same number of top-k documents, probabilistic query performs better than the related works that based on tree search [3, 13] and matrix operation [1, 6]. As the ordered feature of the balanced binary tree is not guaranteed in the index tree and the query based on matrix operation needs to traverse all indexes to retrieve top-k documents, the number of retrieved indexes is far more than the number of retrieved documents.
3.2 Adversarial Learning
Adversarial network works when the probabilistic ranking of the index deviates from the index ranking in the actual query result. As shown in Fig. 4, it employs optimal game equilibrium to make the probabilistic ranking of the index close to its popular ranking (described by formula 2, and are the probability distributions of the index ranking and query result, respectively).
[TABLE]
Inspired by generative adversarial networks (GAN) [2] and self-attention generative adversarial networks (SAGAN) [15] but different from GAN and SAGAN, searching adversarial networks (SAN) do not require complex gradient calculations and extensive iterative training. As a matter of fact, SAN only require simple residual calculations and index sorting floating steps. Specifically, after completing the query, the ranked search result list is feedback to adversarial network in SAN. Adversarial network calculates the residual before and after the weight change of the index corresponding to top-k documents, and calculates the relative floating of the index ranking of the feedback result(i.e. new index ranking) and the original index ranking. Finally, SAN send the results of the calculation (the residual of the weights) and the index ranking changes to the weight update network as a target for index update (see Fig. 5 for details).
3.3 Automatic Weight Update
As illustrated in Fig. 5, in order to achieve automatic weight update and respond to users’ queries for the latest data sets in a timely manner, weight update network (WUN) combines backpropagation neural network (BPNN) [4] with discrete Hopfield neural network (DHNN) [7]. In WUN, the update of index weights uses vector and matrix operations to approximate the actual increments, which has the characteristics of local homomorphism for ciphertext operations and plaintext operations. For instance, considering index vector query vector and two invertible matrices and The update principle of ciphertext index is as follows:
Matrix and vector multiplication: Secure inner product calculation: Index vector update: Inner product approximation:
Asynchronous work mode of WUN: The update task from SAN to WUN is only updating the weight of an index, while other indexes still retain their original weight. i.e.
[TABLE]
Synchronous work mode of WUN: The synchronous work mode is parallel, i.e. the weights of all indexes are all changed in one update. The adjustment of the weight is determined according to the current input value. The weight update is complete and the weight of an index continues to be used for the next update. When the weight of each index is stabilized, the work ends.
[TABLE]
When updating an index, the schemes [3, 13] employ tree-based index need to update the index vector itself (leaf node of index tree) and its corresponding other data (parent node of leaf node). Moreover, in order to achieve dynamic search, the current schemes [1, 3, 6, 9, 13] need to download the ciphertext index from the cloud, update its plaintext after local decryption, and finally upload the new ciphertext index to the cloud. In comparison, our solution only needs to update the index vector in the cloud with touching a smaller amount of data and introduce low overhead on computation and communication.
3.4 Overall Operation and Maintenance of IESNN
As shown in Fig. 6, IESNN consist of sorting net, adversarial net, searching net and weight update net. Except that the initial index weight needs to be generated by data owners, the rest of automatic update operations (“add, delete, change and investigate” operations of index) are all completed in an encrypted cloud environment. The system forms a “query-learning-update-learning-query” self-attention [15] loop and an “automatic operation and maintenance” mechanism. Dynamic operation and maintenance of SE system are almost entirely done in cloud server. On the one hand, implementing dynamic operation and maintenance in an encrypted cloud environment not only improves the usability and flexibility of SE system, but also enhances the strength of privacy protection. On the other hand, when it is necessary to update the index in cloud server, compared with traditional SE [1, 3, 6, 9, 13], our solution eliminates the need to rebuild the index locally and upload a new index to cover the old index stored in the cloud, which introduces low overhead on computation, communication and local storage.
4 Discussion
In this paper, we discuss the cross-fusion problem of ML and SE, and propose IESNN. We creatively combine popular ML with traditional SE, which is committed to exploring intelligent SE. We employ probabilistic learning method to generate sorting network that is trained by a sufficient amount of random queries, which makes a contribution to achieve maximum likelihood searching and bring the query complexity closer to . It means that exploiting ML to optimize the query is effective in an uncertain system, even better than special construction methods. Obviously, traditional query algorithms based on matrix operations and tree searching are not optimistic in big data environments because high dimensional data processing can lead to “curse of dimensionality” and even system crashes. Implementing flexible dynamic operation and maintenance in an encrypted cloud environment with IESNN that reduces communication overhead, protects data privacy and leverages cloud computing well.
Acknowledgment
This work was supported by “the Fundamental Research Funds for the Central Universities” (No. 30918012204) and “the National Undergraduate Training Program for Innovation and Entrepreneurship” (Item number: 201810288061). NJUST graduate Scientific Research Training of ‘Hundred, Thousand and Ten Thousand’ Project “Research on Intelligent Searchable Encryption Technology”.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Cao, N., Wang, C., Li, M., Ren, K., Lou, W.: Privacy-preserving multi-keyword ranked search over encrypted cloud data. IEEE Trans. Parallel Distrib. Syst. 25 (1), 222–233 (2014)
- 2[2] Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A.C., Bengio, Y.: Generative adversarial networks. Co RR abs/1406.2661 (2014)
- 3[3] Guo, Z., Zhang, H., Sun, C., Wen, Q., Li, W.: Secure multi-keyword ranked search over encrypted cloud data for multiple data owners. Journal of Systems and Software 137 (3), 380–395 (2018)
- 4[4] Hinton, G.E., Osindero, S., Welling, M., Teh, Y.W.: Unsupervised discovery of nonlinear structure using contrastive backpropagation. Cognitive Science 30 (4), 725–731 (2006)
- 5[5] Kumar, D.V.N.S., Thilagam, P.S.: Approaches and challenges of privacy preserving search over encrypted data. Inf. Syst. 81 , 63–81 (2019)
- 6[6] Li, R., Xu, Z., Kang, W., Yow, K., Xu, C.: Efficient multi-keyword ranked query over encrypted data in cloud computing. Future Generation Comp. Syst. 30 (1), 179–190 (2014)
- 7[7] Park, J.H., Kim, Y.S., Eom, I.K., Lee, K.Y.: Economic load dispatch for piecewise quadratic cost function using hopfield neural network. IEEE Trans. Power Syst. 8 (3), 1030–1038 (1993)
- 8[8] Song, D.X., Wagner, D.A., Perrig, A.: Practical techniques for searches on encrypted data. In: IEEE S & P 2000. pp. 44–55. IEEE Computer Society (2000)
