Privacy-Preserving Clustering of Unstructured Big Data for Cloud-Based Enterprise Search Solutions
SM Zobaed, Mohsen Amini Salehi

TL;DR
This paper introduces privacy-preserving clustering methods for unstructured encrypted big data, enabling efficient and accurate enterprise search while maintaining data confidentiality in cloud environments.
Contribution
It presents novel clustering schemes tailored for static, semi-dynamic, and dynamic encrypted datasets, enhancing search efficiency and accuracy in privacy-sensitive cloud services.
Findings
30% to 60% improvement in cluster coherency
Search time reduced by up to 78%
Search accuracy increased by up to 35%
Abstract
Cloud-based enterprise search services (e.g., Amazon Kendra) are enchanting to big data owners by providing them with convenient search solutions over their enterprise big datasets. However, individuals and businesses that deal with confidential big data (eg, credential documents) are reluctant to fully embrace such services, due to valid concerns about data privacy. Solutions based on client-side encryption have been explored to mitigate privacy concerns. Nonetheless, such solutions hinder data processing, specifically clustering, which is pivotal in dealing with different forms of big data. For instance, clustering is critical to limit the search space and perform real-time search operations on big datasets. To overcome the hindrance in clustering encrypted big data, we propose privacy-preserving clustering schemes for three forms of unstructured encrypted big datasets, namely static,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Privacy, Security, and Data Protection · Cryptography and Data Security
