Privacy-Preserving Clustering of Unstructured Big Data for Cloud-Based   Enterprise Search Solutions

SM Zobaed; Mohsen Amini Salehi

arXiv:2005.11317·cs.DC·June 10, 2022

Privacy-Preserving Clustering of Unstructured Big Data for Cloud-Based Enterprise Search Solutions

SM Zobaed, Mohsen Amini Salehi

PDF

Open Access

TL;DR

This paper introduces privacy-preserving clustering methods for unstructured encrypted big data, enabling efficient and accurate enterprise search while maintaining data confidentiality in cloud environments.

Contribution

It presents novel clustering schemes tailored for static, semi-dynamic, and dynamic encrypted datasets, enhancing search efficiency and accuracy in privacy-sensitive cloud services.

Findings

01

30% to 60% improvement in cluster coherency

02

Search time reduced by up to 78%

03

Search accuracy increased by up to 35%

Abstract

Cloud-based enterprise search services (e.g., Amazon Kendra) are enchanting to big data owners by providing them with convenient search solutions over their enterprise big datasets. However, individuals and businesses that deal with confidential big data (eg, credential documents) are reluctant to fully embrace such services, due to valid concerns about data privacy. Solutions based on client-side encryption have been explored to mitigate privacy concerns. Nonetheless, such solutions hinder data processing, specifically clustering, which is pivotal in dealing with different forms of big data. For instance, clustering is critical to limit the search space and perform real-time search operations on big datasets. To overcome the hindrance in clustering encrypted big data, we propose privacy-preserving clustering schemes for three forms of unstructured encrypted big datasets, namely static,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Privacy, Security, and Data Protection · Cryptography and Data Security