# ClustCrypt: Privacy-Preserving Clustering of Unstructured Big Data in   the Cloud

**Authors:** SM Zobaed, Sahan Ahmad, Raju Gottumukkala, Mohsen Amini Salehi

arXiv: 1908.04960 · 2019-08-15

## TL;DR

ClustCrypt is a novel method for privacy-preserving clustering of encrypted unstructured big data in the cloud, enhancing search accuracy and efficiency without compromising data confidentiality.

## Contribution

It introduces a dynamic clustering approach for encrypted data and integrates it into a secure cloud-based semantic search system, improving clustering quality and search performance.

## Key findings

- 60% improvement in cluster coherency
- 78% reduction in search-time overhead
- 35% increase in search result accuracy

## Abstract

Security and confidentiality of big data stored in the cloud are important concerns for many organizations to adopt cloud services. One common approach to address the concerns is client-side encryption where data is encrypted on the client machine before being stored in the cloud. Having encrypted data in the cloud, however, limits the ability of data clustering, which is a crucial part of many data analytics applications, such as search systems. To overcome the limitation, in this paper, we present an approach named ClustCrypt for efficient topic-based clustering of encrypted unstructured big data in the cloud. ClustCrypt dynamically estimates the optimal number of clusters based on the statistical characteristics of encrypted data. It also provides clustering approach for encrypted data. We deploy ClustCrypt within the context of a secure cloud-based semantic search system (S3BD). Experimental results obtained from evaluating ClustCrypt on three datasets demonstrate on average 60% improvement on clusters' coherency. ClustCrypt also decreases the search-time overhead by up to 78% and increases the accuracy of search results by up to 35%

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1908.04960/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/1908.04960/full.md

## References

38 references — full list in the complete paper: https://tomesphere.com/paper/1908.04960/full.md

---
Source: https://tomesphere.com/paper/1908.04960