Privacy Preserving Multi-Server k-means Computation over Horizontally   Partitioned Data

Riddhi Ghosal; Sanjit Chatterjee

arXiv:1808.03811·cs.CR·July 2, 2019

Privacy Preserving Multi-Server k-means Computation over Horizontally Partitioned Data

Riddhi Ghosal, Sanjit Chatterjee

PDF

TL;DR

This paper introduces a privacy-preserving multi-server k-means clustering method that uses simple randomization instead of heavy cryptography, maintaining accuracy and efficiency while protecting data privacy.

Contribution

The paper presents a novel, efficient, and cryptography-light approach for privacy-preserving k-means clustering over horizontally partitioned data using multiple servers.

Findings

01

Achieves the same accuracy as standard k-means

02

Reduces computational overhead compared to cryptographic methods

03

Secure against honest but curious adversaries

Abstract

The k-means clustering is one of the most popular clustering algorithms in data mining. Recently a lot of research has been concentrated on the algorithm when the dataset is divided into multiple parties or when the dataset is too large to be handled by the data owner. In the latter case, usually some servers are hired to perform the task of clustering. The dataset is divided by the data owner among the servers who together perform the k-means and return the cluster labels to the owner. The major challenge in this method is to prevent the servers from gaining substantial information about the actual data of the owner. Several algorithms have been designed in the past that provide cryptographic solutions to perform privacy preserving k-means. We provide a new method to perform k-means over a large set using multiple servers. Our technique avoids heavy cryptographic computations and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.