# An Efficient and Scalable Privacy Preserving Algorithm for Big Data and   Data Streams

**Authors:** M.A.P. Chamikara, P. Bertok, D. Liu, S. Camtepe, I. Khalil

arXiv: 1907.13498 · 2019-08-01

## TL;DR

This paper introduces SEAL, a scalable and efficient privacy-preserving data perturbation algorithm using local differential privacy, suitable for big data streams from cyber-physical systems, balancing privacy and utility.

## Contribution

The paper presents SEAL, a novel data perturbation algorithm based on Chebyshev interpolation and Laplacian noise, offering improved speed, scalability, and privacy-utility trade-offs.

## Key findings

- SEAL outperforms existing algorithms in execution speed and scalability.
- SEAL maintains high data utility while providing strong privacy guarantees.
- Empirical results demonstrate SEAL's resistance to attacks.

## Abstract

A vast amount of valuable data is produced and is becoming available for analysis as a result of advancements in smart cyber-physical systems. The data comes from various sources, such as healthcare, smart homes, smart vehicles, and often includes private, potentially sensitive information that needs appropriate sanitization before being released for analysis. The incremental and fast nature of data generation in these systems necessitates scalable privacy-preserving mechanisms with high privacy and utility. However, privacy preservation often comes at the expense of data utility. We propose a new data perturbation algorithm, SEAL (Secure and Efficient data perturbation Algorithm utilizing Local differential privacy), based on Chebyshev interpolation and Laplacian noise, which provides a good balance between privacy and utility with high efficiency and scalability. Empirical comparisons with existing privacy-preserving algorithms show that SEAL excels in execution speed, scalability, accuracy, and attack resistance. SEAL provides flexibility in choosing the best possible privacy parameters, such as the amount of added noise, which can be tailored to the domain and dataset.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.13498/full.md

## Figures

21 figures with captions in the complete paper: https://tomesphere.com/paper/1907.13498/full.md

## References

90 references — full list in the complete paper: https://tomesphere.com/paper/1907.13498/full.md

---
Source: https://tomesphere.com/paper/1907.13498