Computing on Masked Data: a High Performance Method for Improving Big   Data Veracity

Jeremy Kepner; Vijay Gadepally; Pete Michaleas; Nabil Schear; Mayank; Varia; Arkady Yerukhimovich; Robert K. Cunningham (MIT)

arXiv:1406.5751·cs.CR·May 26, 2015

Computing on Masked Data: a High Performance Method for Improving Big Data Veracity

Jeremy Kepner, Vijay Gadepally, Pete Michaleas, Nabil Schear, Mayank, Varia, Arkady Yerukhimovich, Robert K. Cunningham (MIT)

PDF

TL;DR

This paper introduces Computing on Masked Data (CMD), a high-performance method that enhances data veracity by enabling computations directly on masked data with minimal overhead, suitable for big data applications.

Contribution

The paper presents a novel technique called CMD that allows efficient computation on masked data, improving data veracity without significant performance costs.

Findings

01

CMD supports a wide range of linear algebraic operations.

02

Significantly reduced overhead compared to traditional cryptographic methods.

03

Demonstrated effectiveness on DNA matching and social media data processing.

Abstract

The growing gap between data and users calls for innovative tools that address the challenges faced by big data volume, velocity and variety. Along with these standard three V's of big data, an emerging fourth "V" is veracity, which addresses the confidentiality, integrity, and availability of the data. Traditional cryptographic techniques that ensure the veracity of data can have overheads that are too large to apply to big data. This work introduces a new technique called Computing on Masked Data (CMD), which improves data veracity by allowing computations to be performed directly on masked data and ensuring that only authorized recipients can unmask the data. Using the sparse linear algebra of associative arrays, CMD can be performed with significantly less overhead than other approaches while still supporting a wide range of linear algebraic operations on the masked data. Databases…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.