Computing on Masked Data: a High Performance Method for Improving Big Data Veracity
Jeremy Kepner, Vijay Gadepally, Pete Michaleas, Nabil Schear, Mayank, Varia, Arkady Yerukhimovich, Robert K. Cunningham (MIT)

TL;DR
This paper introduces Computing on Masked Data (CMD), a high-performance method that enhances data veracity by enabling computations directly on masked data with minimal overhead, suitable for big data applications.
Contribution
The paper presents a novel technique called CMD that allows efficient computation on masked data, improving data veracity without significant performance costs.
Findings
CMD supports a wide range of linear algebraic operations.
Significantly reduced overhead compared to traditional cryptographic methods.
Demonstrated effectiveness on DNA matching and social media data processing.
Abstract
The growing gap between data and users calls for innovative tools that address the challenges faced by big data volume, velocity and variety. Along with these standard three V's of big data, an emerging fourth "V" is veracity, which addresses the confidentiality, integrity, and availability of the data. Traditional cryptographic techniques that ensure the veracity of data can have overheads that are too large to apply to big data. This work introduces a new technique called Computing on Masked Data (CMD), which improves data veracity by allowing computations to be performed directly on masked data and ensuring that only authorized recipients can unmask the data. Using the sparse linear algebra of associative arrays, CMD can be performed with significantly less overhead than other approaches while still supporting a wide range of linear algebraic operations on the masked data. Databases…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
