Semi-Private Computation of Data Similarity with Applications to Data Valuation and Pricing
Ren\'e B{\o}dker Christensen, Shashi Raj Pandey, Petar Popovski

TL;DR
This paper develops privacy-preserving multiparty computation protocols to measure data similarity via correlation, enabling data valuation and pricing without revealing sensitive data, with efficient linear complexity and error bounds.
Contribution
It introduces novel protocols for private correlation computation with controlled privacy leakage, applicable to data valuation and pricing scenarios.
Findings
Protocols achieve linear computational and communication complexity.
Exact and approximate correlation computation methods are developed.
Error bounds for approximate correlation are established and analyzed.
Abstract
Consider two data providers that want to contribute data to a certain learning model. Recent works have shown that the value of the data of one of the providers is dependent on the similarity with the data owned by the other provider. It would thus be beneficial if the two providers can calculate the similarity of their data, while keeping the actual data private. In this work, we devise multiparty computation-protocols to compute similarity of two data sets based on correlation, while offering controllable privacy guarantees. We consider a simple model with two participating providers and develop methods to compute exact and approximate correlation, respectively, with controlled information leakage. Both protocols have computational and communication complexities that are linear in the number of data samples. We also provide general bounds on the maximal error in the approximation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCryptography and Data Security · Privacy-Preserving Technologies in Data · Complexity and Algorithms in Graphs
