Yahtzee: An Anonymized Group Level Matching Procedure
Jason J. Jones, Robert M. Bond, Christopher J. Fariss, Jaime E., Settle, Adam Kramer, Cameron Marlow, James H. Fowler

TL;DR
The paper introduces the Yahtzee procedure, a privacy-preserving method for matching data from different sources by anonymizing individuals into groups, demonstrated on Facebook and voter records.
Contribution
The paper presents a novel anonymized group matching procedure called Yahtzee, enhancing privacy in data integration while maintaining data utility.
Findings
The Yahtzee procedure effectively prevents individual data leaks.
The method performs well on real-world Facebook and voter data.
Theoretical analysis confirms privacy guarantees.
Abstract
Researchers often face the problem of needing to protect the privacy of subjects while also needing to integrate data that contains personal information from diverse data sources in order to conduct their research. The advent of computational social science and the enormous amount of data about people that is being collected makes protecting the privacy of research subjects evermore important. However, strict privacy procedures can make joining diverse sources of data that contain information about specific individual behaviors difficult. In this paper we present a procedure to keep information about specific individuals from being "leaked" or shared in either direction between two sources of data. To achieve this goal, we randomly assign individuals to anonymous groups before combining the anonymized information between the two sources of data. We refer to this method as the Yahtzee…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Internet Traffic Analysis and Secure E-voting · Human Mobility and Location-Based Analysis
