Confidentiality and linked data
Felix Ritchie, Jim Smith

TL;DR
This paper discusses the challenges and methods of linking data from different sources while protecting privacy, focusing on confidentiality risks and potential solutions for micro-data sharing.
Contribution
It introduces principles and methods for data linking, analyzes confidentiality risks, especially the 'intruder' problem, and reviews potential solutions for micro-data release.
Findings
Identification of confidentiality risks in data linking
Analysis of the 'intruder' problem in data privacy
Overview of statistical and non-statistical solutions
Abstract
Data providers such as government statistical agencies perform a balancing act: maximising information published to inform decision-making and research, while simultaneously protecting privacy. The emergence of identified administrative datasets with the potential for sharing (and thus linking) offers huge potential benefits but significant additional risks. This article introduces the principles and methods of linking data across different sources and points in time, focusing on potential areas of risk. We then consider confidentiality risk, focusing in particular on the "intruder" problem central to the area, and looking at both risks from data producer outputs and from the release of micro-data for further analysis. Finally, we briefly consider potential solutions to micro-data release, both the statistical solutions considered in other contributed articles and non-statistical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Data Quality and Management · Data Mining Algorithms and Applications
