Options for encoding names for data linking at the Australian Bureau of Statistics
Chris Culnane, Benjamin I. P. Rubinstein, Vanessa Teague

TL;DR
This paper evaluates different encoding methods for names in data linking at the Australian Bureau of Statistics, analyzing their privacy and accuracy implications and exploring future options.
Contribution
It provides a comparative analysis of five encoding approaches for names, including privacy and accuracy guarantees, and discusses potential future techniques.
Findings
Encryption offers reversible encoding with a secret key.
Lossy encoding prevents reversal without auxiliary data.
HMAC-based methods limit re-identification without a secret key.
Abstract
Publicly, ABS has said it would use a cryptographic hash function to convert names collected in the 2016 Census of Population and Housing into an unrecognisable value in a way that is not reversible. In 2016, the ABS engaged the University of Melbourne to provide expert advice on cryptographic hash functions to meet this objective. For complex unit-record level data, including Census data, auxiliary data can be often be used to link individual records, even without names. This is the basis of ABS's existing bronze linking. This means that records can probably be re-identified without the encoded name anyway. Protection against re-identification depends on good processes within ABS. The undertaking on the encoding of names should therefore be considered in the full context of auxiliary data and ABS processes. There are several reasonable interpretations: 1. That the encoding cannot…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Geographic Information Systems Studies · Data-Driven Disease Surveillance
