Creating a surrogate commuter network from Australian Bureau of Statistics census data
Kristopher M. Fair, Cameron Zachreson, Mikhail Prokopenko

TL;DR
This paper develops a re-sampling method to correct inconsistencies in Australian census commuter data caused by new privacy policies, creating a high-resolution surrogate dataset with significantly improved accuracy.
Contribution
It introduces a novel re-sampling approach that enhances data consistency and accuracy in census-derived commuter networks affected by privacy-preserving data modifications.
Findings
Reduced discrepancy between aggregated and true totals from ~34% to ~7%.
Improved data consistency across different partition resolutions.
Provides a high-resolution surrogate dataset for 2016 commuter data.
Abstract
Between the 2011 and 2016 national censuses, the Australian Bureau of Statistics changed its anonymity policy compliance system for the distribution of census data. The new method has resulted in dramatic inconsistencies when comparing low-resolution data to aggregated high-resolution data. Hence, aggregated totals do not match true totals, and the mismatch gets worse as the data resolution gets finer. Here, we address several aspects of this inconsistency with respect to the 2016 usual-residence to place-of-work travel data. We introduce a re-sampling system that rectifies many of the artifacts introduced by the new ABS protocol, ensuring a higher level of consistency across partition sizes. We offer a surrogate high-resolution 2016 commuter dataset that reduces the difference between aggregated and true commuter totals from ~34% to only ~7%, which is on the order of the discrepancy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
