A Unified Statistical And Computational Framework For Ex-Post Harmonisation Of Aggregate Statistics
Cynthia A. Huang

TL;DR
This paper introduces the Crossmaps Framework, a unified approach combining statistical and computational methods to improve ex-post harmonisation of aggregate statistics, ensuring data quality and provenance documentation.
Contribution
It presents a novel formal framework using computational graphs and new provenance concepts for harmonising datasets across standards, addressing a key challenge in data integration.
Findings
Defines the crossmap transform and shared mass array concepts
Formalises graph, matrix, and list encodings of crossmaps
Discusses implications for statistical properties and workflow design
Abstract
Ex-post harmonisation is one of many data preprocessing processes used to combine the increasingly vast and diverse sources of data available for research and analysis. Documenting provenance and ensuring the quality of multi-source datasets is vital for ensuring trustworthy scientific research and encouraging reuse of existing harmonisation efforts. However, capturing and communicating statistically relevant properties of harmonised datasets is difficult without a universal standard for describing harmonisation operations. Our paper combines mathematical and computer science perspectives to address this need. The Crossmaps Framework defines a new approach for transforming existing variables collected under a specific measurement or classification standard to an imputed counterfactual variable indexed by some target standard. It uses computational graphs to separate intended…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models
