Auditable and reusable crosswalks for fast, scaled integration of scattered tabular data
Gavin Chait

TL;DR
This paper introduces an open-source toolkit for creating auditable, schema-based crosswalks that enable fast, scalable, and reusable integration of scattered tabular data, supporting both code and no-code workflows.
Contribution
It presents a schema-centric, modular approach for data curation that simplifies complex data restructuring and enhances interoperability, with tools available as Python package and web app.
Findings
Successful integration of local council data into a unified database
Reduced complexity and resource use in data transformation
Toolkit supports both scripted and visual data curation workflows
Abstract
This paper presents an open-source curatorial toolkit intended to produce well-structured and interoperable data. Curation is divided into discrete components, with a schema-centric focus for auditable restructuring of complex and scattered tabular data to conform to a destination schema. Task separation allows development of software and analysis without source data being present. Transformations are captured as high-level sequential scripts describing schema-to-schema mappings, reducing complexity and resource requirements. Ultimately, data are transformed, but the objective is that any data meeting a schema definition can be restructured using a crosswalk. The toolkit is available both as a Python package, and as a 'no-code' visual web application. A visual example is presented, derived from a longitudinal study where scattered source data from hundreds of local councils are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Data Mining Algorithms and Applications · Data Management and Algorithms
