Implementing Automated Data Validation for Canadian Political Datasets

Lindsay Katz; Callandra Moore

arXiv:2309.12886·stat.ME·September 25, 2023

Implementing Automated Data Validation for Canadian Political Datasets

Lindsay Katz, Callandra Moore

PDF

Open Access

TL;DR

This paper develops and applies a comprehensive suite of 200 automated data validation tests to Canadian political and charity datasets, enhancing data quality assurance.

Contribution

It introduces a detailed set of validation tests specifically designed for Canadian political and charity datasets, demonstrating their application and initial results.

Findings

01

Validation tests identified data inconsistencies

02

Preliminary insights into dataset quality and reliability

03

Framework for future automated data validation implementation

Abstract

This paper describes a series of automated data validation tests for datasets detailing charity financial information, political donations, and government lobbying in Canada. We motivate and document a series of 200 tests that check the validity, internal consistency, and external consistency of these datasets. We present preliminary findings after application of these tests to the political donations ( $\approx 10.1$ million observations) and lobbying ( $\approx 711, 200$ observations) datasets, and to a sample of $\approx 380, 880$ observations from the charities datasets. We conclude with areas for future work and lessons learnt for others looking to implement automated data validation in their own workflows.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Data Quality and Management