Changing Data Sources in the Age of Machine Learning for Official Statistics
Cedric De Boom, Michael Reusens

TL;DR
This paper discusses the risks and challenges of changing data sources in machine learning-driven official statistics, emphasizing the importance of robustness, monitoring, and ethical considerations to maintain data integrity.
Contribution
It provides a comprehensive overview of the causes, risks, and mitigation strategies for data source changes in machine learning for official statistics.
Findings
Identifies key causes of data source changes including technical, ownership, and ethical factors.
Highlights effects such as concept drift, bias, and loss of data validity.
Recommends robustness and monitoring measures to ensure data quality and reliability.
Abstract
Data science has become increasingly essential for the production of official statistics, as it enables the automated collection, processing, and analysis of large amounts of data. With such data science practices in place, it enables more timely, more insightful and more flexible reporting. However, the quality and integrity of data-science-driven statistics rely on the accuracy and reliability of the data sources and the machine learning techniques that support them. In particular, changes in data sources are inevitable to occur and pose significant risks that are crucial to address in the context of machine learning for official statistics. This paper gives an overview of the main risks, liabilities, and uncertainties associated with changing data sources in the context of machine learning for official statistics. We provide a checklist of the most prevalent origins and causes of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Data Quality and Management · Ethics and Social Impacts of AI
