Machine Learning for the Production of Official Statistics: Density Ratio Estimation using Biased Transaction Data for Japanese labor statistics

Yuya Takada; Kiyoshi Izumi

arXiv:2510.24153·stat.AP·October 29, 2025

Machine Learning for the Production of Official Statistics: Density Ratio Estimation using Biased Transaction Data for Japanese labor statistics

Yuya Takada, Kiyoshi Izumi

PDF

TL;DR

This paper introduces a machine learning approach using density ratio estimation to produce timely official labor statistics from biased transaction data, demonstrating its effectiveness with Japanese employment data.

Contribution

It presents a novel application of density ratio estimation to correct selection bias in transaction data for official statistics production.

Findings

01

Early release of labor market indicators is possible using biased data.

02

The method effectively adjusts for selection bias in non-survey data.

03

Timely statistics can be produced without waiting for traditional survey results.

Abstract

National statistical institutes are beginning to use non-traditional data sources to produce official statistics. These sources, originally collected for non-statistical purposes, include point-of-sales(POS) data and mobile phone global positioning system(GPS) data. Such data have the potential to significantly enhance the usefulness of official statistics. In the era of big data, many private companies are accumulating vast amounts of transaction data. Exploring how to leverage these data for official statistics is increasingly important. However, progress has been slower than expected, mainly because such data are not collected through sample-based survey methods and therefore exhibit substantial selection bias. If this bias can be properly addressed, these data could become a valuable resource for official statistics, substantially expanding their scope and improving the quality of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.