Imputing Missing Values in the Occupational Requirements Survey

Terry Leitch; Debjani Saha

arXiv:2201.09811·stat.ME·January 25, 2022

Imputing Missing Values in the Occupational Requirements Survey

Terry Leitch, Debjani Saha

PDF

Open Access 1 Repo

TL;DR

This paper presents a novel iterative regression-based method using XGBoost to impute missing data in the Occupational Requirements Survey, enhancing data completeness for better workforce analysis.

Contribution

Introduces a new imputation technique leveraging survey features and iterative regression with XGBoost, applicable to occupational data with missing values.

Findings

01

Achieves accurate imputations with 95% confidence intervals.

02

Enhances the utility of ORS data for workforce analysis.

03

Proposes a generalized imputation algorithm, WIGEM.

Abstract

The U.S. Bureau of Labor Statistics allows public access to much of the data acquired through its Occupational Requirements Survey (ORS). This data can be used to draw inferences about the requirements of various jobs and job classes within the United States workforce. However, the dataset contains a multitude of missing observations and estimates, which somewhat limits its utility. Here, we propose a method by which to impute these missing values that leverages many of the inherent features present in the survey data, such as known population limit and correlations between occupations and tasks. An iterative regression fit, implemented with a recent version of XGBoost and executed across a set of simulated values drawn from the distribution described by the known values and their standard deviations reported in the survey, is the approach used to arrive at a distribution of predicted…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

saharaja/imputeors
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Bayesian Inference · Survey Methodology and Nonresponse · Urban Transport and Accessibility