Imputing Missing Values in the Occupational Requirements Survey
Terry Leitch, Debjani Saha

TL;DR
This paper presents a novel iterative regression-based method using XGBoost to impute missing data in the Occupational Requirements Survey, enhancing data completeness for better workforce analysis.
Contribution
Introduces a new imputation technique leveraging survey features and iterative regression with XGBoost, applicable to occupational data with missing values.
Findings
Achieves accurate imputations with 95% confidence intervals.
Enhances the utility of ORS data for workforce analysis.
Proposes a generalized imputation algorithm, WIGEM.
Abstract
The U.S. Bureau of Labor Statistics allows public access to much of the data acquired through its Occupational Requirements Survey (ORS). This data can be used to draw inferences about the requirements of various jobs and job classes within the United States workforce. However, the dataset contains a multitude of missing observations and estimates, which somewhat limits its utility. Here, we propose a method by which to impute these missing values that leverages many of the inherent features present in the survey data, such as known population limit and correlations between occupations and tasks. An iterative regression fit, implemented with a recent version of XGBoost and executed across a set of simulated values drawn from the distribution described by the known values and their standard deviations reported in the survey, is the approach used to arrive at a distribution of predicted…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Bayesian Inference · Survey Methodology and Nonresponse · Urban Transport and Accessibility
