On Prediction Feature Assignment in the Heckman Selection Model

Huy Mai; Xintao Wu

arXiv:2309.08043·cs.LG·April 23, 2024

On Prediction Feature Assignment in the Heckman Selection Model

Huy Mai, Xintao Wu

PDF

Open Access

TL;DR

This paper introduces Heckman-FA, a data-driven framework that automatically selects prediction features for the Heckman model to improve prediction accuracy under MNAR sample selection bias.

Contribution

Heckman-FA provides a novel automated method for selecting prediction features in the Heckman model, addressing the challenge of manual feature selection in large selection feature sets.

Findings

01

Heckman-FA yields more robust regression models under MNAR bias.

02

Experimental results show improved prediction performance over traditional methods.

03

The framework effectively handles high-dimensional selection features.

Abstract

Under missing-not-at-random (MNAR) sample selection bias, the performance of a prediction model is often degraded. This paper focuses on one classic instance of MNAR sample selection bias where a subset of samples have non-randomly missing outcomes. The Heckman selection model and its variants have commonly been used to handle this type of sample selection bias. The Heckman model uses two separate equations to model the prediction and selection of samples, where the selection features include all prediction features. When using the Heckman model, the prediction features must be properly chosen from the set of selection features. However, choosing the proper prediction features is a challenging task for the Heckman model. This is especially the case when the number of selection features is large. Existing approaches that use the Heckman model often provide a manually chosen set of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Distributed Sensor Networks and Detection Algorithms · Statistical Methods and Inference