Inference on High Dimensional Selective Labeling Models
Shakeeb Khan, Elie Tamer, Qingsong Yao

TL;DR
This paper introduces a new distribution-free estimation method for high-dimensional selective labeling models, addressing computational challenges and selection bias in binary outcome analysis across various fields.
Contribution
It proposes a novel semiparametric estimation procedure combining batched gradient descent with sorting algorithms to handle large covariate sets and selection bias.
Findings
Method is computationally efficient for many covariates.
Asymptotic properties are established under increasing dimension.
Finite sample performance is validated through simulations and judicial bail data.
Abstract
A class of simultaneous equation models arise in the many domains where observed binary outcomes are themselves a consequence of the existing choices of of one of the agents in the model. These models are gaining increasing interest in the computer science and machine learning literatures where they refer the potentially endogenous sample selection as the {\em selective labels} problem. Empirical settings for such models arise in fields as diverse as criminal justice, health care, and insurance. For important recent work in this area, see for example Lakkaruju et al. (2017), Kleinberg et al. (2018), and Coston et al.(2021) where the authors focus on judicial bail decisions, and where one observes the outcome of whether a defendant filed to return for their court appearance only if the judge in the case decides to release the defendant on bail. Identifying and estimating such models can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms
MethodsFocus
