Who Leads? Comparing Human-Centric and Model-Centric Strategies for Defining ML Target Variables
Mengtian Guo, David Gotz, Yue Wang

TL;DR
This paper investigates how human-machine teaming strategies influence the process of defining proxy target variables in predictive modeling, highlighting benefits and risks of different approaches.
Contribution
It compares relevance-first and performance-first teaming strategies, revealing their impact on iteration speed and alignment with application goals.
Findings
Performance-first strategy speeds up decision-making.
Performance-first biases users towards well-performing but misaligned proxies.
Human-machine teaming offers opportunities and risks in operationalizing ML targets.
Abstract
Predictive modeling has the potential to enhance human decision-making. However, many predictive models fail in practice due to problematic problem formulation in cases where the prediction target is an abstract concept or construct and practitioners need to define an appropriate target variable as a proxy to operationalize the construct of interest. The choice of an appropriate proxy target variable is rarely self-evident in practice, requiring both domain knowledge and iterative data modeling. This process is inherently collaborative, involving both domain experts and data scientists. In this work, we explore how human-machine teaming can support this process by accelerating iterations while preserving human judgment. We study the impact of two human-machine teaming strategies on proxy construction: 1) relevance-first: humans leading the process by selecting relevant proxies, and 2)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
