Automating Data Annotation under Strategic Human Agents: Risks and Potential Solutions
Tian Xie, Xueru Zhang

TL;DR
This paper explores the long-term effects of using ML models to annotate data in social domains with strategic human agents, highlighting risks to fairness and proposing solutions for stable retraining.
Contribution
It formalizes the interaction between strategic agents and models, analyzes dynamic evolution, and proposes a refined retraining process to mitigate instability and fairness issues.
Findings
Agents tend to receive more positive decisions over time.
The proportion of positively labeled agents may decrease in the long run.
Enforcing fairness constraints at each round may not benefit disadvantaged groups.
Abstract
As machine learning (ML) models are increasingly used in social domains to make consequential decisions about humans, they often have the power to reshape data distributions. Humans, as strategic agents, continuously adapt their behaviors in response to the learning system. As populations change dynamically, ML systems may need frequent updates to ensure high performance. However, acquiring high-quality human-annotated samples can be highly challenging and even infeasible in social domains. A common practice to address this issue is using the model itself to annotate unlabeled data samples. This paper investigates the long-term impacts when ML models are retrained with model-annotated samples when they incorporate human strategic responses. We first formalize the interactions between strategic agents and the model and then analyze how they evolve under such dynamic interactions. We find…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsData Quality and Management · Semantic Web and Ontologies · Big Data and Business Intelligence
