Stop! In the Name of Flaws: Disentangling Personal Names and Sociodemographic Attributes in NLP
Vagrant Gautam, Arjun Subramonian, Anne Lauscher, Os Keyes

TL;DR
This paper critically examines the use of personal names in NLP to infer sociodemographic attributes, highlighting methodological and ethical challenges, and offers guidelines to improve research practices.
Contribution
It provides an interdisciplinary overview of issues and offers normative recommendations to address validity and ethical concerns in NLP involving names.
Findings
Identifies validity issues like systematic error and construct validity.
Highlights ethical concerns such as harms and cultural insensitivity.
Provides guiding questions and normative recommendations for future research.
Abstract
Personal names simultaneously differentiate individuals and categorize them in ways that are important in a given society. While the natural language processing community has thus associated personal names with sociodemographic characteristics in a variety of tasks, researchers have engaged to varying degrees with the established methodological problems in doing so. To guide future work that uses names and sociodemographic characteristics, we provide an overview of relevant research: first, we present an interdisciplinary background on names and naming. We then survey the issues inherent to associating names with sociodemographic attributes, covering problems of validity (e.g., systematic error, construct validity), as well as ethical concerns (e.g., harms, differential impact, cultural insensitivity). Finally, we provide guiding questions along with normative recommendations to avoid…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Interpreting and Communication in Healthcare
