Nigerian Software Engineer or American Data Scientist? GitHub Profile Recruitment Bias in Large Language Models
Takashi Nakano, Kazumasa Shimari, Raula Gaikovina Kula, Christoph, Treude, Marc Cheong, Kenichi Matsumoto

TL;DR
This paper investigates how large language models like ChatGPT exhibit regional and role biases when automating recruitment tasks using GitHub profiles, revealing societal biases and implications for fairness.
Contribution
It provides empirical evidence of geographic and role biases in LLMs during recruitment automation, highlighting societal bias issues in AI models.
Findings
ChatGPT favors certain regions over others in recruitment tasks.
Profiles from specific countries are more likely to be assigned certain roles.
Biases persist even when location information is swapped (counterfactuals).
Abstract
Large Language Models (LLMs) have taken the world by storm, demonstrating their ability not only to automate tedious tasks, but also to show some degree of proficiency in completing software engineering tasks. A key concern with LLMs is their "black-box" nature, which obscures their internal workings and could lead to societal biases in their outputs. In the software engineering context, in this early results paper, we empirically explore how well LLMs can automate recruitment tasks for a geographically diverse software team. We use OpenAI's ChatGPT to conduct an initial set of experiments using GitHub User Profiles from four regions to recruit a six-person software development team, analyzing a total of 3,657 profiles over a five-year period (2019-2023). Results indicate that ChatGPT shows preference for some regions over others, even when swapping the location strings of two profiles…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
