Nigerian Software Engineer or American Data Scientist? GitHub Profile   Recruitment Bias in Large Language Models

Takashi Nakano; Kazumasa Shimari; Raula Gaikovina Kula; Christoph; Treude; Marc Cheong; Kenichi Matsumoto

arXiv:2409.12544·cs.SE·January 15, 2025

Nigerian Software Engineer or American Data Scientist? GitHub Profile Recruitment Bias in Large Language Models

Takashi Nakano, Kazumasa Shimari, Raula Gaikovina Kula, Christoph, Treude, Marc Cheong, Kenichi Matsumoto

PDF

Open Access

TL;DR

This paper investigates how large language models like ChatGPT exhibit regional and role biases when automating recruitment tasks using GitHub profiles, revealing societal biases and implications for fairness.

Contribution

It provides empirical evidence of geographic and role biases in LLMs during recruitment automation, highlighting societal bias issues in AI models.

Findings

01

ChatGPT favors certain regions over others in recruitment tasks.

02

Profiles from specific countries are more likely to be assigned certain roles.

03

Biases persist even when location information is swapped (counterfactuals).

Abstract

Large Language Models (LLMs) have taken the world by storm, demonstrating their ability not only to automate tedious tasks, but also to show some degree of proficiency in completing software engineering tasks. A key concern with LLMs is their "black-box" nature, which obscures their internal workings and could lead to societal biases in their outputs. In the software engineering context, in this early results paper, we empirically explore how well LLMs can automate recruitment tasks for a geographically diverse software team. We use OpenAI's ChatGPT to conduct an initial set of experiments using GitHub User Profiles from four regions to recruit a six-person software development team, analyzing a total of 3,657 profiles over a five-year period (2019-2023). Results indicate that ChatGPT shows preference for some regions over others, even when swapping the location strings of two profiles…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling