Inferring Gender from Names on the Web: A Comparative Evaluation of Gender Detection Methods
Fariba Karimi, Claudia Wagner, Florian Lemmerich, Mohsen Jadidi,, Markus Strohmaier

TL;DR
This paper evaluates various automated gender detection methods from names on the web, revealing biases and proposing a novel image-based approach to improve accuracy and reduce bias in demographic inference.
Contribution
It systematically compares existing name-based gender detection methods and introduces a new web image retrieval technique to enhance gender inference accuracy.
Findings
Name-based methods are biased by country of origin.
Combining name-based and image-based methods reduces bias.
The proposed image-based approach improves gender detection accuracy.
Abstract
Computational social scientists often harness the Web as a "societal observatory" where data about human social behavior is collected. This data enables novel investigations of psychological, anthropological and sociological research questions. However, in the absence of demographic information, such as gender, many relevant research questions cannot be addressed. To tackle this problem, researchers often rely on automated methods to infer gender from name information provided on the web. However, little is known about the accuracy of existing gender-detection methods and how biased they are against certain sub-populations. In this paper, we address this question by systematically comparing several gender detection methods on a random sample of scientists for whom we know their full name, their gender and the country of their workplace. We further suggest a novel method that employs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuthorship Attribution and Profiling · Names, Identity, and Discrimination Research · Digital Communication and Language
