
TL;DR
This paper investigates gender bias in big data analysis by comparing gender prediction software tools against human-identified data, highlighting implications for public understanding and the computing profession.
Contribution
It introduces a combined humanistic and data-driven approach to measure gender bias in big data tools using historical and current datasets.
Findings
Gender prediction tools show bias when compared to human-identified data.
Preliminary results indicate discrepancies in gender classification accuracy.
Implications for understanding gender bias in computing are discussed.
Abstract
This article combines humanistic "data critique" with informed inspection of big data analysis. It measures gender bias when gender prediction software tools (Gender API, Namsor, and Genderize.io) are used in historical big data research. Gender bias is measured by contrasting personally identified computer science authors in the well-regarded DBLP dataset (1950-1980) with exactly comparable results from the software tools. Implications for public understanding of gender bias in computing and the nature of the computing profession are outlined. Preliminary assessment of the Semantic Scholar dataset is presented. The conclusion combines humanistic approaches with selective use of big data methods.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
