Gender Inference using Statistical Name Characteristics in Twitter
Juergen Mueller, Gerd Stumme

TL;DR
This paper introduces a novel classifier that infers Twitter users' gender by analyzing statistical characteristics of their names, including ill-formed and international names, overcoming limitations of traditional name dictionaries.
Contribution
It presents a new method for gender inference that leverages name features, enabling classification of diverse and ill-formed names beyond existing dictionary-based approaches.
Findings
Effective classification of international names
Successful inference of ill-formed names
Improved gender prediction accuracy
Abstract
Much attention has been given to the task of gender inference of Twitter users. Although names are strong gender indicators, the names of Twitter users are rarely used as a feature; probably due to the high number of ill-formed names, which cannot be found in any name dictionary. Instead of relying solely on a name database, we propose a novel name classifier. Our approach extracts characteristics from the user names and uses those in order to assign the names to a gender. This enables us to classify international first names as well as ill-formed names.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
