Gender Inference using Statistical Name Characteristics in Twitter

Juergen Mueller; Gerd Stumme

arXiv:1606.05467·cs.CL·July 4, 2016

Gender Inference using Statistical Name Characteristics in Twitter

Juergen Mueller, Gerd Stumme

PDF

TL;DR

This paper introduces a novel classifier that infers Twitter users' gender by analyzing statistical characteristics of their names, including ill-formed and international names, overcoming limitations of traditional name dictionaries.

Contribution

It presents a new method for gender inference that leverages name features, enabling classification of diverse and ill-formed names beyond existing dictionary-based approaches.

Findings

01

Effective classification of international names

02

Successful inference of ill-formed names

03

Improved gender prediction accuracy

Abstract

Much attention has been given to the task of gender inference of Twitter users. Although names are strong gender indicators, the names of Twitter users are rarely used as a feature; probably due to the high number of ill-formed names, which cannot be found in any name dictionary. Instead of relying solely on a name database, we propose a novel name classifier. Our approach extracts characteristics from the user names and uses those in order to assign the names to a gender. This enables us to classify international first names as well as ill-formed names.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.