What Your Username Says About You

Aaron Jaech; Mari Ostendorf

arXiv:1507.02045·cs.CL·August 18, 2015

What Your Username Says About You

Aaron Jaech, Mari Ostendorf

PDF

1 Repo

TL;DR

This paper investigates how much gender and language information can be inferred from usernames using unsupervised morphology induction, showing that morphological features outperform character n-gram baselines.

Contribution

It introduces a method leveraging unsupervised morphology induction to extract features from usernames for demographic inference, demonstrating improved accuracy over simple baselines.

Findings

01

Morphological features outperform character n-gram baselines

02

Gender and language can be inferred from usernames with reasonable accuracy

03

Unsupervised morphology induction effectively captures meaningful sub-units in usernames

Abstract

Usernames are ubiquitous on the Internet, and they are often suggestive of user demographics. This work looks at the degree to which gender and language can be inferred from a username alone by making use of unsupervised morphology induction to decompose usernames into sub-units. Experimental results on the two tasks demonstrate the effectiveness of the proposed morphological features compared to a character n-gram baseline.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ajaech/username_analytics
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.