Nationality Classification Using Name Embeddings

Junting Ye; Shuchu Han; Yifan Hu; Baris Coskun; Meizhu Liu; Hong Qin,; Steven Skiena

arXiv:1708.07903·cs.SI·August 29, 2017·25 cites

Nationality Classification Using Name Embeddings

Junting Ye, Shuchu Han, Yifan Hu, Baris Coskun, Meizhu Liu, Hong Qin,, Steven Skiena

PDF

Open Access 1 Repo

TL;DR

This paper introduces a highly accurate, fine-grained nationality classifier based on name embeddings learned from communication patterns, outperforming previous methods and revealing social insights through social media analysis.

Contribution

We develop a novel name embedding approach leveraging communication homophily, enabling a large-scale, fine-grained nationality classifier with superior accuracy.

Findings

01

Achieved an F1 score of 0.795 on 13 classes, outperforming prior systems.

02

Successfully classified 39 nationality groups covering over 90% of the world.

03

Revealed demographic and ethnic patterns in social media followers.

Abstract

Nationality identification unlocks important demographic information, with many applications in biomedical and sociological research. Existing name-based nationality classifiers use name substrings as features and are trained on small, unrepresentative sets of labeled names, typically extracted from Wikipedia. As a result, these methods achieve limited performance and cannot support fine-grained classification. We exploit the phenomena of homophily in communication patterns to learn name embeddings, a new representation that encodes gender, ethnicity, and nationality which is readily applicable to building classifiers and other systems. Through our analysis of 57M contact lists from a major Internet company, we are able to design a fine-grained nationality classifier covering 39 groups representing over 90% of the world population. In an evaluation against other published systems over…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

AlokElashoff/Ethnicity_Classification_URAP
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAuthorship Attribution and Profiling · Hate Speech and Cyberbullying Detection · Misinformation and Its Impacts