What's in a Name?

Stasinos Konstantopoulos

arXiv:0710.1481·cs.CL·October 9, 2007

What's in a Name?

Stasinos Konstantopoulos

PDF

Open Access

TL;DR

This paper investigates language identification of names and short text fragments, introducing a new corpus and comparing general language models with names-only models to evaluate their effectiveness.

Contribution

It presents a new corpus for name-language matching and compares the performance of different language models on name and short fragment identification tasks.

Findings

01

Names-only models perform comparably to general models on name identification.

02

Performance varies between isolated names and short document fragments.

03

The new corpus enables more accurate evaluation of language identification methods.

Abstract

This paper describes experiments on identifying the language of a single name in isolation or in a document written in a different language. A new corpus has been compiled and made available, matching names against languages. This corpus is used in a series of experiments measuring the performance of general language models and names-only language models on the language identification task. Conclusions are drawn from the comparison between using general language models and names-only language models and between identifying the language of isolated names and the language of very short document fragments. Future research directions are outlined.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Authorship Attribution and Profiling