To What Extent are Name Variants Used as Named Entities in Turkish Tweets?
Dilek K\"u\c{c}\"uk

TL;DR
This paper analyzes the use of name variants as named entities in Turkish tweets, highlighting their types and providing detailed annotations to aid social media NLP research.
Contribution
It offers a detailed analysis and publicly available annotations of name variants in Turkish social media texts, addressing a gap in named entity recognition research.
Findings
High prevalence of informal name variants in Turkish tweets
Finer-grained annotations distinguish well-formed names from variants
Annotations facilitate improved named entity recognition in social media
Abstract
Social media texts differ from regular texts in various aspects. One of the main differences is the common use of informal name variants instead of well-formed named entities in social media compared to regular texts. These name variants may come in the form of abbreviations, nicknames, contractions, and hypocoristic uses, in addition to names distorted due to capitalization and writing errors. In this paper, we present an analysis of the named entities in a publicly-available tweet dataset in Turkish with respect to their being name variants belonging to different categories. We also provide finer-grained annotations of the named entities as well-formed names and different categories of name variants, where these annotations are made publicly-available. The analysis presented and the accompanying annotations will contribute to related research on the treatment of named entities in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Authorship Attribution and Profiling
