Experiments to Improve Named Entity Recognition on Turkish Tweets

Dilek K\"u\c{c}\"uk; Ralf Steinberger

arXiv:1410.8668·cs.CL·November 3, 2014·1 cites

Experiments to Improve Named Entity Recognition on Turkish Tweets

Dilek K\"u\c{c}\"uk, Ralf Steinberger

PDF

Open Access

TL;DR

This paper explores methods to enhance Turkish named entity recognition on social media tweets by adapting existing systems with normalization, relaxed rules, and lexical expansions, demonstrating improved performance.

Contribution

It introduces specific adaptations like normalization and lexical expansion to improve NER accuracy on Turkish tweets, addressing social media language challenges.

Findings

01

Normalization improves recognition accuracy

02

Relaxing capitalization rules increases entity detection

03

Lexical resource expansion enhances system robustness

Abstract

Social media texts are significant information sources for several application areas including trend analysis, event monitoring, and opinion mining. Unfortunately, existing solutions for tasks such as named entity recognition that perform well on formal texts usually perform poorly when applied to social media texts. In this paper, we report on experiments that have the purpose of improving named entity recognition on Turkish tweets, using two different annotated data sets. In these experiments, starting with a baseline named entity recognition system, we adapt its recognition rules and resources to better fit Twitter language by relaxing its capitalization constraint and by diacritics-based expansion of its lexical resources, and we employ a simplistic normalization scheme on tweets to observe the effects of these on the overall named entity recognition performance on Turkish tweets.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text and Document Classification Technologies