Analysis of Named Entity Recognition and Linking for Tweets

Leon Derczynski; Diana Maynard; Giuseppe Rizzo; Marieke van Erp,; Genevieve Gorrell; Rapha\"el Troncy; Johann Petrak; Kalina Bontcheva

arXiv:1410.7182·cs.CL·November 26, 2014

Analysis of Named Entity Recognition and Linking for Tweets

Leon Derczynski, Diana Maynard, Giuseppe Rizzo, Marieke van Erp,, Genevieve Gorrell, Rapha\"el Troncy, Johann Petrak, Kalina Bontcheva

PDF

1 Datasets

TL;DR

This paper investigates the challenges of named entity recognition and linking in tweets, analyzing the robustness of existing systems on noisy, short texts and identifying key areas for improvement.

Contribution

It introduces a new Twitter entity disambiguation dataset and provides an empirical analysis of current NER and linking systems on tweet data.

Findings

01

State-of-the-art systems struggle with noisy tweet data

02

Main errors stem from context ambiguity and noise

03

Identifies key challenges for future research

Abstract

Applying natural language processing for mining and intelligent information access to tweets (a form of microblog) is a challenging, emerging research area. Unlike carefully authored news text and other longer content, tweets pose a number of new challenges, due to their short, noisy, context-dependent, and dynamic nature. Information extraction from tweets is typically performed in a pipeline, comprising consecutive stages of language identification, tokenisation, part-of-speech tagging, named entity recognition and entity disambiguation (e.g. with respect to DBpedia). In this work, we describe a new Twitter entity disambiguation dataset, and conduct an empirical analysis of named entity recognition and disambiguation, investigating how robust a number of state-of-the-art systems are on such noisy texts, what the main sources of error are, and which problems should be further…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

naist-nlp/derczynski
dataset· 10 dl
10 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.