Analysis of Named Entity Recognition and Linking for Tweets
Leon Derczynski, Diana Maynard, Giuseppe Rizzo, Marieke van Erp,, Genevieve Gorrell, Rapha\"el Troncy, Johann Petrak, Kalina Bontcheva

TL;DR
This paper investigates the challenges of named entity recognition and linking in tweets, analyzing the robustness of existing systems on noisy, short texts and identifying key areas for improvement.
Contribution
It introduces a new Twitter entity disambiguation dataset and provides an empirical analysis of current NER and linking systems on tweet data.
Findings
State-of-the-art systems struggle with noisy tweet data
Main errors stem from context ambiguity and noise
Identifies key challenges for future research
Abstract
Applying natural language processing for mining and intelligent information access to tweets (a form of microblog) is a challenging, emerging research area. Unlike carefully authored news text and other longer content, tweets pose a number of new challenges, due to their short, noisy, context-dependent, and dynamic nature. Information extraction from tweets is typically performed in a pipeline, comprising consecutive stages of language identification, tokenisation, part-of-speech tagging, named entity recognition and entity disambiguation (e.g. with respect to DBpedia). In this work, we describe a new Twitter entity disambiguation dataset, and conduct an empirical analysis of named entity recognition and disambiguation, investigating how robust a number of state-of-the-art systems are on such noisy texts, what the main sources of error are, and which problems should be further…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
