SpeedRead: A Fast Named Entity Recognition Pipeline

Rami Al-Rfou'; Steven Skiena

arXiv:1301.2857·cs.CL·January 15, 2013·5 cites

SpeedRead: A Fast Named Entity Recognition Pipeline

Rami Al-Rfou', Steven Skiena

PDF

Open Access

TL;DR

SpeedRead is a high-performance named entity recognition pipeline that significantly outperforms existing systems in speed, enabling scalable analysis of large web-scale text corpora.

Contribution

The paper introduces SpeedRead, a fast NER pipeline combining efficient tokenization, POS tagging, and knowledge-based recognition, achieving at least 10 times faster performance.

Findings

01

SpeedRead runs at least 10 times faster than Stanford NLP pipeline.

02

It maintains high accuracy with a Penn Treebank-compliant tokenizer and near state-of-art POS tagging.

03

The pipeline enables scalable web-scale text analysis.

Abstract

Online content analysis employs algorithmic methods to identify entities in unstructured text. Both machine learning and knowledge-base approaches lie at the foundation of contemporary named entities extraction systems. However, the progress in deploying these approaches on web-scale has been been hampered by the computational cost of NLP over massive text corpora. We present SpeedRead (SR), a named entity recognition pipeline that runs at least 10 times faster than Stanford NLP pipeline. This pipeline consists of a high performance Penn Treebank- compliant tokenizer, close to state-of-art part-of-speech (POS) tagger and knowledge-based named entity recognizer.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques