Named Entity Recognition Using Web Document Corpus

Wahiba Ben Abdessalem Karaa

arXiv:1102.5728·cs.IR·March 1, 2011

Named Entity Recognition Using Web Document Corpus

Wahiba Ben Abdessalem Karaa

PDF

TL;DR

This paper presents a method for named entity recognition that leverages web document corpora to identify and classify contexts associated with entities like persons, locations, and organizations, using frequency-based weighting.

Contribution

It introduces a novel context-based classification approach for NE recognition utilizing web documents and frequency-weighted representations.

Findings

01

Effective identification of NE contexts using web corpus data

02

Improved NE classification accuracy through frequency and tf-idf weighting

03

Demonstrated applicability to various NE types such as persons and locations

Abstract

This paper introduces a named entity recognition approach in textual corpus. This Named Entity (NE) can be a named: location, person, organization, date, time, etc., characterized by instances. A NE is found in texts accompanied by contexts: words that are left or right of the NE. The work mainly aims at identifying contexts inducing the NE's nature. As such, The occurrence of the word "President" in a text, means that this word or context may be followed by the name of a president as President "Obama". Likewise, a word preceded by the string "footballer" induces that this is the name of a footballer. NE recognition may be viewed as a classification method, where every word is assigned to a NE class, regarding the context. The aim of this study is then to identify and classify the contexts that are most relevant to recognize a NE, those which are frequently found with the NE. A learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.