Entity Extraction with Knowledge from Web Scale Corpora

Zeyi Wen; Zeyu Huang; Rui Zhang

arXiv:1911.09373·cs.CL·November 22, 2019

Entity Extraction with Knowledge from Web Scale Corpora

Zeyi Wen, Zeyu Huang, Rui Zhang

PDF

Open Access

TL;DR

This paper introduces techniques that leverage web-scale corpora to enhance entity extraction accuracy and efficiency in text mining tasks.

Contribution

It presents novel post-processing methods utilizing models trained on large web data, improving existing entity extraction techniques.

Findings

01

Significant improvement in extraction accuracy

02

Enhanced efficiency in processing large datasets

03

Robustness across diverse text sources

Abstract

Entity extraction is an important task in text mining and natural language processing. A popular method for entity extraction is by comparing substrings from free text against a dictionary of entities. In this paper, we present several techniques as a post-processing step for improving the effectiveness of the existing entity extraction technique. These techniques utilise models trained with the web-scale corpora which makes our techniques robust and versatile. Experiments show that our techniques bring a notable improvement on efficiency and effectiveness.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Web Data Mining and Analysis · Advanced Text Analysis Techniques