SWAT: A System for Detecting Salient Wikipedia Entities in Texts

Marco Ponza; Paolo Ferragina; Francesco Piccinno

arXiv:1804.03580·cs.IR·May 17, 2019·6 cites

SWAT: A System for Detecting Salient Wikipedia Entities in Texts

Marco Ponza, Paolo Ferragina, Francesco Piccinno

PDF

Open Access

TL;DR

SWAT is a system designed to identify and classify Wikipedia entities in texts as salient or not, leveraging a supervised learning approach with extensive feature extraction and validated through comprehensive experiments.

Contribution

The paper introduces SWAT, a novel system that detects salient Wikipedia entities in texts using a multi-module approach and a large-scale supervised training process.

Findings

01

SWAT outperforms existing solutions on multiple datasets.

02

High accuracy in entity salience detection.

03

Effective feature extraction improves classification performance.

Abstract

We study the problem of entity salience by proposing the design and implementation of SWAT, a system that identifies the salient Wikipedia entities occurring in an input document. SWAT consists of several modules that are able to detect and classify on-the-fly Wikipedia entities as salient or not, based on a large number of syntactic, semantic and latent features properly extracted via a supervised process which has been trained over millions of examples drawn from the New York Times corpus. The validation process is performed through a large experimental assessment, eventually showing that SWAT improves known solutions over all publicly available datasets. We release SWAT via an API that we describe and comment in the paper in order to ease its use in other software.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Wikis in Education and Collaboration