Supervised Topical Key Phrase Extraction of News Stories using Crowdsourcing, Light Filtering and Co-reference Normalization
Luis Marujo, Anatole Gershman, Jaime Carbonell, Robert Frederking,, Jo\~ao P. Neto

TL;DR
This paper enhances automated news story indexing by integrating semantic features, pre-processing techniques like light filtering and co-reference normalization, and crowdsourced labeling, resulting in improved key phrase extraction accuracy.
Contribution
It introduces the combination of semantic features and pre-processing steps, evaluated with crowdsourced data, to significantly improve key phrase extraction from news stories.
Findings
Semantic features and rhetorical signals improve accuracy.
Light filtering and co-reference normalization enhance key phrase extraction.
Deeper semantic features alone do not significantly improve results.
Abstract
Fast and effective automated indexing is critical for search and personalized services. Key phrases that consist of one or more words and represent the main concepts of the document are often used for the purpose of indexing. In this paper, we investigate the use of additional semantic features and pre-processing steps to improve automatic key phrase extraction. These features include the use of signal words and freebase categories. Some of these features lead to significant improvements in the accuracy of the results. We also experimented with 2 forms of document pre-processing that we call light filtering and co-reference normalization. Light filtering removes sentences from the document, which are judged peripheral to its main content. Co-reference normalization unifies several written forms of the same named entity into a unique form. We also needed a "Gold Standard" - a set of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques · Topic Modeling · Natural Language Processing Techniques
