Mining the Web for Lexical Knowledge to Improve Keyphrase Extraction:   Learning from Labeled and Unlabeled Data

Peter D. Turney (National Research Council of Canada)

arXiv:cs/0212011·cs.LG·May 23, 2007·28 cites

Mining the Web for Lexical Knowledge to Improve Keyphrase Extraction: Learning from Labeled and Unlabeled Data

Peter D. Turney (National Research Council of Canada)

PDF

Open Access

TL;DR

This paper introduces a novel approach to automatic keyphrase extraction by mining lexical knowledge from a vast collection of unlabeled web data, improving performance without domain-specific or training-intensive features.

Contribution

It proposes new features derived from large-scale web mining that enhance keyphrase extraction, reducing reliance on domain-specific and manually labeled data.

Findings

01

Improved keyphrase extraction performance with new features

02

Features are effective across multiple domains

03

Reduces need for extensive labeled training data

Abstract

Keyphrases are useful for a variety of purposes, including summarizing, indexing, labeling, categorizing, clustering, highlighting, browsing, and searching. The task of automatic keyphrase extraction is to select keyphrases from within the text of a given document. Automatic keyphrase extraction makes it feasible to generate keyphrases for the huge number of documents that do not have manually assigned keyphrases. Good performance on this task has been obtained by approaching it as a supervised learning problem. An input document is treated as a set of candidate phrases that must be classified as either keyphrases or non-keyphrases. To classify a candidate phrase as a keyphrase, the most important features (attributes) appear to be the frequency and location of the candidate phrase in the document. Recent work has demonstrated that it is also useful to know the frequency of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Text Analysis Techniques · Topic Modeling · Information Retrieval and Search Behavior