Mining the Web for Lexical Knowledge to Improve Keyphrase Extraction: Learning from Labeled and Unlabeled Data
Peter D. Turney (National Research Council of Canada)

TL;DR
This paper introduces a novel approach to automatic keyphrase extraction by mining lexical knowledge from a vast collection of unlabeled web data, improving performance without domain-specific or training-intensive features.
Contribution
It proposes new features derived from large-scale web mining that enhance keyphrase extraction, reducing reliance on domain-specific and manually labeled data.
Findings
Improved keyphrase extraction performance with new features
Features are effective across multiple domains
Reduces need for extensive labeled training data
Abstract
Keyphrases are useful for a variety of purposes, including summarizing, indexing, labeling, categorizing, clustering, highlighting, browsing, and searching. The task of automatic keyphrase extraction is to select keyphrases from within the text of a given document. Automatic keyphrase extraction makes it feasible to generate keyphrases for the huge number of documents that do not have manually assigned keyphrases. Good performance on this task has been obtained by approaching it as a supervised learning problem. An input document is treated as a set of candidate phrases that must be classified as either keyphrases or non-keyphrases. To classify a candidate phrase as a keyphrase, the most important features (attributes) appear to be the frequency and location of the candidate phrase in the document. Recent work has demonstrated that it is also useful to know the frequency of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques · Topic Modeling · Information Retrieval and Search Behavior
