Learning Algorithms for Keyphrase Extraction

Peter D. Turney (National Research Council of Canada)

arXiv:cs/0212020·cs.LG·May 23, 2007·112 cites

Learning Algorithms for Keyphrase Extraction

Peter D. Turney (National Research Council of Canada)

PDF

Open Access

TL;DR

This paper compares machine learning algorithms for automatic keyphrase extraction, demonstrating that a custom algorithm with domain knowledge outperforms general-purpose methods, achieving about 80% human-acceptable keyphrases.

Contribution

It introduces the GenEx algorithm, a domain-specific method for keyphrase extraction that surpasses general algorithms like C4.5 in performance.

Findings

01

GenEx outperforms C4.5 in keyphrase quality

02

Approximately 80% of generated keyphrases are human-acceptable

03

Domain-specific algorithms improve keyphrase extraction results

Abstract

Many academic journals ask their authors to provide a list of about five to fifteen keywords, to appear on the first page of each article. Since these key words are often phrases of two or more words, we prefer to call them keyphrases. There is a wide variety of tasks for which keyphrases are useful, as we discuss in this paper. We approach the problem of automatically extracting keyphrases from text as a supervised learning task. We treat a document as a set of phrases, which the learning algorithm must learn to classify as positive or negative examples of keyphrases. Our first set of experiments applies the C4.5 decision tree induction algorithm to this learning task. We evaluate the performance of nine different configurations of C4.5. The second set of experiments applies the GenEx algorithm to the task. We developed the GenEx algorithm specifically for automatically extracting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Text Analysis Techniques · Biomedical Text Mining and Ontologies · Information Retrieval and Search Behavior