Evaluating the Homogeneity of Keyphrase Prediction Models
Ma\"el Houbre, Florian Boudin, Beatrice Daille

TL;DR
This paper introduces a method to evaluate the homogeneity of keyphrase prediction models, revealing that extraction methods are competitive and absent keyphrase generation may reduce homogeneity.
Contribution
The work proposes a novel evaluation method for homogeneity in keyphrase models and compares generative and extraction approaches on this metric.
Findings
Extraction methods are competitive with generative models.
Absent keyphrase generation can negatively impact homogeneity.
Homogeneity evaluation is a new benchmark for keyphrase models.
Abstract
Keyphrases which are useful in several NLP and IR applications are either extracted from text or predicted by generative models. Contrarily to keyphrase extraction approaches, keyphrase generation models can predict keyphrases that do not appear in a document's text called `absent keyphrases`. This ability means that keyphrase generation models can associate a document to a notion that is not explicitly mentioned in its text. Intuitively, this suggests that for two documents treating the same subjects, a keyphrase generation model is more likely to be homogeneous in their indexing i.e. predict the same keyphrase for both documents, regardless of those keyphrases appearing in their respective text or not; something a keyphrase extraction model would fail to do. Yet, homogeneity of keyphrase prediction models is not covered by current benchmarks. In this work, we introduce a method to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques · Sentiment Analysis and Opinion Mining · Biomedical Text Mining and Ontologies
