Keyphrase Generation for Scientific Document Retrieval

Florian Boudin; Ygor Gallina; Akiko Aizawa

arXiv:2106.14726·cs.IR·June 29, 2021

Keyphrase Generation for Scientific Document Retrieval

Florian Boudin, Ygor Gallina, Akiko Aizawa

PDF

1 Repo

TL;DR

This paper demonstrates that sequence-to-sequence keyphrase generation models can enhance scientific document retrieval and introduces an evaluation framework to analyze their limitations across different domains.

Contribution

It provides empirical evidence of the benefits of keyphrase generation for retrieval and proposes a new framework for evaluating these models' limitations.

Findings

01

Keyphrase models improve retrieval performance.

02

Challenges exist in generating absent keyphrases.

03

Cross-domain generalization remains difficult.

Abstract

Sequence-to-sequence models have lead to significant progress in keyphrase generation, but it remains unknown whether they are reliable enough to be beneficial for document retrieval. This study provides empirical evidence that such models can significantly improve retrieval performance, and introduces a new extrinsic evaluation framework that allows for a better understanding of the limitations of keyphrase generation models. Using this framework, we point out and discuss the difficulties encountered with supplementing documents with -- not present in text -- keyphrases, and generalizing models across domains. Our code is available at https://github.com/boudinfl/ir-using-kg

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

boudinfl/ir-using-kg
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.