Redefining Absent Keyphrases and their Effect on Retrieval Effectiveness
Florian Boudin, Ygor Gallina

TL;DR
This paper examines the role of absent keyphrases in improving scientific document retrieval, proposing a new categorization scheme that highlights the significant impact of a small subset of keyphrase words on retrieval performance.
Contribution
It introduces a finer-grained categorization of absent keyphrases and demonstrates that a small fraction of keyphrase words significantly enhances retrieval effectiveness.
Findings
Only about 20% of keyphrase words serve as document expansion.
This small subset accounts for much of the retrieval gains.
The scheme offers a new way to evaluate neural keyphrase generation models.
Abstract
Neural keyphrase generation models have recently attracted much interest due to their ability to output absent keyphrases, that is, keyphrases that do not appear in the source text. In this paper, we discuss the usefulness of absent keyphrases from an Information Retrieval (IR) perspective, and show that the commonly drawn distinction between present and absent keyphrases is not made explicit enough. We introduce a finer-grained categorization scheme that sheds more light on the impact of absent keyphrases on scientific document retrieval. Under this scheme, we find that only a fraction (around 20%) of the words that make up keyphrases actually serves as document expansion, but that this small fraction of words is behind much of the gains observed in retrieval effectiveness. We also discuss how the proposed scheme can offer a new angle to evaluate the output of neural keyphrase…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
