LongKey: Keyphrase Extraction for Long Documents
Jeovane Honorio Alves, Radu State, Cinthia Obladen de Almendra, Freitas, Jean Paul Barddal

TL;DR
LongKey is a new framework that effectively extracts keyphrases from long documents using an encoder-based model, outperforming existing methods across multiple datasets and domains.
Contribution
It introduces a novel encoder-based approach with max-pooling for keyphrase extraction from lengthy texts, addressing a gap in current short-document-focused methods.
Findings
LongKey outperforms existing methods on diverse datasets.
It effectively captures long-range dependencies in lengthy texts.
Demonstrates versatility across different domains.
Abstract
In an era of information overload, manually annotating the vast and growing corpus of documents and scholarly papers is increasingly impractical. Automated keyphrase extraction addresses this challenge by identifying representative terms within texts. However, most existing methods focus on short documents (up to 512 tokens), leaving a gap in processing long-context documents. In this paper, we introduce LongKey, a novel framework for extracting keyphrases from lengthy documents, which uses an encoder-based language model to capture extended text intricacies. LongKey uses a max-pooling embedder to enhance keyphrase candidate representation. Validated on the comprehensive LDKP datasets and six diverse, unseen datasets, LongKey consistently outperforms existing unsupervised and language model-based keyphrase extraction methods. Our findings demonstrate LongKey's versatility and superior…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques
MethodsFocus
