LongKey: Keyphrase Extraction for Long Documents

Jeovane Honorio Alves; Radu State; Cinthia Obladen de Almendra; Freitas; Jean Paul Barddal

arXiv:2411.17863·cs.CL·January 22, 2025

LongKey: Keyphrase Extraction for Long Documents

Jeovane Honorio Alves, Radu State, Cinthia Obladen de Almendra, Freitas, Jean Paul Barddal

PDF

Open Access 1 Repo

TL;DR

LongKey is a new framework that effectively extracts keyphrases from long documents using an encoder-based model, outperforming existing methods across multiple datasets and domains.

Contribution

It introduces a novel encoder-based approach with max-pooling for keyphrase extraction from lengthy texts, addressing a gap in current short-document-focused methods.

Findings

01

LongKey outperforms existing methods on diverse datasets.

02

It effectively captures long-range dependencies in lengthy texts.

03

Demonstrates versatility across different domains.

Abstract

In an era of information overload, manually annotating the vast and growing corpus of documents and scholarly papers is increasingly impractical. Automated keyphrase extraction addresses this challenge by identifying representative terms within texts. However, most existing methods focus on short documents (up to 512 tokens), leaving a gap in processing long-context documents. In this paper, we introduce LongKey, a novel framework for extracting keyphrases from lengthy documents, which uses an encoder-based language model to capture extended text intricacies. LongKey uses a max-pooling embedder to enhance keyphrase candidate representation. Validated on the comprehensive LDKP datasets and six diverse, unseen datasets, LongKey consistently outperforms existing unsupervised and language model-based keyphrase extraction methods. Our findings demonstrate LongKey's versatility and superior…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jeohalves/longkey
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Text Analysis Techniques

MethodsFocus