Enhancing Automatic Keyphrase Labelling with Text-to-Text Transfer Transformer (T5) Architecture: A Framework for Keyphrase Generation and Filtering
Jorge Gab\'in, M. Eduardo Ares, Javier Parapar

TL;DR
This paper introduces a T5-based framework for automatic keyphrase generation and filtering, significantly improving accuracy over existing methods by generating relevant keyphrases and effectively filtering irrelevant ones.
Contribution
The paper presents a novel T5-based keyphrase generation model and a filtering technique, enhancing the quality and relevance of generated keyphrases beyond prior extractive and generative approaches.
Findings
Model outperforms baselines with over 100% gains in some cases.
Filtering technique achieves near-perfect accuracy in removing false positives.
Majority voting improves keyphrase relevance and diversity.
Abstract
Automatic keyphrase labelling stands for the ability of models to retrieve words or short phrases that adequately describe documents' content. Previous work has put much effort into exploring extractive techniques to address this task; however, these methods cannot produce keyphrases not found in the text. Given this limitation, keyphrase generation approaches have arisen lately. This paper presents a keyphrase generation model based on the Text-to-Text Transfer Transformer (T5) architecture. Having a document's title and abstract as input, we learn a T5 model to generate keyphrases which adequately define its content. We name this model docT5keywords. We not only perform the classic inference approach, where the output sequence is directly selected as the predicted values, but we also report results from a majority voting approach. In this approach, multiple sequences are generated,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques · Technology Adoption and User Behaviour
MethodsAttention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Adafactor · Label Smoothing · Gated Linear Unit · Refunds@Expedia|||How do I get a full refund from Expedia? · SentencePiece · Byte Pair Encoding · Absolute Position Encodings
