Rethinking Model Selection and Decoding for Keyphrase Generation with Pre-trained Sequence-to-Sequence Models
Di Wu, Wasi Uddin Ahmad, Kai-Wei Chang

TL;DR
This paper systematically analyzes how model selection and decoding strategies impact keyphrase generation with pre-trained seq2seq models, proposing a new decode-select algorithm that enhances performance.
Contribution
It provides a comprehensive analysis of model and decoding choices in PLM-based KPG and introduces DeSel, a likelihood-based decoding method that improves F1 scores.
Findings
Greedy search achieves high F1 but lower recall.
Increased model size and task adaptation have limited efficiency.
DeSel improves greedy search F1 by 4.7% on average.
Abstract
Keyphrase Generation (KPG) is a longstanding task in NLP with widespread applications. The advent of sequence-to-sequence (seq2seq) pre-trained language models (PLMs) has ushered in a transformative era for KPG, yielding promising performance improvements. However, many design decisions remain unexplored and are often made arbitrarily. This paper undertakes a systematic analysis of the influence of model selection and decoding strategies on PLM-based KPG. We begin by elucidating why seq2seq PLMs are apt for KPG, anchored by an attention-driven hypothesis. We then establish that conventional wisdom for selecting seq2seq PLMs lacks depth: (1) merely increasing model size or performing task-specific adaptation is not parameter-efficient; (2) although combining in-domain pre-training with task adaptation benefits KPG, it does partially hinder generalization. Regarding decoding, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques · Topic Modeling · Natural Language Processing Techniques
MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory · Sequence to Sequence
