A Joint Learning Approach based on Self-Distillation for Keyphrase Extraction from Scientific Documents
Tuan Manh Lai, Trung Bui, Doo Soon Kim, Quan Hung Tran

TL;DR
This paper introduces a joint learning method based on self-distillation to improve keyphrase extraction from scientific documents, leveraging large unlabeled datasets to outperform existing models and achieve state-of-the-art results.
Contribution
The paper presents a novel self-distillation based joint learning approach that effectively utilizes unlabeled scientific articles for keyphrase extraction.
Findings
Consistently improves baseline models' performance
Outperforms previous methods on Inspec and SemEval-2017 datasets
Achieves new state-of-the-art results
Abstract
Keyphrase extraction is the task of extracting a small set of phrases that best describe a document. Most existing benchmark datasets for the task typically have limited numbers of annotated documents, making it challenging to train increasingly complex neural networks. In contrast, digital libraries store millions of scientific articles online, covering a wide range of topics. While a significant portion of these articles contain keyphrases provided by their authors, most other articles lack such kind of annotations. Therefore, to effectively utilize these large amounts of unlabeled articles, we propose a simple and efficient joint learning approach based on the idea of self-distillation. Experimental results show that our approach consistently improves the performance of baseline models for keyphrase extraction. Furthermore, our best models outperform previous methods for the task,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
