A Joint Learning Approach based on Self-Distillation for Keyphrase   Extraction from Scientific Documents

Tuan Manh Lai; Trung Bui; Doo Soon Kim; Quan Hung Tran

arXiv:2010.11980·cs.CL·October 26, 2020

A Joint Learning Approach based on Self-Distillation for Keyphrase Extraction from Scientific Documents

Tuan Manh Lai, Trung Bui, Doo Soon Kim, Quan Hung Tran

PDF

TL;DR

This paper introduces a joint learning method based on self-distillation to improve keyphrase extraction from scientific documents, leveraging large unlabeled datasets to outperform existing models and achieve state-of-the-art results.

Contribution

The paper presents a novel self-distillation based joint learning approach that effectively utilizes unlabeled scientific articles for keyphrase extraction.

Findings

01

Consistently improves baseline models' performance

02

Outperforms previous methods on Inspec and SemEval-2017 datasets

03

Achieves new state-of-the-art results

Abstract

Keyphrase extraction is the task of extracting a small set of phrases that best describe a document. Most existing benchmark datasets for the task typically have limited numbers of annotated documents, making it challenging to train increasingly complex neural networks. In contrast, digital libraries store millions of scientific articles online, covering a wide range of topics. While a significant portion of these articles contain keyphrases provided by their authors, most other articles lack such kind of annotations. Therefore, to effectively utilize these large amounts of unlabeled articles, we propose a simple and efficient joint learning approach based on the idea of self-distillation. Experimental results show that our approach consistently improves the performance of baseline models for keyphrase extraction. Furthermore, our best models outperform previous methods for the task,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.