Self-Compositional Data Augmentation for Scientific Keyphrase Generation

Mael Houbre; Florian Boudin; Beatrice Daille; Akiko Aizawa

arXiv:2411.03039·cs.CL·November 7, 2024

Self-Compositional Data Augmentation for Scientific Keyphrase Generation

Mael Houbre, Florian Boudin, Beatrice Daille, Akiko Aizawa

PDF

1 Repo

TL;DR

This paper introduces a self-compositional data augmentation technique that combines similar documents based on shared keyphrases to enhance scientific keyphrase generation without external data.

Contribution

The proposed method generates synthetic training samples by combining related documents, improving keyphrase generation performance across multiple domains.

Findings

01

Consistent performance improvement on multiple datasets

02

Enhanced representativity of generated keyphrases in Computer Science

03

No external data or resources needed for augmentation

Abstract

State-of-the-art models for keyphrase generation require large amounts of training data to achieve good performance. However, obtaining keyphrase-labeled documents can be challenging and costly. To address this issue, we present a self-compositional data augmentation method. More specifically, we measure the relatedness of training documents based on their shared keyphrases, and combine similar documents to generate synthetic samples. The advantage of our method lies in its ability to create additional training samples that keep domain coherence, without relying on external data or resources. Our results on multiple datasets spanning three different domains, demonstrate that our method consistently improves keyphrase generation. A qualitative analysis of the generated keyphrases for the Computer Science domain confirms this improvement towards their representativity property.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MHoubre/Self_Compo_DA_4KPG
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.