PGA-SciRE: Harnessing LLM on Data Augmentation for Enhancing Scientific   Relation Extraction

Yang Zhou; Shimin Shan; Hongkui Wei; Zhehuan Zhao; Wenshuo Feng

arXiv:2405.20787·cs.CL·June 3, 2024

PGA-SciRE: Harnessing LLM on Data Augmentation for Enhancing Scientific Relation Extraction

Yang Zhou, Shimin Shan, Hongkui Wei, Zhehuan Zhao, Wenshuo Feng

PDF

Open Access

TL;DR

This paper introduces PGA, a data augmentation framework leveraging large language models to generate paraphrased and label-embedded pseudo-samples, significantly improving scientific relation extraction performance and reducing manual labeling costs.

Contribution

The paper presents a novel LLM-based data augmentation method for scientific relation extraction, enhancing model accuracy and efficiency.

Findings

01

Improved F1 scores across multiple RE models.

02

Effective reduction in manual labeling efforts.

03

Demonstrated benefits in scientific domain RE tasks.

Abstract

Relation Extraction (RE) aims at recognizing the relation between pairs of entities mentioned in a text. Advances in LLMs have had a tremendous impact on NLP. In this work, we propose a textual data augmentation framework called PGA for improving the performance of models for RE in the scientific domain. The framework introduces two ways of data augmentation, utilizing a LLM to obtain pseudo-samples with the same sentence meaning but with different representations and forms by paraphrasing the original training set samples. As well as instructing LLM to generate sentences that implicitly contain information about the corresponding labels based on the relation and entity of the original training set samples. These two kinds of pseudo-samples participate in the training of the RE model together with the original dataset, respectively. The PGA framework in the experiment improves the F1…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Semantic Web and Ontologies

MethodsSparse Evolutionary Training