Data Augmentation Techniques for Process Extraction from Scientific   Publications

Yuni Susanti

arXiv:2405.14594·cs.CL·April 16, 2025·2 cites

Data Augmentation Techniques for Process Extraction from Scientific Publications

Yuni Susanti

PDF

Open Access

TL;DR

This paper introduces data augmentation techniques tailored for process extraction in scientific publications, significantly enhancing model accuracy and robustness, especially in low-resource chemistry datasets.

Contribution

The paper proposes novel data augmentation methods that leverage process-specific information, role label similarity, and sentence similarity for improved process extraction.

Findings

01

Up to 12.3 points improvement in F-score.

02

Enhanced performance on small and low-resource datasets.

03

Potential reduction in overfitting.

Abstract

We present data augmentation techniques for process extraction tasks in scientific publications. We cast the process extraction task as a sequence labeling task where we identify all the entities in a sentence and label them according to their process-specific roles. The proposed method attempts to create meaningful augmented sentences by utilizing (1) process-specific information from the original sentence, (2) role label similarity, and (3) sentence similarity. We demonstrate that the proposed methods substantially improve the performance of the process extraction model trained on chemistry domain datasets, up to 12.3 points improvement in performance accuracy (F-score). The proposed methods could potentially reduce overfitting as well, especially when training on small datasets or in a low-resource setting such as in chemistry and other scientific domains.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies