Data Augmentation Method Utilizing Template Sentences for Variable Definition Extraction
Kotaro Nagayama, Shota Kato, Manabu Kano

TL;DR
This paper introduces a data augmentation technique using template sentences to improve variable definition extraction from scientific papers, achieving higher accuracy without extensive new training data.
Contribution
The study proposes a novel template-based sentence generation method to augment training data for variable definition extraction across fields.
Findings
Achieved 89.6% accuracy on chemical process papers.
Outperformed existing models with augmented data.
Demonstrated effectiveness of template-based augmentation.
Abstract
The extraction of variable definitions from scientific and technical papers is essential for understanding these documents. However, the characteristics of variable definitions, such as the length and the words that make up the definition, differ among fields, which leads to differences in the performance of existing extraction methods across fields. Although preparing training data specific to each field can improve the performance of the methods, it is costly to create high-quality training data. To address this challenge, this study proposes a new method that generates new definition sentences from template sentences and variable-definition pairs in the training data. The proposed method has been tested on papers about chemical processes, and the results show that the model trained with the definition sentences generated by the proposed method achieved a higher accuracy of 89.6%,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Advanced Text Analysis Techniques · Natural Language Processing Techniques
