I-WAS: a Data Augmentation Method with GPT-2 for Simile Detection

Yongzhu Chang; Rongsheng Zhang; Jiashu Pu

arXiv:2308.04109·cs.CL·October 10, 2023

I-WAS: a Data Augmentation Method with GPT-2 for Simile Detection

Yongzhu Chang, Rongsheng Zhang, Jiashu Pu

PDF

Open Access

TL;DR

This paper introduces I-WAS, a data augmentation technique using GPT-2 for simile detection, enhancing the diversity and quality of training data to improve NLP applications in literature.

Contribution

The paper presents a novel GPT-2 based data augmentation method specifically designed for simile detection, addressing data scarcity and diversity issues.

Findings

01

Improved simile detection accuracy with augmented data

02

Enhanced diversity of simile forms in the dataset

03

Effective augmentation method validated on a new diverse corpus

Abstract

Simile detection is a valuable task for many natural language processing (NLP)-based applications, particularly in the field of literature. However, existing research on simile detection often relies on corpora that are limited in size and do not adequately represent the full range of simile forms. To address this issue, we propose a simile data augmentation method based on \textbf{W}ord replacement And Sentence completion using the GPT-2 language model. Our iterative process called I-WAS, is designed to improve the quality of the augmented sentences. To better evaluate the performance of our method in real-world applications, we have compiled a corpus containing a more diverse set of simile forms for experimentation. Our experimental results demonstrate the effectiveness of our proposed data augmentation method for simile detection.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Cosine Annealing · Discriminative Fine-Tuning · Linear Layer · Adam · Dense Connections · Residual Connection · Dropout