Transplant Then Regenerate: A New Paradigm for Text Data Augmentation

Guangzhan Wang; Hongyu Zhang; Beijun Shen; Xiaodong Gu

arXiv:2508.14723·cs.CL·September 16, 2025

Transplant Then Regenerate: A New Paradigm for Text Data Augmentation

Guangzhan Wang, Hongyu Zhang, Beijun Shen, Xiaodong Gu

PDF

Open Access 1 Video

TL;DR

This paper introduces LMTransplant, a novel text data augmentation method using large language models that enhances diversity and content creativity while maintaining original attributes, outperforming existing methods.

Contribution

The paper proposes LMTransplant, a new paradigm that leverages LLMs for more diverse and attribute-preserving text augmentation through a transplant-then-regenerate approach.

Findings

01

LMTransplant outperforms existing augmentation methods.

02

It scales effectively with larger augmented datasets.

03

Demonstrates superior performance across various text tasks.

Abstract

Data augmentation is a critical technique in deep learning. Traditional methods like Back-translation typically focus on lexical-level rephrasing, which primarily produces variations with the same semantics. While large language models (LLMs) have enhanced text augmentation by their "knowledge emergence" capability, controlling the style and structure of these outputs remains challenging and requires meticulous prompt engineering. In this paper, we propose LMTransplant, a novel text augmentation paradigm leveraging LLMs. The core idea of LMTransplant is transplant-then-regenerate: incorporating seed text into a context expanded by LLM, and asking the LLM to regenerate a variant based on the expanded context. This strategy allows the model to create more diverse and creative content-level variants by fully leveraging the knowledge embedded in LLMs, while preserving the core attributes of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Transplant Then Regenerate: A New Paradigm for Text Data Augmentation· underline

Taxonomy

TopicsNatural Language Processing Techniques · Service-Oriented Architecture and Web Services