LLM-Generated Natural Language Meets Scaling Laws: New Explorations and Data Augmentation Methods
Zhenhua Wang, Guang Xu, Ming Ren

TL;DR
This paper explores the relationship between LLM-generated natural language and human language using scaling laws, introduces a novel data augmentation method called ZGPTDA for few-shot classification, and demonstrates its effectiveness through extensive experiments.
Contribution
It establishes a theoretical foundation for LLM-generated language using scaling laws and proposes a new data augmentation method that improves classification performance.
Findings
LLMNL slightly deviates from Mandelbrot's law (~0.2 exponent)
ZGPTDA improves F1 scores of BERT and RoBERTa by 7-10%
ZGPTDA surpasses recent methods like AugGPT and GENCO by about 2% accuracy
Abstract
With the ascent of large language models (LLM), natural language processing has witnessed enhancements, such as LLM-based data augmentation. Nonetheless, prior research harbors two primary concerns: firstly, a lack of contemplation regarding whether the natural language generated by LLM (LLMNL) truly aligns with human natural language (HNL), a critical foundational question; secondly, an oversight that augmented data is randomly generated by LLM, implying that not all data may possess equal training value, that could impede the performance of classifiers. To address these challenges, we introduce the scaling laws to intrinsically calculate LLMNL and HNL. Through extensive experiments, we reveal slight deviations (approximately 0.2 Mandelbrot exponent) from Mandelbrot's law in LLMNL, underscore a complexity advantage in HNL, and supplement an interpretive discussion on language style.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Byte Pair Encoding · Linear Layer · Label Smoothing · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Weight Decay · Residual Connection · Transformer
