ALP: Data Augmentation using Lexicalized PCFGs for Few-Shot Text Classification
Hazel Kim, Daecheol Woo, Seong Joon Oh, Jeong-Won Cha, Yo-Sub Han

TL;DR
ALP introduces a novel data augmentation method using lexicalized PCFGs to generate syntactically diverse and plausible samples, significantly improving few-shot text classification performance.
Contribution
This paper presents ALP, a new augmentation technique leveraging lexicalized PCFGs for syntactic diversity, and proposes augmentation-based train-validation splitting strategies for better few-shot learning.
Findings
ALP improves performance of state-of-the-art classifiers in few-shot tasks.
Augmentation-based splitting strategies outperform traditional train-validation splits.
Lexicalized PCFGs generate syntactically diverse, plausible sentences without domain experts.
Abstract
Data augmentation has been an important ingredient for boosting performances of learned models. Prior data augmentation methods for few-shot text classification have led to great performance boosts. However, they have not been designed to capture the intricate compositional structure of natural language. As a result, they fail to generate samples with plausible and diverse sentence structures. Motivated by this, we present the data Augmentation using Lexicalized Probabilistic context-free grammars (ALP) that generates augmented samples with diverse syntactic structures with plausible grammar. The lexicalized PCFG parse trees consider both the constituents and dependencies to produce a syntactic frame that maximizes a variety of word choices in a syntactically preservable manner without specific domain experts. Experiments on few-shot text classification tasks demonstrate that ALP…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text and Document Classification Technologies
