Intergenerational Test Generation for Natural Language Processing Applications
Pin Ji, Yang Feng, Weitao Huang, Jia Liu, Zhihong Zhao

TL;DR
This paper introduces NLPLego, an automated, linguistically grounded test generation method for NLP applications that creates diverse, grammatically correct test cases to detect model errors effectively across multiple tasks.
Contribution
It presents a novel, general test generation framework based on sentence parsing that automates error detection in NLP models using seed sentences and grammatical assembly.
Findings
Successfully detected thousands of errors in state-of-the-art models
Achieved around 95.7% precision in error detection
Demonstrated effectiveness across multiple NLP tasks
Abstract
The development of modern NLP applications often relies on various benchmark datasets containing plenty of manually labeled tests to evaluate performance. While constructing datasets often costs many resources, the performance on the held-out data may not properly reflect their capability in real-world application scenarios and thus cause tremendous misunderstanding and monetary loss. To alleviate this problem, in this paper, we propose an automated test generation method for detecting erroneous behaviors of various NLP applications. Our method is designed based on the sentence parsing process of classic linguistics, and thus it is capable of assembling basic grammatical elements and adjuncts into a grammatically correct test with proper oracle information. We implement this method into NLPLego, which is designed to fully exploit the potential of seed sentences to automate the test…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Software Engineering Research
