Improving Testsuites via Instrumentation
Norbert Broeker

TL;DR
This paper demonstrates how code instrumentation techniques from software engineering can improve the development and effectiveness of large-scale natural language grammar testsuites by identifying untested rules and redundancies.
Contribution
It introduces a novel methodology that reuses grammar writing knowledge to optimize testsuite development and coverage analysis in natural language processing.
Findings
Less than half of the German grammar is tested by existing testsuites.
10-30% of testing time is redundant.
The methodology effectively identifies untested rules and redundancies.
Abstract
This paper explores the usefulness of a technique from software engineering, namely code instrumentation, for the development of large-scale natural language grammars. Information about the usage of grammar rules in test sentences is used to detect untested rules, redundant test sentences, and likely causes of overgeneration. Results show that less than half of a large-coverage grammar for German is actually tested by two large testsuites, and that 10-30% of testing time is redundant. The methodology applied can be seen as a re-use of grammar writing knowledge for testsuite compilation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Testing and Debugging Techniques · Natural Language Processing Techniques · Educational Technology and Assessment
