Beyond Testers' Biases: Guiding Model Testing with Knowledge Bases using   LLMs

Chenyang Yang; Rishabh Rustogi; Rachel Brower-Sinning; Grace A. Lewis,; Christian K\"astner; Tongshuang Wu

arXiv:2310.09668·cs.CL·October 17, 2023·2 cites

Beyond Testers' Biases: Guiding Model Testing with Knowledge Bases using LLMs

Chenyang Yang, Rishabh Rustogi, Rachel Brower-Sinning, Grace A. Lewis,, Christian K\"astner, Tongshuang Wu

PDF

Open Access

TL;DR

Weaver is an interactive tool leveraging large language models to generate knowledge bases and guide model testing, helping testers identify diverse and nuanced test cases beyond their biases.

Contribution

We introduce Weaver, a novel interactive system that uses LLMs to support requirements elicitation and diversify model testing through knowledge base generation.

Findings

01

Testers identified more and more diverse concepts with Weaver.

02

Over 200 failing test cases found for stance detection using zero-shot ChatGPT.

03

Weaver aids real-world model testing in various application scenarios.

Abstract

Current model testing work has mostly focused on creating test cases. Identifying what to test is a step that is largely ignored and poorly supported. We propose Weaver, an interactive tool that supports requirements elicitation for guiding model testing. Weaver uses large language models to generate knowledge bases and recommends concepts from them interactively, allowing testers to elicit requirements for further testing. Weaver provides rich external knowledge to testers and encourages testers to systematically explore diverse concepts beyond their own biases. In a user study, we show that both NLP experts and non-experts identified more, as well as more diverse concepts worth testing when using Weaver. Collectively, they found more than 200 failing test cases for stance detection with zero-shot ChatGPT. Our case studies further show that Weaver can help practitioners test models in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Software System Performance and Reliability