ChatGPT and Human Synergy in Black-Box Testing: A Comparative Analysis
Hiroyuki Kirinuki, Haruto Tanno

TL;DR
This study compares ChatGPT and human-generated black-box test cases, finding that collaboration enhances coverage and effectiveness, but also highlights issues needing resolution before deployment.
Contribution
It provides a comparative analysis of ChatGPT and human test case generation, demonstrating the benefits of human-AI collaboration in software testing.
Findings
ChatGPT's test cases match or slightly surpass human coverage.
Collaborative testing covers more viewpoints than individual efforts.
ChatGPT-generated test cases have issues needing improvement.
Abstract
In recent years, large language models (LLMs), such as ChatGPT, have been pivotal in advancing various artificial intelligence applications, including natural language processing and software engineering. A promising yet underexplored area is utilizing LLMs in software testing, particularly in black-box testing. This paper explores the test cases devised by ChatGPT in comparison to those created by human participants. In this study, ChatGPT (GPT-4) and four participants each created black-box test cases for three applications based on specifications written by the authors. The goal was to evaluate the real-world applicability of the proposed test cases, identify potential shortcomings, and comprehend how ChatGPT could enhance human testing strategies. ChatGPT can generate test cases that generally match or slightly surpass those created by human participants in terms of test viewpoint…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Artificial Intelligence in Healthcare and Education
