Testing Question Answering Software with Context-Driven Question Generation
Shuang Liu, Zhirun Zhang, Jinhao Dong, Zan Wang, Qingchao Shen, Junjie Chen, Wei Lu, Xiaoyong Du

TL;DR
This paper introduces CQ^2A, a context-driven question generation method that improves testing of question-answering systems by producing more natural, diverse, and contextually relevant questions, leading to better bug detection and system refinement.
Contribution
The paper presents a novel context-driven question generation approach using large language models, enhancing the quality and relevance of test questions for QA system testing.
Findings
Outperforms state-of-the-art methods in bug detection
Generates more natural and contextually relevant questions
Reduces error rates in QA systems after fine-tuning
Abstract
Question-answering software is becoming increasingly integrated into our daily lives, with prominent examples including Apple Siri and Amazon Alexa. Ensuring the quality of such systems is critical, as incorrect answers could lead to significant harm. Current state-of-the-art testing approaches apply metamorphic relations to existing test datasets, generating test questions based on these relations. However, these methods have two key limitations. First, they often produce unnatural questions that humans are unlikely to ask, reducing the effectiveness of the generated questions in identifying bugs that might occur in real-world scenarios. Second, these questions are generated from pre-existing test datasets, ignoring the broader context and thus limiting the diversity and relevance of the generated questions. In this work, we introduce CQ^2A, a context-driven question generation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Testing and Debugging Techniques · Topic Modeling · Software Engineering Research
