Automated Testing of Task-based Chatbots: How Far Are We?
Diego Clerissi, Elena Masserini, Daniela Micucci, Leonardo Mariani

TL;DR
This paper evaluates the effectiveness of current testing techniques for task-based chatbots, highlighting their limitations in scenario complexity and oracle implementation through a systematic study of popular chatbots.
Contribution
It provides a comprehensive assessment of state-of-the-art chatbot testing methods on real-world chatbots, revealing key limitations and areas for improvement.
Findings
Testing techniques often produce simplistic scenarios
Weaknesses in oracle implementation limit testing effectiveness
Identified gaps suggest need for more sophisticated testing approaches
Abstract
Task-based chatbots are software, typically embedded in real-world applications, that assist users in completing tasks through a conversational interface. As chatbots are gaining popularity, effectively assessing their quality has become crucial. Whereas traditional testing techniques fail to systematically exercise the conversational space of chatbots, several approaches specifically targeting chatbots have emerged from both industry and research. Although these techniques have shown advancements over the years, they still exhibit limitations, such as simplicity of the generated test scenarios and weakness in implemented oracles. In this paper, we conduct a confirmatory study to investigate such limitations by evaluating the effectiveness of state-of-the-art chatbot testing techniques on a curated selection of task-based chatbots from GitHub, developed using the most popular commercial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI in Service Interactions · Speech and dialogue systems · Spreadsheets and End-User Computing
