Automated Testing of Task-based Chatbots: How Far Are We?

Diego Clerissi; Elena Masserini; Daniela Micucci; Leonardo Mariani

arXiv:2602.13072·cs.SE·February 16, 2026

Automated Testing of Task-based Chatbots: How Far Are We?

Diego Clerissi, Elena Masserini, Daniela Micucci, Leonardo Mariani

PDF

Open Access

TL;DR

This paper evaluates the effectiveness of current testing techniques for task-based chatbots, highlighting their limitations in scenario complexity and oracle implementation through a systematic study of popular chatbots.

Contribution

It provides a comprehensive assessment of state-of-the-art chatbot testing methods on real-world chatbots, revealing key limitations and areas for improvement.

Findings

01

Testing techniques often produce simplistic scenarios

02

Weaknesses in oracle implementation limit testing effectiveness

03

Identified gaps suggest need for more sophisticated testing approaches

Abstract

Task-based chatbots are software, typically embedded in real-world applications, that assist users in completing tasks through a conversational interface. As chatbots are gaining popularity, effectively assessing their quality has become crucial. Whereas traditional testing techniques fail to systematically exercise the conversational space of chatbots, several approaches specifically targeting chatbots have emerged from both industry and research. Although these techniques have shown advancements over the years, they still exhibit limitations, such as simplicity of the generated test scenarios and weakness in implemented oracles. In this paper, we conduct a confirmatory study to investigate such limitations by evaluating the effectiveness of state-of-the-art chatbot testing techniques on a curated selection of task-based chatbots from GitHub, developed using the most popular commercial…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAI in Service Interactions · Speech and dialogue systems · Spreadsheets and End-User Computing