Towards the Assessment of Task-based Chatbots: From the TOFU-R Snapshot to the BRASATO Curated Dataset
Elena Masserini, Diego Clerissi, Daniela Micucci, Jo\~ao R. Campos, Leonardo Mariani

TL;DR
This paper introduces two datasets, TOFU-R and BRASATO, to support the evaluation of task-based chatbots' reliability, security, and robustness, addressing the lack of large-scale, high-quality datasets.
Contribution
It presents new datasets and tools for creating and maintaining datasets to improve automated quality assessment of task-based chatbots.
Findings
TOFU-R captures open-source Rasa chatbots from GitHub.
BRASATO offers a curated selection of relevant chatbots for research.
Datasets facilitate reproducibility and evaluation of chatbot reliability.
Abstract
Task-based chatbots are increasingly being used to deliver real services, yet assessing their reliability, security, and robustness remains underexplored, also due to the lack of large-scale, high-quality datasets. The emerging automated quality assessment techniques targeting chatbots often rely on limited pools of subjects, such as custom-made toy examples, or outdated, no longer available, or scarcely popular agents, complicating the evaluation of such techniques. In this paper, we present two datasets and the tool support necessary to create and maintain these datasets. The first dataset is RASA TASK-BASED CHATBOTS FROM GITHUB (TOFU-R), which is a snapshot of the Rasa chatbots available on GitHub, representing the state of the practice in open-source chatbot development with Rasa. The second dataset is BOT RASA COLLECTION (BRASATO), a curated selection of the most relevant chatbots…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI in Service Interactions · Spreadsheets and End-User Computing · Personal Information Management and User Behavior
