Quality Assurance of Generative Dialog Models in an Evolving Conversational Agent Used for Swedish Language Practice
Markus Borg, Johan Bengtsson, Harald \"Osterling, Alexander, Hagelborn, Isabella Gagner, Piotr Tomaszewski

TL;DR
This paper presents initial steps toward automating quality assurance for generative dialog models used in Swedish language practice, focusing on detecting meaningful differences between models in an evolving conversational agent.
Contribution
It introduces a set of automated test cases for evaluating generative dialog models and demonstrates their effectiveness in distinguishing model quality in a language learning context.
Findings
Six test cases detect meaningful model differences
Automated framework aids in model selection
Progress toward MLOps integration for conversational agents
Abstract
Due to the migration megatrend, efficient and effective second-language acquisition is vital. One proposed solution involves AI-enabled conversational agents for person-centered interactive language practice. We present results from ongoing action research targeting quality assurance of proprietary generative dialog models trained for virtual job interviews. The action team elicited a set of 38 requirements for which we designed corresponding automated test cases for 15 of particular interest to the evolving solution. Our results show that six of the test case designs can detect meaningful differences between candidate models. While quality assurance of natural language processing applications is complex, we provide initial steps toward an automated framework for machine learning model selection in the context of an evolving conversational agent. Future work will focus on model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · AI in Service Interactions · Topic Modeling
