Multi-Level Testing of Conversational AI Systems

Elena Masserini

arXiv:2602.03311·cs.SE·February 4, 2026

Multi-Level Testing of Conversational AI Systems

Elena Masserini

PDF

Open Access

TL;DR

This paper proposes a multi-level testing framework for conversational AI systems, addressing the unique challenges of validating AI components and interactions at various granularities to improve reliability.

Contribution

It introduces a novel testing approach that evaluates conversational AI systems at multiple levels, from integration to multi-agent configurations, filling a gap in existing testing methods.

Findings

01

Developed a hierarchical testing methodology for conversational AI

02

Validated the approach on real-world conversational systems

03

Enhanced detection of integration and interaction issues

Abstract

Conversational AI systems combine AI-based solutions with the flexibility of conversational interfaces. However, most existing testing solutions do not straightforwardly adapt to the characteristics of conversational interaction or to the behavior of AI components. To address this limitation, this Ph.D. thesis investigates a new family of testing approaches for conversational AI systems, focusing on the validation of their constituent elements at different levels of granularity, from the integration between the language and the AI components, to individual conversational agents, up to multi-agent implementations of conversational AI systems

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · AI in Service Interactions · Explainable Artificial Intelligence (XAI)