SDialog: A Python Toolkit for End-to-End Agent Building, User Simulation, Dialog Generation, and Evaluation

Sergio Burdisso; S\'everin Baroudi; Yanis Labrak; David Grunert; Pawel Cyrta; Yiyang Chen; Srikanth Madikeri; Thomas Schaaf; Esa\'u Villatoro-Tello; Ahmed Hassoon; Ricard Marxer; Petr Motlicek

arXiv:2506.10622·cs.CL·May 12, 2026

SDialog: A Python Toolkit for End-to-End Agent Building, User Simulation, Dialog Generation, and Evaluation

Sergio Burdisso, S\'everin Baroudi, Yanis Labrak, David Grunert, Pawel Cyrta, Yiyang Chen, Srikanth Madikeri, Thomas Schaaf, Esa\'u Villatoro-Tello, Ahmed Hassoon, Ricard Marxer, Petr Motlicek

PDF

1 Video

TL;DR

SDialog is an open-source Python toolkit that unifies dialog generation, evaluation, and interpretability for building and analyzing conversational agents with LLMs.

Contribution

It introduces a comprehensive framework combining multi-agent simulation, evaluation metrics, interpretability tools, and audio simulation in a unified dialog-centric architecture.

Findings

01

Supports persona-driven multi-agent simulation for controlled dialog generation.

02

Provides diverse evaluation metrics including linguistic and functional correctness.

03

Includes interpretability tools for activation inspection and steering.

Abstract

We present SDialog, an MIT-licensed open-source Python toolkit that unifies dialog generation, evaluation and mechanistic interpretability into a single end-to-end framework for building and analyzing LLM-based conversational agents. Built around a standardized Dialog representation, SDialog provides: (1) persona-driven multi-agent simulation with composable orchestration for controlled, synthetic dialog generation, (2) comprehensive evaluation combining linguistic metrics, LLM-as-a-judge and functional correctness validators, (3) mechanistic interpretability tools for activation inspection and steering via feature ablation and induction, and (4) audio generation with full acoustic simulation including 3D room modeling and microphone effects. The toolkit integrates with all major LLM backends, enabling mixed-backend experiments under a unified API. By coupling generation, evaluation,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

SDialog: A Python Toolkit for End-to-End Agent Building, User Simulation, Dialog Generation, and Evaluation· underline