PyTOD: Programmable Task-Oriented Dialogue with Execution Feedback

Alexandru Coca; Bo-Hsiang Tseng; Pete Boothroyd; Jianpeng Cheng; Mark Gaynor; Zhenxing Zhang; Joe Stacey; Tristan Guigue; H\'ector Martinez Alonso; Diarmuid \'O S\'eaghdha; Anders Johannsen

arXiv:2508.15456·cs.CL·August 22, 2025

PyTOD: Programmable Task-Oriented Dialogue with Execution Feedback

Alexandru Coca, Bo-Hsiang Tseng, Pete Boothroyd, Jianpeng Cheng, Mark Gaynor, Zhenxing Zhang, Joe Stacey, Tristan Guigue, H\'ector Martinez Alonso, Diarmuid \'O S\'eaghdha, Anders Johannsen

PDF

Open Access

TL;DR

PyTOD introduces a programmable dialogue agent that generates executable code for state tracking, leveraging execution feedback and constrained decoding to improve accuracy and robustness in task-oriented dialogues.

Contribution

PyTOD is the first to use language models for executable code generation in dialogue state tracking, achieving state-of-the-art results with a simple, flexible approach.

Findings

01

Achieves state-of-the-art performance on SGD benchmark.

02

Outperforms baselines in accuracy and goal estimation.

03

Demonstrates robustness with execution-aware state tracking.

Abstract

Programmable task-oriented dialogue (TOD) agents enable language models to follow structured dialogue policies, but their effectiveness hinges on accurate state tracking. We present PyTOD, an agent that generates executable code to track dialogue state and uses policy and execution feedback for efficient error correction. To this end, PyTOD employs a simple constrained decoding approach, using a language model instead of grammar rules to follow API schemata. This leads to state-of-the-art state tracking performance on the challenging SGD benchmark. Our experiments show that PyTOD surpasses strong baselines in both accuracy and robust user goal estimation as the dialogue progresses, demonstrating the effectiveness of execution-aware state tracking.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Topic Modeling · Multimodal Machine Learning Applications