ESAinsTOD: A Unified End-to-End Schema-Aware Instruction-Tuning Framework for Task-Oriented Dialog Modeling

Dechuan Teng; Chunlin Lu; Libo Qin; Wanxiang Che

arXiv:2603.09691·cs.CL·March 11, 2026

ESAinsTOD: A Unified End-to-End Schema-Aware Instruction-Tuning Framework for Task-Oriented Dialog Modeling

Dechuan Teng, Chunlin Lu, Libo Qin, Wanxiang Che

PDF

Open Access

TL;DR

ESAinsTOD is a unified framework that enhances task-oriented dialog systems by making them schema-aware and instruction-following, enabling better generalization, robustness, and adaptability across diverse datasets and low-resource scenarios.

Contribution

The paper introduces a structured end-to-end schema-aware instruction-tuning framework that improves adaptability and performance of task-oriented dialog models beyond traditional fine-tuning methods.

Findings

01

Outperforms state-of-the-art on multiple dialog benchmarks

02

Shows strong zero-shot generalization in low-resource settings

03

Enhances robustness against data noise and cascading errors

Abstract

Existing end-to-end modeling methods for modular task-oriented dialog systems are typically tailored to specific datasets, making it challenging to adapt to new dialog scenarios. In this work, we propose ESAinsTOD, a unified End-to-end Schema-Aware Instruction-tuning framework for general Task-Oriented Dialog modeling. This framework introduces a structured methodology to go beyond simply fine-tuning Large Language Models (LLMs), enabling flexible adaptation to various dialogue task flows and schemas. Specifically, we leverage full-parameter fine-tuning of LLMs and introduce two alignment mechanisms to make the resulting system both instruction-aware and schema-aware: (i) instruction alignment, which ensures that the system faithfully follows task instructions to complete various task flows from heterogeneous TOD datasets; and (ii) schema alignment, which encourages the system to make…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech and dialogue systems · Multimodal Machine Learning Applications