Can Current Task-oriented Dialogue Models Automate Real-world Scenarios in the Wild?
Sang-Woo Lee, Sungdong Kim, Donghyeon Ko, Donghoon Ham, Youngki Hong,, Shin Ah Oh, Hyunhoon Jung, Wangkyo Jung, Kyunghyun Cho, Donghyun Kwak,, Hyungsuk Noh, Woomyoung Park

TL;DR
This paper critiques current task-oriented dialogue systems' limitations in real-world scenarios and explores the WebTOD framework, which leverages large-scale language models to understand web and mobile interfaces for scalable dialogue management.
Contribution
It identifies the limitations of existing SF-TOD systems and introduces the WebTOD framework as a scalable alternative using large language models.
Findings
Current benchmarks are limited to surrogate scenarios.
WebTOD enables understanding of web/mobile interfaces.
Potential for more scalable, real-world dialogue systems.
Abstract
Task-oriented dialogue (TOD) systems are mainly based on the slot-filling-based TOD (SF-TOD) framework, in which dialogues are broken down into smaller, controllable units (i.e., slots) to fulfill a specific task. A series of approaches based on this framework achieved remarkable success on various TOD benchmarks. However, we argue that the current TOD benchmarks are limited to surrogate real-world scenarios and that the current TOD models are still a long way to cover the scenarios. In this position paper, we first identify current status and limitations of SF-TOD systems. After that, we explore the WebTOD framework, the alternative direction for building a scalable TOD system when a web/mobile interface is available. In WebTOD, the dialogue system learns how to understand the web/mobile interface that the human agent interacts with, powered by a large-scale language model.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Multi-Agent Systems and Negotiation · Context-Aware Activity Recognition Systems
