BiToD: A Bilingual Multi-Domain Dataset For Task-Oriented Dialogue Modeling
Zhaojiang Lin, Andrea Madotto, Genta Indra Winata, Peng Xu, Feijun, Jiang, Yuxiang Hu, Chen Shi, Pascale Fung

TL;DR
BiToD introduces the first bilingual multi-domain dataset for task-oriented dialogue, enabling better evaluation and development of multilingual dialogue systems with cross-lingual transfer learning.
Contribution
It provides a large, realistic bilingual dataset and benchmarks for end-to-end ToD modeling, addressing the lack of multilingual datasets.
Findings
Bilingual ToD systems outperform separate monolingual systems.
Leveraging bilingual knowledge bases enhances low-resource system performance.
Cross-lingual transfer learning improves dialogue system robustness.
Abstract
Task-oriented dialogue (ToD) benchmarks provide an important avenue to measure progress and develop better conversational agents. However, existing datasets for end-to-end ToD modeling are limited to a single language, hindering the development of robust end-to-end ToD systems for multilingual countries and regions. Here we introduce BiToD, the first bilingual multi-domain dataset for end-to-end task-oriented dialogue modeling. BiToD contains over 7k multi-domain dialogues (144k utterances) with a large and realistic bilingual knowledge base. It serves as an effective benchmark for evaluating bilingual ToD systems and cross-lingual transfer learning approaches. We provide state-of-the-art baselines under three evaluation settings (monolingual, bilingual, and cross-lingual). The analysis of our baselines in different settings highlights 1) the effectiveness of training a bilingual ToD…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech and dialogue systems · Natural Language Processing Techniques
