A Systematic Study of Performance Disparities in Multilingual Task-Oriented Dialogue Systems
Songbo Hu, Han Zhou, Moy Yuan, Milan Gritta, Guchun Zhang, Ignacio, Iacobacci, Anna Korhonen, Ivan Vuli\'c

TL;DR
This paper empirically analyzes performance disparities in multilingual task-oriented dialogue systems, revealing factors influencing these gaps and providing practical guidance for improving system development across languages.
Contribution
It introduces new quantitative measures for evaluating disparities and demonstrates how factors like language, data, and model affect performance in multilingual ToD systems.
Findings
Performance disparities depend on task, model, language, and data amount.
Even with parallel data, systems show reduced performance in Arabic and Turkish.
Insights and practical tips for data collection and system development in new languages.
Abstract
Achieving robust language technologies that can perform well across the world's many languages is a central goal of multilingual NLP. In this work, we take stock of and empirically analyse task performance disparities that exist between multilingual task-oriented dialogue (ToD) systems. We first define new quantitative measures of absolute and relative equivalence in system performance, capturing disparities across languages and within individual languages. Through a series of controlled experiments, we demonstrate that performance disparities depend on a number of factors: the nature of the ToD task at hand, the underlying pretrained language model, the target language, and the amount of ToD annotated data. We empirically prove the existence of the adaptation and intrinsic biases in current ToD systems: e.g., ToD systems trained for Arabic or Turkish using annotated ToD data fully…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Topic Modeling · Recommender Systems and Techniques
