Benchmarks Underestimate the Readiness of Multi-lingual Dialogue Agents
Andrew H. Lee, Sina J. Semnani, Galo Castillo-L\'opez, G\"ael de, Chalendar, Monojit Choudhury, Ashna Dua, Kapil Rajesh Kavitha, Sungkyun Kim,, Prashant Kodali, Ponnurangam Kumaraguru, Alexis Lombard, Mehrad Moradshahi,, Gihyun Park, Nasredine Semmar, Jiwon Seo, Tianhao Shen

TL;DR
This paper demonstrates that in-context learning with GPT-4 can effectively handle multilingual dialogue tasks, but current benchmarks underestimate its true performance due to annotation errors and metric limitations.
Contribution
It introduces a novel approach using in-context learning for multilingual dialogue state tracking and response generation, revealing that benchmarks undervalue its effectiveness.
Findings
GPT-4 achieves 89.6%-96.8% DST accuracy after correction
Response generation correctness exceeds 99% with improved evaluation
Current benchmarks significantly underestimate in-context learning capabilities
Abstract
Creating multilingual task-oriented dialogue (TOD) agents is challenging due to the high cost of training data acquisition. Following the research trend of improving training data efficiency, we show for the first time, that in-context learning is sufficient to tackle multilingual TOD. To handle the challenging dialogue state tracking (DST) subtask, we break it down to simpler steps that are more compatible with in-context learning where only a handful of few-shot examples are used. We test our approach on the multilingual TOD dataset X-RiSAWOZ, which has 12 domains in Chinese, English, French, Korean, Hindi, and code-mixed Hindi-English. Our turn-by-turn DST accuracy on the 6 languages range from 55.6% to 80.3%, seemingly worse than the SOTA results from fine-tuned models that achieve from 60.7% to 82.8%; our BLEU scores in the response generation (RG) subtask are also significantly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Multi-Agent Systems and Negotiation · Topic Modeling
MethodsAttention Is All You Need · Dynamic Sparse Training · Linear Layer · Byte Pair Encoding · Label Smoothing · Adam · Residual Connection · Position-Wise Feed-Forward Layer · Multi-Head Attention · Dropout
