CALICO: Conversational Agent Localization via Synthetic Data Generation
Andy Rosenbaum, Pegah Kharazmi, Ershad Banijamali, Lu Zeng,, Christopher DiPersio, Pan Wei, Gokmen Oz, Clement Chung, Karolina Owczarzak,, Fabian Triefenbach, Wael Hamza

TL;DR
CALICO is a novel method that fine-tunes large language models to generate localized conversational data across multiple languages, improving slot translation accuracy and creating more challenging test sets for multilingual conversational agents.
Contribution
CALICO introduces a new approach for multilingual slot localization using synthetic data generation and an iterative filtering mechanism, outperforming existing literal translation methods.
Findings
CALICO produces more accurate slot translations than state-of-the-art methods.
The human-localized test set is more challenging than the original.
CALICO improves downstream conversational agent performance.
Abstract
We present CALICO, a method to fine-tune Large Language Models (LLMs) to localize conversational agent training data from one language to another. For slots (named entities), CALICO supports three operations: verbatim copy, literal translation, and localization, i.e. generating slot values more appropriate in the target language, such as city and airport names located in countries where the language is spoken. Furthermore, we design an iterative filtering mechanism to discard noisy generated samples, which we show boosts the performance of the downstream conversational agent. To prove the effectiveness of CALICO, we build and release a new human-localized (HL) version of the MultiATIS++ travel information test set in 8 languages. Compared to the original human-translated (HT) version of the test set, we show that our new HL version is more challenging. We also show that CALICO…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMulti-Agent Systems and Negotiation
MethodsEmirates Airlines Office in Dubai · Sparse Evolutionary Training
