Deep Exploration of Cross-Lingual Zero-Shot Generalization in Instruction Tuning
Janghoon Han, Changho Lee, Joongbo Shin, Stanley Jungkyu Choi, Honglak, Lee, Kynghoon Bae

TL;DR
This paper investigates cross-lingual zero-shot generalization in instruction tuning, introducing a new Korean dataset and demonstrating significant performance improvements across languages, highlighting the importance of multilingual data over language similarity.
Contribution
It introduces KORANI, a new Korean instruction dataset, and proposes cross-lingual templates, advancing understanding of multilingual instruction tuning and zero-shot generalization.
Findings
Cross-lingual generalization improves performance in unseen tasks.
Performance gains of 20.7% in English and 13.6% in Korean over baselines.
Cross-lingual training can surpass monolingual instruction tuning in some cases.
Abstract
Instruction tuning has emerged as a powerful technique, significantly boosting zero-shot performance on unseen tasks. While recent work has explored cross-lingual generalization by applying instruction tuning to multilingual models, previous studies have primarily focused on English, with a limited exploration of non-English tasks. For an in-depth exploration of cross-lingual generalization in instruction tuning, we perform instruction tuning individually for two distinct language meta-datasets. Subsequently, we assess the performance on unseen tasks in a language different from the one used for training. To facilitate this investigation, we introduce a novel non-English meta-dataset named "KORANI" (Korean Natural Instruction), comprising 51 Korean benchmarks. Moreover, we design cross-lingual templates to mitigate discrepancies in language and instruction-format of the template between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEducational Technology and Assessment · Educational Technology and Pedagogy
