Cross-Lingual Knowledge Distillation for Answer Sentence Selection in Low-Resource Languages
Shivanshu Gupta, Yoshitomo Matsubara, Ankit Chadha, Alessandro, Moschitti

TL;DR
This paper introduces Cross-Lingual Knowledge Distillation (CLKD), a method for training Answer Sentence Selection models in low-resource languages using English teachers, achieving competitive results without target language labeled data.
Contribution
The paper proposes CLKD, a novel cross-lingual distillation approach, and introduces new multilingual datasets for evaluating AS2 in low-resource languages.
Findings
CLKD outperforms or rivals supervised fine-tuning methods.
Introduces Xtr-WikiQA and TyDi-AS2 datasets for multilingual AS2 evaluation.
Demonstrates effectiveness across diverse languages and models.
Abstract
While impressive performance has been achieved on the task of Answer Sentence Selection (AS2) for English, the same does not hold for languages that lack large labeled datasets. In this work, we propose Cross-Lingual Knowledge Distillation (CLKD) from a strong English AS2 teacher as a method to train AS2 models for low-resource languages in the tasks without the need of labeled data for the target language. To evaluate our method, we introduce 1) Xtr-WikiQA, a translation-based WikiQA dataset for 9 additional languages, and 2) TyDi-AS2, a multilingual AS2 dataset with over 70K questions spanning 8 typologically diverse languages. We conduct extensive experiments on Xtr-WikiQA and TyDi-AS2 with multiple teachers, diverse monolingual and multilingual pretrained language models (PLMs) as students, and both monolingual and multilingual training. The results demonstrate that CLKD either…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsKnowledge Distillation
