Cross-Lingual Retrieval Augmented Prompt for Low-Resource Languages
Ercong Nie, Sheng Liang, Helmut Schmid, Hinrich Sch\"utze

TL;DR
This paper introduces PARC, a retrieval-augmented prompting method that enhances zero-shot cross-lingual transfer for low-resource languages by leveraging semantically similar high-resource language sentences, improving performance across multiple tasks.
Contribution
The paper proposes the PARC pipeline, a novel retrieval-augmented prompting approach that significantly boosts zero-shot performance on low-resource languages in multilingual models.
Findings
PARC improves zero-shot performance by up to 16.3% on low-resource languages.
PARC outperforms finetuning baseline by 3.7%.
Performance correlates with language similarity and pretraining data amount.
Abstract
Multilingual Pretrained Language Models (MPLMs) have shown their strong multilinguality in recent empirical cross-lingual transfer studies. In this paper, we propose the Prompts Augmented by Retrieval Crosslingually (PARC) pipeline to improve the zero-shot performance on low-resource languages (LRLs) by augmenting the context with semantically similar sentences retrieved from a high-resource language (HRL) as prompts. PARC improves the zero-shot performance on three downstream tasks (binary sentiment classification, topic categorization and natural language inference) with multilingual parallel test sets across 10 LRLs covering 6 language families in both unlabeled settings (+5.1%) and labeled settings (+16.3%). PARC-labeled also outperforms the finetuning baseline by 3.7%. We find a significant positive correlation between cross-lingual transfer performance on one side, and the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsTest
