BhashaSetu: Cross-Lingual Knowledge Transfer from High-Resource to Extreme Low-Resource Languages
Subhadip Maji, Arnab Bhattacharya

TL;DR
This paper introduces a novel graph-based method, GETR, for cross-lingual knowledge transfer that significantly improves NLP task performance in extremely low-resource languages by leveraging high-resource language data.
Contribution
The paper proposes GETR, a graph-enhanced token representation method, and demonstrates its effectiveness over existing baselines in low-resource language NLP tasks.
Findings
GETR outperforms baselines by 13-27 percentage points in key NLP tasks.
Significant improvements in POS tagging, sentiment analysis, and NER for low-resource languages.
Analysis reveals key transfer mechanisms and factors for successful knowledge transfer.
Abstract
Despite remarkable advances in natural language processing, developing effective systems for low-resource languages remains a formidable challenge, with performances typically lagging far behind high-resource counterparts due to data scarcity and insufficient linguistic resources. Cross-lingual knowledge transfer has emerged as a promising approach to address this challenge by leveraging resources from high-resource languages. In this paper, we investigate methods for transferring linguistic knowledge from high-resource languages to low-resource languages, where the number of labeled training instances is in hundreds. We focus on sentence-level and word-level tasks. We introduce a novel method, GETR (Graph-Enhanced Token Representation) for cross-lingual knowledge transfer along with two adopted baselines (a) augmentation in hidden layers and (b) token embedding transfer through token…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsICT in Developing Communities · Natural Language Processing Techniques · Topic Modeling
