Cross-Lingual Transfer for Low-Resource Natural Language Processing
Iker Garc\'ia-Ferrero

TL;DR
This paper advances cross-lingual transfer learning for low-resource NLP by improving annotation projection and model-based methods, and introduces Medical mT5, a multilingual medical model, to enhance NLP tasks in underrepresented languages.
Contribution
It introduces T-Projection, a novel annotation transfer method, and a constrained decoding algorithm for zero-shot cross-lingual tasks, along with the Medical mT5 model for medical NLP applications.
Findings
T-Projection outperforms previous annotation projection methods.
The constrained decoding algorithm improves zero-shot sequence labeling.
Medical mT5 demonstrates practical impact in medical NLP tasks.
Abstract
Natural Language Processing (NLP) has seen remarkable advances in recent years, particularly with the emergence of Large Language Models that have achieved unprecedented performance across many tasks. However, these developments have mainly benefited a small number of high-resource languages such as English. The majority of languages still face significant challenges due to the scarcity of training data and computational resources. To address this issue, this thesis focuses on cross-lingual transfer learning, a research area aimed at leveraging data and models from high-resource languages to improve NLP performance for low-resource languages. Specifically, we focus on Sequence Labeling tasks such as Named Entity Recognition, Opinion Target Extraction, and Argument Mining. The research is structured around three main objectives: (1) advancing data-based cross-lingual transfer learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
