Cross-Lingual Transfer Learning for Complex Word Identification
George-Eduard Zaharia, Dumitru-Clementin Cercel, Mihai Dascalu

TL;DR
This paper explores cross-lingual transfer learning for complex word identification using advanced NLP models, achieving state-of-the-art results across multiple languages in zero-shot and monolingual scenarios.
Contribution
It introduces a novel application of zero-shot, one-shot, and few-shot learning with Transformers for multilingual CWI, surpassing existing benchmarks.
Findings
Outperforms state-of-the-art results in zero-shot scenarios for English, German, and Spanish.
Achieves the best monolingual result for German CWI.
Demonstrates effectiveness of cross-lingual transfer learning in identifying complex words.
Abstract
Complex Word Identification (CWI) is a task centered on detecting hard-to-understand words, or groups of words, in texts from different areas of expertise. The purpose of CWI is to highlight problematic structures that non-native speakers would usually find difficult to understand. Our approach uses zero-shot, one-shot, and few-shot learning techniques, alongside state-of-the-art solutions for Natural Language Processing (NLP) tasks (i.e., Transformers). Our aim is to provide evidence that the proposed models can learn the characteristics of complex words in a multilingual environment by relying on the CWI shared task 2018 dataset available for four different languages (i.e., English, German, Spanish, and also French). Our approach surpasses state-of-the-art cross-lingual results in terms of macro F1-score on English (0.774), German (0.782), and Spanish (0.734) languages, for the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
