Model and Data Transfer for Cross-Lingual Sequence Labelling in Zero-Resource Settings
Iker Garc\'ia-Ferrero, Rodrigo Agerri, German Rigau

TL;DR
This paper compares data transfer and model transfer techniques for zero-resource cross-lingual sequence labelling, finding that model transfer with multilingual models generally outperforms data transfer, especially when high-capacity models are available.
Contribution
It provides an in-depth experimental comparison of data-based and model-based cross-lingual transfer methods, highlighting the superiority of model transfer with multilingual models in zero-shot settings.
Findings
Model transfer with multilingual models outperforms data transfer.
Translation-based data transfer can introduce discrepancies affecting performance.
Data transfer remains viable without high-capacity multilingual models.
Abstract
Zero-resource cross-lingual transfer approaches aim to apply supervised models from a source language to unlabelled target languages. In this paper we perform an in-depth study of the two main techniques employed so far for cross-lingual zero-resource sequence labelling, based either on data or model transfer. Although previous research has proposed translation and annotation projection (data-based cross-lingual transfer) as an effective technique for cross-lingual sequence labelling, in this paper we experimentally demonstrate that high capacity multilingual language models applied in a zero-shot (model-based cross-lingual transfer) setting consistently outperform data-based cross-lingual transfer approaches. A detailed analysis of our results suggests that this might be due to important differences in language use. More specifically, machine translation often generates a textual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
