TL;DR
This paper introduces LabelPigeon, a joint translation and label projection framework using XML tags, which improves cross-lingual transfer and translation quality across many languages and tasks.
Contribution
It presents a novel joint translation and label projection method with XML tags, outperforming baselines and enhancing translation and transfer performance.
Findings
LabelPigeon outperforms baselines in label projection accuracy.
Joint translation and label projection improves translation quality in 11 languages.
Substantial transfer gains, up to +40.2 F1 in NER, across multiple languages and tasks.
Abstract
Label projection is an effective technique for cross-lingual transfer, extending span-annotated datasets from a high-resource language to low-resource ones. Most approaches perform label projection as a separate step after machine translation, and prior work that combines the two reports degraded translation quality. We re-evaluate this claim with LabelPigeon, a novel framework that jointly performs translation and label projection via XML tags. We design a direct evaluation scheme for label projection, and find that LabelPigeon outperforms baselines and actively improves translation quality in 11 languages. We further assess translation quality across 203 languages and varying annotation complexity, finding consistent improvement attributed to additional fine-tuning. Finally, across 27 languages and three downstream tasks, we report substantial gains in cross-lingual transfer over…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
