EntityCS: Improving Zero-Shot Cross-lingual Transfer with Entity-Centric Code Switching
Chenxi Whitehouse, Fenia Christopoulou, Ignacio Iacobacci

TL;DR
This paper introduces EntityCS, a novel entity-centric code-switching method that enhances cross-lingual transfer in multilingual models by leveraging entity alignment from Wikidata and Wikipedia, leading to improved performance on entity-centric tasks.
Contribution
EntityCS is the first approach to focus on entity-level code-switching for better semantic and syntactic preservation, utilizing external knowledge bases for data augmentation.
Findings
10% improvement in Fact Retrieval accuracy
Consistent gains across four entity-centric tasks
Enhanced cross-lingual transfer with entity-aware training
Abstract
Accurate alignment between languages is fundamental for improving cross-lingual pre-trained language models (XLMs). Motivated by the natural phenomenon of code-switching (CS) in multilingual speakers, CS has been used as an effective data augmentation method that offers language alignment at the word- or phrase-level, in contrast to sentence-level via parallel instances. Existing approaches either use dictionaries or parallel sentences with word alignment to generate CS data by randomly switching words in a sentence. However, such methods can be suboptimal as dictionaries disregard semantics, and syntax might become invalid after random word switching. In this work, we propose EntityCS, a method that focuses on Entity-level Code-Switching to capture fine-grained cross-lingual semantics without corrupting syntax. We use Wikidata and English Wikipedia to construct an entity-centric CS…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
