CORI: CJKV Benchmark with Romanization Integration -- A step towards Cross-lingual Transfer Beyond Textual Scripts
Hoang H. Nguyen, Chenwei Zhang, Ye Liu, Natalie Parde, Eugene, Rohrbaugh, Philip S. Yu

TL;DR
This paper emphasizes the importance of language contact in cross-lingual transfer, introduces a CJKV benchmark dataset, and proposes Romanization integration with contrastive learning to improve transfer performance.
Contribution
It highlights the significance of source language contact, creates a novel CJKV benchmark, and introduces Romanization with contrastive learning for better cross-lingual representations.
Findings
Romanization integration improves transfer accuracy.
Contact-aware source language selection enhances performance.
The CJKV benchmark facilitates in-depth language contact studies.
Abstract
Naively assuming English as a source language may hinder cross-lingual transfer for many languages by failing to consider the importance of language contact. Some languages are more well-connected than others, and target languages can benefit from transferring from closely related languages; for many languages, the set of closely related languages does not include English. In this work, we study the impact of source language for cross-lingual transfer, demonstrating the importance of selecting source languages that have high contact with the target language. We also construct a novel benchmark dataset for close contact Chinese-Japanese-Korean-Vietnamese (CJKV) languages to further encourage in-depth studies of language contact. To comprehensively capture contact between these languages, we propose to integrate Romanized transcription beyond textual scripts via Contrastive Learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHistorical Geopolitical and Social Dynamics
MethodsSparse Evolutionary Training · Contrastive Learning
