Enhancing Cross-lingual Transfer via Phonemic Transcription Integration
Hoang H. Nguyen, Chenwei Zhang, Tao Zhang, Eugene Rohrbaugh, Philip S., Yu

TL;DR
This paper introduces PhoneXL, a framework that integrates phonemic transcriptions with orthographic data to improve cross-lingual transfer, especially among languages with different scripts, demonstrating consistent performance gains on CJKV language tasks.
Contribution
The paper proposes a novel phonemic-orthographic alignment framework and releases the first dataset for CJKV languages, advancing cross-lingual transfer methods beyond script similarity.
Findings
Phonemic transcriptions provide essential information beyond orthography.
Incorporating phonemic data improves cross-lingual token-level task performance.
The framework achieves consistent improvements over traditional orthographic-based models.
Abstract
Previous cross-lingual transfer methods are restricted to orthographic representation learning via textual scripts. This limitation hampers cross-lingual transfer and is biased towards languages sharing similar well-known scripts. To alleviate the gap between languages from different writing scripts, we propose PhoneXL, a framework incorporating phonemic transcriptions as an additional linguistic modality beyond the traditional orthographic transcriptions for cross-lingual transfer. Particularly, we propose unsupervised alignment objectives to capture (1) local one-to-one alignment between the two different modalities, (2) alignment via multi-modality contexts to leverage information from additional modalities, and (3) alignment via multilingual contexts where additional bilingual dictionaries are incorporated. We also release the first phonemic-orthographic alignment dataset on two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
