Cross-Lingual Word Alignment for ASEAN Languages with Contrastive   Learning

Jingshen Zhang; Xinying Qiu; Teng Shen; Wenyu Wang; Kailin Zhang,; Wenhe Feng

arXiv:2407.05054·cs.CL·July 9, 2024

Cross-Lingual Word Alignment for ASEAN Languages with Contrastive Learning

Jingshen Zhang, Xinying Qiu, Teng Shen, Wenyu Wang, Kailin Zhang,, Wenhe Feng

PDF

Open Access

TL;DR

This paper introduces a contrastive learning approach within a BiLSTM encoder-decoder model to improve cross-lingual word alignment accuracy for ASEAN languages, especially in low-resource settings.

Contribution

It proposes a novel contrastive learning method with multi-view negative sampling to explicitly model differences in word embeddings for better alignment.

Findings

01

Contrastive learning improves alignment accuracy across datasets.

02

The method outperforms previous models in low-resource scenarios.

03

The approach is validated on five bilingual datasets for ASEAN languages.

Abstract

Cross-lingual word alignment plays a crucial role in various natural language processing tasks, particularly for low-resource languages. Recent study proposes a BiLSTM-based encoder-decoder model that outperforms pre-trained language models in low-resource settings. However, their model only considers the similarity of word embedding spaces and does not explicitly model the differences between word embeddings. To address this limitation, we propose incorporating contrastive learning into the BiLSTM-based encoder-decoder framework. Our approach introduces a multi-view negative sampling strategy to learn the differences between word pairs in the shared cross-lingual embedding space. We evaluate our model on five bilingual aligned datasets spanning four ASEAN languages: Lao, Vietnamese, Thai, and Indonesian. Experimental results demonstrate that integrating contrastive learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech and dialogue systems · Text Readability and Simplification

MethodsSparse Evolutionary Training · Contrastive Learning