Multilingual Sentence Transformer as A Multilingual Word Aligner
Weikang Wang, Guanhua Chen, Hanqing Wang, Yue Han, Yun Chen

TL;DR
This paper explores the use of LaBSE, a multilingual sentence transformer, as a word aligner, demonstrating its effectiveness and improvements over existing models through fine-tuning on parallel corpora.
Contribution
The study shows that LaBSE, originally designed for sentence embeddings, can be effectively adapted for word alignment, outperforming previous models and supporting zero-shot language pairs.
Findings
LaBSE outperforms other mPLMs in word alignment tasks.
Fine-tuning LaBSE improves alignment accuracy across seven language pairs.
The model achieves state-of-the-art results, including zero-shot language pairs.
Abstract
Multilingual pretrained language models (mPLMs) have shown their effectiveness in multilingual word alignment induction. However, these methods usually start from mBERT or XLM-R. In this paper, we investigate whether multilingual sentence Transformer LaBSE is a strong multilingual word aligner. This idea is non-trivial as LaBSE is trained to learn language-agnostic sentence-level embeddings, while the alignment extraction task requires the more fine-grained word-level embeddings to be language-agnostic. We demonstrate that the vanilla LaBSE outperforms other mPLMs currently used in the alignment task, and then propose to finetune LaBSE on parallel corpus for further improvement. Experiment results on seven language pairs show that our best aligner outperforms previous state-of-the-art models of all varieties. In addition, our aligner supports different language pairs in a single model,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
MethodsAttention Is All You Need · Linear Layer · Softmax · Absolute Position Encodings · XLM-R · Byte Pair Encoding · Adam · Layer Normalization · Label Smoothing · Multi-Head Attention
