Utilizing Language-Image Pretraining for Efficient and Robust Bilingual Word Alignment
Tuan Dinh, Jy-yong Sohn, Shashank Rajput, Timothy Ossowski, Yifei, Ming, Junjie Hu, Dimitris Papailiopoulos, Kangwook Lee

TL;DR
This paper introduces WALIP, a novel method that leverages pretrained language-image models and visual observations to improve the efficiency and robustness of unsupervised bilingual word alignment across multiple languages.
Contribution
We propose WALIP, a new unsupervised word alignment method using language-image pretraining and visual observations, enhancing accuracy and robustness over existing approaches.
Findings
WALIP outperforms state-of-the-art bilingual word alignment methods.
WALIP demonstrates robustness across different language pairs and embeddings.
The method effectively utilizes visual information for improved alignment.
Abstract
Word translation without parallel corpora has become feasible, rivaling the performance of supervised methods. Recent findings have shown that the accuracy and robustness of unsupervised word translation (UWT) can be improved by making use of visual observations, which are universal representations across languages. In this work, we investigate the potential of using not only visual observations but also pretrained language-image models for enabling a more efficient and robust UWT. Specifically, we develop a novel UWT method dubbed Word Alignment using Language-Image Pretraining (WALIP), which leverages visual observations via the shared embedding space of images and texts provided by CLIP models (Radford et al., 2021). WALIP has a two-step procedure. First, we retrieve word pairs with high confidences of similarity, computed using our proposed image-based fingerprints, which define the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Domain Adaptation and Few-Shot Learning
MethodsContrastive Language-Image Pre-training · Procrustes
