Limitations of Cross-Lingual Learning from Image Search
Mareike Hartmann, Anders Soegaard

TL;DR
This paper investigates the limitations of using image-based data for cross-lingual representation learning, especially beyond nouns, and explores combining visual and textual embeddings across multiple languages.
Contribution
It demonstrates that current image-based cross-lingual methods do not effectively extend to adjectives and verbs, highlighting the need for improved approaches.
Findings
Image-based methods struggle with non-noun parts of speech.
Combining visual and textual embeddings offers limited improvements.
Cross-lingual learning from images does not scale well beyond nouns.
Abstract
Cross-lingual representation learning is an important step in making NLP scale to all the world's languages. Recent work on bilingual lexicon induction suggests that it is possible to learn cross-lingual representations of words based on similarities between images associated with these words. However, that work focused on the translation of selected nouns only. In our work, we investigate whether the meaning of other parts-of-speech, in particular adjectives and verbs, can be learned in the same way. We also experiment with combining the representations learned from visual data with embeddings learned from textual data. Our experiments across five language pairs indicate that previous work does not scale to the problem of learning cross-lingual representations beyond simple nouns.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
