Like a bilingual baby: The advantage of visually grounding a bilingual   language model

Khai-Nguyen Nguyen; Zixin Tang; Ankur Mali; Alex Kelly

arXiv:2210.05487·cs.CL·February 15, 2023

Like a bilingual baby: The advantage of visually grounding a bilingual language model

Khai-Nguyen Nguyen, Zixin Tang, Ankur Mali, Alex Kelly

PDF

Open Access

TL;DR

This paper demonstrates that visually grounding a bilingual language model enhances semantic understanding and perplexity, especially for concrete words, highlighting the importance of multi-sensory data in multilingual NLP.

Contribution

It introduces a visually grounded bilingual language model trained on English and Spanish image-caption data, showing improvements over traditional models in semantic similarity and perplexity.

Findings

01

Visual grounding improves semantic similarity within and across languages.

02

Grounded models show lower perplexity than non-grounded models.

03

No significant benefit observed for abstract words.

Abstract

Unlike most neural language models, humans learn language in a rich, multi-sensory and, often, multi-lingual environment. Current language models typically fail to fully capture the complexities of multilingual language use. We train an LSTM language model on images and captions in English and Spanish from MS-COCO-ES. We find that the visual grounding improves the model's understanding of semantic similarity both within and across languages and improves perplexity. However, we find no significant advantage of visual grounding for abstract words. Our results provide additional evidence of the advantages of visually grounded language models and point to the need for more naturalistic language data from multilingual speakers and multilingual datasets with perceptual grounding.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications

MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory