Does Vision Accelerate Hierarchical Generalization in Neural Language   Learners?

Tatsuki Kuribayashi; Timothy Baldwin

arXiv:2302.00667·cs.CL·December 18, 2024·1 cites

Does Vision Accelerate Hierarchical Generalization in Neural Language Learners?

Tatsuki Kuribayashi, Timothy Baldwin

PDF

Open Access

TL;DR

This paper investigates whether visual information can improve the syntactic generalization of neural language models, finding that clear visual-linguistic alignment aids learning, but additional biases are needed for optimal results.

Contribution

It demonstrates that visual data can enhance language model generalization when alignments are explicit, emphasizing the importance of multimodal cues and biases.

Findings

01

Visual data improves syntactic generalization with clear alignments

02

Lack of alignment reduces the benefit of visual input

03

Additional biases like mutual gaze are necessary for better multimodal learning

Abstract

Neural language models (LMs) are arguably less data-efficient than humans from a language acquisition perspective. One fundamental question is why this human-LM gap arises. This study explores the advantage of grounded language acquisition, specifically the impact of visual information -- which humans can usually rely on but LMs largely do not have access to during language acquisition -- on syntactic generalization in LMs. Our experiments, following the poverty of stimulus paradigm under two scenarios (using artificial vs. naturalistic images), demonstrate that if the alignments between the linguistic and visual components are clear in the input, access to vision data does help with the syntactic generalization of LMs, but if not, visual input does not help. This highlights the need for additional biases or signals, such as mutual gaze, to enhance cross-modal alignment and enable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications