Does Vision Accelerate Hierarchical Generalization in Neural Language Learners?
Tatsuki Kuribayashi, Timothy Baldwin

TL;DR
This paper investigates whether visual information can improve the syntactic generalization of neural language models, finding that clear visual-linguistic alignment aids learning, but additional biases are needed for optimal results.
Contribution
It demonstrates that visual data can enhance language model generalization when alignments are explicit, emphasizing the importance of multimodal cues and biases.
Findings
Visual data improves syntactic generalization with clear alignments
Lack of alignment reduces the benefit of visual input
Additional biases like mutual gaze are necessary for better multimodal learning
Abstract
Neural language models (LMs) are arguably less data-efficient than humans from a language acquisition perspective. One fundamental question is why this human-LM gap arises. This study explores the advantage of grounded language acquisition, specifically the impact of visual information -- which humans can usually rely on but LMs largely do not have access to during language acquisition -- on syntactic generalization in LMs. Our experiments, following the poverty of stimulus paradigm under two scenarios (using artificial vs. naturalistic images), demonstrate that if the alignments between the linguistic and visual components are clear in the input, access to vision data does help with the syntactic generalization of LMs, but if not, visual input does not help. This highlights the need for additional biases or signals, such as mutual gaze, to enhance cross-modal alignment and enable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
