Neural Variational Learning for Grounded Language Acquisition
Nisha Pillai, Cynthia Matuszek, Francis Ferraro

TL;DR
This paper introduces a neural generative approach for grounded language learning that links language to visual percepts without predefined categories, enabling effective multilingual and low-resource language grounding.
Contribution
It presents a unified generative model that learns shared semantic-visual embeddings for grounded language acquisition without relying on pre-defined visual categories.
Findings
Effective in low-resource settings
Generalizes across multilingual datasets
Outperforms non-neural methods in language grounding
Abstract
We propose a learning system in which language is grounded in visual percepts without specific pre-defined categories of terms. We present a unified generative method to acquire a shared semantic/visual embedding that enables the learning of language about a wide range of real-world objects. We evaluate the efficacy of this learning by predicting the semantics of objects and comparing the performance with neural and non-neural inputs. We show that this generative approach exhibits promising results in language grounding without pre-specifying visual categories under low resource settings. Our experiments demonstrate that this approach is generalizable to multilingual, highly varied datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
