Interactive Grounded Language Acquisition and Generalization in a 2D World
Haonan Yu, Haichao Zhang, Wei Xu

TL;DR
This paper presents a virtual agent that learns language grounded in visual perception within a 2D maze, capable of generalizing to new words and sentence structures through interactive learning and shared concept detection.
Contribution
The work introduces a novel interactive learning framework that disentangles language grounding from other routines, enabling reliable zero-shot generalization in a 2D environment.
Findings
Outperforms five comparison methods in zero-shot sentence interpretation
Learns from over 1.6 million sentences with diverse vocabulary
Demonstrates human-interpretable intermediate model outputs
Abstract
We build a virtual agent for learning language in a 2D maze-like world. The agent sees images of the surrounding environment, listens to a virtual teacher, and takes actions to receive rewards. It interactively learns the teacher's language from scratch based on two language use cases: sentence-directed navigation and question answering. It learns simultaneously the visual representations of the world, the language, and the action control. By disentangling language grounding from other computational routines and sharing a concept detection function between language grounding and prediction, the agent reliably interpolates and extrapolates to interpret sentences that contain new word combinations or new words missing from training sentences. The new words are transferred from the answers of language prediction. Such a language ability is trained and evaluated on a population of over 1.6…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques
