A Deep Compositional Framework for Human-like Language Acquisition in   Virtual Environment

Haonan Yu; Haichao Zhang; and Wei Xu

arXiv:1703.09831·cs.CL·May 23, 2017·6 cites

A Deep Compositional Framework for Human-like Language Acquisition in Virtual Environment

Haonan Yu, Haichao Zhang, and Wei Xu

PDF

Open Access

TL;DR

This paper presents a deep, compositional framework enabling a virtual agent to learn language and navigation in a 2D maze, achieving zero-shot command execution through grounded, modular learning.

Contribution

It introduces an end-to-end deep learning approach that learns visual, linguistic, and action representations simultaneously, enabling zero-shot language understanding in a virtual environment.

Findings

01

Agent can execute zero-shot commands involving new word combinations

02

Agent understands new object concepts learned from other tasks

03

Framework visualizes intermediate representations showing comprehension

Abstract

We tackle a task where an agent learns to navigate in a 2D maze-like environment called XWORLD. In each session, the agent perceives a sequence of raw-pixel frames, a natural language command issued by a teacher, and a set of rewards. The agent learns the teacher's language from scratch in a grounded and compositional manner, such that after training it is able to correctly execute zero-shot commands: 1) the combination of words in the command never appeared before, and/or 2) the command contains new object concepts that are learned from another task but never learned from navigation. Our deep framework for the agent is trained end to end: it learns simultaneously the visual representations of the environment, the syntax and semantics of the language, and the action module that outputs actions. The zero-shot learning capability of our framework results from its compositionality and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Speech and dialogue systems