Grounded Language Learning in a Simulated 3D World
Karl Moritz Hermann, Felix Hill, Simon Green, Fumin Wang, Ryan, Faulkner, Hubert Soyer, David Szepesvari, Wojciech Marian Czarnecki, Max, Jaderberg, Denis Teplyashin, Marcus Wainwright, Chris Apps, Demis Hassabis,, Phil Blunsom

TL;DR
This paper introduces a simulated 3D environment where an agent learns to interpret and ground language through reinforcement and unsupervised learning, enabling understanding of novel instructions and improving language acquisition efficiency.
Contribution
The study presents a novel grounded language learning agent that generalizes language understanding in a complex 3D environment using combined learning methods.
Findings
Agent successfully interprets language in new situations
Learning speed increases with accumulated semantic knowledge
Agent relates language to perceptual and action representations
Abstract
We are increasingly surrounded by artificially intelligent technology that takes decisions and executes actions on our behalf. This creates a pressing need for general means to communicate with, instruct and guide artificial agents, with human language the most compelling means for such communication. To achieve this in a scalable fashion, agents must be able to relate language to the world and to actions; that is, their understanding of language must be grounded and embodied. However, learning grounded language is a notoriously challenging problem in artificial intelligence research. Here we present an agent that learns to interpret language in a simulated 3D environment where it is rewarded for the successful execution of written instructions. Trained via a combination of reinforcement and unsupervised learning, and beginning with minimal prior knowledge, the agent learns to relate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech and dialogue systems · Multimodal Machine Learning Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
