Learning Interpretable Spatial Operations in a Rich 3D Blocks World

Yonatan Bisk; Kevin J. Shih; Yejin Choi; Daniel Marcu

arXiv:1712.03463·cs.CL·December 27, 2017

Learning Interpretable Spatial Operations in a Rich 3D Blocks World

Yonatan Bisk, Kevin J. Shih, Yejin Choi, Daniel Marcu

PDF

TL;DR

This paper introduces a new dataset and neural model for mapping natural language instructions to complex spatial actions in a 3D blocks world, enabling more interpretable and accurate spatial understanding.

Contribution

It presents a novel dataset with rich natural language descriptions of 3D spatial operations and a neural architecture that learns interpretable spatial operations from this data.

Findings

01

Achieved competitive results on 3D spatial instruction tasks

02

Discovered an interpretable set of spatial operations automatically

03

Enhanced understanding of complex spatial language in 3D environments

Abstract

In this paper, we study the problem of mapping natural language instructions to complex spatial actions in a 3D blocks world. We first introduce a new dataset that pairs complex 3D spatial operations to rich natural language descriptions that require complex spatial and pragmatic interpretations such as "mirroring", "twisting", and "balancing". This dataset, built on the simulation environment of Bisk, Yuret, and Marcu (2016), attains language that is significantly richer and more complex, while also doubling the size of the original dataset in the 2D environment with 100 new world configurations and 250,000 tokens. In addition, we propose a new neural architecture that achieves competitive results while automatically discovering an inventory of interpretable spatial operations (Figure 5)

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.