Representation Learning for Grounded Spatial Reasoning
Michael Janner, Karthik Narasimhan, Regina Barzilay

TL;DR
This paper introduces a reinforcement learning-based model that learns grounded spatial representations to improve spatial reasoning in simulated environments, significantly reducing goal localization errors.
Contribution
It presents a novel representation learning approach that aligns language with environment context for enhanced spatial reasoning capabilities.
Findings
45% reduction in goal localization error
Outperforms state-of-the-art methods on multiple metrics
Effective handling of local and global spatial references
Abstract
The interpretation of spatial references is highly contextual, requiring joint inference over both language and the environment. We consider the task of spatial reasoning in a simulated environment, where an agent can act and receive rewards. The proposed model learns a representation of the world steered by instruction text. This design allows for precise alignment of local neighborhoods with corresponding verbalizations, while also handling global references in the instructions. We train our model with reinforcement learning using a variant of generalized value iteration. The model outperforms state-of-the-art approaches on several metrics, yielding a 45% reduction in goal localization error.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Speech and dialogue systems · Topic Modeling
