What is Right for Me is Not Yet Right for You: A Dataset for Grounding Relative Directions via Multi-Task Learning
Jae Hee Lee, Matthias Kerzel, Kyra Ahrens, Cornelius Weber, Stefan, Wermter

TL;DR
This paper introduces GRiD-3D, a new dataset for understanding relative spatial directions in images, and demonstrates that neural networks can learn to ground these directions through multi-task learning.
Contribution
The paper presents a novel dataset for relative directions and analyzes how end-to-end models can learn to ground these spatial relations.
Findings
Models can learn to answer relative direction questions.
Subtasks are learned in an order reflecting an intuitive processing pipeline.
Grounding relative directions is feasible with appropriate training.
Abstract
Understanding spatial relations is essential for intelligent agents to act and communicate in the physical world. Relative directions are spatial relations that describe the relative positions of target objects with regard to the intrinsic orientation of reference objects. Grounding relative directions is more difficult than grounding absolute directions because it not only requires a model to detect objects in the image and to identify spatial relation based on this information, but it also needs to recognize the orientation of objects and integrate this information into the reasoning process. We investigate the challenging problem of grounding relative directions with end-to-end neural networks. To this end, we provide GRiD-3D, a novel dataset that features relative directions and complements existing visual question answering (VQA) datasets, such as CLEVR, that involve only absolute…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Geographic Information Systems Studies
