SIRI: Spatial Relation Induced Network For Spatial Description   Resolution

Peiyao Wang; Weixin Luo; Yanyu Xu; Haojie Li; Shugong Xu; Jianyu Yang,; Shenghua Gao

arXiv:2010.14301·cs.CV·October 28, 2020

SIRI: Spatial Relation Induced Network For Spatial Description Resolution

Peiyao Wang, Weixin Luo, Yanyu Xu, Haojie Li, Shugong Xu, Jianyu Yang,, Shenghua Gao

PDF

Open Access 1 Video

TL;DR

This paper introduces SIRI, a novel network that models spatial relationships explicitly for language-guided localization in panoramic views, significantly improving accuracy over previous methods.

Contribution

The paper proposes a new spatial relationship induced network that mimics human spatial reasoning, incorporating object-level correlation, spatial relationship distillation, and global position priors.

Findings

01

Achieves 24% better accuracy than state-of-the-art on Touchdown dataset.

02

Effectively generalizes to an extended dataset with similar settings.

03

Improves spatial description resolution by explicit relationship modeling.

Abstract

Spatial Description Resolution, as a language-guided localization task, is proposed for target location in a panoramic street view, given corresponding language descriptions. Explicitly characterizing an object-level relationship while distilling spatial relationships are currently absent but crucial to this task. Mimicking humans, who sequentially traverse spatial relationship words and objects with a first-person view to locate their target, we propose a novel spatial relationship induced (SIRI) network. Specifically, visual features are firstly correlated at an implicit object-level in a projected latent space; then they are distilled by each spatial relationship word, resulting in each differently activated feature representing each spatial relationship. Further, we introduce global position priors to fix the absence of positional information, which may result in global positional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

SIRI: Spatial Relation Induced Network For Spatial Description Resolution· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Human Pose and Action Recognition