Building Category Graphs Representation with Spatial and Temporal Attention for Visual Navigation
Xiaobo Hu, Youfang Lin, HeHe Fan, Shuo Wang, Zhihao Wu, Kai Lv

TL;DR
This paper introduces a novel approach for visual navigation that combines a category relation graph with a spatial-temporal-region attention architecture, enabling agents to better understand object relations and improve navigation in unseen environments.
Contribution
The paper proposes a new Category Relation Graph and a Temporal-Spatial-Region attention model to enhance object relation understanding and navigation performance in visual tasks.
Findings
Significantly outperforms existing methods on AI2-THOR
Improves navigation accuracy and efficiency
Effectively models long-term spatial-temporal dependencies
Abstract
Given an object of interest, visual navigation aims to reach the object's location based on a sequence of partial observations. To this end, an agent needs to 1) learn a piece of certain knowledge about the relations of object categories in the world during training and 2) look for the target object based on the pre-learned object category relations and its moving trajectory in the current unseen environment. In this paper, we propose a Category Relation Graph (CRG) to learn the knowledge of object category layout relations and a Temporal-Spatial-Region (TSR) attention architecture to perceive the long-term spatial-temporal dependencies of objects helping the navigation. We learn prior knowledge of object layout, establishing a category relationship graph to deduce the positions of specific objects. Subsequently, we introduced TSR to capture the relationships of objects in temporal,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Geographic Information Systems Studies
MethodsConvolution · Sigmoid Activation · Max Pooling · Average Pooling
