What Is Near?: Room Locality Learning for Enhanced Robot   Vision-Language-Navigation in Indoor Living Environments

Muraleekrishna Gopinathan; Jumana Abu-Khalaf; David Suter; Sidike; Paheding; Nathir A. Rawashdeh

arXiv:2309.05036·cs.RO·September 12, 2023

What Is Near?: Room Locality Learning for Enhanced Robot Vision-Language-Navigation in Indoor Living Environments

Muraleekrishna Gopinathan, Jumana Abu-Khalaf, David Suter, Sidike, Paheding, Nathir A. Rawashdeh

PDF

Open Access

TL;DR

This paper introduces WIN, a model that enhances robot navigation in indoor environments by learning local room layouts from prior knowledge, improving generalization and decision-making in unseen spaces.

Contribution

WIN is a novel commonsense learning model that predicts local neighborhood maps using visual cues and layout knowledge, aiding efficient navigation in unseen indoor environments.

Findings

01

WIN outperforms classical VLN agents in unseen environments.

02

Achieves 68% success rate and 63% success weighted by path length.

03

Locality learning improves generalization and navigation efficiency.

Abstract

Humans use their knowledge of common house layouts obtained from previous experiences to predict nearby rooms while navigating in new environments. This greatly helps them navigate previously unseen environments and locate their target room. To provide layout prior knowledge to navigational agents based on common human living spaces, we propose WIN (\textit{W}hat \textit{I}s \textit{N}ear), a commonsense learning model for Vision Language Navigation (VLN) tasks. VLN requires an agent to traverse indoor environments based on descriptive navigational instructions. Unlike existing layout learning works, WIN predicts the local neighborhood map based on prior knowledge of living spaces and current observation, operating on an imagined global map of the entire environment. The model infers neighborhood regions based on visual cues of current observations, navigational history, and layout…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques