MultiON: Benchmarking Semantic Map Memory using Multi-Object Navigation
Saim Wani, Shivansh Patel, Unnat Jain, Angel X. Chang, Manolis Savva

TL;DR
This paper introduces the multiON benchmark to evaluate semantic map memory in multi-object navigation tasks within photorealistic environments, revealing the impact of task complexity on navigation performance and highlighting the potential for future improvements.
Contribution
It presents the multiON benchmark for assessing map-based navigation, analyzes various agent models across task complexities, and uncovers surprising results about simple versus complex map agents.
Findings
Navigation performance drops with increased task complexity
Simple semantic map agents perform comparably to complex neural feature map agents
Even oracle map agents show limited performance, indicating room for improvement
Abstract
Navigation tasks in photorealistic 3D environments are challenging because they require perception and effective planning under partial observability. Recent work shows that map-like memory is useful for long-horizon navigation tasks. However, a focused investigation of the impact of maps on navigation tasks of varying complexity has not yet been performed. We propose the multiON task, which requires navigation to an episode-specific sequence of objects in a realistic environment. MultiON generalizes the ObjectGoal navigation task and explicitly tests the ability of navigation agents to locate previously observed goal objects. We perform a set of multiON experiments to examine how a variety of agent models perform across a spectrum of navigation task complexities. Our experiments show that: i) navigation performance degrades dramatically with escalating task complexity; ii) a simple…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
