SAME: Learning Generic Language-Guided Visual Navigation with   State-Adaptive Mixture of Experts

Gengze Zhou; Yicong Hong; Zun Wang; Chongyang Zhao; Mohit Bansal; Qi; Wu

arXiv:2412.05552·cs.CV·December 10, 2024

SAME: Learning Generic Language-Guided Visual Navigation with State-Adaptive Mixture of Experts

Gengze Zhou, Yicong Hong, Zun Wang, Chongyang Zhao, Mohit Bansal, Qi, Wu

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper introduces SAME, a versatile model that unifies various language-guided visual navigation tasks, enabling an agent to adaptively handle different instruction granularities and observations, achieving strong multi-task performance.

Contribution

The paper proposes a novel State-Adaptive Mixture of Experts model that effectively shares knowledge across diverse navigation tasks and adapts to task-specific requirements.

Findings

01

SAME outperforms task-specific agents on seven navigation tasks.

02

The model demonstrates strong generalization across different instruction granularities.

03

Unified framework simplifies multi-task learning in visual navigation.

Abstract

The academic field of learning instruction-guided visual navigation can be generally categorized into high-level category-specific search and low-level language-guided navigation, depending on the granularity of language instruction, in which the former emphasizes the exploration process, while the latter concentrates on following detailed textual commands. Despite the differing focuses of these tasks, the underlying requirements of interpreting instructions, comprehending the surroundings, and inferring action decisions remain consistent. This paper consolidates diverse navigation tasks into a unified and generic framework -- we investigate the core difficulties of sharing general knowledge and exploiting task-specific capabilities in learning navigation and propose a novel State-Adaptive Mixture of Experts (SAME) model that effectively enables an agent to infer decisions based on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gengzezhou/same
noneOfficial

Models

🤗
ZGZzz/SAME
model· 15 dl
15 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization