UNMuTe: Unifying Navigation and Multimodal Dialogue-like Text Generation

Niyati Rawal; Roberto Bigazzi; Lorenzo Baraldi; Rita Cucchiara

arXiv:2408.04423·cs.RO·August 9, 2024

UNMuTe: Unifying Navigation and Multimodal Dialogue-like Text Generation

Niyati Rawal, Roberto Bigazzi, Lorenzo Baraldi, Rita Cucchiara

PDF

Open Access

TL;DR

UNMuTe is a novel model that unifies navigation and multimodal dialogue generation, enabling autonomous agents to interact with humans or other agents for improved navigation in complex environments.

Contribution

This work introduces UNMuTe, combining a GPT-2 based dialogue model with a navigation component, allowing agents to ask questions and receive guidance during navigation tasks.

Findings

01

Achieves state-of-the-art results on CVDN and NDH datasets.

02

Effectively generates questions and answers to improve navigation.

03

Demonstrates the benefit of integrated dialogue and navigation models.

Abstract

Smart autonomous agents are becoming increasingly important in various real-life applications, including robotics and autonomous vehicles. One crucial skill that these agents must possess is the ability to interact with their surrounding entities, such as other agents or humans. In this work, we aim at building an intelligent agent that can efficiently navigate in an environment while being able to interact with an oracle (or human) in natural language and ask for directions when it is unsure about its navigation performance. The interaction is started by the agent that produces a question, which is then answered by the oracle on the basis of the shortest trajectory to the goal. The process can be performed multiple times during navigation, thus enabling the agent to hold a dialogue with the oracle. To this end, we propose a novel computational model, named UNMuTe, that consists of two…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems